Google Gemini 2.0 vs Competition: DeepSeek R1 and OpenAI O3-mini (film, 5m)

Yesterday, Google unleashed Gemini 2.0 on the public, which unfortunately just caused Fireship to drop yet another AI video to the dismay of all the JavaScript framework bros. As is tradition, this new large language model comes in a variety of confusingly named flavors, and it looks like Google is about to take yet another L on the AI race. Its most jacked deep thinking model comes in behind both OpenAI 03 Mini High and DeepSeek R1. That's on LiveBench, and anything less than first place is a failure for Google. However, the release of Gemini 2.0 is actually the biggest win Google's had in the AI race so far, because it beats the competition on some of the most real world use cases, and it does everything at a fraction of the cost. Like this guy on the internet explained how Gemini can process 6,000 pages of PDFs, and nothing else even comes close at the same cost, and Gemini does it with better accuracy. That's just one example of many, and in today's video, we'll take a closer look at Gemini to find out why you need to stop making fun of it. It is February 6th, 2025, and you're watching The Code Report. Google has taken a lot of L's in recent years. There was the Monopoly conviction, the overly woke image generator, and yesterday Alphabet stock dropped because their cloud revenue didn't hit expectations. But there have been many dubs within these L's. After deciding not to not be evil, their AI killbots have been selling like hotcakes, but more importantly, Gemini is a legit contender to win the AI race because it's good enough, smart enough, and doggone it, people like it, mostly because it's cheap. Not just a little bit cheaper, but like over 90% cheaper. Like to get a million tokens out of GPT-40, it's gonna cost you 10 bucks, but to get a million tokens out of Gemini Flash 2, which is supposedly a better model than 4.0, it'll only cost you 40 cents, which is nearly a 100% discount. That's even cheaper than DeepSeek, well at least it was until they slashed their prices, although the true value of DeepSeek is that it's open source. Gemini also has a Lite model that's even less expensive and faster, and then there's a bigger Pro model that's more expensive. What's really awesome though is that if you're not a developer using the API, all these models can be used for free in the chatbot, and they can do things that no other LLM can do, like watch YouTube videos. In fact, if you're watching this video by putting your eyeballs in front of a screen, you're falling behind. What you should do right now is go to the DeepThinking model with Experimental Apps and have Gemini summarize this video for you, so you can get back to work playing Civilization 7. What's really crazy though is that Flash has a 1 million token context window, and that goes up to 2 million on the Pro model. That's like 100,000 lines of code, or 16 novels, and that means you can feed it way more data as a starting point compared to O3 Mini and DeepSeek, which are limited to 128k tokens. And it's a terrifying feature if you have a vector database or RAG startup, because that's way more context than most people will ever need. But another crazy thing I tried was chat with Gemini using Flash 2.0. When your kids ask questions, like why does water always stay level even though we supposedly live on a curved ball, Gemini will talk to you in a way that feels so natural that you almost forget you're in Uncanny Valley. Yo, the Earth's hella round, but water's so chill it just does its own thing, and creates these totally flat vibes. Now when it comes to benchmarks, Gemini still falls behind OpenAI O3, and it's not the best choice if you're doing PhD level math and science. But surprisingly, Gemini is currently on top of the LM Arena benchmark, which is basically a blind taste test where people try out different LLMs and rank them. It beats everything including DeepSeek and O1, although O3 is not on this list yet. However, if you're a web developer, a better benchmark is WebDev Arena, and there Gemini 2 comes in 5th, tied with O3 Mini. Meanwhile Sonnet and DeepSeeker are on top, which aligns with my experience coding with these models in tools like Cursor. And it's also worth noting that Google's Imogen is currently sitting on top of the text to image leaderboard. Gemini is proprietary, but Google did give the open source community a big win recently by open sourcing the operating system for the Pebble watch, which died a long time ago but was the best smartwatch ever made. But in case you haven't heard, they're actually bringing the Pebble watch back. In addition, Google does have an open family of LLMs called Gemma, but they're gonna need a big update to compete with things like DeepSeek. But if you're building an app with tech like this, one of the most important choices you'll have to make is where to deploy your code, and that's why you need to know about Zavala, the sponsor of today's video. If you're old enough to remember when Heroku was actually good, Zavala is like a superior modern successor, where you can deploy entire full stack applications, databases, and static websites, all backed by Google Kubernetes Engine and Cloudflare. But most importantly, without all these painful YAML configs, when you finish grinding your fingers to the bone building an app with your favorite framework, you can easily ship it to production by one, connecting a Git repo or Docker image to Zavala, two, provision some resources, and finally three, click the deploy button. Not only will it host your web application and database, backed with DDoS protection, a CDN, and edge caching, and a beautiful graph to visualize it, but you can also fully automate getting your code from development to production by building CI-CD pipelines. Give Zavala a try for free right now with $50 in free credits using the link below. This has been the Code Report, thanks for watching, and I will see you in the next one.

Menu

Google Gemini 2.0 vs Competition: DeepSeek R1 and OpenAI O3-mini (film, 5m)

Toggle timeline summary

Transcription