Grok 3 - the new king of AI from Elon Musk? (movie, 4m)

Just hours ago, yet another deep-thinking large-language model hit the timeline, crushing existing benchmarks and reaching the coveted number one spot on the LM Arena leaderboard. This new model is none other than Elon's based and red-pilled Grok 3. Not only is it smart as hell, but it's also mostly uncensored, and will generate content that's illegal in many parts of the world. It has a deep-thinking mode like DeepSeeker 1, it can apparently do text-to-video, and in the near future will have a paid subscription for something even more powerful called SuperGrok. I'm already paying for Twitter Premium Plus just to access Grok 3, so that's a slap in the face, but in today's video, we'll find out what makes Grok special, how they trained it, and if it truly is the best LLM in the world. It is February 18th, 2025, and you're watching The Code Report. Last week, Elon attempted to troll OpenAI by offering to buy it out. Not surprisingly, the OpenAI board promptly rejected this offer, and Sam Altman is still on track to make it for-profit and get his big payday. The AI game of Thrones is ruthless, and Mark Zuckerberg, one of the big players vying for the throne, took a big loss last week when it was revealed that he signed off on using 82 terabytes of pirated books to train their llama models, which he obtained through the Library Genesis Project, which contains millions of books and paywalled articles. I just can't believe Zuckerberg would do something like that, though, said nobody ever. When it comes to training AI, though, one thing that's special about Grok is that it's the only model that has direct access to the firehose of data from Twitter, and ex-AI developers have optimized this model for maximum truth-seeking, even if that comes at the expense of being politically correct, and that means you can use it to do things like generate images of celebrities or have it write a profanity-laced poem about racial stereotypes. In the name of science, I tried this prompt on every LLM, and it was blocked on every single one of them except for Grok. The response it gave me is so offensive that I can't even show it here on YouTube, and if you posted something like this in a country that doesn't have the right to free speech, you could quite literally go to jail. Despite that, Grok 3 should be available in countries like Germany and the UK soon. That's a huge win if you're a professional internet troll, but how good is Grok in reality? Well, currently it's sitting on top of LLM Arena, which is basically a blind taste test where humans compare different LLMs side by side, and reaching the top means that it's pretty dang good. In addition, we have another benchmark showing Grok beating Gemini, Claw, DeepSeek, and GPT-4 when it comes to math, science, and coding. However, conveniently, it's missing OpenAI-03, and when you add that model in, it paints a much different picture. It's also missing the Codeforces and ArcAGI benchmarks, and benchmarks are almost always cherry-picked for obvious reasons. The only thing I care about is my own proprietary vibe check, and it did things like generate ValidSvelte 5 code in one shot, and helped me build a game in Godot. Overall, it did a great job and appears to be plateauing at the same level as all the other state-of-the-art models. Recently, the AI grip has shifted from creating bigger, better base models to creating better prompting frameworks like DeepResearch and BigBrainMode. Another interesting detail about Grok, though, is that they provided details on how it was actually trained at the Colossus supercomputer in Memphis, Tennessee, which is currently believed to be the world's largest AI supercomputer. It contains a cluster of over 200,000 NVIDIA H100 GPUs, with plans to expand to 1 million GPUs. The facility uses so much electricity that they can't get it all from the grid, and brought in a bunch of portable diesel generators just to power the thing. And they're going to need all that power when SuperGrok comes out, which is expected to cost $30 per month, which would be a highly competitive price compared to ChatGPT Pro at $200 per month. As a developer, I'm already going broke paying for Claude, Cursor, Gemini, ChatGPT, Copilot, Codium, Midjourney, and WatsonX, but for some reason my code quality is worse than ever. A better strategy to learn how to code is to start from the ground up, and you can do that today for free thanks to this video's sponsor, Brilliant. Their platform provides interactive, hands-on lessons that demystify the complexity of deep learning. With just a few minutes of effort each day, you can understand the math and computer science behind this seemingly magic technology. I'd recommend starting with Python, then check out their full How Large Language Models Work course if you really want to look under the hood of ChatGPT. Try everything Brilliant has to offer for free for 30 days by going to brilliant.org slash fireship or use the QR code on screen. This has been The Code Report. Thanks for watching, and I will see you in the next one.

Menu

Grok 3 - the new king of AI from Elon Musk? (movie, 4m)

Toggle timeline summary

Transcription