OpenAI Codex and model o4 mini vs Claude, Copilot, and Firebase Studio (film, 5m)
Fireship released a new episode discussing the launch of two new reasoning models from OpenAI: O3 and O4 mini. These models are claimed by some to be at genius level, sparking speculation about upcoming technologies like room temperature superconducting hoverboards. In San Francisco, there's a saying that goes, if someone fools you once, it's their fault, but if they fool you four times, it's your fault. However, this time it might genuinely be a game-changer. Fireship emphasizes that OpenAI is shipping products at an incredible pace, which is evident with the recent release of GPT 4.1 and GPT 4.0 image generation. The new O4 model is expected to be a powerful coding tool, although the host urges viewers to stop posting negative comments on their videos.
Toggle timeline summary
-
OpenAI released new reasoning models, O3 and O4 mini, with claims of genius-level performance.
-
Skepticism around AI genius claims is highlighted with a humorous saying.
-
OpenAI is actively releasing new models, including GPT 4.1 and others.
-
The new models are lauded for their code-writing capabilities.
-
OpenAI introduced an open-source CLI tool, Codex, similar to Cloud Code.
-
Despite numerous pricey tools, the speaker's code quality remains poor.
-
Silicon Valley experiences a competitive rush to attract software engineers.
-
Rumors of OpenAI's acquisition of Windsurf for $3 billion circulate.
-
Google's Firebase Studio, a development tool, generates code quickly.
-
The current landscape for development tools is chaotic, prompting a test of O4mini.
-
The AI struggles with vague requirements but tries to build a YouTube clone.
-
Comparison with Cloud Code reveals similar difficulties in generating code.
-
Firebase Studio shows faster performance but has its struggles with specific requests.
-
Overall effectiveness of AI tools is questioned, highlighting flaws.
-
Encouragement for developers to actively create despite the challenges.
-
Introduction of Mux as a solution for video integration challenges for developers.
-
A call to action to try Mux for free, emphasizing its scalability.
Transcription
Yesterday, OpenAI released two new reasoning models, O3 and O4 mini, and people claim that they're at or above genius level. That means room temperature superconducting hoverboards should be right around the corner, but these claims of AI genius feel like deja vu all over again. In San Francisco, there's an old saying that goes, fool me once, shame on you. Fool me O4 times, shame on me. But maybe this time it's the real deal. One thing's for sure though, OpenAI is shipping like crazy. This comes just days after they released GPT 4.1, and just weeks after 4.0 imagegen and GPT 4.5. Hopefully they can use the genius of O4 to not create such confusingly stupid names. Remember today we're talking about O4, not 4.0, so try to keep up. It is April 17th, 2025, and you're watching The Code Report. These new reasoning models are supposed to be really super good at writing code, but you guys really need to stop posting these comments on their videos. Because this guy in a $2 million car has been following me around, and I have a bad feeling about it. But the good news is that OpenAI also released an open source CLI tool to go along with it called Codex. It's basically the exact same thing as Cloud Code that can write, execute, and analyze code directly from your terminal or IDE. In today's video we'll try out Codex, but I already pay thousands of dollars a month to vibe code with Lovable, Windsurf, Cursor, Firebase Studio, Cloud Code, Copilot, Devon, Augment, and Bolt, yet my code quality is worse than ever. I know it's probably just a skill issue, but one thing's for sure, there's a massive arms race going on right now in Silicon Valley to capture the hearts, minds, and wallets of software engineers, especially the smart lazy ones who don't want to write code. The global economy might be on the verge of collapse, but the code shovel business is booming right now. In fact, there's a rumor that OpenAI is in talks to buy Windsurf for $3 billion. And Windsurf is just a VS Code fork that adds a few AI bells and whistles. I'm starting to regret investing all my time working on Horstender when I should have just forked VS Code and put a price tag on it. Like Cursor, another VS Code fork, is doing $100 million in annual revenue. However, VS Code is built by Microsoft, and Microsoft is the biggest player in the developer tooling race. I believe their goal is to embrace, extend, and extinguish coders, I mean that in a good way of course, and they just released a massive upgrade to Copilot, called Agent Mode, that many people are calling the Cursor or Windsurf killer. Like OpenAI Codex and Cloud Code, it can create files and run commands, and integrates model context protocol servers. That's pretty cool, but at the moment, many people regard Gemini 2.5 as the best programming model out there. People will bow to it. And I'd say I have to agree. Last week, Google released Firebase Studio, which was formerly known as Project IDX, and it's a browser-based fork of VS Code that's hosted by Google. Not only does it generate code with Gemini 2.5, but it can also host and deploy that code automatically. The tooling situation for developers right now is more chaotic than I've ever seen in my life, but now let's try out 04mini in OpenAI Codex to find out if it's truly a genius. To use it, you install it with npm, and then set an OpenAI API key as an environment variable. From there, you can run the codex command, and then give it a prompt. I'll go ahead and ask it to build something simple, like a YouTube clone. Apparently those requirements were not clear enough, so it asked me to clarify. As a developer, I feel its pain after getting years of half-assed requirements from clients. From there, it took a really long time to think, and then asked me to confirm a bunch of different actions. The end result was a bunch of empty directories, although I could see the code that it was trying to write in the terminal. The likely explanation is because I'm a giga-chad using Windows for development, but if you're a 0.10x developer on macOS, things should go a lot smoother. I asked it to write Svelte 5 code with runes, and still in 2025, it failed to do so. Now as a control, I also ran the same prompt in Cloud Code and Firebase Studio. Cloud Code also took an extremely long time, but it did figure out how to run the commands on Windows. But like all the other AI models, it can't seem to figure out how to write Svelte 5 rune code. It tried by using the dollar sign, but that ultimately resulted in an error and a non-working app. Let's see if Firebase Studio can generate a YouTube clone any better. The first thing you'll notice is that Firebase Studio is at least 10 times faster. However, when I asked it to generate Svelte 5 code, it completely ignored me and wrote everything in Next.js. That's disappointing, but like I've said before, if you want to be a good vibe coder, you should probably just use the most popular technologies like React. The thing about Firebase Studio, though, is that it's way easier to work with AI when it's integrated directly into an IDE and environment like this. The final verdict, though, is that they all kind of suck in their own special ways. Don't fall for OpenAI's genius hype, but also don't fall for the AI doomers who say these tools are worthless. Life is electric, and it's the greatest time ever to be a developer. Get out there and ship as much AI slop as you possibly can like you're on a mission from the gods. But if you try to build a YouTube clone like me with streaming video, you're going to want to know about Mux, the sponsor of today's video. If you've ever had to integrate video into an app, you know how easy it is to get started, but how difficult it is to get right. This is where API-first video infrastructure from Mux can help out. Not only will it host and encode your videos for adaptive bitrate streaming, but also provides real-time analytics, automatic thumbnail generation, and even live streaming all through an API that's highly customizable. It can handle video streaming at your startup with zero users, but scales up for big companies like Substack, Patreon, and HubSpot. If you're a developer looking to build awesome video features, try it out for free right now at mux.com slash fireship. This has been the Code Report, thanks for watching, and I will see you in the next one.