OpenAI Codex i model o4 mini vs Claude, Copilot i Firebase Studio (film, 5m)
Fireship opublikował nowy odcinek, w którym omawia wydanie dwóch nowych modeli rozumienia od OpenAI: O3 oraz O4 mini. Te modele są określane przez niektórych jako na poziomie geniuszu, co zainspirowało spekulacje o nadchodzących technologiach, takich jak unoszące się na podczerwień hoverboardy. W San Francisco istnieje przysłowie, które mówi, że jeśli ktoś oszuka cię raz, to jego wina, ale jeśli oszuka cię cztery razy, to twoja. Jednak być może tym razem rzeczywiście mamy do czynienia z rewolucją. Fireship podkreśla, że OpenAI dostarcza nowe produkty w błyskawicznym tempie, co jest zauważalne po niedawnym wydaniu GPT 4.1 oraz obrazów generowanych przez GPT 4.0. Nowy model O4 ma być potężnym narzędziem do pisania kodu, jednak autor zwraca się do widzów z prośbą o zaprzestanie negatywnych komentarzy.
Toggle timeline summary
-
OpenAI wydało nowe modele rozumowania, O3 i O4 mini, twierdząc, że osiągają one poziom geniusza.
-
Podkreślono sceptycyzm dotyczący twierdzeń o geniuszu AI za pomocą humorystycznego powiedzenia.
-
OpenAI aktywnie wydaje nowe modele, w tym GPT 4.1 i inne.
-
Nowe modele chwalone są za zdolności do pisania kodu.
-
OpenAI wprowadziło narzędzie CLI typu open-source, Codex, podobne do Cloud Code.
-
Mimo licznych drogich narzędzi, jakość kodu prelegenta pozostaje niska.
-
W Dolinie Krzemowej trwa konkurencyjny wyścig o pozyskanie inżynierów oprogramowania.
-
Krążą plotki o przejęciu Windsurf przez OpenAI za 3 miliardy dolarów.
-
Firebase Studio firmy Google, narzędzie do rozwoju, generuje kod szybko.
-
Obecny krajobraz narzędzi deweloperskich jest chaotyczny, co skłania do przetestowania O4mini.
-
AI ma trudności z niejasnymi wymaganiami, ale próbuje stworzyć klon YouTube'a.
-
Porównanie z Cloud Code ujawnia podobne trudności w generowaniu kodu.
-
Firebase Studio pokazuje szybszą wydajność, ale ma problemy z konkretnymi żądaniami.
-
Ogólna skuteczność narzędzi AI jest kwestionowana, podkreślając ich wady.
-
Zachęta dla deweloperów do aktywnego tworzenia mimo wyzwań.
-
Wprowadzenie Mux jako rozwiązania wyzwań związanych z integracją wideo dla deweloperów.
-
Wezwanie do działania, aby spróbować Mux za darmo, podkreślając jego skalowalność.
Transcription
Yesterday, OpenAI released two new reasoning models, O3 and O4 mini, and people claim that they're at or above genius level. That means room temperature superconducting hoverboards should be right around the corner, but these claims of AI genius feel like deja vu all over again. In San Francisco, there's an old saying that goes, fool me once, shame on you. Fool me O4 times, shame on me. But maybe this time it's the real deal. One thing's for sure though, OpenAI is shipping like crazy. This comes just days after they released GPT 4.1, and just weeks after 4.0 imagegen and GPT 4.5. Hopefully they can use the genius of O4 to not create such confusingly stupid names. Remember today we're talking about O4, not 4.0, so try to keep up. It is April 17th, 2025, and you're watching The Code Report. These new reasoning models are supposed to be really super good at writing code, but you guys really need to stop posting these comments on their videos. Because this guy in a $2 million car has been following me around, and I have a bad feeling about it. But the good news is that OpenAI also released an open source CLI tool to go along with it called Codex. It's basically the exact same thing as Cloud Code that can write, execute, and analyze code directly from your terminal or IDE. In today's video we'll try out Codex, but I already pay thousands of dollars a month to vibe code with Lovable, Windsurf, Cursor, Firebase Studio, Cloud Code, Copilot, Devon, Augment, and Bolt, yet my code quality is worse than ever. I know it's probably just a skill issue, but one thing's for sure, there's a massive arms race going on right now in Silicon Valley to capture the hearts, minds, and wallets of software engineers, especially the smart lazy ones who don't want to write code. The global economy might be on the verge of collapse, but the code shovel business is booming right now. In fact, there's a rumor that OpenAI is in talks to buy Windsurf for $3 billion. And Windsurf is just a VS Code fork that adds a few AI bells and whistles. I'm starting to regret investing all my time working on Horstender when I should have just forked VS Code and put a price tag on it. Like Cursor, another VS Code fork, is doing $100 million in annual revenue. However, VS Code is built by Microsoft, and Microsoft is the biggest player in the developer tooling race. I believe their goal is to embrace, extend, and extinguish coders, I mean that in a good way of course, and they just released a massive upgrade to Copilot, called Agent Mode, that many people are calling the Cursor or Windsurf killer. Like OpenAI Codex and Cloud Code, it can create files and run commands, and integrates model context protocol servers. That's pretty cool, but at the moment, many people regard Gemini 2.5 as the best programming model out there. People will bow to it. And I'd say I have to agree. Last week, Google released Firebase Studio, which was formerly known as Project IDX, and it's a browser-based fork of VS Code that's hosted by Google. Not only does it generate code with Gemini 2.5, but it can also host and deploy that code automatically. The tooling situation for developers right now is more chaotic than I've ever seen in my life, but now let's try out 04mini in OpenAI Codex to find out if it's truly a genius. To use it, you install it with npm, and then set an OpenAI API key as an environment variable. From there, you can run the codex command, and then give it a prompt. I'll go ahead and ask it to build something simple, like a YouTube clone. Apparently those requirements were not clear enough, so it asked me to clarify. As a developer, I feel its pain after getting years of half-assed requirements from clients. From there, it took a really long time to think, and then asked me to confirm a bunch of different actions. The end result was a bunch of empty directories, although I could see the code that it was trying to write in the terminal. The likely explanation is because I'm a giga-chad using Windows for development, but if you're a 0.10x developer on macOS, things should go a lot smoother. I asked it to write Svelte 5 code with runes, and still in 2025, it failed to do so. Now as a control, I also ran the same prompt in Cloud Code and Firebase Studio. Cloud Code also took an extremely long time, but it did figure out how to run the commands on Windows. But like all the other AI models, it can't seem to figure out how to write Svelte 5 rune code. It tried by using the dollar sign, but that ultimately resulted in an error and a non-working app. Let's see if Firebase Studio can generate a YouTube clone any better. The first thing you'll notice is that Firebase Studio is at least 10 times faster. However, when I asked it to generate Svelte 5 code, it completely ignored me and wrote everything in Next.js. That's disappointing, but like I've said before, if you want to be a good vibe coder, you should probably just use the most popular technologies like React. The thing about Firebase Studio, though, is that it's way easier to work with AI when it's integrated directly into an IDE and environment like this. The final verdict, though, is that they all kind of suck in their own special ways. Don't fall for OpenAI's genius hype, but also don't fall for the AI doomers who say these tools are worthless. Life is electric, and it's the greatest time ever to be a developer. Get out there and ship as much AI slop as you possibly can like you're on a mission from the gods. But if you try to build a YouTube clone like me with streaming video, you're going to want to know about Mux, the sponsor of today's video. If you've ever had to integrate video into an app, you know how easy it is to get started, but how difficult it is to get right. This is where API-first video infrastructure from Mux can help out. Not only will it host and encode your videos for adaptive bitrate streaming, but also provides real-time analytics, automatic thumbnail generation, and even live streaming all through an API that's highly customizable. It can handle video streaming at your startup with zero users, but scales up for big companies like Substack, Patreon, and HubSpot. If you're a developer looking to build awesome video features, try it out for free right now at mux.com slash fireship. This has been the Code Report, thanks for watching, and I will see you in the next one.