Menu
About me Kontakt

Claude 3.7 and Cloud Code - New LLM Model and Tool for Developers (video, 6m)

Yesterday, Anthropic finally unveiled Clod 3.7 Sonnet, the highly-anticipated large-language model that has generated excitement and concern within the programming community. The announcement video received significant attention, and the top comment showcased viewers eagerly awaiting Fireship's take on the model. The author expressed appreciation for the trust viewers placed, despite acknowledging that his reviews may sometimes be less thorough. After extensive testing involving millions of tokens, it is evident that Clod 3.7 marks a notable advancement. It appears to be faster and more efficient, introducing a new thinking mode that emulates the success of DeepSeek R1 and OpenAIO models. However, the most intriguing feature released is Clod Code, which allows users to create, test, and execute code within any project, raising discussions about the potential for AI to replace programmers. While many coding influencers fear the implications, this video will explore whether those fears are justified.

In a recent analysis released by Anthropic, it was highlighted that AI represents only 3.4% of the workforce, yet over 37% of prompts pertain to math and coding. Although AI has not yet displaced human programmers, it has diminished the demand for platforms like Stack Overflow. Currently, Clod 3.5 ranks on top of the leaderboard regarding web development, but Clod 3.7 has surpassed its predecessor with a significant leap, claiming a dramatic 70.3% success rate in resolving GitHub issues, a benchmark that many view as critical in evaluating AI's efficiency in coding tasks.

To access the Clod CLI, users can easily install it via npm, which utilizes Anthropic's API directly. However, potential users should be aware of the costs, as Clod is priced over ten times higher than many other models. Once installed, the tool provides access to the terminal command, enabling developers to harness the full context of the code within their projects efficiently. The author noticed that despite the potential efficiency, the outputted code sometimes lacks adherence to frameworks and libraries specified in the original project parameters, leading to confusion.

Fireship decided to put Clod to the ultimate test by attempting to build a project with a complex user interface using Svelte instead of React. Though Clod successfully generated the required components, the execution revealed various issues. The author highlighted that the generated code did not incorporate the specified tech stack properly and suffered from design inefficiencies. Moreover, impending costs associated with using the model raised questions about its overall value relative to traditional programming methods—debunking some of the assumptions made earlier regarding Clod's capabilities.

In conclusion, while Clod is an impressive advance in front-end development, the results point out significant challenges in back-end integration. The reliance on tools like Convex has shown substantial benefits in building applications quickly, especially when combined with Clod for enhanced productivity. As the video shows, even with promising features, Clod still harbors substantial room for improvement. As of this writing, the video has garnered 1,907,269 views and 80,824 likes, demonstrating the community’s strong interest in the advancements and implications surrounding this groundbreaking model.

Toggle timeline summary

  • 00:00 Introduction to Anthropic's release of Clod 3.7 Sonnet, a new large-language model.
  • 00:05 The model is polarizing among programmers.
  • 00:12 The announcement video sparked excitement.
  • 00:18 The speaker shares their experience testing Clod 3.7.
  • 00:31 Clod 3.7 introduces a new thinking mode to enhance programming capabilities.
  • 00:40 Introduction of Clod Code, a CLI tool for coding projects.
  • 01:00 Anthropic's study reveals AI's growing impact on the labor force.
  • 01:24 Clod 3.7 outperforms its predecessor in software engineering benchmarks.
  • 01:32 New model achieves a 70.3% success rate in solving GitHub issues.
  • 01:57 Clod CLI installation details and costs discussed.
  • 02:50 The speaker demonstrates creating a random name generator using Clod.
  • 03:27 Attempt to build a visual UI in Svelte instead of React.
  • 03:59 Comparative performance of Clod and OpenAI03MiniHigh in generating UIs.
  • 04:13 Issues found in Clod's generated code despite its impressive initial output.
  • 04:50 A dilemma involving end-to-end encryption and AI's limitations in fixing complex issues.
  • 05:07 Introduction of Convex, a tool for building apps with a focus on backend development.
  • 05:38 Encouragement to create a free Convex project for enhanced productivity.
  • 05:44 Conclusion and thanks for watching.

Transcription

Yesterday, Anthropic finally released Clod 3.7 Sonnet, the highly-anticipated large-language model that's most loved and most feared by programmers. Their announcement video has everybody freaking out, and the top comment on that video is people waiting for this video. And let me just say, I'm humbled and honored that you put so much faith into my half-assed AI reviews. I've already burned through millions of tokens testing it, and the TLDR is that Clod 3.7 is straight gas. It's different, mad heat, high-key goaded, on god, no cap, for real, for real. The new base model beat itself to become even better at programming, while adding a new thinking mode to copy the success of DeepSeek R1 and the OpenAIO models. But the craziest thing they released is something called Clod Code, a CLI tool that allows you to build, test, and execute code in any project, thus creating an infinite feedback loop that in theory should replace all programmers. All the code influencers are telling us we're cooked, and in today's video, we'll find out if they're right. It is February 25th, 2025, and you're watching The Code Report. A few weeks ago, Anthropic released a paper that studied how AI affects the labor force, and what they found is that despite only making up 3.4% of the workforce, over 37% of the prompts are related to math and coding. And although it hasn't taken any human programmer jobs yet, it has taken Stack Overflow's job. Now there's a lot of AI slopware out there, and it's hard to keep track of it all, but for web development, one of the better indicators is WebDevArena, and Clod 3.5, the previous version, is already sitting on top of that leaderboard. But it was roughly tied with all of the other state-of-the-art models when it comes to the software engineering benchmark, which is human verified and based on real github issues. What's crazy though is that the new 3.7 model absolutely crushed all the other models, including OpenAI-03-mini-hi and DeepSeek, and is now capable of solving 70.3% of github issues. That is if we're to believe the benchmark, and after trying out the Clod code CLI, I might actually believe it. It's in research preview currently, but you can install the Clod CLI with npm. Disclaimer though, it uses the Anthropic API directly, and Clod is not cheap, costing over 10 times more than models like Gemini Flash and DeepSeek. $15 per million output tokens costs more than my entire personality. Once installed, you'll have access to the Clod command in the terminal, and that gives it the full context of existing code in your project. One thing I noticed immediately though is that their text decoration in the CLI looks almost identical to SST, an open source tool we've covered on this channel before. That could be a coincidence, but it appears the Clod logo might also be plagiarized, based on this drawing of a butthole by one of my favorite authors Kurt Vonnegut. Now there's nothing wrong with designing your logo after a sphincter, plenty of companies do it, but I think Clod is just a little too on the nose here. But now that I have Clod installed, I can run the init command, which will scan the project and create a markdown file with some initial context and instructions. That's cool, but now we have an open session, and one thing you might want to do is see how much money you've lost by prompting so far. With the cost command, I can see that creating that init file cost about $0.08. Now the first actual job I gave it was pretty easy, and that was to create a random name generator in Deno. After you enter a prompt, it will figure out what to do, and then have you confirm it with a yes or no. Like in this case here, it wants to generate a new file. It'll go ahead and write that file to the file system, and then it also creates a dedicated testing file as well. That's important, because using a strongly typed language along with test driven development are ways for the AI to validate that the code it writes is actually valid. If that test fails, it can use the feedback to rewrite the business logic, and continue going back and forth until it gets it right. And in this example, it wrote what I would consider perfect code. But now let's do something more challenging and have it build an actual visual frontend UI, but instead of React, we'll use Svelte. When I generated the config, you'll notice that it understands the tech stack is using TypeScript and Tailwind, and then I'll prompt it for a moderately complex frontend UI. An application that can access my microphone and visualize the waveform. For this initial prompt, I had to confirm like 20 different things, and as you can see, it wrote a bunch of new components to my project. It took a lot longer than just prompting Claude in the web UI, but the end result was worth the wait. Here in the application, I can click through a waveform, frequency, and circular graphic that visualizes the sound of my voice. As a control, I had OpenAI03MiniHigh generate the same thing, and at first I got an error, which was easy to fix, but the end result looked like this. An embarrassing piece of crap compared to Claude. But upon closer inspection, there were a lot of problems in Claude's code. Like for one, it didn't use TypeScript or Tailwind at all, even though it should know that they're in our tech stack. It also failed to use the new Svelte5 rune syntax. And the entire session cost me about 65 cents, which would have been better spent on an egg or a banana. But now I have one final test. Recently, Apple had to discontinue end-to-end encryption in the UK, because the government wanted a backdoor, and Apple refused to build one. If you've been affected by this, one thing you can do is build your own end-to-end encrypted app. I can't do that myself in JavaScript, but every single large language model that I've tried fails. Let's see if Claude code can fix this chat GPT garbage code. It took quite a while and changed a lot of the code, but for whatever reason, it still fails to run. And unfortunately, because I've become so dependent on AI, I have no idea how to fix an error message like this, and all I can really do is wait for the next best model to come out. Throughout this video, we've seen how good Claude is at front-end dev, but the other half to your application is the back-end, and if you like building apps fast, you need to try out Convex, the sponsor of today's video. It's an open-source reactive database that provides type-safe queries, scheduled jobs, server functions, real-time data sync like Firebase, just to name a few of its features. Best of all though, database queries are written in pure TypeScript, giving us this beautiful IDE autocomplete across the entire stack. But that also creates another side effect, making Convex really good at autonomous vibe coding with AI. Models like Claude can more easily understand Convex code, write it with fewer errors, and thus be more productive with it. If you know how to build a front-end app, you're already halfway there. Now use the link on the screen to create a free Convex project to build the other half. This has been The Code Report, thanks for watching, and I will see you in the next one.