Claude 3.7 and Cloud Code - New LLM Model and Tool for Developers (video, 6m)

Yesterday, Anthropic finally released Clod 3.7 Sonnet, the highly-anticipated large-language model that's most loved and most feared by programmers. Their announcement video has everybody freaking out, and the top comment on that video is people waiting for this video. And let me just say, I'm humbled and honored that you put so much faith into my half-assed AI reviews. I've already burned through millions of tokens testing it, and the TLDR is that Clod 3.7 is straight gas. It's different, mad heat, high-key goaded, on god, no cap, for real, for real. The new base model beat itself to become even better at programming, while adding a new thinking mode to copy the success of DeepSeek R1 and the OpenAIO models. But the craziest thing they released is something called Clod Code, a CLI tool that allows you to build, test, and execute code in any project, thus creating an infinite feedback loop that in theory should replace all programmers. All the code influencers are telling us we're cooked, and in today's video, we'll find out if they're right. It is February 25th, 2025, and you're watching The Code Report. A few weeks ago, Anthropic released a paper that studied how AI affects the labor force, and what they found is that despite only making up 3.4% of the workforce, over 37% of the prompts are related to math and coding. And although it hasn't taken any human programmer jobs yet, it has taken Stack Overflow's job. Now there's a lot of AI slopware out there, and it's hard to keep track of it all, but for web development, one of the better indicators is WebDevArena, and Clod 3.5, the previous version, is already sitting on top of that leaderboard. But it was roughly tied with all of the other state-of-the-art models when it comes to the software engineering benchmark, which is human verified and based on real github issues. What's crazy though is that the new 3.7 model absolutely crushed all the other models, including OpenAI-03-mini-hi and DeepSeek, and is now capable of solving 70.3% of github issues. That is if we're to believe the benchmark, and after trying out the Clod code CLI, I might actually believe it. It's in research preview currently, but you can install the Clod CLI with npm. Disclaimer though, it uses the Anthropic API directly, and Clod is not cheap, costing over 10 times more than models like Gemini Flash and DeepSeek. $15 per million output tokens costs more than my entire personality. Once installed, you'll have access to the Clod command in the terminal, and that gives it the full context of existing code in your project. One thing I noticed immediately though is that their text decoration in the CLI looks almost identical to SST, an open source tool we've covered on this channel before. That could be a coincidence, but it appears the Clod logo might also be plagiarized, based on this drawing of a butthole by one of my favorite authors Kurt Vonnegut. Now there's nothing wrong with designing your logo after a sphincter, plenty of companies do it, but I think Clod is just a little too on the nose here. But now that I have Clod installed, I can run the init command, which will scan the project and create a markdown file with some initial context and instructions. That's cool, but now we have an open session, and one thing you might want to do is see how much money you've lost by prompting so far. With the cost command, I can see that creating that init file cost about $0.08. Now the first actual job I gave it was pretty easy, and that was to create a random name generator in Deno. After you enter a prompt, it will figure out what to do, and then have you confirm it with a yes or no. Like in this case here, it wants to generate a new file. It'll go ahead and write that file to the file system, and then it also creates a dedicated testing file as well. That's important, because using a strongly typed language along with test driven development are ways for the AI to validate that the code it writes is actually valid. If that test fails, it can use the feedback to rewrite the business logic, and continue going back and forth until it gets it right. And in this example, it wrote what I would consider perfect code. But now let's do something more challenging and have it build an actual visual frontend UI, but instead of React, we'll use Svelte. When I generated the config, you'll notice that it understands the tech stack is using TypeScript and Tailwind, and then I'll prompt it for a moderately complex frontend UI. An application that can access my microphone and visualize the waveform. For this initial prompt, I had to confirm like 20 different things, and as you can see, it wrote a bunch of new components to my project. It took a lot longer than just prompting Claude in the web UI, but the end result was worth the wait. Here in the application, I can click through a waveform, frequency, and circular graphic that visualizes the sound of my voice. As a control, I had OpenAI03MiniHigh generate the same thing, and at first I got an error, which was easy to fix, but the end result looked like this. An embarrassing piece of crap compared to Claude. But upon closer inspection, there were a lot of problems in Claude's code. Like for one, it didn't use TypeScript or Tailwind at all, even though it should know that they're in our tech stack. It also failed to use the new Svelte5 rune syntax. And the entire session cost me about 65 cents, which would have been better spent on an egg or a banana. But now I have one final test. Recently, Apple had to discontinue end-to-end encryption in the UK, because the government wanted a backdoor, and Apple refused to build one. If you've been affected by this, one thing you can do is build your own end-to-end encrypted app. I can't do that myself in JavaScript, but every single large language model that I've tried fails. Let's see if Claude code can fix this chat GPT garbage code. It took quite a while and changed a lot of the code, but for whatever reason, it still fails to run. And unfortunately, because I've become so dependent on AI, I have no idea how to fix an error message like this, and all I can really do is wait for the next best model to come out. Throughout this video, we've seen how good Claude is at front-end dev, but the other half to your application is the back-end, and if you like building apps fast, you need to try out Convex, the sponsor of today's video. It's an open-source reactive database that provides type-safe queries, scheduled jobs, server functions, real-time data sync like Firebase, just to name a few of its features. Best of all though, database queries are written in pure TypeScript, giving us this beautiful IDE autocomplete across the entire stack. But that also creates another side effect, making Convex really good at autonomous vibe coding with AI. Models like Claude can more easily understand Convex code, write it with fewer errors, and thus be more productive with it. If you know how to build a front-end app, you're already halfway there. Now use the link on the screen to create a free Convex project to build the other half. This has been The Code Report, thanks for watching, and I will see you in the next one.

Menu

Claude 3.7 and Cloud Code - New LLM Model and Tool for Developers (video, 6m)

Toggle timeline summary

Transcription