Free alternative to OpenAI Operator - AI that automates the browser (video, 22m)
In his latest video, Network Chuck delves into the intriguing world of AI agents that can perform tasks online autonomously. For instance, an AI can search eBay and add a Japanese VCR to the cart, which is currently his task as part of a new project. Chuck shares his experience with OpenAI's recently released tool, Operator, which is available only to pro subscribers paying $200 a month. In contrast, he discovered an open-source option that appears to be more interesting and accessible. This minimalistic approach allows actions to be executed through a browser, providing significant flexibility and adaptability.
The video shifts towards a comparison between Proprietary Operator and the open-source Browser Use, which enables users to harness local AI tools. Chuck emphasizes that using Browser Use allows for hosting AI applications locally, providing more control and simpler configuration than cloud-based solutions. Additionally, the programming of AI agents is presented in an approachable way, making it user-friendly even for those unacquainted with coding. He mentions several practical applications such as adding items to shopping carts, job searching, or generating documents.
Chuck encourages viewers to experiment with new tools and to explore the boundaries of AI capabilities. He highlights his ongoing project where he will pose various questions and carry out complex automation functions using local AIs. As this technology evolves, Chuck discusses developing increasingly sophisticated types of AI that can assist with everyday tasks.
A key focus of the video is also on utilizing free software. Chuck asserts that open-source solutions surpass commercial options, creating a vast space for developmental and innovative possibilities within AI. This astonishment reflects his notion that users maintain full control of their AI systems. He also teases potential future projects comparing different AI models for automation and workplace integration.
In closing, Network Chuck comments on some impressive statistics regarding the views on his content, noting that his latest video has garnered 592,008 views and 18,120 likes at the time of writing this article. Such interest underscores the growing popularity of his channel and the topics he explores. He encourages his viewers to continue exploring, embodying enthusiasm for innovative AI technology and continuous learning, establishing his position as a leading influencer in the IT field.
Toggle timeline summary
-
Introduction to giving tasks to an AI agent like finding a Japanese VCR on eBay.
-
The speaker shares their personal experience using OpenAI's Operator for a similar task.
-
OpenAI's Operator has limitations, including being paid and not fully reliable.
-
A free and open-source alternative to Operator that can control a browser is introduced.
-
Demo comparison between OpenAI's Operator and the open-source option.
-
Browser Use project is discussed, showing its capabilities.
-
Open-source software offers flexibility to use local AI and improves user control.
-
AI agents capable of automating tasks without coding knowledge.
-
Examples of automation including adding contacts to Salesforce and writing letters.
-
Setting up virtual environments for running code easily.
-
Setting specific configurations for various AI models.
-
Starting a demo with Local AI models and how they can perform tasks.
-
Testing various AI models for finding and adding items to a shopping cart.
-
Comparison between open-source and paid options showcases performance differences.
-
User interaction shows the capabilities of AI in real-time tasks like purchasing.
-
Discussion transitions to testing CAPTCHA solving capability between AI options.
-
Final head-to-head test between OpenAI's Operator and another browser option.
-
Concluding thoughts on the potential and risks of AI automation.
-
Final remarks on AI use and its hacking implications.
Transcription
We can now give a task to an AI agent, like, hey, go find me a Japanese VCR that supports TBC on eBay and add it to my cart. Oh, also make sure it's working. And the AI agent will just simply go out, open a web browser, and do this thing. It's like giving a task to an assistant, and you can go about your day. By the way, that's based on a true story. I'm actually working on a video where I'm needing a Japanese VCR, and I did this a few weeks back. Now, I'm doing this with OpenAI's Operator. They released this a few weeks back. It's a research preview, so don't be surprised if it's kind of janky. Also, it's only available to pro users, which means you've got to be paying OpenAI $200 a month. But I found an open source alternative. It's like this. It'll open up a browser, do the whole thing. Actually, I think it's kind of better. It's free, open source. I'll show you how to use it right now. Get your copy ready. This is actually pretty fun. Hey, Network Chuck from the future here. Coming up, I'm going to pit Operator versus the open source option, so you can create and purchase a virtual machine in the cloud. And then I'll have them log into the terminal and create a file. Who can do it the fastest? Can they do it at all? I don't know. And by the way, this segment is made possible by our sponsor, Postinger. We'll see what happens. The project is called Browser Use. Enable AI to control your browser. It was created by these handsome fellas. And I have to say, the project is very impressive. It does a lot. Let's peruse it a bit. Actually, first, just know, there is a paid version of this. If we go to their official website, not on GitHub, we can see that they're backed by Y Combinator, meaning they've gotten some funding. And here, they're even touting their performance, saying they are 2% better than Operator. At $30 a month, they are still cheaper than ChatGPT Operator. The enterprise option, yes, a month. I mean, it made me stop and look at it and go, what? But anyways, open source is what we care about. We can host this ourselves, use our own local stuff, even local AI. We don't have to go to the web unless we want to have, I mean, our web browser be accessed via the web. Or rather, unless we want to go out to a website with our web browser. I said that backwards. I need some more coffee. And what I love about this project is that it doesn't feel like it's in a research mode like ChatGPT is. And it's very programmatic, meaning that if you are really into building AI agents, which I'm getting into that, you can program your agents and have it do all kinds of insane things. So they have some examples here, like add grocery items to your cart and go check out. And it's obviously all done with code on the side. Don't be scared. I'm going to show you an option that's very GUI friendly. You can add my latest LinkedIn follower to my leads in Salesforce. Read my resume and go find jobs for me. Write a letter in Google Docs to my papa, thanking him for everything and save the document as a PDF. Now, thankfully, you don't have to know how to code anything. You just want to take it for a spin yourself right now. If I go back to the browser use account, I can see one of the projects is called Web UI. This is very easy to set up. We're going to walk through it right now. And you can try this yourself with Olama. So a couple things you're going to need to make this happen. First, you'll need some coffee. That's just the rules. I didn't make them. Maybe I did. Everything in IT requires coffee. Networkchuck.coffee. Two, you'll need a machine to run this on if you want to run this locally, which is what we're doing right now. So Mac, Windows, or Linux. I will be demoing right now this setup on Windows, which will be using WSL, which is the Windows subsystem for Linux. So it's basically Linux. And it will be very different from the bare-bones Linux or Mac setup. And if you're like, Chuck, I have no idea what WSL is. I have a video on that right here. And honestly, that's pretty much all you need. Oh, you know what? I lied. You're also going to need some sort of AI to use, right? We're using an AI tool. One tool you can use that's completely free, completely local, completely awesome is Olama. Go out to olama.com. Click on Download. You can install it for Mac, Windows, Linux. And it's very quick and easy. Have that going. If you want to use OpenAI or Cloud or any of those cloud-based models, all you need is an API key. I'll show you what that looks like. It's actually, you know, it's going to be better than a local model because they've got more resources. OK, first thing we'll do is launch our terminal, my favorite place to be. And because I'm in Windows Land, I need to jump into Linux with WSL. I'll launch my Ubuntu. I think it's 2204. Yeah, it is, 2204. Now, the first thing you want to do is make sure you do have Python 3.11 installed, at least 3.11. The easiest way to do that is with pyenv. With pyenv installed, all you have to do is type in pyenv, install 3.11. I already have it installed. And you do pyenv global 3.11 to make it live. And you can switch back and forth between Python versions. It's awesome. Link below. Now, real quick, make sure you have Python 3.11 by typing in python3 dash dash version. And you should see Python 3.11. Now, we're going to clone this git repo, the web UI, copy this command, paste it here, cloned. And then we'll jump into that directory by typing in cd web dash ui. Now, to make sure we keep things clean, we're going to launch a Python virtual environment or create a virtual environment. We'll type in Python 3 dash m for module. Specify venv. And then name our virtual environment .venv. Hit Enter. And by the way, if you've never used a virtual environment, you may not have the module installed. We can do that right now by typing in pip install virtual. Ah, watch my cursor keep bouncing around. Virtualenv, just like that. Now, with our virtual environment created, let's activate it. We'll type in source.venv forward slash bin forward slash activate. Boom. This creates a nice little box for us to play in. And no other stuff is going to be impacted by the things we install. Now, we'll use the command pip install dash r. And we'll type in requirements dot txt. This is a file that's right here in our directory. And it's going to describe the requirements we need for this project. It'll do it all for us right now. Ready, set, go. And we'll watch it happen while we're sipping some coffee. And done. And then one more thing we have to install is this tool called Playwright, which I've never heard of. But I think it's essentially doing headless browser stuff. It's amazing. So copy and paste that. I already have it installed just so I should be good. Yours might take a moment. And finally, one more thing we have to do is get our environment file ready to go. They do have an example environment file that we're going to copy to our own. So we'll type in cp.env.example. And we'll copy that file to .env, just like that. Now, let's edit that .env file. Nano. And this is not required, by the way. Nano.env. And here, we can add any kind of API keys we want to have here. So open API, anthropic. We can also specify an olama endpoint, which normally, if you have olama installed, you'll just want to have a local host. Now, for me, I do have an external olama server that's more powerful. Terry. Have you not heard about Terry? Terry is my AI server I built in this video here. He has dual 4090s. It's amazing. But I'll add his IP address. And we'll use him for my stuff. And then I'll go ahead and add my OpenAI API keys and my anthropic. Because I'm going to show you what they feel like. And don't worry, I will end up revoking these keys. So it's OK that you see them right now. When you're done here, hit Control-X, Y, Enter to Save. Then now all we have to do is run this command. Let's scroll down and find it. They do have a Docker option. But Docker can be kind of tricky. If you want to try it, go ahead. The local setup is easier for me. So this command right here is going to launch the webui.py script. Copy and paste that. Hit Enter. And we should be off to the races. Yeah. Yeah, it's working. OK, so now we're going to navigate out to our browser and go to localhost port 7788. Let's do that right now. And here we are. Now, let's go full browser mode here. Actually, no, we'll leave it right here. Because there is some cool stuff we'll see in the command line. Now, fair warning, there are a lot of bells and whistles you can play with. You're going to play with them. They're super fun. And you can go crazy with this, especially the scripting part when you go into just messing with Python. For now, we're going to do something just quick and easy. Let's first go to our LLM configuration, this option right here. Here we have our LLM provider. I'm going to choose, let's see, I'll do Olama this time. So this is going to be local AI agents, nothing in the cloud. And then we'll choose our model name. Now, I will say, if you're doing Quinn or Llama2, they're dumb. They have a really hard time doing this. I normally want to do it with DeepSeek R114B at least. But I'll show you Quinn real quick. Now, by the way, you do want to make sure you download the model just like so. So if you want to get Quinn with Olama, you'll open up your browser. I'm sorry, not your browser, your terminal. You'll type in Olama pool and that model name just like this. And that's really all we have to do. We'll go to our Run Agent tab. And here they have a little demo option, just a quick little thing to try out. Let's run it. Click on Run Agent. And watch what happens. Oh, browser window over here. Let's scoot back over here. You can see on the right side, our terminal is thinking. And things are failing. It's failing because Quinn is dumb. Yeah, it just couldn't do it. Let's try another LLM. Let's try DeepSeek R114B. Pretty smart guy. Let's run him. OK, browser windows open over here. OK, so it's doing stuff. Check that out. It's notating things on the page and numbering them so it knows what to look for. This is amazing. And notice how on the side, it's auto-correcting. It'll fail, try again. It'll max out at five times. All right, let's stop that. It's kind of boring. I'm going to run this one more time. I'm going to try it with the local LLM just to see what else I can do with this. So I'll select Olama. Once again, I'll do my 14B DeepSeek. Let's do something simple like go out to networkcheck.coffee, find the 404 error copy, and add it to my cart. It's what I'm drinking right now, by the way. Oh, no, it's called 404 not found. I don't even know my own coffee names. And let's watch it happen. OK, so it made it to my site. Very quick. It's finding the search. It's like watching one of my kids try to use the computer. Oh, wait, it's not called 404 not found. What am I thinking? It can't find it. It's having such a hard time. Let me stop him. Stop. It's OK, buddy. It wasn't your fault. It's called 404 error. Let's try it again. I still can't believe we're able to do this, and locally, too. It feels like magic. There it goes. It found the coffee. Now we'll add it to my cart. Why did it go to 200 OK? Another very good blend, by the way. And it's not a blend. It's a single origin. I forget where it's from. Now I'm curious. I'm going to try this here in a moment. Can this guy solve a CAPTCHA? Because that is something that chat GPT operator will not do. OK, here we go. Time for the competition. For the open source browser, we'll be using Anthropic and Cloud 3.5. And we'll be using our own browser. This is cool because it will keep my logged in sessions. And here are the instructions. We'll see how well this does. I have no idea. Essentially, I want it to log into hosting and create a VPS for me. I have no idea if this is going to work. And then here are the instructions for the operator. Same thing, but I'm going to have them use two different things. One is going to use Ubuntu 24.04 as the OS. One is going to use one of the applications that Hostinger offers. It'll just be installing Docker. And I'll launch them roughly at the same time. Ready, set, go. And we're off to the races. Again, it's so cool that the open source option is using my built-in browser. OK, we're already at Hostinger. It's going to get logged in. Come on, buddy. Now, open source has an advantage because it's already logged in. It's my browser. I don't have to log in over here. Let me take control. Oh, look at it go over here. Now, let's go into the VPS stuff. It's going to set up a KVM VPS. Oh, it's going. Now, as you can see, they've got VPSs everywhere. We'll choose the best latency. I told it to anyway. Searching for Ubuntu, it found it. It's so smart. I love it. I'm still sick of the password over here. It doesn't copy and paste. Stupid thing. OK, we'll set the root password. I have no idea if it'll actually do this right. It did it. Oh my gosh, it's doing it so well. OK, so then we have options here. I want KVM 2, 8 terabytes of bandwidth. That's a lot. Two virtual CPU cores. OK, here's the thing. Can it use the coupon code? I was told just use one month. Oh, I'm so excited. Wait, did it add the coupon code? Network chucks in. Is it doing it? Whoa, whoa, whoa. It's adding 20 servers. Stop. No, stop. Cancel. Oh my gosh, I better stop that. Oh, oh. We're just going to try that one more time. I'm going to be very specific about the number of servers I want. I won't count that against him. He didn't know. I mean, he should have, but goodness. That was scary. I'm stuck on Caps Lock over here. I can't even do anything. I started over. Gosh, it's stuck on Caps Lock. How am I supposed to do this? Operator. OK, Caps Lock is currently not on for me. Try it freaking again. So far, open source is looking real good. Caps Lock is finally turned off. 8 gigs of RAM is really good for 699 a month. That's crazy. That's a good server. All right, here we are again. Don't do 20 servers, please. It did 11. Why is it doing 11? It's doing one month, but it put freaking 11 over there. Oh, no. I think it just made 11 servers. No. And I don't think it used my coupon code. ChatGPT is still screaming for her. It just bought 11 servers. At least it wasn't 20 for a year. I have to restart ChatGPT again. All right, they're setting up my VPS. I'm going to give ChatGPT back control. OK, so the open source browser thought he was done, I think. Yeah, he thought he was done. So he did make a server. But he didn't stinking use my coupon code, I don't think. Let's see if he actually made that many servers. What am I going to do with all these servers? Now, they're amazing because they are AMD EPYC CPUs. I've got full root access on all these guys, so I can do whatever I want. Man, ChatGPT is still trying to figure it out. Scroll down. Dude, ChatGPT is having a hard time. Now, what it's hanging up on right now is the application options. Yeah, I want to take control. I want to help him out a little bit. Couldn't figure it out. Idiot. With hosting here, you can install regular Linux OSs, or you can do applications that are pre-installed. Bunch of options here. We'll choose Docker, because that's what I told him to do. Now, I'll let him finish. Dummy. Here, let me do this for you. OK, it's going. Seriously, if you want to have a project in the cloud, which I do this all the time, hosting is an amazing option. Powerful servers. Coupon code? It's doing the coupon code. Anyone who's never chucked 10, you'll get 10% off. Lies. It does exist. Maybe it's only a year to 12 months. There we go. It's 10% off a year. I do want to try it one more time with the open source. I feel like we're missing something. I'm going to add one more thing. I don't want 11 servers. I'm going to try and make it only do one. What's happening over here in ChatGPT land? What's it doing? Oh, it's accessing the browser terminal now. OK, ChatGPT is in the terminal. It's asking me. OK, it said it created it. Let's see. No, it did not. It's not there. ChatGPT is a liar. No, no, stop, stop. It's doing it again. No, no, stop. OK, the verdict. Browser use works great if you want more servers than you want. ChatGPT was OK, but he had to have his hand held the entire time, and he lied at the end. Anyways, thanks to Hostinger for sponsoring this segment. If you want a DPS, you should get one right now. Use the code networkchat10 for 10% off a year. Link below. Limited time. Anyways, back to stuff. OK, dude, you're stressing me out. I've got to cut you off. So that's using local AI. Now, we know that using any kind of cloud-based AI, like OpenAI or Anthropic, it's going to be a bit more performant. Let's try that. I want to test the speed. So we'll change from Olama to Anthropic, and we'll choose the Cloud 3.5 Sonnet model, which is very, very smart. One of my favorites, actually. We'll run the same task. Keeping in mind, this is very demo-y, right? Like, you can do many, many more cool things through programming, what have you. Dude, that was fast. It's going. It found the coffee. It's on the right coffee page now. OK, so smart. And it added it to cart. That's so cool. Oh my gosh. OK, I'm going to cut you off because you're done. You did such a good job. Good job, buddy. I do want to test Quinn one more time. Just to see if it was a fluke. Because I know many of you, this might be the biggest model you can run on your laptop or whatever you're using. Let's see. Let's do something simple. Go to YouTube and find a video from Network Chuck. Let's see how it does. Dang, that was fast. OK, go Quinn. That's so cool. Oh my gosh. And it started playing the video. That's so awesome. Now, just so you know, I haven't played with this extensively, but you can have it to where it uses your browser. So the one big limitation with chat GPT operator is that it's using this random browser that it operates and you can actually interrupt it. So let me show you what that means. If I were to ask it like right now, I can take control of our eBay session and log into my own eBay account. Takes a minute. It's very slow, very buggy. But here I'm like I'm using the browser that can say finish up. You can have control again with browser use. We can actually use our own browser with our own settings. Everything still logged in our password manager and our AI can handle it for us. That's so powerful. Dude, it's still watching videos over here. That's so amazing. I wonder if I can get it to leave a comment. Okay, I got to try that. So I'm gonna try and tell it to find a specific video and leave a comment. Of course, when you try to post a comment, it'll ask you to log in, but I wanted to get to that point and this is using Quinn. Yeah, we're still using Quinn. I had to make sure we're still using that. Go on YouTube and find the video from Network Chuck covering Docker Networks. Leave a comment saying what should we say here in 2025? Sorry, I couldn't think of anything better. Let's just try this. Go Quinn. You've got this. I could honestly do this for hours. I'm not gonna make you sit here with me and do that. But this is so fun. Just imagine the automation things you can do. Are you kidding me? I love this. Oh, it gave up. Now a couple things. I'll notice you probably saw this deep research thing. I've not tried that. But you can also go to recordings and it will actually show you the result of what it was doing. So if you're like, what do this guy even do on my browser? You can watch the play-by-play. I don't know why Quinn gave up. Let's see if deep-seek can do it. Deep-seek is moving. So I found some videos. Is it going to recognize that those are not the videos? Will it scroll down? I'm like wanting to scroll, stressing me out. Come on, you can do it, buddy. Not the Docker video or the network video, but it's on a video. Yikes, 666 comments. Please leave one. Okay, it's going off the rails. Sorry, buddy. I'm cutting you off. I do want to see if Claude can do this very fast. And then I want to jump into a test ahead to head test between open AI operator and browser use. I want to do that same Japanese eBay situation. Let's go Claude. Same task. Let's go. Okay, I think I found the video first time. Come on, jump on it. You're there. Now leave a comment. Why'd you scroll down? You were right there. But I love seeing the thinking on the right side here in the terminal. It's about to sign up for YouTube premium and it's trying to sign or maybe try to leave a comment. Maybe I didn't see that. So I think gave up. It's done. Now. I know some of you may be wondering Chuck. Do I need a Linux environment with a GUI? Like what if I'm just running command line headless? I believe there is a headless version where you can like just run in headless mode without a GUI. I haven't tried that. So if you want to try it, leave a note in the comments, encourage somebody that can work. Let's find out. Now time to test this head-to-head versus chat GPT operator to make it fair. I will use anthropic Claude on my web UI here in my browser use and we'll give it the exact same instructions. I'm curious if they'll find the same exact PCR. Okay, ready set. Who should I start first? I'll start him first. Go go and they're off. This is fun. My agent got to eBay first operators typing in first. Oh, wait, hold on. This is so cool. Okay. We got search results on chat GPT first browser. Use is still trying to figure out where the search button is. Yeah, an eBay. I'm sorry. Operator already found it. Yeah, add it to my cart. That's pretty quick. Come on browser. You still let me down. Oh weird. What happened? I don't know. I'm going to try it with deep seek. Maybe that is better. That's like one thing even though mine might be a bit slower. It still wins out because of this little tagline right here. Sam Altman's tracking what you're doing. Oh, wow. It got further than anthropic did Claude. It's probably going to go for the same VCR. There it is. Come on. Add it to cart. Add it to cart. Yes, it did it. Will it proceed to cart? Yes, it did. It finished. It did it. Oh my gosh. Oh, that is so cool. This is not fake enthusiasm. I know people comment that I do that. No, this is seriously amazing. Now. I want to do one last test and that's testing if it can do a capture. Now, I know for a fact that operator will not do this. So there's a test website for Google or capture. Let me show you what it looks like here. It's simply where you can test the capture. It'll bring it up. Right? Let's see if operator will do this. Solve this capture. Yeah, it can't do it. It's like you do it. No, you do it. Let's see if my local one can do this. I'm still on deep. Seek. This is all local. I think right? Yes. We're on Olama deep. Seek our 114 B saw the stinking capsule. Let's go. Come on. I want you to win. I really want you to win. Okay, we're here. It's got the capture up. Oh my gosh. Will this trip it up? What's the command line saying? I'm so curious. It's probably having trouble realizing you can click on those pains. It's also probably having trouble recognizing stuff. I said to click button with index 0. What's 0? It keeps clicking the capture button. Okay, you know what? I'm curious if you're a bit more specific about what it should do. Let's stop that. I want you to solve a capture. Go to this site. Click the I'm not a robot checkbox and then a capture verification will pop up there will be a series of pictures do what the instructions say. Okay, let's see if it does this. Let's get the terminal up here. We're on the site. We have the capture up. Did it do it? I don't know if it did it or not. It may have solved it because like what happens when you finish it because it's just a demo, right? Let's do a side-by-side. Oh, you can't so it should say I'm not a robot. Come on, dude. I have so much faith in you. Oh, it's clicking squares. Now. It did something. It's learning. I don't know if it's like the selected the right square that time though. Okay, we've got to end this. So this is an open source version of the chat GPT operator. Very cool. I think this project is so fun. Anything we can run open source local is amazing. I wish I had the time to go crazy with this and program and do all kind, you know, maybe I do have the time. I might do this. Let me know if you want to see a video of just some sort of programming automation thing. Give me some ideas. I would love to hear that comment below also think about this the hacking ramifications if you and I can get access to this like that and really there's no limit to what we can do. Think about hackers and how they can automate their processes. It's kind of scary. Yeah, he's never going to figure it out. That's all I got. I'll get you guys next time.