Visual web scraping using GPT-4o and Make
On the channel Yang, the new OpenAI model, GPT 4.0, also known as GPT 4.0 Omni, was discussed. This model brings significant advancements, enabling reasoning across different modalities such as audio, vision, and text. Moreover, the new version is faster and at half the price of the previous GPT 4. In today’s video, Yang outlines how viewers can utilize the GPT 4.0 model for vision-based web scraping using make.com. This tutorial will guide you step-by-step through a process that is not only simple but incredibly powerful for AI automations in business and for clients.
As part of this tutorial, the author intends to take a screenshot of a selected webpage and subsequently use this screenshot to extract necessary data with GPT 4.0. Prior to this, Yang explains why image-based scraping is superior to traditional web scraping. The key issue with traditional HTML scraping is its low user-friendliness, which requires understanding of HTML and CSS. Additionally, updates to websites can cause scraping scripts to break, which is a significant problem.
In contrast, vision-based web scraping, while more costly, offers increased flexibility by eliminating many complications associated with traditional methods. Yang demonstrates step-by-step how to take a screenshot from specific sites like CoinMarketCap using Dumpling AI, a tool that he operates. While traditional screenshots can be more time-consuming and complicated, the new GPT 4.0 model can greatly simplify this process.
Yang shows how straightforward it can be to send a screenshot to GPT 4.0 to retrieve specific data, such as the prices of Bitcoin and Ethereum as well as the fear and greed index. A crucial aspect of this process is the use of OpenAI’s API, which can be somewhat complex, but Yang offers tips on how to construct a query in the appropriate JSON format. This helps enhance the accuracy and reliability of data extraction.
In conclusion, Yang summarizes the tutorial, emphasizing that this is more of a building block for a larger project rather than a standalone application. It proves useful in situations where a project requires regular updates and for data that may be encoded in non-text formats. Video statistics on Yang's channel indicate that this material has garnered 57,465 views and 1,187 likes at the time of writing this article. He encourages viewers to ask questions and subscribe for more similar tutorials in the future.
Toggle timeline summary
-
Introduction of OpenAI's new model GPT 4.0, highlighting its advancements in reasoning across audio, vision, and text.
-
Overview of using GPT 4.0 for vision-based web scraping in make.com.
-
Explaining the process of taking a screenshot of a website for scraping.
-
Discussion on the limitations of traditional web scraping methods.
-
Introduction to vision-based scraping as a solution to traditional scraping challenges.
-
Instructions on taking a screenshot of the target page using Dumpling AI.
-
Defining specific data points to scrape like Bitcoin and Ethereum prices.
-
Using the chat GPT module in make.com to extract data from screenshots.
-
Construction of JSON format requests in OpenAI's playground.
-
Demonstrating how to retrieve and verify the scraped values.
-
Summary of vision-based scraping and its applications in different data extraction scenarios.
-
Closing remarks inviting viewer interaction and comments.
Transcription
So OpenAI recently came out with a new model called GPT 4.0 or GPT 4.0 Omni. This model is a huge step forward in that it can reason across different modalities, so audio, vision and text. And best of all, it's a lot faster and it's also half the price of GPT 4. So in today's video, we're going to go through how you can use the new GPT 4.0 model to do vision-based web scraping in make.com. This should be a pretty basic and quick tutorial, but it's also super powerful. You can use this as a building block in your AI automations for your own business or for your customers. So at a high level, what we're going to do is we're going to find a website that we want to scrape and we're going to take a screenshot of it. And then we're going to pass that screenshot into GPT 4.0 and get it to extract the data that we need. Now, before we do that, you might be wondering to yourself, why would we even want to do this? What's wrong with sort of normal web scraping, where we just get the HTML, look for CSS selectors and pull out the data manually, or even markdown-based web scraping, where you sort of use a tool like Dumpling AI, get the markdown and then use LLM to extract it. So the problem with the first one, which is just getting the HTML and using CSS selectors, is it's one, not very easy and user-friendly. You have to sort of be somewhat familiar with HTML and CSS in order to do it. And two is, as websites get updated over time, your scraping will break, right? So you might, for example, scrape a certain value based on a CSS selector here. So this is not the best example, but for example, you're scraping based on class names, so something like this, SC90s, blah, blah, blah, blah, blah. And then if the website gets updated so that that class changes, your scraping will break. The second approach is a bit more robust, which is something like using Dumpling AI to get the markdown from a page and then getting a large language model, like GPT-3, 3.5, GPT-4 to extract the data. That's a bit more robust. It won't break from CSS changes or just design changes on the website, but there are cases where the data you want to scrape might not be in text format. It might not be in the HTML, so you can't actually convert it to markdown. It might be in an image or in a screenshot or something like that. And the other thing is it does take a bit more work, right? Sometimes you sort of have to fiddle around with the markdown a little bit. So vision-based scraping solves all of those. The downside with vision-based scraping is it's a little bit more expensive, but we can sort of see the trend. GPT-4.0 is half the price of GPT-4. And we've got models like Anthropic Claude 3 Haiku, which is also really cheap and has a vision input option. So we can see that the trend is going downwards. So I'm just going to show you today how you can use these new models that accept image input to do web scraping. So it all starts with getting a screenshot of the page. Now, you can't really do this natively in make.com. So you're going to need to use a third-party provider. For me, I'm going to use Dumpling AI, which is the product that I run. We do have a screenshot option. So I'm just going to drag in Dumpling AI, scrape URL. I'm going to do coin marker cap. You could probably do this with markdown-based scraping, but just for the purpose of this tutorial, I'm going to use a screenshot and do image-based or vision-based web scraping. So I'm going to put in this URL a set of markdown. So usually what would happen is Dumpling AI will go to coin marker cap, strip out all the useless HTML and things you don't need, and return to you like very nicely formatted markdown. In this case, we're not even going to do that. We're just going to do a screenshot. And we're going to run that. And what this is going to do is Dumpling AI is going to go to coin marker cap and take a screenshot. So whilst it does that, let's talk about what data we actually want to scrape, right? So as an example, maybe we just want to scrape the price of Bitcoin, Ethereum, right? So maybe we want to do, what is the price of Bitcoin? What's the price of Ethereum? What is the market cap? And all that sort of stuff. And potentially we want to scrape the fear and greed index as well, just to make it a bit more interesting, right? So let's do fear and greed index, Bitcoin and Ethereum. Let's do that. So let's just quickly see if this has worked, right? So we've got this screenshot URL now. So let me open that. And it's very annoying to copy URLs in make.com, but there we go. It's taken a screenshot of the page and you can see it's got Bitcoin, Ethereum, all that sort of stuff, and the fear and greed index here. So in order to scrape this with the GPT 4.40, GPT 4 Omni, you have to actually use the chat GPT module in make.com. So right now you can't use the make a completion option. And the reason for that is make.com hasn't actually updated their chat GPT module yet to support GPT 4.0. So you can choose GPT 4.0 here. But if you look in the messages, you can't actually select the message content to be an image, which is why you can't use this. So what you actually need to do is use the more basic make an API request option or make an API call option in the chat GPT module. Now, this might be a bit intimidating and a bit confusing. So I'm going to show you a quick hack on how you can use this very easily. So what you're going to want to do is go to the playground in OpenAI. And here in the chat playground, you can actually construct what you want and then export it as a JSON, which you can then paste into make.com. So let's do that, right? So maybe I'll say you are a data gatherer. The user will provide you with a screenshot of crypto data. You will respond with Bitcoin price, Ethereum price, and what was the last thing we wanted? The fear and greed index. And I'm going to ask it to respond in JSON format, respond in the following JSON format. And what we're going to do here is do so Bitcoin price for one. So let's do a float. So a float, for those of you who don't know, is like an integer. It's like a number. It's like an integer, but there's decimal places. It's a simple way to explain it. We're going to do the same for Ethereum. And we're going to do the same for, what was it called? The fear and greed index. So this doesn't actually need to be a float. This can be a integer. Cool. And then what we need to do now is pass in the image. So OpenAI has an amazing feature where you can just actually link to the image. So you don't actually have to convert the image into a data format like you have to do with Anthropic. So you can just paste in the image and let's just run this and see if it even works. So I've given it a system prompt and it's just responded with the data. So let's see if this is right. So 0.50 for Bitcoin, 0.50, this is 52, 52, and Ethereum 2902.93, 2902.93. So it's extremely accurate and it's worked really well. So now how do we do this in make.com? So what I'm going to do is I'm going to delete the assistant response because I don't want to copy it. I only want to copy the prompt and then me putting in the image URL. So I'm going to go here, this in the top right corner, and there's this JSON object, JSON option here. And what I'm going to do is I'm going to copy all this JSON and go back into make.com. And again, this is the make an API call option. And I'm going to go to the body and just paste this in. So this is the full setting for GPT 4.0. And you can see here the URL. This is the URL for the screenshot that I pasted into the playground. But in reality, we don't want that to be hard-coded. We only want that to be a variable. So we want that to be the screenshot URL we get back from dumping AI. So I want to put the screenshot URL here in as a variable. Cool. And then the final piece we need to do here is set the URL and the method. So if you go back to the playground, you'll see here it says post v1 chat completion. So we're just going to copy that v1 chat completions. So the URL will be v1 chat completions and the method will be post. Awesome. So once we've done all that, let's press OK. And then let's run everything through from the beginning. Actually, before we do that, so we're getting an output here in JSON format, right? So if we want to use it in make.com, we actually need to pass the JSON. So let's just do that just for completeness. What I'm going to do now is something like this. I'm going to add a data structure. There's a nice generate option here. I'm going to generate and see if that works. Did not work. So let's do one, two, three. And this one, it's 50. It's clean this up a bit. Say. All right. Nope. Sometimes it might just be easier to input this manually, right? Rather than trying to get the generator to work. It's quite temperamental. Okay, you know what? Let's just do it manually. So we've got Bitcoin price and that is a number. We have, I think it was Ethereum price. This is also a number. And then we have the agreed index value. Which is also a number. So hopefully we've done that correctly. I'm going to put in the JSON string as what OpenAI comes back with. But we don't have that here yet because we haven't run this module yet. So how about I'm going to put this in first as a placeholder. So this is going to fail, but let's just run through first. So we get all the values. So let's run everything. It's going to go to CoinMarketCap. Take a screenshot. Run GPT-4 Omni, GPT-4 O. Which will extract the data from the screenshot. And then we're going to pause it with this JSON blob. It's taking the screenshot. Let's give it some time. And then GPT-4 O, GPT-4 Omni is now running. We've got the output. Let's have a look. So body got our choices and the output is here. So we've got the Bitcoin price, Ethereum price, and the fiat grade index value. So you can see it's changed a bit because, you know, the CoinMarketCap is real time. So the price is always changing. Which is why the values here are different to when we ran it a few minutes ago in OpenAI Playground. Then finally here, this failed as we expected because we're not passing in the actual JSON. But now we have these options. So we can just do this. There we go. And now if I run it, it will all work. But not much point running it again since we've already run it. So yeah, that's basically how you do vision-based scraping with the new GPT-4 Omni model. This is more of a building block rather than a project that you would sell to your customers or a project you would build yourself. This is just one of the building blocks for a larger project. So as an example, maybe one of your customers or you need to scrape a website where the design of the website regularly changes. Or maybe some of the data that you want to scrape is not actually in text format. It's like in an image or something like that. Using this sort of approach would work for something like that. Another common approach I see is maybe not even web scraping, but maybe you have invoices or that sort of thing where you need to extract data from a document. You can pass screenshots of those documents, PDFs into GPT-4-0 and get it to extract information in a similar way to what I've done here. But yeah, keen to hear what you would build with a building block of vision-based scraping. But that's it for this video. If you have any questions, thoughts, feel free to leave a comment below. I try my best to get back to everyone. If you like this tutorial and want to see more like it, please like and subscribe. It really does help me out a lot.