Can AI agents kill web browsers? (film, 15m)
The channel All Things Open presented a fascinating talk on the future of the web. The speaker, who used to live in the area, reflected on his career path that included working with Mozilla, Microsoft, and other innovative projects in the web technology space. He disclosed that he is currently working on AgentQL, a piece of AI technology that transforms any website into an API for automation and data extraction. While the presentation touched on these innovations, the speaker focused on the issues surrounding the current state of the web. He pointed out the phenomenon of 'shittification,' which describes the lifecycle of web platforms, their increasing monetization, and lack of interoperability. As a result, users become captives of social platforms and closed systems, leading to chaos and frustration. New solutions, like agents, may assist in combating this issue.
Toggle timeline summary
-
Introduction and thanks to the staff.
-
Speaker reconnects with the audience.
-
Discussion on the future of the web, introducing the concept of agents.
-
Speaker's background in web development and AI technology.
-
Speaker reminisces about early web experiences.
-
Overview of the friendly early internet.
-
Introduction of the concept of platform monopolies.
-
Explanation of 'inshittification' by Cory Doctorow.
-
Critique of platforms for lacking interoperability.
-
Speaker describes workflow challenges in a multi-platform environment.
-
Questioning the reliance on ad-based browsers.
-
Proposal for turning web pages into API endpoints.
-
Introduction of existing technologies for creating agents.
-
Emerging automation tools and their implications.
-
Discussion on the future capabilities of personal agents.
-
Explanation of the Model Context Protocol (MCP) for AI.
-
Exploration of OAuth for agent authentication.
-
Overview of the rebirth of RSS as AT Proto.
-
Importance of LLMs in content curation.
-
Concluding thoughts on a future where platforms and personal agents evolve.
-
Final reflections on the value and accessibility of information.
Transcription
There we go. Wow, great changeover. Many thanks to the Carolina, the staff here at the theater for making this possible. I've not had such a smooth plug-in before. So hi there. I'm curious. I used to live here eight years ago. How many of you actually know me? Put your little paw in the air. Okay, great, awesome. It's good to reconnect with you. I've been gone for a while. I ended up in Europe. I'm here to talk with you about the future of the web. Spoiler alert, it's agents. But let's see. I promise I'm going to tell you the deets. All right. So I've worked with Mozilla on Firefox, DevTools, W3C on Web Standards, Microsoft's Edge browsers, and even on the React team. So I really love the web. Full disclosure, I now work on AgentQL, which is a little piece of AI technology that turns any website into like an API surface for automation and data extraction. Useful for automations and for for agents, but I'm not going to talk to you about a product today. I want to talk with you about the web, the web that I loved growing up. I made web comics in my teens. True story. I was a cartoonist. Didn't get to do computer science at school. Kind of regret that. And this is my first website on Drupal, an open source and wonderful CMS. It was a website for other teenagers making comics, and that's how I got started in development and engineering. I loved this CWeb. It changed my life. It opened the doors to the future for me. I always forget which one's forward and which one's backward. Bear with me. The internet. It started as a network of computers you could use to keep in touch with family, make new friends, chat, play games, share photos. I remember it as a friendly web, not a disorganized or violent web. But then, well, except for some of those chat rooms, but then something happened. Platforms happened. Facebook, Instagram, Amazon. People moved off their blogs and forums and onto platforms where they were held hostage. The inshittification happened. Now, this term, very scientific, coined by Cory Doctorow, describes the life cycle of platforms from Amazon to TikTok, and let me break them down for you right here. So, phase one. Acquire the most users by directing surpluses to them until they're locked in. You spend money. Give them a bunch of free videos. Give them a bunch of free access. Come on, everybody, join the party. Then, attract suppliers by directing the surpluses to them. We're going to have a creator's program. We're going to create words, blah, blah, blah. And then, you've got both the suppliers and the consumers in one place. You lock them in. You start diverting those surpluses over to the shareholders. So, platforms, they promise ease of use. They promise great content. But it turned out that we went from free and open to gated and monetized. To keep users captive, they purposely lack interoperability. Holding content hostage and people hostage with pricier, no APIs, and obfuscating data, hiding it behind authentication. So, yeah, now we all live in a bunch of gardens. So, this means that to get anything done, this is actually one of my workflows, which has gotten worse and worse over the years, we have to flip between a bajillion different platforms. This is me just trying to keep up with web agent news from the agentic community. A lot of copy and paste and data entry. And if we want an aggregated experience, a one-stop shop, either we have to get other users and suppliers onto the platform we're using, or we have to pay yet another platform, like Buffer, to give us that aggregated experience. The inshittification runs deep. The internet is a vast sea of unstructured content with spotty and clunky interop at best. So, why are we still using a browser built by an advertising company to navigate it? Wow, I didn't expect that one to get a cheer, but I guess it's the right audience. Don't we need something more intelligent? Okay, I told you it was spoilers, agents. All right, what if we could turn any web page into an API endpoint, truly free that information? What if we had a way to navigate, aggregate, summarize, and interact with all that unstructured content without a platform, without even necessarily opening a browser? Our agents, they could visit any site, no ads, no trackers, and interact on our behalf, bring us just what we need to see, curate using our own personal algorithms as opposed to ones dictated to us by a platform owned by who knows who. Today, we actually have this technology. I'm not painting some far-off picture. I'm telling you, you can literally go home and spend some time and build something like this, and I hope you will. We've already built the technologies that allow agents to do these things, and most of it comes from tools built for scraping and testing. Anybody here do scraping? No shame, no shame. There you go. You're freeing the information. Anybody here engage in testing? Sweet, okay, good company. This is not the most glamorous work on the internet, but it is perhaps the most difficult. These machines break down a lot, and they're a pain in the butt. Scraping tools used to be imperative and rely on like fragile DOM selectors and XPath, but there's this new class of data extractors, one of which I happen to work with. They are declarative, and they allow LLMs to evaluate an entire page to find the content or the items on the page that need to be interacted with. Wow, amazing. Tools like Puppeteer and Selenium WebDriver, they've been used for testing and automation for years, and now these smart selector tools I just showed you, you combine them with automation libraries like Playwright, and suddenly you've got something that can interact with web pages via prompts. Now sites, of course, have been implementing anti-bot measures in a cat-and-mouse game with scrapers for a long time, and to fend off well, bots, etc. But these scrapers have also, up to their game, creating things like IP rotation, proxy networks, credential sharing, CAPTCHA solvers, services like these, they exist for all of these. We've got CAPT solvers, smart proxy, and my particular favorite, Xenros, which slices through, oh, I didn't get to Xenros, Xenros, which is my favorite because it slices through most defenses like a knife through butter. All right, these tools, they're already coming together to create on-browser automations. This is one from Opera. This is Opera's answer to Operator. I expect we're going to see browsers attempting to extend into agentic use cases and automations using this sort of technology, but I feel like that's kind of their death throes. They're gonna try to extend it, but probably we're gonna end up with something very different. Okay. Platforms can't hold us hostage if we send our agents to fetch our data, or if we directly share it with each other. Hmm, let's take a look at what's happening next. We know where we are now, but what's coming along the line to help bring about this future? Okay. Come on little clicker, you can do it. How exactly are our agents going to access content from people, data, and services? Well, it's a it's a solved problem. Who hasn't, have you heard of MCP? Raise your little paws. Yeah, everyone's talking about MCP. For those of you who don't know what MCP is, it stands for Model Context Protocol. It is an open standard for AI assistants to communicate with data sources, and if you're wondering what's the enterprise story, well, you should talk to our friends from IBM. They're working on something I believe it's called ACP, which is an extension to this. That includes a little bit more for the enterprise side of things. Now, I think sites will be incentivized to make MCP servers available versus APIs because of the SEO effect. MCPs are a way of making your site's services and content available for ingestion by LLMs or people's personal agents, that is LLMs with tools. If you remember, SEO took off because everyone wanted to rank high in Google. Well, if people are using personal agents, people who are providing things are going to invest in making their content available via MCP to be ingested by those personal agents. Ba-boom. Are you worried about authentication? Maybe your agent won't be able to authenticate in and act on your behalf. Well, I have a completely boring example for you. It's just OAuth. Granular permissions and scopes in OAuth are perfect for agent access patterns. Technology exists today. Adopt it. Now, when it comes to subscribing to your friends, maybe you remember RSS, the format for sharing content as web feeds. RSS use declined as social media platforms came to power. We all remember the death of Google Reader in 2013. Maybe we don't, but anyway, I think that was when RSS was declared archaic. But it's being reborn as the Authenticated Transfer Protocol, aka AT Proto, a decentralized protocol for large-scale social web applications currently in use by Blue Sky. All right, so you don't have to all be on Twitter or on Facebook now to access each other's social feeds. All right, somebody says, well, you've just described an endless scrolling hellscape of content from my friends. I actually did want Mark Zuckerberg's algorithm to just show me what I wanted to see. I understand. Information overload is a challenge in our modern environment. Fortunately, as my friend Lori Voss says, LLMs, and this is Lori Voss from Llama Index, LLMs are good at transforming text into less text. Well, that's great because we have too much text right now. On-device LLMs are going to become standard. Chrome's already working on some APIs and web standards that do rely on on-device LLMs. And I think we're going to see more and more of them cropping up as they become smaller and less energy-sucking to run. Okay, now somebody, and this should be the question you're starting with, was like, what about accessibility and design? How can we create a web experience that's even better for more people? Well, we got you there. Come on, baby. And ambient agents will invisibly adapt to our needs. Here's an example for you. Imagine you receive a high-priority piece of content breaking news while walking, and the agent repurposes this content into an audio file, a podcast, to play it in your ear. So you don't have to look down at your phone and get hit by a bus. This is an example of an ambient US that's responding to your example. This goes both ways, though. Converting a podcast to an article you can read when you're disconnected on the subway. But think about this as well. It also means that all content can be accessible by default, reformatted into a format that a person can actually perceive. No more guilt and shame journeys for forgetting the why ARIA tags. Content will just naturally flow into whatever it is that user needs, customized based on their abilities and desires. Yes. Then there's adaptive UX. UI that can be generated on the fly. We already see this with projects like V0 from Vercel or Bolt.new. This example is Elkar from Star Trek. It adapts to whatever information the person requests or needs to see on the fly. This means we don't have to create every single piece of UI and anticipate all the edge cases. We're gonna let the agent figure out how to display the content as we need it. Spin up a player, spin up a gallery, whatever needs to be seen at that time. Okay. Ten minutes is really hard. I'm sorry. I'm running a little over. What do I think is gonna happen next? Well, I suspect that platforms, browsers, and app stores are not the vehicles of the future. We're not developing a killer app. I suspect something like a personal agent is going to overtake them all and it will be a unified platform, you know, the one platform to rule them all. It's something that many companies have fought over and the web has been a part of that fight for a long time. And I think now I see a great unifying force on the horizon. Why expose yourselves to a hundred apps doing God knows what with your information when you can use just one completely under your control with an algorithm that's adapted to you? I see in the future a return to personal feeds, a more accessible interface for more people with more modalities, and a smaller less platformized cozier web. Now, we've all heard information wants to be free and that's what I grew up with, but it's actually a very small paraphrasing of a much larger quote that's very important. This is Stuart Brand, the author of the Whole Earth Catalog. I remember my mom had one of those. I don't know if you've ever heard of it before, but it was pretty popular in the 1970s, I hear. Information wants to be expensive because it is so valuable. Literally all the platforms on the web right now are data brokers, your data usually, and the right information in the right place can just change your life. On the other hand, information almost wants to be free because the cost of getting it is getting lower and lower all of the time. Anyone can access information of the internet now. There's no way to stop us. AI has blown the lid off of data monopolies. So it's available to us, not just to big tech, not just to people who understand how to scrape data. The real question I have for you is what are you going to build with it? Thank you. You