Recaptcha as a Tool for Surveillance - How Google Tracks Users? (film, 18m)
The video on CHUPPL explores the intriguing topic of the infamous tool known as reCAPTCHA, which aims to protect users from bots. The host prompts viewers to think about how it can recognize human activity without traditional methods like reading distorted words or selecting images. More so, the discussion raises the question of how reCAPTCHA has transformed into a mechanism for mass surveillance and corporate greed. With its presence on over 12.5 million websites, many of us encounter it daily, often unaware of its darker implications.
As the investigation unfolds, the host uncovers the myriad of side effects contributing to the surveillance of U.S. citizens. Controversial topics arise, including secret FISA orders and the links between reCAPTCHA and the NSA. Interestingly, reCAPTCHA is not very effective at stopping bots, yet it collects users' mouse movements and interactions, enabling Google to gather extensive information about their online activities. This phenomenon prompts questions regarding our identity in the eyes of corporations and the potential consequences of using such ubiquitous tools.
Analyzing the vast amounts of data accumulated through interactions with reCAPTCHA reveals a troubling reality where users are increasingly burdened by performing unpaid work for Google. Shockingly, statistics indicate that users have spent over 819 million hours solving CAPTCHAs, resulting in $6.1 billion worth of unpaid labor. Unfortunately, despite such findings, a class-action lawsuit against these practices was dropped due to the relatively short time commitment required from users.
The host also emphasizes that, even though Google asserts that data collected through reCAPTCHA is not utilized for personalized advertising, many findings suggest otherwise. A closer examination of Google's terms of service shows ambiguous phrases that could allow the company to continue collecting data for undefined purposes. Coupled with surveillance practices and governmental oversight, all of this raises significant privacy concerns and the potential for abuses by various entities.
Lastly, it’s worth noting that the CHUPPL video currently stands at 228,314 views and 17,000 likes, indicating substantial interest in the topic. Viewers are not just curious but also concerned about the implications of tools they engage with daily. This video encourages reflection on how corporate and governmental entities may observe us and what the future holds for our privacy.
Toggle timeline summary
-
Introduction to the question about how reCAPTCHA operates.
-
Explanation of reCAPTCHA's presence on the web.
-
Questioning reCAPTCHA's effectiveness at stopping bots.
-
Revelation that reCAPTCHA's narrative is misleading.
-
Investigation into mouse movement tracking by reCAPTCHA.
-
Discussion on the tool's controversy, linking it to corporate greed.
-
Query about a robot's incapacity to tick the box.
-
Challenge with the antitrust trial context.
-
Detailing how mouse movements are analyzed.
-
Assertion that reCAPTCHA has shifted to spyware.
-
The complexity of data collection through reCAPTCHA.
-
Mention of secretive legal frameworks surrounding surveillance.
-
Questioning Google's data collection practices.
-
Describing reCAPTCHA's background functionality.
-
Direct warning about reCAPTCHA's surveillance nature.
-
Insight into data brokers and their practices.
-
Discussion of a class action lawsuit against Google.
-
Analysis of Google's ambiguous data use terms.
-
Explaining the implications of government data requests.
-
Final thoughts on reCAPTCHA's implications beyond bots.
-
Concerns about the future of data privacy.
Transcription
How does this checkbox know that I'm not a robot? I didn't click any motorcycles, traffic lights, I didn't even type in distorted words, and yet it knew. This infamous tech is called reCAPTCHA, and when it comes to reach, few tools rival its presence across the web. It's on 12 and a half million websites, quietly sitting on pages that you visit every day. And it's actually not very good at stopping bots. Which of course led me to the question, if it's not stopping bots, what is it doing? This simple question sent us down a rabbit hole that was deeper and more complicated than we ever imagined. The box test isn't really about the box. A single checkbox. It turns out reCAPTCHA isn't what we think it is. And the public narrative around reCAPTCHA is an impossibly small sliver of the truth. And by accepting that sliver as the full truth, we've all been misled. Facts, your mouse movement. For months, we followed the data. We examined glossed over research, and uncovered evidence that most people don't know exists. This isn't the story of an inconsequential box. It's the story of a seemingly innocent tool, and how it became a gateway for corporate greed and mass surveillance. We found buried lawsuits, whispers of the NSA, and echoes of Edward Snowden. This is the story of the future of the internet, and who's trying to control it. Why can't a robot work out how to tick a box marked, I'm not a robot? In the largest antitrust trial in the United States in 25 years. Let's start here. What you've been told is wrong. Journalists told you such a small sliver of the truth, that I would consider it to be deceptive. Why can't a robot just click, I'm not a robot? The box test isn't really about the box. It actually tracks your mouse movements right before you check the box. See, if you run code to make an object move to a certain point, like a cursor, the simplest version will make it move in a straight line. Robots move like this. But humans naturally aren't that accurate. We don't move in a perfectly straight line. But humans move kind of like this. Humans tend to move their mice in wiggly, imperfect ways. And that is what this version of CAPTCHA was looking for. Okay. Mouse macro software. Find image. Look for the checkbox. Move mouse to checkbox. And make it click. Play. I mean, it's so easy. It's so easy to test this. Okay, you're probably thinking, why does any of this matter? I agree with you. I did agree with you. I actually halted this investigation for a few weeks because I thought it was quite boring. Until I went to renew my passport. Passportstatus.state.gov I got a CAPTCHA. Not a checkbox. Not fire hydrants. But the old one. And I clicked it. And it took me here. ReCAPTCHA seems to have become a spyware. It might be also that its primary purpose is doxing of US residents and spying on everyone else. I don't know what led me to do it. It felt like my mouse was moving itself. An entire page dedicated to documenting the horrors of ReCAPTCHA. Alleging national security implications for the US and foreign governments. Its ability to dox users. Mentioning secret FISA orders. The same type of orders that Edward Snowden risked his life to warn us about. It is one of the most secretive places in the world. Who put this together? Anonymous. If you're a webnative journalist looking to get in touch, we doubt you're going to have a hard time figuring out who we are anyway. This felt like a key. Left in plain sight, whispering, There's a door nearby, and it's meant to be open. This is what we're good at. This is what we do. This is what we do. This is what we do. This is what we do. This is what we do. This is what we do. This is what we do. This is what we do. There's a door nearby, and it's meant to be open. This is what we're good at. This is what we do. It felt like someone on the other side already knew that. As if they'd been waiting for someone to come along and notice. Okay, let's get this out of the way. Recaptcha is not, and really has never been, very good at stopping bots. Recaptcha was founded in part by this guy in 2007. Version 1 looked like this. In 2009, Google bought Recaptcha. But in 2012, hackers were able to get bots through with a 99.1% success rate. V2 dropped in 2014. It was the first time we saw the infamous checkbox. This is also when it allegedly became spyware. In 2017, version 2 was cracked with an 85% success rate. The code to do so was made public, and still works to this day. 2018 was the launch of V3. According to researchers at UC Irvine, there's practically no difference between V2 and V3. A few months after the launch of V3, it was beaten with a 97% success rate. Google doesn't tell us how Recaptcha works. Besides using an advanced risk analysis engine. But these hackers, they spelled it out. Do you believe that Recaptcha should be thrown out? Yeah, I think it's time to deprecate this technology. This is Dr. Andrew Searles, and he's the lead author of this study. And so I essentially developed a mirror of Recaptcha that would make you have to solve multiple in a row and would tell you that you're wrong regardless of what you said. And showed some of the data that you can actually scrape. You can scrape so much data from somebody's user interaction from a website. Whoa, this has been my big question. What data is Google collecting? They can collect all sorts of data, right? Any information, any keystroke, any click, IP address, user agent string, all the websites, all the browsing history, cookie information. Recaptcha takes a pixel-by-pixel fingerprint of your browser. A real-time map of everything you do on the internet. I think it's like 10 million websites employ this technology. They essentially get access to any user interaction on that webpage. Are you saying that Google can see anything that anyone is doing on any website that has Recaptcha embedded in it? There's a very good chance. They have that capability, is what I will say. Recaptcha doesn't need to be good at stopping bots because it knows who you are. The new Recaptcha runs in the background, is invisible, and only shows challenges to bots or suspicious users. If there's any part of this video you should listen to, it's this. Stop making dinner, stop scrolling on your phone, and please listen. When I tell you that Recaptcha is watching you, I'm not saying that in some abstract, metaphorical way. Right now, Recaptcha is watching you. It knows that you're watching me, and it doesn't want you to know. It's not just Google. Now there is this whole sketchy market about brokers, advertising brokers that like to purchase large amounts of people's information. These brokers have information on probably all of us. They get this data from a variety of sources. Government records, like your birth certificate, your voter registration, court documents, or social media. The posts you've made, the posts you've liked, the quizzes you've taken, the websites you've visited. They package all of this together, and they sell it. They're not just selling this to advertisers. Many are selling it to whoever will pay for it. So I use a tool called Delete.me. They're also graciously sponsoring this portion of today's video. Delete.me goes through the strenuous steps of getting your data removed from hundreds of these data broker websites. Just in my first month, Delete.me searched 3,500 listings for my personal information, and they found data on me on 55 different data broker websites. And then they got it removed on my behalf. My favorite thing about this is that it's not a one-time deal. Delete.me routinely checks different data broker websites to see if they post anything new on me, and then they get it removed. I don't have to do a thing. Because they've sponsored this portion of today's video, Delete.me is offering a 20% discount to you if you use the link down below. It's joindelete.me.com.chuple20 or you can use code CHUPLE20 at checkout. So if you don't like the idea of corporations selling your personally identifiable information for financial gain, and you see the potential risks of the data economy, use that link. You'll get 20% off, and you'll get your first report back in seven days. Thank you to Delete.me. I'm seriously, genuinely, really impressed with what you all do. In January of 2015, a class action lawsuit was filed against Google. The allegations claimed that every time someone solved a reCAPTCHA challenge, they weren't just proving that they were human, but they were performing unpaid labor for Google without their knowledge. Look at this. Typing morning overlooks would get you a passing grade, obviously, but so would adorning overlooks, and pouring overlooks, even egg overlooks, horse overlooks, blue man group overlooks. All of them will get you a passing grade and confirm you're a human. reCAPTCHA is testing you on this word alone. It has no idea what this one is. You, on the other hand, do. And when you submit the word morning, you are fulfilling a contractual obligation that Google has with one of its clients, like the New York Times. And when you do this, you are helping to train one of Google's AI data sets. As one researcher says, you are doing unpaid labor that directly benefits the world's most powerful surveillance corporation. And even though the judge acknowledged this to be true, the class action lawsuit was dropped. In large part because the time involved for each user is extremely small. As a part of Dr. Searle's study, they estimated that users have spent more than 819 million hours taking these tests, which results in $6.1 billion of unpaid labor. Maybe we could find a way to turn a blind eye to all of this, as long as the tests served their purpose, but that doesn't seem to be the case. Google has said that they don't use the data collected from reCAPTCHA for targeted advertising, which actually scares me a bit more. If not for targeted ads, which is their whole business model, why is Google acting like an intelligence agency? So I dove into Google's 32,000 word terms of service. When you're writing a legal document, syntax matters. They only collect data as necessary. But because of this comma, the words as necessary only apply here. Not here. The word improve, general security purposes, those are keywords where you're like, okay, I see you clearly say it's not being used for personalized advertising. Congratulations, great. What does improve mean in that context? What does that security purposes entail? That's where Google is the art of saying nothing. This is Zach. He's a privacy watchdog, but he's also the co-creator of this website, a central hub dedicated to documenting the US's antitrust case against Google. Google has allowed themselves this general security purposes allowance to use the decisions and the data it collects for anything that they deem as security related. In 2015, Google failed an audit by the United Kingdom's ICO, saying that Google is too vague when describing how it uses personal data gathered from its services and products. Is sharing to the government considered security related? It probably could be defined that way by a creative lawyer. Check this out. If you want to submit a tip to the FBI, you're met with this notice acknowledging your right to anonymity. But even though the State Department doesn't use reCAPTCHA, the FBI and the NSA do. The federal court of 11 judges with the power to allow government to conduct electronic surveillance on you. And if they want to know who submitted the anonymous report, Google has to tell them. But the Federal Intelligence Surveillance Court is anything but public. It's here somewhere inside this sprawling federal court complex. This is a court so secret, we don't even know exactly where it convenes inside the building. These companies, when they engage in a clandestine relationship with the government, as soon as they begin cooperating in part, they will ultimately end up cooperating in full. Look at this. I realize most people aren't submitting anonymous tips to the FBI. But listen. Earlier this month, 404 Media got its hands on internal emails from the Secret Service. They confirmed that the intelligence agency used a technology called LocateX, which uses location data harvested from ordinary apps installed on phones. Because users agreed to an opaque Terms of Service page, the Secret Service believes it doesn't need a warrant. Despite those apps often not saying that their data may end up with the authorities. Any information, any keystroke, any click, IP address, all the websites, all the browsing history. A real-time map of everything you do on the internet. It knows who you are. Google has essentially created a digital tollbooth with Recaptcha. They hide in the background, watching whatever you do before, during, and after you interact with it. Then it puts emphasis on the question, which human is this user? Rather than the ordinary, is this user human? If it can figure out who you are, it lets you through. It's also why this happened. Why can't a robot just click, I'm not a robot? Robots move like this. Okay, for the checkbox. It's so easy to test this. If Google can't figure out exactly who you are, let's say you clear your cookies, you're browsing incognito, maybe you're using a privacy-focused browser. They flag you as suspicious. You're grouped in with the bots, who are actually better at solving Recaptchas than you are. And you still have to pay the toll in a different way. Today, that means labeling Google's machine learning data sets. In my experience, the more that I try to hide my identity from Google, the more data sets they make me train. Finally, if you reject those two prior options, Recaptcha, the toll booth, stops you right there. The internet isn't open for you. In short, Recaptcha isn't about bots. It's about you. And USVGoogle, what's the top-down on this situation? Here in the United States, there are a group of attorney generals and the Department of Justice are basically suing Google for a variety of monopolistic practices. And so the government's response was, we want you, Google, to get out of the advertising industry. So Google, as you can imagine that being one of the primary sources of revenue, said, okay, it looks like we're going to trial over this. I realize the irony of posting this video on YouTube, which is wholly owned by Google. We reached out to Google for comment. And by the time of filming this, they still haven't responded. So if they have, you'll see it on the screen somewhere right now. Google has all this power. What they have currently is empowering the data broker ecosystem, tons of this data flowing into government and data broker hands. What's interesting is, if Google is forced to sell their ad tech space, it may actually make privacy outcomes worse in the short term. Depending on who they sell it to, that company may be more evil than Google. And thanks again to Delete.me for sponsoring a portion of this video. Thanks so much to our Patreon supporters. You are my pride and joy, my muse. If you want to support us, hang out with us on Discord. I'd love to see you over there. Thanks for watching. Transcribed by https://otter.ai