One of Google's algorithms revealed by a GitHub leak? (video, 4m)
Fireship has released a fascinating video about one of the most tightly held secrets in technology — the Google search ranking algorithm. Due to a leak of documents on GitHub, we now have an unprecedented look at how this algorithm actually works. The video’s author calls for an analysis of these documents, questioning whether Google still upholds its motto, “Don't be evil.” From the video, we learn that the ranking algorithm has not always been as honest as Google would like us to believe. When Google was founded in the late 90s, the idea was that a search engine could operate entirely on an algorithm, which was a radical concept compared to earlier search engines like Ask or Yahoo, which relied on human curation that didn't scale well. Over the years, as SEO gained importance, manipulation of rankings became easier, especially through backlink spamming.
As time went on, Google’s algorithm evolved to require high-quality content to achieve top rankings. The author points out that given such competition, many SEO experts find their livelihoods threatened. It's concerning that some of Google's claims about how its algorithm operates appear dubious. It’s worth noting that while Google has confirmed the authenticity of the documents, we don’t know exactly what they contain. They could be outdated training materials or even a false flag meant to protect the algorithm. Google claims these documents are out of context, outdated, and incomplete.
One of the more controversial findings within these documents is a metric called ‘site authority,’ which contradicts Google’s previous denials about using domain authority for ranking. Another surprising revelation contradicted Google’s statement that clicks are not a direct ranking factor, which was exposed during Google’s antitrust lawsuit. The system known as NavBoost collects various user interactions such as clicks, hovers, and scrolls, suggesting that clicks play a significant role in search outcomes.
Other striking conclusions from the documents indicate that data collected from users via the Chrome browser can indeed influence search rankings, which is hardly surprising. It is also mentioned that high-quality backlinks remain important, albeit not in the straightforward ways they once were. The author emphasizes that human raters may be involved in assessing and whitelisting critical content based on various metrics. Today, Google grapples with the challenge of content value, as much of the information online is dominated by authoritative sites like Wikipedia, posing a threat to smaller web pages.
Overall, the leaked materials paint a complex and concerning picture of Google’s operations. From the author's perspective, this situation undermines trust in Google and highlights the changing landscape of SEO. Traditional SEO seems to be dying in the face of growing AI dominance and advertising mechanisms. At the time of writing this article, the video has garnered over 1,396,141 views and 56,150 likes, indicating significant interest in the subject and the need for better comprehension of how online search really functions.
Toggle timeline summary
-
Introduction to Google's search ranking algorithm as a closely guarded secret.
-
Concerns about the potential consequences if the algorithm's secrets were exposed.
-
Accidental leak of documents on GitHub providing insight into Google's search mechanism.
-
Discussion about the honesty of Google regarding its algorithms.
-
Preview of the video content based on the leaked documents.
-
Brief history of Google's founding and initial algorithmic approach.
-
SEO tactics evolving to exploit Google's Page Rank algorithm.
-
Current demand for quality content to achieve high search rankings.
-
Uncertainty around the authenticity and context of the leaked documents.
-
Contradiction of Google's claims regarding domain authority in ranking.
-
Revelation about the importance of clicks as a ranking factor.
-
Continued significance of backlinks in search rankings despite algorithm changes.
-
Utilization of human evaluators for content rating and validation.
-
Critique of the current state of web searches dominated by authoritative sites.
-
Conclusion on the impact of the leak and the relevance of SEO.
Transcription
One of the most tightly held secrets in all of technology is how the Google search ranking algorithm actually works. If the secret ever got out, Google would implode because SEO experts would get every keyword to link to a landing page for fake Viagra pills. Unfortunately, Google accidentally pushed thousands of documents to GitHub of all places, a website owned by their Bing rival Microsoft, that provide an unprecedented look behind the curtain of Google search. As a bit of an SEO guru myself, I was left shocked and utterly devastated when I found out that Google has not been totally honest about the algorithm. In today's video, we'll take a look at what's inside these documents and find out if Google has been living up to its credo of, Don't be evil. It is May 31st, 2024, and you're watching The Code Report. When Google was founded in the late 90s by Larry and Sergei at Stanford, it was all based on the idea that a search engine could be handled entirely with an algorithm, which at the time was a radical idea that differed from search engines like Ask, Jeeves, and Yahoo, which relied on unscalable human curation. They wrote a legendary paper called The Anatomy of a Large-Scale Hypertextual Web Search Engine that detailed something called the Page Rank Algorithm. Every web page has an initial rank, and that ranking grows and improves based on the number of high-quality incoming backlinks. This worked pretty well at first, but eventually SEO gurus realized that all you had to do was spam a bunch of backlinks with the anchor text of your keyword to dominate the extremely valuable top search result placement. However, over the years, the algorithm has become more complex, and nowadays you actually have to make really good content to get the top ranking. But that's too hard, and SEO gurus still need to put food on their families. And sadly, many of the statements Google has made about how the algorithm works appear to be lies. It's important to point out that although Google has confirmed that these documents are real, we still don't really know exactly what they are. They could be internal training documents, they could be old and outdated, or it could be a false flag in Google's 5D chess game to protect the algorithm. Officially though, Google has implied that these documents are out of context, outdated, and incomplete. Another interesting point is that the leaked code uses the Elixir programming language, which is not a language that Google would normally use internally. But now let's get into the true lies. In the past, Google has denied the use of domain authority for ranking. However, in these documents, there's a site authority metric that seems to contradict that claim. Another highly sus thing Google has said in the past is that clicks are not a direct ranking factor. Well, we actually learned a while ago that that's a fib during Google's antitrust lawsuit, which revealed a system called NavBoost, or Glue, and aggregates a bunch of different interactions like clicks, hovers, scrolls, swipes, etc. What's a unicorn click? NavBoost was confirmed once again in the leaked documents, which it defines as click and impression signals for craps, so it looks like clicks are actually important, not surprisingly. Another potential fib is that it looks like, based on these documents, that data collected from users in the Chrome browser affects search rankings. Not surprised. And another thing that's not surprising is that backlinks still matter. It's not the simple page rank algorithm that it used to be, but getting those high-quality backlinks is still important. And finally, the most shockingly unsurprising thing is that actual humans are used for rating and whitelisting critical content. Fields like IsCovidAuthority or IsElectionAuthority are used for this. And through my investigation, I also found this one called WebRefCompactFlatPropertyValue that appears to be hiding the true shape of the Earth. Now, I'm no urologist, but overall, this leak looks pretty bad. I can't believe a big corporation would lie to us, but the real tragedy here is the web itself. In the early days, Google was the best way to find interesting websites and forums created by random weirdos, but nowadays, the top rankings are almost entirely dominated by authoritative sites like Wikipedia and Reddit, in addition to paid advertisers. And it's like, what is even the point of a website nowadays if AI is just going to summarize your website anyway and never get you a click-through? SEO has been dead for a long time, and now with this leak, it's even more deader. This has been The Code Report. Thanks for watching, and I will see you in the next one.