One of Google's algorithms revealed by a GitHub leak? (video, 4m)

One of the most tightly held secrets in all of technology is how the Google search ranking algorithm actually works. If the secret ever got out, Google would implode because SEO experts would get every keyword to link to a landing page for fake Viagra pills. Unfortunately, Google accidentally pushed thousands of documents to GitHub of all places, a website owned by their Bing rival Microsoft, that provide an unprecedented look behind the curtain of Google search. As a bit of an SEO guru myself, I was left shocked and utterly devastated when I found out that Google has not been totally honest about the algorithm. In today's video, we'll take a look at what's inside these documents and find out if Google has been living up to its credo of, Don't be evil. It is May 31st, 2024, and you're watching The Code Report. When Google was founded in the late 90s by Larry and Sergei at Stanford, it was all based on the idea that a search engine could be handled entirely with an algorithm, which at the time was a radical idea that differed from search engines like Ask, Jeeves, and Yahoo, which relied on unscalable human curation. They wrote a legendary paper called The Anatomy of a Large-Scale Hypertextual Web Search Engine that detailed something called the Page Rank Algorithm. Every web page has an initial rank, and that ranking grows and improves based on the number of high-quality incoming backlinks. This worked pretty well at first, but eventually SEO gurus realized that all you had to do was spam a bunch of backlinks with the anchor text of your keyword to dominate the extremely valuable top search result placement. However, over the years, the algorithm has become more complex, and nowadays you actually have to make really good content to get the top ranking. But that's too hard, and SEO gurus still need to put food on their families. And sadly, many of the statements Google has made about how the algorithm works appear to be lies. It's important to point out that although Google has confirmed that these documents are real, we still don't really know exactly what they are. They could be internal training documents, they could be old and outdated, or it could be a false flag in Google's 5D chess game to protect the algorithm. Officially though, Google has implied that these documents are out of context, outdated, and incomplete. Another interesting point is that the leaked code uses the Elixir programming language, which is not a language that Google would normally use internally. But now let's get into the true lies. In the past, Google has denied the use of domain authority for ranking. However, in these documents, there's a site authority metric that seems to contradict that claim. Another highly sus thing Google has said in the past is that clicks are not a direct ranking factor. Well, we actually learned a while ago that that's a fib during Google's antitrust lawsuit, which revealed a system called NavBoost, or Glue, and aggregates a bunch of different interactions like clicks, hovers, scrolls, swipes, etc. What's a unicorn click? NavBoost was confirmed once again in the leaked documents, which it defines as click and impression signals for craps, so it looks like clicks are actually important, not surprisingly. Another potential fib is that it looks like, based on these documents, that data collected from users in the Chrome browser affects search rankings. Not surprised. And another thing that's not surprising is that backlinks still matter. It's not the simple page rank algorithm that it used to be, but getting those high-quality backlinks is still important. And finally, the most shockingly unsurprising thing is that actual humans are used for rating and whitelisting critical content. Fields like IsCovidAuthority or IsElectionAuthority are used for this. And through my investigation, I also found this one called WebRefCompactFlatPropertyValue that appears to be hiding the true shape of the Earth. Now, I'm no urologist, but overall, this leak looks pretty bad. I can't believe a big corporation would lie to us, but the real tragedy here is the web itself. In the early days, Google was the best way to find interesting websites and forums created by random weirdos, but nowadays, the top rankings are almost entirely dominated by authoritative sites like Wikipedia and Reddit, in addition to paid advertisers. And it's like, what is even the point of a website nowadays if AI is just going to summarize your website anyway and never get you a click-through? SEO has been dead for a long time, and now with this leak, it's even more deader. This has been The Code Report. Thanks for watching, and I will see you in the next one.

Menu

One of Google's algorithms revealed by a GitHub leak? (video, 4m)

Toggle timeline summary

Transcription