2024 Massive Leak – Google’s Search Secrets Revealed: What You Need to Know

Google 2024 Leak

For years search engine optimizers and digital marketers have long speculated what’s happening behind the scenes and what the tech and algorithms are doing behind closed doors at Google. If you know the space – or if you don’t – you know that entities like Moz, SEMRush and hundreds of others have worked diligently to survey experts, run tests, compare notes, and compiled “what experts generally believe and see leads to results” – EVERY. SINGLE. YEAR.

This has been going on for over 20 years!

I remember reading Matt Cutts’ blog trying to gleen any information I could. Ironically, it’s mostly been for nothing – as TO THIS DAY – the basics still dominate. But, one exciting thing that happened this month was that Google accidentally published all of it’s secrets in the form of a 14,000+ page document that outlined EXACTLY how it does what it does. Here is a quick synopis of everything – and Olympia Marketing’s take away…

The Big Google Leak

On March 13, thousands of internal Google documents were released on GitHub by an automated bot called yoshi-code-bot. These documents, from Google’s internal Content API Warehouse, were shared with Rand Fishkin, SparkToro co-founder. Michael King, CEO of iPullRank, also reviewed these documents and will provide further insights for Search Engine Land soon

Why This Matters

These documents offer a rare glimpse into Google’s ranking algorithms, an invaluable resource for SEOs – but also allows us to see if they’ve been telling the truth all of these years with what they’ve revealed. And, it allows us to see how accurately many of our predictions have been from those “in the industry”.

Turns out we were right about a lot – Google lies about virtually everything, but there are some suprising revelations. This leak is set to be one of the most significant events in the history of SEO, similar to the Yandex Search ranking factors leak in 2023.

Key Direct Takeaways from the Documents

  • The information is up-to-date from March 2024.
  • Ranking Features: The API documentation includes 2,596 modules and 14,014 attributes.
  • Weighting: The documents do not specify how ranking features are weighted.
  • Twiddlers: These functions adjust the ranking of a document based on various factors.
  • Demotions: Content can be downgraded for reasons like mismatched links, user dissatisfaction, product reviews, location, and exact match domains.
  • Change History: Google keeps a record of every version of every page it has indexed but only uses the last 20 changes for link analysis.
  • Links Matter: Link diversity and relevance are crucial, and PageRank is still a significant factor.
  • Successful Clicks Matter: High-quality content and user experience are essential for good rankings.
  • Content Scores: Longer documents may be truncated, while shorter content is scored on originality.
  • Your Money Your Life (YMYL) content gets special attention.

Thanks to Search Engine Land for gathering their Insights from the Experts:

Michael King:
To rank well, you need to drive successful clicks across a variety of queries and earn diverse links. This strategy indicates to Google that your content is valuable and deserving of a high ranking.

Rand Fishkin (human garbage can and leftist idiot – who hasn’t worked in SEO for a long time but is still respected in digital marketing circles):
Building a strong brand is more important than ever. A notable, well-recognized brand can significantly improve your organic search rankings and traffic.

Additional Important Findings

  • Entities and Authorship: Google tracks author information and uses it in ranking.
  • SiteAuthority: Google uses a metric called “siteAuthority” to evaluate sites.
  • Chrome Data: Google uses data from the Chrome browser for ranking.
  • Whitelists: Google has whitelists for certain domains, particularly related to elections and COVID.
  • Small Sites: Google may boost or demote small personal sites or blogs using specific features.
  • Freshness and Relevance
  • Google considers dates in the byline, URL, and on-page content for freshness.
  • Google compares page embeddings to determine if a document is a core topic of the website.
  • Domain registration information is also stored and used.
  • Page Titles and Fonts
  • Page titles are important, and Google has a metric to measure their relevance.
  • Google measures the average weighted font size of terms in documents and anchor text.

Olympia Marketing’s Thoughts

There’s not really too much crazy here although these are the things that stand out the most:

Tons of Factors Determine Ranking – The astounding number of factors that Google is using to rank things – if you tally up the modules and attributes you’re just under 20,000 different factors. Most businesses don’t understand this. Luckily if you get the basics correct you don’t need much else.

Demotion of Keyword Match Domains is Nonsense – Even though this document says they do sometimes demote keyword match domains (and I’ve seen this with clearly bogus domains with lots of dashes) it’s been my experience having done this 100+ times that keyword match domains do phenomenally well. More lying from Google even in their leaks? 4D chess perhaps…

Your Web Design Matters – Click tracking – it’s always been thought Google DOES include on-site behavior in determining ranking, even though Google has adamantly denied this, we now know they were lying and do – in fact – use this as a metric in weighing websites (and for good reason). So tweaking your UI to make it easier, more helpful, etc. DOES have an impact not just on general conversion rates, but also on how many people will eventually see your stuff (by way of it’s impact on SEO).

Chrome – Chrome being used to mine data on sites for ranking purposes is interesting, although not surprising. I can already see some ways this could be gamed over time potentially.

Newer Content Helps – Having fresh content with the date of publishing clearly marked is helpful. I’ve debated this for sometime, but it seems pretty clear (unless this leak is to throw people off completely) that newer fresher content does actually help and it’s certainly something I’ve seen in my recent tests.

Privacy & Author Reputation – Domain Registration information being stored, along with the tracking of personal blogs, as well the saving of all of copies of a website (something I’ll get back to in a second) as well as authors effectively having a “profile” as far as Google search is interesting, but concerning. Basically this could be a privacy nightmare overall. Google could know, behind the scenes for example based on non-public ICANN data who owns websites and then demote them.

A couple thoughts on the above… I know directly of a case that was settled out of court with a prominent person in the SEO world – who was personally attacked by Google – although one site was in violation of some of Google’s policies, they decided to go after ALL sites owned or operated by this person regardless of circumstance, topic, or legal precedent (IE these could have been owned, held, or operated by whole separate businesses – just their ASSOCIATION with this person caused them to be attacked and removed from Google’s SERPs).

Google storing old copies of websites is interesting – and aligns sort of (but is also at odds with) something we’ve noticed for sometime… starting about 18 months ago (back in late 2022 early 2023) Google seemed to simply stop indexing pages. At first I believed this was an anomaly with the site I was seeing this on, but after running tests on multiple new sites – I noticed the same thing over and over – Google simply wasn’t crawling and indexing sites. In fact, to this day – and with a client I’ll be doing a highlight on here shortly – only if you FORCE GOOGLE’S CRAWLERS – via indexing tools like Tag Parrot (which manually submits your full site via your Search Central and your sitemap). The client in question has been adding to their website nonstop for a half decade but only about 20 pages were indexed. By setting up manual submissions we were able to get them all 700+ pages indexed, and exploded their views and impressions to over 100,000/month (and now they’re top ranked nationally in their discipline).

Google is the World’s Ministry of Information

Additionally the fact that Google has “authority” sites for elections and COVID is also terrible but not unexpected. Google is the world’s defacto “ministry of information” controlling much of the information one can connect to, see or find and controls such a massive percent of both the onramps to content as well as the content itself (via YouTube which is the #2 search engine on the planet). We also know, as Google publicly came out shortly after the supposed “covid outbreak” that YouTube would be heavily censoring any and all information not deemed “mainstream/approved.”

So to close things out, although some interesting things have come to light – it’s nothing that crazy or that surprising, although it finallly confirms that Google has been bold-faced lying for decades even to it’s closest “allies” in the digital marketing community – which really does bring into question their original slogan… “do no evil” and one has to ask… why would I trust an organization that’s now been associated with so much fraud on the advertising level, and the lying to the public and SEO level with being the gatekeeper of information globally?

Share This