HUGE Google Search document leak reveals inner workings of ranking algorithm

A trove of leaked Google documents has given us an unprecedented look inside Google Search and revealed some of the most important elements Google uses to rank content.

What happened. Thousands of leaked internal documents, which appear to come from Google’s internal Content API Warehouse, were shared with Rand Fishkin, SparkToro co-founder, earlier this month.

  • Read on to discover what we’ve learned from Fishkin, as well as Michael King, iPullRank CEO who also reviewed the documents (and plans to provide further analysis for Search Engine Land soon).

Why we care. This leak gives us a glimpse into how Google’s ranking algorithm works, which is invaluable for SEOs who can understand what it all means. In 2023, we got an unprecedented look at Yandex Search ranking factors via a leak, which was one of the biggest stories of that year.

This Google document leak? It will likely be one of the biggest stories in the history of SEO and Google Search.

What’s inside. Here’s what we know about the leaked documents from Fishkin and King:

  • Current: The documentation indicates this information is accurate as of March.
  • Ranking features: 2,596 modules are represented in the API documentation with 14,014 attributes.
  • Weighting: The documents did not specify how any of the ranking features are weighted – just that they exist.
  • Twiddlers: These are re-ranking functions that “can adjust the information retrieval score of a document or change the ranking of a document,” according to King.
  • Demotions: Content can be demoted for a variety of reasons, such as:
    • A link doesn’t match the target site.
    • SERP signals indicate user dissatisfaction.
    • Product reviews.
    • Location.
    • Exact match domains.
    • Porn
  • Change history: Google apparently keeps a copy of every version of every page it has ever indexed. Meaning, Google can “remember” every change ever made to a page. However, Google only uses the last 20 changes of a URL when analyzing links.

Links matter. Shocking, I know. This leak confirms that link diversity and relevance remain key. And PageRank is still very much alive within Google’s ranking features. PageRank for a website’s homepage is considered for every document.

Successful clicks matter. This should not be a shocker, but if you want to rank well, elements of the leak clearly indicate that you need to keep creating great content and user experiences. Google uses a variety of measurements, including badClicks, goodClicks, lastLongestClicks and unsquashedClicks. As King put it:

  • “[Y]ou need to drive more successful clicks using a broader set of queries and earn more link diversity if you want to continue to rank. Conceptually, it makes sense because a very strong piece of content will do that. A focus on driving more qualified traffic to a better user experience will send signals to Google that your page deserves to rank.”

Documents and testimony from the U.S. vs. Google antitrust trial confirmed that Google uses clicks in ranking. See more from our coverage:

Brand matters. Fishkin’s big takeaway from the leak is that brand matters more than anything else:

  • “If there was one universal piece of advice I had for marketers seeking to broadly improve their organic search rankings and traffic, it would be: ‘Build a notable, popular, well-recognized brand in your space, outside of Google search.’”

Entities matter. Google stores author information associated with content and tries to determine whether an entity is the author of the document.

SiteAuthority: Google uses something called “siteAuthority”.

Chrome data. A module called ChromeInTotal indicates that Google uses data from its Chrome browser for search ranking.

Whitelists. A couple of modules indicate Google whitelist certain domains related to elections and COVID – isElectionAuthority and isCovidLocalAuthority. Though we’ve long known Google (and Bing) have “exception lists” when “specific algorithms inadvertently impact websites.”

The articles.

The leaker. Erfan Azimi, CEO and director of SEO for digital marketing agency EA Eagle Digital, posted this video, claiming responsibility for leaking the documents to Fishkin. Azimi is not employed by Google, but said he got the documents from a former Googler.

Leave a Reply

Your email address will not be published. Required fields are marked *