What the Google Search data leak means for SEO

May 31, 2024
Dan Lauer
5 min read
Beginner
||

As Batman villain Harvey Dent put it, “You either die a hero, or live long enough to see yourself become the villain.” Dent was referring to Julius Caesar’s lifelong accumulation of power and the inevitably tragic result, but the sentiment could now also describe Google’s increasingly fractious relationship with website owners and the wider SEO community.

The last two years have been a whirlwind for Google, from the DOJ anti-trust lawsuit to 10 confirmed algorithm system updates (including four core updates with significant implications), the reactive launch of the Search Generative Experience (SGE) in Labs, the bumpy arrival of AI Overviews, and the contentious growth of Reddit content at the top of the SERP. Almost all of these updates and events have, understandably, generated negative press for Google.

Then, like a bolt from the blue, this week’s leaked API documents about Google Search appear to have spilled some of the search giant’s most closely guarded secrets. Ex-Googler Erfan Azimi is claiming responsibility for leaking Google API documents from Google’s Internal Content API Warehouse in March 2024 to GitHub—documents he subsequently shared with respected SEO stalwart Rand Fishkin, who verified their authenticity and gave his take. Likewise, technical SEO expert Mike King reviewed thousands of pages of the leaked documents and analyzed the potential impact on internal ranking features and signals.

At first, some were skeptical about the leaks, but they appear legitimate having been scrutinized by trusted figures across the SEO industry, including a number of Azimi’s fellow ex-Googlers. Then, late this Wednesday, Google responded with a statement of its own:

We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.” 

What does Google’s API leak actually reveal?

With 2,500 pages of API documentation containing information about more than 14,000 attributes, the leak describes features that nobody outside of Google even knew existed. Worryingly, there is plenty of data in this leak that contradicts many of Google’s public statements over the last 20 years. It appears, for example, that click-through rates do indeed affect ranking, that subdomains have their own rankings, and that domain age is a ranking factor—all of which SEOs have previously speculated about only for the search giant to deny them.

This may well turn out to be one of the biggest stories in the history of SEO, and there are countless unanswered questions yet to be addressed. Google itself urges caution using this information, as do DAC’s SEO specialists, for a number of reasons:

  1. The leak does not provide full-context details of Google’s algorithm system.
  2. Features or factors mentioned in the leaked documents aren’t necessarily currently being used to rank search results.
  3. We don’t know how much weight these various signals carry compared to other ranking factors.

This changes everything in SEO… or does it?

In short, the basics remain unchanged—for now—and the fundamentals of building authority remain relevant. Create great content that builds relationships, drives qualified traffic, and gives people a reason to share or cite. Generating search demand to build your brand’s online visibility is still the name of the game.

There are, however, some hints and tips to be taken onboard. The revelation that Google uses click data in its ranking signals, for instance, means optimizing for user engagement should be balanced with the need for “helpful” content, with SEO and UX working in tandem rather than competing with one another.

The leak won’t change the fact that DAC’s SEO strategies are built on proven, fundamental SEO. Our experts will continue to stress the importance of making data-based decisions to create and distribute relevant, high-quality, “helpful” content that provides great on-site user experiences.

The future of search is unwritten

Google’s public perception has been taking hits for a couple of years now. Having faced complaints about the quality of its search results, the search giant seemingly lost the AI arms race after being caught flat-footed by ChatGPT (then botching its own SGE and Gemini launches). Since the announcement of AI Overviews at Google I/O, multiple examples of inappropriate AI answers have surfaced, including responses with hate speech and frankly dangerous health, medical, and financial advice. The rollout was so bad that users were trying to figure out how to opt out of AI Overviews within 24 hours. Before this leak, the most recent cherry on top of the controversy cake was Reddit content appearing prominently for potentially sensitive YMYL (your money, your life) queries. 

More than anything else, Google’s biggest problem is trust. Has the search giant finally lived long enough to become the villain? Will it meet the same fate as Harvey “Two-Face” Dent in The Dark Knight? Only time will tell. 

Until then, Google needs to be far more transparent and collaborative with website owners and the SEO industry. By building bridges and earning back trust over time, Google may undo some of the damage it inflicted on itself. Will Bing, Yahoo, and other search engines be able to take advantage and steal market share in the meantime? Watch this space… 

GET IN TOUCH

Contributing Experts

Dan Lauer

Mentioned in this article

Explore more insights