Google’s Search algorithms all but rule the internet. As the dominant search engine on the planet, Search rankings can make or break a website — which has everyone and their aunt battling to try and claim the top spots. Those algorithms are a closely guarded secret, but leaked documents claim to shed some light on how Google Search operates.
SparkToro claims to have accessed over 2,500 of API documentation it claims originates from Google’s internal “Content API Warehouse”. And in those documents are what appear to be key details of Google’s Search algorithm. Android Authority notes that these documents don’t show the way Search ranks different websites or how they treat different site characteristics. But it does seem to show what Google actually collects in its bid to offer users the most useful Search results.
Interestingly, the site also claims these documents leaked to GitHub back in March, only to be removed. However for now SparkToro is working in collaboration with iPullRank to try and figure out what all these alleged APIs are meant to do.
It’s a very big deal to get a glimpse into how Search does its thing. Google hasn't explicitly acknowledged the leak, but it's all-but-confirmed the documents are legitimate. The company has pointed out in a statement to The Verge that people shouldn’t make “inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information”. Meaning that even though the documents are legit, they’re no longer relevant. At least. according to Google.
Google regularly offers tips and best practices for websites, in order to help them improve and better optimize their content for Search. However it has never come out and told people what they should be doing, likely to avoid people trying to game the system. That does happen, and Google’s continual updates to the algorithm seem, in part, to try and combat this kind of activity.
Google Search and how the algorithm works
Google has always said that it advises “people-first content” focusing on readers and users rather than search engines. The general philosophy there is “EEAT” — or expertise, authoritativeness and trustworthiness. Which is all pretty self-explanatory. However the leaked documents suggest Google actually takes a different approach.
Analysis by SparkToro and iPullRank of these documents suggest that different factors may be involved. These include domain authority, chrome data, clicks as a measure of success, the author in the byline, as well as seemingly using a sandbox to segregate new sites that haven’t developed the search engine’s trust yet.
These are all factors Google has denied using in the past. While it makes sense Google wants to keep the secrets of its flagship product, well, a secret these documents do suggest it’s been deliberately misleading.
Other factors noted in the documents are things we have known about in the past. Like the fact freshness of content matters, as does linking in and out to other relevant content. Branding and changing history also play a part, while demotion can occur for things like links not matching their target and presence or pornography — among other things. So the more things you have that Google likes, the more visible your content is likely to be.
Still, Google is sticking to its guns and is adamant that these documents are either out of date, inaccurate or not the full picture of how Google Search works. So we'll just have to see how this story develops over the coming weeks.