User data is important in Google’s ranking systems. What we learned from Liz Reid’s appeal statement.

User data is important in Google’s ranking systems. What we learned from Liz Reid’s appeal statement.

I found some interesting things from the latest document in the trial between DOJ and Google. Google has appealed the ruling, which says the company must share confidential information with competitors.

Key Takeaways:

  • Google was ordered to share information with competitors to avoid becoming an illegal monopolist. Google doesn’t want to reveal its extensive user data.
  • Google’s data on page quality and timeliness is protected by copyright. They don’t want to give it away.
  • Indexed pages are annotated including signals that identify spam sites.
  • If spammers got hold of these spam signals, it would be difficult to stop spam.
  • User data is important to Google’s Glue system This is where information about each search query is stored, what the user saw and how they interacted with the search results.
  • User data is important for training RankEmbed BERT – one of the deep learning systems behind search.

Okay, let’s get to the interesting stuff!

Google has proprietary signals for page quality and freshness

That’s really no surprise. I found it interesting that freshness signals are at the core of Google’s proprietary secrets.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

Here you can learn more about the importance of Google’s proprietary freshness signals:

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

Crawled pages are marked with “proprietary page comprehension annotations.”

Every page in the Google index is annotated to help you understand the page. This includes signals to detect spam and duplicate pages. I’ve already written about how every page in the index has a spam score.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

Spam scores could be used to reverse engineer ranking systems

Google doesn’t want to share information about these results with its competitors.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

If the spam scores are made public, it could lead to more spam and make it harder for Google to combat spam.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

Google creates the index based on these marked pages

The pages where Google has added page comprehension annotations are organized based on how often Google expects the content to need to be accessed and how current the content needs to be.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

Only a fraction of the pages make it into the Google index

Google argues that providing competitors with a list of indexed URLs would allow them to “forgo crawling and analyzing the larger web and instead focus on crawling only the fraction of the pages that Google has included in its index.” Building this index costs Google a lot of time and money. They don’t want to give that away for free.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

The role of user data in Google’s ranking systems

That’s the most interesting part. I feel like we don’t pay enough attention to how Google uses user data. (Stay tuned with me YouTube channel as I will soon be releasing a very interesting video with my thoughts on the importance of user-side data – probably the most important factor in Google’s ranking systems.)

User data is used to build GLUE and RankEmbed models

Google Glue is a huge table of user activity. It collects the text of searched queries, the user’s language, location, and device type, as well as information about what was displayed on the SERP, what the user clicked or hovered over, how long they stayed on a SERP, and more.

Even more interesting is RankEmbed BERT. RankEmbed BERT is one of the deep learning systems underlying search. In the Pandu Nayak From our statement, we learned that RankEmbed BERT is used to rerank the results returned by traditional ranking systems. RankEmbed BERT is trained on click and query data from actual users.

The AI ​​systems behind search are constantly learning to improve to provide searchers with satisfactory results. Google looks at what they click on and whether or not they return to the SERPs. Google also runs live experiments looking at what searchers click and stay on. These actions help train RankEmbed BERT. Further fine-tuning is done through the ratings of the quality assessors. I will post more about this soon. The key point I want to highlight is that user satisfaction is by far the most important thing we should be optimizing for!

From Liz Reid’s document that we are analyzing today, we can see that user data is used to train, build and operate RankEmbed models.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

Once again, we learn that the user data used to train these models includes query, location, time of search, and how the user interacted with what was shown to them.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

This is about the actions that users take within Google search results. What I really want to know is what role does Chrome data play? Does Google check whether users interact with your pages, fill out your forms, create your recipes, and more? I think they do. The Judgment summary of this trial points out that Chrome data is used in the ranking systems, but not many details are shared.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

Google says that if someone had the user data from Glue and RankEmbed, they could use it to train an LLM

This user data is the key to Google’s success.

User data is important in Google's ranking systems. What we learned from Liz Reid's appeal statement.

It’s worth reading the whole thing Statement from Liz Reid.

I’m also once again offering quality ratings for websites.

If you liked this, you’ll love my newsletter:

Or come to us Search bar for the latest news on search and AI.

Leave a Comment

Scroll to Top