If you missed it, I published a piece of everything we learned about Google’s search algorithms from the Doj VS Google test version at the beginning of this week. Boy, there is a lot!
In this article we will concentrate on AI.
Most important snack:
-Google -ki trains on the Google Common Corpus (GCC).
-E a Gemini version called Magit Fine-Tunes Ai overviews.
-Openai has developed its own proprietary search index.
-A system called Fastseearch Grounds Gemini’s Ai.
-Publisher have little control over AI with their content.
-Google aims to build a “super assistant” for something.
Here is that Court document If you want to read it yourself.
1. Google Common Corpus is where most training data for the AI ​​of Google come
You have probably heard of that More frequently crawlA repository from public web crawl data. While some From the AI ​​training data from Google come from the common crawl, most of them come from another source called the Google Common Corpus.
The court documents tell us that “Google uses its Google Common Corpus (‘GCC’) to connect its Gemini Genai models.” GCC is a data record that contains large amounts of information from the web and is saved in a repository Google “Docjoins”. This repository does not contain all public information on the Internet, but rather documents that have been “Visited Googlebot at least once in recent months.”
2. A version of Gemini called Magit Fine is setting off the model to create answers for AI overviews
There is a certain version of Gemini called Magit. It is specially tailored to the production of the text in AI overviews.
Gemini is trained on text, mainly from the Internet, and then it is further finely tailored to certain data collections so that it can solve certain tasks such as mathematical problems, answering questions or creating code.
Google does not use click and query data from users to train Gemini. This was considered, but it did not find that the advantages of the pre -formation of search data were worth the costs.
I found it interesting, they said that the Magit model was well coordinated to “produce Textant words in the desired format for AI overviews. ““ There is no mention here that Magit is used to predict which Left Bring AI overviews. We do not know whether the links in Aios from Fastseearch (see below) or by the regular search ranking algorithms are classified.
3. Openai has developed its own search index
Did you know that Openai has its own search index? I didn’t!
In the test documents we learned that Openaai built up a separate search index because they had quality problems with third -party searchers.
Historically speaking, Chatgpt pulled out of Bing’s index.
You can still do this, although the latest tests show that chatt seems to be Pull information from Google Search. (It was later found that Openaai was previously listed as a customer of SerpapiA tool that scratches Google Search.)
I couldn’t find an final information about the Search Index of Openaai.
V.
FASTSEARCH is based on rankembed -signals -a series of search ranking signals. It generates a abbreviated list of ranked lists, with which a voice model can create a result that is based on searching. FASTSEARCH is faster than a full web search, but not so high quality.
Suppose you ask, for example, a question about a current message event in Gemini. Gemini should see that this event is not in the training data and in the soil (also known as verify) in Google search. FASTSEARCH would be used to generate a short list of websites on which Gemini’s reaction can be founded.
Fasseearch is integrated in Versai -Ai -Vector search This can be used via the API to justify LLM answers on Google search results (or even in your own documents).
5. We have no say in how Google uses our content for AI
The court ruled that Google “does not have to change its guidelines in order to be selected more as Google uses its content.”
As it looks now, Site owners can use them Google-based guideline In robots.txt to prevent Gemini’s AI models from being trained in their content. However, this does not prevent your website from being displayed in AI overviews or AI mode. If you want to unsubscribe these functions, essentially decide on the search. It looks like it doesn’t change.
It was interesting to see how this initiative came out this week: The RSL collective. The aim is to create a system in which AI companies can be instructed to pay for the use of their content. It’s a good idea, but I don’t see any evidence that shows that the AI ​​companies are bound to these rules, no method to pay website owners for their content. Nevertheless, I pay close attention to Stripe’s new pace system, a new blockchain that could possibly be the framework for payments in an agent to agent web.
Further information on Agent2Agent communication can be found in my blog post about how Agent to Agent Communication will probably change the web radically.
6. One day we could have a Google AI assistant who can do this anything!
Take a look at that:
“In the long term, Genai companies strive to transform chatbots into a kind of” (s) uper (a) “. A super assistant could help to carry out ‘every task ‘ requested by the user.
I know that seems impossible. But … recently Google Deepmind CEO Demis Hassabis wrote that, that Google’s vision was to build a world model that makes it possible Become a universal AI assistant. This would not only be for searching on the Internet, but also in the real world. The article speaks of the use of agents like Google’s genius That can simulate real environments and Train robots for real tasks.
Perhaps one day we will tell our grandchildren of the world in which we lived in the spot where Google was a tool with which they could enter keywords and receive text -based answers to a screen.
Here is something interesting. Try to search for “search engine”. You won’t see that Google.com Homepage there. I believe Google’s ultimate goal of becoming our super helpful assistant every day For everything we need in life.
If you liked that, you will love my newsletter!
Or, Accompany us in the search bar For real -time messages for SEO and KI.
Marie
Related articles in my blog:
What I learned on Google I/O 2025: A new era of searching
What is the future of Google search with AI? Will the AI ​​mode replace the traditional search?
From Rankbrain to Bert and more: A look at the role of AI in Google’s search algorithms
Which test documents from Google indicate clicks, links and other ranking signals