Several website owners have recently contacted me about a scary spike in indexing, which then drops back to normal levels. This shows up in the coverage reports in Google Search Console and the site owners thought it might be a sign that something terrible was happening from an SEO perspective. For example, Google is suddenly indexing URLs that it shouldn’t, and potentially this is potentially affecting the overall quality of the site. This is not the case for most sites and I will explain what the problem was below. And it’s worth noting that I’ve seen this several times over the years. Ultimately, the answer is right there in the report, but it can be difficult to see at first glance.
Double Vision: Don’t forget to scroll down the reporting reports and read “Indexed despite being blocked by robots.txt.”
It’s easy to miss the yellow/orange addition at the end of the coverage. The report covers “Indexed despite being blocked by robots.txt” and this can often explain a sudden increase and then a decrease in a website’s overall indexing. But I find that many site owners don’t know to look there, or still don’t know if that’s a big problem from an SEO perspective even if they see it.
I think one of the most confusing things about this report is that as it increases, so does the overall indexing of the site. And that makes sense when you think about it. The top page indexing report appears all pages indexed by Google. And URLs that are indexed despite being blocked by robots.txt are technically indexed. So you should show up there. But this is also very confusing for some website owners. I’ll talk more about this situation below.
For example, here are two websites that saw an increase in indexing, but then went back down pretty quickly. If you enable the “Indexed even though robots.txt is blocked” option in Search Console, you can see the same trend from an indexing perspective. Yes, that’s where the rise and fall happened. And that’s totally fine from an SEO perspective.
And “Indexed although blocked by robots.txt” shows the same rise and fall:
Here’s another example of indexing rising but then falling off quickly:
And again, “Indexed even though blocked by robots.txt” shows the same rise and fall:
Reasons for the increase in indexing in Search Console:
First: blocking URLs via robots.txt not Prevent pages from being indexed. Google has explained this many times, literally in their documentation. Google can still index the URLs, but without crawling the content. For example, if Google collects links to these URLs, they may end up indexed (but without Google actually crawling the page content).
Additionally, if these pages end up in the SERPs (which is sometimes the case), Google will simply display a message that a snippet could not be generated Links to its documentation Explain why you might be seeing this message. And as you can imagine, it contains information that blocking via robots.txt may be a cause of this.
So yes, Google has decided to index these URLs, but no, it won’t be a big problem from an SEO perspective and the quality of the website, nor will it have a negative impact on the rankings. I’ll explain why this is next. Additionally, Google’s systems decided to deindex these URLs not long after they were indexed, as you can see from the graphics above.
Why spikes in “Indexed but blocked by robots.txt” are (usually) not a problem.
After seeing the number of URLs in “Indexed but blocked by robots.txt” increase over the years and wondering how these URLs might affect Google’s quality rating of a site, I decided to ask John Mueller of Google in one Search Central Hangout to question this topic video in 2021. John explained that URLs blocked by robots.txt would do this not have an impact on quality because Google could not crawl the URLs and see the content there. You can see my tweet about it below, which includes a link to the video with John.
And here’s another clip from John that specifically addresses “Indexed despite being blocked by robots.txt.” John explains that it is more of a warning to website owners if they want to index these URLs. If you don’t want them to be crawlable and you definitely want the URLs to be blocked by robots.txt, then that’s totally fine. And he explained again that the URLs could not be associated with the website because Google could not crawl the content.
Summary: Not all sudden spikes in indexing are scary and problematic. And make sure you check ALL reports in the GSC.
The next time you notice a sudden increase and possibly a decrease in indexing, be sure to check out the report at the bottom of the Coverage report in Search Console. The answer could simply be URLs that are “indexed but blocked by robots.txt.” And if that’s the case, you may not have to worry at all. These URLs are technically indexed, but Google cannot crawl the content, so they cannot be associated with the site from a quality perspective. And if that’s the case, then breathe a sigh of relief and move on to more important SEO-related things.
GG