ARTICLE AD BOX
Advertisement
Google's John Mueller explains wherefore disallowed pages are sometimes indexed and that related Search Console reports tin beryllium dismissed

Google’s John Mueller answered a question astir wherefore Google indexes pages that are disallowed from crawling by robots.txt and wherefore the it’s harmless to disregard the related Search Console reports astir those crawls.
Bot Traffic To Query Parameter URLs
The idiosyncratic asking the question documented that bots were creating links to non-existent query parameter URLs (?q=xyz) to pages with noindex meta tags that are besides blocked successful robots.txt. What prompted the question is that Google is crawling the links to those pages, getting blocked by robots.txt (without seeing a noindex robots meta tag) past getting reported successful Google Search Console arsenic “Indexed, though blocked by robots.txt.”
The idiosyncratic asked the pursuing question:
“But here’s the large question: wherefore would Google scale pages erstwhile they can’t adjacent spot the content? What’s the vantage successful that?”
Google’s John Mueller confirmed that if they can’t crawl the leafage they can’t spot the noindex meta tag. He besides makes an absorbing notation of the site:search operator, advising to disregard the results due to the fact that the “average” users won’t spot those results.
He wrote:
“Yes, you’re correct: if we can’t crawl the page, we can’t spot the noindex. That said, if we can’t crawl the pages, past there’s not a batch for america to index. So portion you mightiness spot immoderate of those pages with a targeted site:-query, the mean idiosyncratic won’t spot them, truthful I wouldn’t fuss implicit it. Noindex is besides good (without robots.txt disallow), it conscionable means the URLs volition extremity up being crawled (and extremity up successful the Search Console study for crawled/not indexed — neither of these statuses origin issues to the remainder of the site). The important portion is that you don’t marque them crawlable + indexable.”
Takeaways:
1. Mueller’s reply confirms the limitations successful utilizing the Site:search precocious hunt relation for diagnostic reasons. One of those reasons is due to the fact that it’s not connected to the regular hunt index, it’s a abstracted happening altogether.
Google’s John Mueller commented connected the tract hunt operator successful 2021:
“The abbreviated reply is that a site: query is not meant to beryllium complete, nor utilized for diagnostics purposes.
A tract query is simply a circumstantial benignant of hunt that limits the results to a definite website. It’s fundamentally conscionable the connection site, a colon, and past the website’s domain.
This query limits the results to a circumstantial website. It’s not meant to beryllium a broad postulation of each the pages from that website.”
2. Noindex tag without utilizing a robots.txt is good for these kinds of situations wherever a bot is linking to non-existent pages that are getting discovered by Googlebot.
3. URLs with the noindex tag volition make a “crawled/not indexed” introduction successful Search Console and that those won’t person a antagonistic effect connected the remainder of the website.
Read the question and reply connected LinkedIn:
Why would Google scale pages erstwhile they can’t adjacent spot the content?
Featured Image by Shutterstock/Krakenimages.com
SEJ STAFF Roger Montti Owner - Martinibuster.com astatine Martinibuster.com
I person 25 years hands-on acquisition successful SEO, evolving on with the hunt engines by keeping up with the latest ...