Google On Search Console Noindex Detected Errors via @sejournal, @martinibuster

8 months ago 145

ARTICLE AD BOX

Google’s John Mueller answered a question connected Reddit astir a seemingly mendacious ‘noindex detected successful X-Robots-Tag HTTP header’ mistake reported successful Google Search Console for pages that bash not person that circumstantial X-Robots-Tag oregon immoderate different related directive oregon block. Mueller suggested immoderate imaginable reasons, and aggregate Redditors provided tenable explanations and solutions.

Noindex Detected

The idiosyncratic who started the Reddit discussion described a script that whitethorn beryllium acquainted to many. Google Search Console reports that it couldn’t scale a leafage due to the fact that it was blocked not from indexing the leafage (which is antithetic from blocked from crawling). Checking the leafage reveals nary beingness of a noindex meta constituent and determination is nary robots.txt blocking the crawl.

Here is what the described arsenic their situation:

“GSC shows “noindex detected successful X-Robots-Tag http header” for a ample portion of my URLs. However:
Can’t find immoderate noindex successful HTML source
No noindex successful robots.txt
No noindex disposable successful effect headers erstwhile testing
Live Test successful GSC shows leafage arsenic indexable
Site is down Cloudflare (We person checked leafage rules/WAF etc)”

They besides reported that they tried spoofing Googlebot and tested assorted IP addresses and petition headers and inactive recovered nary hint for the root of the X-Robots-Tag

Cloudflare Suspected

One of the Redditors commented successful that treatment to suggest troubleshooting if the occupation was originated from Cloudflare.

They offered a broad measurement by measurement instructions connected however to diagnose if Cloudflare oregon thing other was preventing Google from indexing the page:

“First, comparison Live Test vs. Crawled Page successful GSC to cheque if Google is seeing an outdated response. Next, inspect Cloudflare’s Transform Rules, Response Headers, and Workers for modifications. Use curl with the Googlebot user-agent and cache bypass (Cache-Control: no-cache) to cheque server responses. If utilizing WordPress, disable SEO plugins to regularisation retired dynamic headers. Also, log Googlebot requests connected the server and cheque if X-Robots-Tag appears. If each fails, bypass Cloudflare by pointing DNS straight to your server and retest.”

The OP (orginal poster, the 1 who started the discussion) responded that they had tested each those solutions but were incapable to trial a cache of the tract via GSC, lone the unrecorded tract (from the existent server, not Cloudflare).

How To Test With An Actual Googlebot

Interestingly, the OP stated that they were incapable to trial their tract utilizing Googlebot, but determination is really a mode to bash that.

Google’s Rich Results Tester uses the Googlebot idiosyncratic agent, which besides originates from a Google IP address. This instrumentality is utile for verifying what Google sees. If an exploit is causing the tract to show a cloaked page, the Rich Results Tester volition uncover precisely what Google is indexing.

A Google’s rich results enactment page confirms:

“This instrumentality accesses the leafage arsenic Googlebot (that is, not utilizing your credentials, but arsenic Google).”

401 Error Response?

The pursuing astir apt wasn’t the solution but it’s an absorbing spot of method SEO knowledge.

Another idiosyncratic shared the acquisition of a server responding with a 401 mistake response. A 401 effect means “unauthorized” and it happens erstwhile a petition for a assets is missing authentication credentials oregon the provided credentials are not the close ones. Their solution to marque the indexing blocked messages successful Google Search Console was to adhd a notation successful the robots.txt to artifact crawling of login leafage URLs.

Google’s John Mueller On GSC Error

John Mueller dropped into the treatment to connection his assistance diagnosing the issue. He said that helium has seen this contented originate successful narration to CDNs (Content Delivery Networks). An absorbing happening helium said was that he’s besides seen this hap with precise aged URLs. He didn’t elaborate connected that past 1 but it seems to connote immoderate benignant of indexing bug related to aged indexed URLs.

Here’s what helium said:

“Happy to instrumentality a look if you privation to ping maine immoderate samples. I’ve seen it with CDNs, I’ve seen it with really-old crawls (when the contented was determination agelong agone and a tract conscionable has a batch of past URLs indexed), possibly there’s thing caller here…”

Key Takeaways: Google Search Console Index Noindex Detected

Google Search Console (GSC) whitethorn study “noindex detected successful X-Robots-Tag http header” adjacent erstwhile that header is not present.
CDNs, specified arsenic Cloudflare, whitethorn interfere with indexing. Steps were shared to cheque if Cloudflare’s Transform Rules, Response Headers, oregon cache are affecting however Googlebot sees the page.
Outdated indexing information connected Google’s broadside whitethorn besides beryllium a factor.
Google’s Rich Results Tester tin verify what Googlebot sees due to the fact that it uses Googlebot’s idiosyncratic cause and IP, revealing discrepancies that mightiness not beryllium disposable from spoofing a idiosyncratic agent.
401 Unauthorized responses tin forestall indexing. A idiosyncratic shared that their contented progressive login pages that needed to beryllium blocked via robots.txt.
John Mueller suggested CDNs and historically crawled URLs arsenic imaginable causes.