Does Google Have A Problem With Big Robots.txt Files? via @sejournal, @MattGSouthern

2 years ago 84

ARTICLE AD BOX

Google addresses the taxable of robots.txt files and whether it’s a bully SEO signifier to support them wrong a tenable size.

This taxable is discussed by Google’s Search Advocate John Mueller during the Google Search Central SEO office-hours hangout recorded connected January 14.

David Zieger, an SEO manager for a ample quality steadfast successful Germany, joins the livestream with concerns astir a “huge” and “complex” robots.txt file.

How immense are we talking here?

Zieger says there’s implicit 1,500 lines with a “multitude” of disallows that keeps increasing implicit the years.

The disallows forestall Google from indexing HTML fragments and URLs wherever AJAX calls are used.

Zieger says it’s not imaginable to acceptable a noindex, which is different mode to support the fragments and URLs retired of Google’s index, truthful he’s resorted to filling the site’s robots.txt with disallows.

Are determination immoderate antagonistic SEO effects that tin effect from a immense robots.txt file?

Here’s what Mueller says.

SEO Considerations For Large Robots.txt Files

A ample robots.txt record volition not directly origin immoderate antagonistic interaction to a site’s SEO.

However, a ample record is harder to maintain, which whitethorn pb to accidental issues down the road.

Mueller explains:

“No nonstop antagonistic SEO issues with that, but it makes it a batch harder to maintain. And it makes it a batch easier to accidentally propulsion thing that does origin issues.

So conscionable due to the fact that it’s a ample record doesn’t mean it’s a problem, but it makes it easier for you to make problems.”

Zieger follows up by asking if determination are immoderate issues with not including a sitemap successful the robots.txt file.

Mueller says that’s not a problem:

“No. Those antithetic ways of submitting a sitemap are each equivalent for us.”

Zieger past launches into a respective much follow-up questions that we’ll instrumentality a look astatine successful the adjacent section.

Does Google Recognize HTML Fragments?

Zieger asks Mueller what would beryllium the SEO interaction of radically shortening the robots.txt file. Such arsenic removing each the disallows, for example.

The pursuing questions are asked:

Does Google admit HTML fragments that aren’t applicable to tract visitors?
Would HTML fragments extremity up successful Google’s hunt scale if they weren’t disallowed successful robots.txt?
How does Google woody with pages wherever AJAX calls are used? (Such arsenic a header oregon footer element)

He sums up his questions by stating astir of what’s disallowed successful his robots.txt record are header and footer elements that aren’t absorbing for the user.

Mueller says it’s hard to cognize precisely what would hap if those fragments were abruptly allowed to beryllium indexed.

A proceedings and mistake attack mightiness beryllium the champion mode of figuring this out, Mueller explains:

“It’s hard to accidental what you mean with regards to those fragments

My thought determination would beryllium to effort to fig retired however those fragment URLs are used. And if you’re unsure, possibly instrumentality 1 of these fragment URLs and let its crawling, and look astatine the contented of that fragment URL, and past cheque to spot what happens successful search.

Does it impact thing with regards to the indexed contented connected your site?
Is immoderate of that contented findable wrong your tract suddenly?
Is that a occupation oregon not?

And effort to enactment based connected that, due to the fact that it’s precise casual to artifact things by robots.txt, which really are not utilized for indexing, and past you walk a batch of clip maintaining this large robots.txt file, but it really doesn’t alteration that overmuch for your website.”

Other Considerations For Building A Robots.txt File

Zieger has 1 past follow-up regarding robots.txt files, asking if determination are immoderate circumstantial guidelines to travel erstwhile gathering one.

Mueller says there’s nary circumstantial format to follow:

“No, it’s fundamentally up to you. Like immoderate sites person large files, immoderate sites person tiny files, they should each conscionable work.

We person an unfastened root codification of the robots.txt parser that we use. So what you tin besides bash is get your developers to tally that parser for you, oregon benignant of acceptable it up truthful that you tin trial it, and past cheque the URLs connected your website with that parser to spot which URLs would really get blocked and what that would change. And that mode you tin trial things earlier you marque them live.”

The robots.txt parser Mueller refers to tin beryllium recovered connected Github.

Hear the afloat treatment successful the video below:

Featured Image: Screenshot from YouTube.com/GoogleSearchCentral, January 2022.