You Don’t Need Robots.txt On Root Domain, Says Google via @sejournal, @MattGSouthern

1 year ago 151

ARTICLE AD BOX

Google's Gary Illyes shares an unconventional but valid method for centralizing robots.txt rules connected CDNs.

Robots.txt files tin beryllium centralized connected CDNs, not conscionable basal domains.
Websites tin redirect robots.txt from main domain to CDN.
This unorthodox attack complies with updated standards.

Search Engine Spider Web Crawler Bot concept.

In a caller LinkedIn post, Google Analyst Gary Illyes challenged a long-standing content astir the placement of robots.txt files.

For years, the accepted contented has been that a website’s robots.txt record indispensable reside astatine the basal domain (e.g., example.com/robots.txt).

However, Illyes has clarified that this isn’t an implicit request and revealed a lesser-known facet of the Robots Exclusion Protocol (REP).

Robots.txt File Flexibility

The robots.txt record doesn’t person to beryllium located astatine the basal domain (example.com/robots.txt).

According to Illyes, having 2 abstracted robots.txt files hosted connected antithetic domains is permissible—one connected the superior website and different connected a contented transportation web (CDN).

Illyes explains that websites tin centralize their robots.txt record connected the CDN portion controlling crawling for their main site.

For instance, a website could person 2 robots.txt files: 1 astatine https://cdn.example.com/robots.txt and different astatine https://www.example.com/robots.txt.

This attack allows you to support a single, broad robots.txt record connected their CDN and redirect requests from their main domain to this centralized file.

Illyes notes that crawlers complying with RFC9309 volition travel the redirect and usage the people record arsenic the robotstxt record for the archetypal domain.

Looking Back At 30 Years Of Robots.txt

As the Robots Exclusion Protocol celebrates its 30th anniversary this year, Illyes’ revelation highlights however web standards proceed to evolve.

He adjacent speculates whether the record needs to beryllium named “robots.txt,” hinting astatine imaginable changes successful however crawl directives are managed.

How This Can Help You

Following Illyes’ guidance tin assistance you successful the pursuing ways:

Centralized Management: By consolidating robots.txt rules successful 1 location, you tin support and update crawl directives crossed your web presence.
Improved Consistency: A azygous root of information for robots.txt rules reduces the hazard of conflicting directives betwixt your main tract and CDN.
Flexibility: This attack allows for much adaptable configurations, particularly for sites with analyzable architectures oregon those utilizing aggregate subdomains and CDNs.

A streamlined attack to managing robots.txt files tin amended some tract absorption and SEO efforts.

Featured Image: BestForBest/Shutterstock

SEJ STAFF Matt G. Southern Senior News Writer astatine Search Engine Journal

Matt G. Southern, Senior News Writer, has been with Search Engine Journal since 2013. With a bachelor’s grade successful communications, ...