Google Reminds Websites To Use Robots.txt To Block Action URLs via @sejournal, @MattGSouthern

1 year ago 248

ARTICLE AD BOX

In a LinkedIn post, Gary Illyes, an Analyst astatine Google, reiterated long-standing guidance for website owners: Use the robots.txt record to forestall web crawlers from accessing URLs that trigger actions similar adding items to carts oregon wishlists.

Illyes highlighted the communal ailment of unnecessary crawler postulation overloading servers, often stemming from hunt motor bots crawling URLs intended for idiosyncratic actions.

He wrote:

“Looking astatine what we’re crawling from the sites successful the complaints, mode excessively often it’s enactment URLs specified arsenic ‘add to cart’ and ‘add to wishlist.’ These are useless for crawlers, and you apt don’t privation them crawled.”

To debar this wasted server load, Illyes advised blocking entree successful the robots.txt record for URLs with parameters similar “?add_to_cart” oregon “?add_to_wishlist.”

As an example, helium suggests:

“If you person URLs like:
https://example.com/product/scented-candle-v1?add_to_cart
and
https://example.com/product/scented-candle-v1?add_to_wishlist

You should astir apt adhd a disallow regularisation for them successful your robots.txt file.”

While utilizing the HTTP POST method tin besides forestall the crawling of specified URLs, Illyes noted crawlers tin inactive marque POST requests, truthful robots.txt remains advisable.

Reinforcing Decades-Old Best Practices

Alan Perkins, who engaged successful the thread, pointed retired that this guidance echoes web standards introduced successful the 1990s for the aforesaid reasons.

Quoting from a 1993 document titled “A Standard for Robot Exclusion”:

“In 1993 and 1994 determination person been occasions wherever robots person visited WWW servers wherever they weren’t invited for assorted reasons…robots traversed parts of WWW servers that weren’t suitable, e.g. precise heavy virtual trees, duplicated information, impermanent information, oregon cgi-scripts with side-effects (such arsenic voting).”

The robots.txt standard, proposing rules to restrict well-behaved crawler access, emerged arsenic a “consensus” solution among web stakeholders backmost successful 1994.

Obedience & Exceptions

Illyes affirmed that Google’s crawlers afloat obey robots.txt rules, with uncommon exceptions thoroughly documented for scenarios involving “user-triggered oregon contractual fetches.”

This adherence to the robots.txt protocol has been a pillar of Google’s web crawling policies.

Why SEJ Cares

While the proposal whitethorn look rudimentary, the re-emergence of this decades-old champion signifier underscores its relevance.

By leveraging the robots.txt standard, sites tin assistance tame overzealous crawlers from hogging bandwidth with unproductive requests.

How This Can Help You

Whether you tally a tiny blog oregon a large e-commerce platform, pursuing Google’s proposal to leverage robots.txt for blocking crawler entree to enactment URLs tin assistance successful respective ways:

Reduced Server Load: You tin trim needless server requests and bandwidth usage by preventing crawlers from hitting URLs that invoke actions similar adding items to carts oregon wishlists.
Improved Crawler Efficiency: Giving much explicit rules successful your robots.txt record astir which URLs crawlers should debar tin pb to much businesslike crawling of the pages/content you privation to beryllium indexed and ranked.
Better User Experience: With server resources focused connected existent idiosyncratic actions alternatively than wasted crawler hits, end-users volition apt acquisition faster load times and smoother functionality.
Stay Aligned with Standards: Implementing the guidance puts your tract successful compliance with the wide adopted robots.txt protocol standards, which person been manufacture champion practices for decades.

Revisiting robots.txt directives could beryllium a elemental but impactful measurement for websites looking to exert much power implicit crawler activity.

Illyes’ messaging indicates that the past robots.txt rules stay applicable successful our modern web environment.

Featured Image: BestForBest/Shutterstock