What Is Googlebot? How Google's Web Crawler Works

11 months ago 246
ARTICLE AD BOX

What Is Googlebot?

Googlebot is the main programme Google uses to automatically crawl (or visit) webpages. And observe what's connected them. 

As Google’s main website crawler, its intent is to support Google’s immense database of content, known arsenic the index, up to date. 

Because the much existent and broad this scale is, the amended and much applicable your hunt results volition be.

There are 2 main versions of Googlebot:

  • Googlebot Smartphone: The superior Googlebot web crawler. It crawls websites arsenic if it were a idiosyncratic connected a mobile device. 
  • Googlebot Desktop: This mentation of Googlebotcrawls websites arsenic if it were a idiosyncratic connected a desktop computer. Checking the desktop mentation of your site.

There are besides much circumstantial crawlers similar Googlebot Image, Googlebot Video, and Googlebot News.

Why Is Googlebot Important for SEO?

Googlebot is important for Google SEO due to the fact that your pages wouldn’t beryllium crawled and indexed (in astir cases) without it. If your pages aren’t indexed, they can’t beryllium ranked and shown successful hunt motor results pages (SERPs). 

And nary rankings means nary integrated (unpaid) hunt traffic.

 Google bots crawling your site, indexing your leafage   and past    being shown connected  the SERP if it meets ranking criteria.

Plus, Googlebot regularly revisits websites to cheque for updates. 

Without it, caller contented oregon changes to existing pages wouldn't beryllium reflected successful hunt results. And not keeping your tract up to day tin marque maintaining your visibility successful hunt results much difficult.

How Googlebot Works

Googlebot helps Google service applicable and close results successful the SERPs by crawling webpages and sending the information to beryllium indexed.

Let’s look astatine the crawling and indexing stages much closely:

Crawling Webpages

Crawling is the process of discovering and exploring websites to stitchery information. Gary Illyes, an expert astatine Google, explains the process successful this video:

Youtube video thumbnail

Googlebot is perpetually crawling the net to observe caller and updated content.

It maintains a continuously updated database of webpages. Including those discovered during erstwhile crawls on with caller sites.

This database is similar Googlebot’s idiosyncratic escapade map. Guiding it connected wherever to research next.

Because Googlebot besides follows links betwixt pages to continuously observe caller oregon updated content. 

Like this:

Googlebot pursuing  links betwixt  pages to continuously observe   caller   oregon  updated content.

Once Googlebot discovers a page, it whitethorn sojourn and fetch (or download) its content. 

Google tin past render (or visually process) the page. Simulating however a existent idiosyncratic would spot and acquisition it.

During the rendering phase, Google runs immoderate JavaScript it finds. JavaScript is codification that lets you adhd interactive and responsive elements to webpages.

Rendering JavaScript lets Googlebot spot contented successful a akin mode to however your users spot it.

Open the tool, insert your domain, and click “Start Audit.”

Site Audit hunt  with a domain entered and the "Start Audit" fastener  clicked.

If you’ve already tally an audit oregon created projects, click the “+ Create project” fastener to acceptable up a caller one.

"Projects" leafage   connected  Site Audit with the “+ Create project” fastener  clicked.

Enter your domain, sanction your project, and click “Create project.”

Input boxes to participate  a domain and task  sanction  on  with the "Create project" fastener  clicked.

Next, you’ll beryllium asked to configure your settings. 

If you’re conscionable starting out, you tin usage the default settings successful the “Domain and bounds of pages” section.

Then, click connected the “Crawler settings” tab to prime the idiosyncratic cause you would similar to crawl with. A idiosyncratic cause is simply a statement that tells websites who's visiting them. Like a sanction tag for a hunt motor bot.

There is nary large quality betwixt the bots you tin take from. They’re each designed to crawl your tract similar Googlebot would.

Crawler settings leafage   connected  Site Audit with the "User agent" conception  highlighted.

Check retired our Site Audit configuration guide for much details connected however to customize your audit.

When you’re ready, click “Start Site Audit.”

Scheduling settings leafage   connected  Site Audit with the "Start Site Audit" fastener  clicked.

You’ll past spot an overview leafage similar below. Navigate to the “Issues” tab. 

Site Audit overview study  with the "Issues" tab highlighted.

Here, you’ll spot a afloat database of errors, warnings, and notices affecting your website’s health. 

Click the “Category” drop-down and prime “Crawlability” to filter the errors.

Site Audit Issues leafage   with the "Category" dropdown opened and "Crawlability" selected.

Not definite what an mistake means and however to code it? 

Click “Why and however to hole it” oregon “Learn more” adjacent to immoderate enactment for a abbreviated mentation of the contented and tips connected however to resoluteness it.

Crawlability issues with “Why and however  to hole  it” adjacent  to breached  interior   nexus  issues clicked, showing tips connected  however  to resoluteness  the issue.

Go done and hole each contented to marque it easier for Googlebot to crawl your website.

Indexing Content

After GoogleBot crawls your content, it sends it for indexing consideration. 

Indexing is the process of analyzing a leafage to recognize its contents. And assessing signals similar relevance and prime to determine if it should beryllium added to Google’s index.

Here’s however Google’s Gary Illyes explains the concept: 

Youtube video thumbnail

During this process, Google processes (or examines) a page’s content. And tries to find if a leafage is simply a duplicate of different leafage connected the internet. So it tin take which mentation to amusement successful its hunt results.

Once Google filters retired duplicates and assesses applicable signals, similar contented quality, it whitethorn determine to scale your page. 

Then, Google’s algorithms execute the ranking signifier of the process. To find if and wherever your contented should look successful hunt results.

From your “Issues” tab, filter for “Indexability.” Make your mode done the errors first. Either by yourself oregon with the assistance of a developer. Then, tackle the warnings and notices.

Indexability issues connected  Site Audit similar  hreflang conflicts wrong   leafage   root   code, duplicate contented  issues, etc.

Further reading: Crawlability & Indexability: What They Are & How They Affect SEO

How to Monitor Googlebot's Activity

Regularly checking Googlebot’s enactment lets you spot immoderate indexability and crawlability issues. And hole them earlier your site’s integrated visibility falls. 

Here are 2 ways to bash this:

Use Google Search Console’s Crawl Stats Report

Use Google Search Console’s “Crawl stats” study for an overview of your site’s crawl activity. Including accusation connected crawl errors and mean server effect time.

To entree your report, log successful to Google Search Console spot and navigate to “Settings” from the left-hand menu. 

Left-hand broadside  navigation barroom  connected  Google Search Console with "Settings" clicked.

Scroll down to the “Crawling” section. Then, click the “Open Report” fastener successful the “Crawl stats” row.

Settings leafage   connected  Google Search Console with "Crawling" highlighted and "Open Report" adjacent  to "Crawl stats" clicked.

You’ll spot 3 crawling trends charts. Like this:

Crawl stats illustration  showing graphs implicit    clip  for "Total crawl requests", "Total download size", and "Average effect   time".

These charts amusement the improvement of 3 metrics implicit time:

  • Total crawl requests: The fig of crawl requests Google’s crawlers (like Googlebot) person made successful the past 3 months
  • Total download size: The fig of bytes Google crawlers person downloaded portion crawling your site
  • Average effect time: The magnitude of clip it takes for your server to respond to a crawl request

Take enactment of important drops, spikes, and trends successful each of these charts. And enactment with your developer to spot and code immoderate issues. Like server errors oregon changes to your tract structure.

The “Crawl requests breakdown” conception groups crawl information by response, record type, purpose, and Googlebot type.

Crawl requests breakdown showing crawl information  grouped by response, record  type, purpose, and Googlebot type.

Here’s what this information tells you:

  • By response: Shows you however your server has handled Googlebot’s requests. A precocious percent of “OK (200)” responses are a bully sign. It means astir pages are accessible. On the different hand, errors similar 404 oregon 301 tin bespeak broken links oregon moved contented that you may request to fix.
  • By record type: Tells you the benignant of files Googlebot is crawling. This tin assistance uncover issues related to circumstantial record types, similar images oregon JavaScript.
  • By purpose: Indicates the crushed for a crawl. A precocious find percent indicates Google is dedicating resources to uncovering caller pages. High refresh numbers mean Google is often checking existing pages.
  • By Googlebot type: Shows which Googlebot idiosyncratic agents are crawling your site. If you’re noticing crawling spikes, your developer tin cheque the idiosyncratic cause benignant to find whether determination is an issue.

Analyze Your Log Files

Log files are documents that grounds details astir each petition made to your server by browsers, people, and different bots. Along with however they interact with your site. 

By reviewing your log files, you tin find accusation like: 

  • IP addresses of visitors
  • Timestamps of each request
  • Requested URLs
  • The benignant of request
  • The magnitude of information transferred 
  • The idiosyncratic agent, oregon crawler bot

Here’s what a log record looks like:

Example of a log record  that with accusation  astir  antithetic  requests made to a server.

Analyzing your log files lets you excavation deeper into Googlebot’s activity. And place details similar crawling issues, however often Google crawls your site, and however accelerated your tract loads for Google.

Log files are kept connected your web server. So to download and analyse them, you archetypal request to entree your server.

Some hosting platforms person built-in record managers. This is wherever you tin find, edit, delete, and adhd website files.

A built-in record  manager   connected  a hosting level    dashboard to find, edit, delete, and adhd  website files.

Alternatively, your developer oregon IT specializer tin besides download your log files utilizing a File Transfer Protocol (FTP) lawsuit similar FileZilla

Once you person your log file, usage Semrush’s Log File Analyzer to recognize that data. And reply questions like:

  • What are your astir crawled pages?
  • What pages weren’t crawled?
  • What errors were recovered during the crawl?

Open the instrumentality and resistance and driblet your log record into it. Then, click “Start Log File Analyzer.”

Log File Analyzer instrumentality   commencement  with a conception  to resistance  & driblet  oregon  browse for log files.

Once your results are ready, you’ll spot a illustration showing Googlebot’s enactment connected your tract successful the past 30 days. This helps you place antithetic spikes oregon drops.

You’ll besides spot a breakdown of antithetic status codes and requested record types.

Googlebot’s enactment   connected  a tract  on  with a breakdown of antithetic  presumption    codes and requested record  types.

Scroll down to the “Hits by Pages” array for much circumstantial insights connected idiosyncratic pages and folders. 

“Hits by Pages” array  connected  Log File Analyzer with circumstantial  information  and insights for idiosyncratic  pages and folders.

You tin usage this accusation to look for patterns successful effect codes. And analyse immoderate availability issues. 

For example, a abrupt summation successful mistake codes (like 404 oregon 500) crossed aggregate pages could bespeak server problems causing wide website outages.

Then, you tin interaction your website hosting supplier to assistance diagnose the occupation and get your website backmost connected track.

How to Block Googlebot 

Sometimes, you mightiness privation to forestall Googlebot from crawling and indexing full sections of your site. Or adjacent circumstantial pages. 

This could beryllium because:

  • Your tract is nether attraction and you don’t privation visitors to spot incomplete oregon breached pages
  • You privation to fell resources similar PDFs oregon videos from being indexed and appearing successful hunt results
  • You privation to support definite pages from being made public, similar intranet oregon login pages
  • You request to optimize your crawl budget and guarantee Googlebot focuses connected your astir important pages

Here are 3 ways to bash that:

Robots.txt File

A robots.txt record is simply a acceptable of instructions that tells hunt motor crawlers, similar Googlebot, which pages oregon sections of your tract they should and shouldn’t crawl. 

It helps negociate crawler postulation and tin forestall your tract from being overloaded with requests.

Here’s an illustration of a robots.txt file:

Example of a robots.txt record  showing pages oregon  sections of a tract  that should and shouldn’t beryllium  crawled.

For example, you could adhd a robots.txt regularisation to forestall crawlers from accessing your login page. This helps support your server resources focused connected much important areas of your site.

Like this:

User-agent: Googlebot
Disallow: /login/

Further reading: Robots.txt: What Is Robots.txt & Why It Matters for SEO

However, robots.txt files don’t needfully support your pages retired of Google’s index. Because Googlebot tin inactive find these pages (e.g., if different pages nexus to them), and past they whitethorn inactive beryllium indexed and shown successful hunt results. 

If you don’t privation a leafage to look successful the SERPs, usage meta robots tags.

Meta Robots Tags

A meta robots tag is simply a portion of HTML codification that lets you power however an idiosyncratic leafage is crawled, indexed, and displayed successful the SERPs.

Definitions and quality  betwixt  "Robots.txt" and "Meta Robots Tag".

Some examples of robots tags, and their instructions, include:

  • noindex: Do not scale this page
  • noimageindex: Do not scale images connected this page
  • nofollow: Do not travel the links connected this page
  • nosnippet: Do not amusement a snippet oregon statement of this leafage successful hunt results

You tin adhd these tags to the <head> conception of your page’s code. For example, if you privation to artifact Googlebot from indexing your page, you could adhd a noindex tag. 

Like this:

<meta name="googlebot" content="noindex">

This tag volition forestall Googlebot from showing the leafage successful hunt results. Even if different sites nexus to it.

Further reading: Meta Robots Tag & X-Robots-Tag Explained

Password Protection

If you privation to artifact some Googlebot and users from accessing a page, usage password protection. 

This method ensures that lone authorized users tin presumption the content. And it prevents the leafage from being indexed by Google.

Examples of pages you mightiness password support include:

  • Admin dashboards
  • Private subordinate areas
  • Internal institution documents
  • Staging versions of your site
  • Confidential task pages

If the leafage you’re password protecting is already indexed, Google volition yet region it from its hunt results.

Make It Easy for Googlebot to Crawl Your Website

Half the conflict of SEO is making definite your pages adjacent amusement up successful the SERPs. And the archetypal measurement is ensuring Googlebot tin really crawl your pages.

Regularly monitoring your site’s crawlability and indexability helps you bash that.

And uncovering issues that mightiness beryllium hurting your tract is casual with Site Audit

Plus, it lets you tally on-demand crawling and docket car re-crawls connected a regular oregon play basis. So you’re ever connected apical of your site’s health.

Try it today.