Get Started With GSC Queries In BigQuery

2 months ago 29
ARTICLE AD BOX

BigQuery has a fig of advantages not recovered with different tools erstwhile it comes to analyzing ample volumes of Google Search Console (GSC) data.

It lets you process billions of rows successful seconds, enabling heavy investigation crossed monolithic datasets.

This is simply a step up from Google Search Console, which lone allows you to export 1,000 rows of information and whitethorn person data discrepancies.

You work each astir wherefore you should beryllium utilizing BigQuery arsenic an SEO pro. You figured retired however to plug GSC with BigQuery. Data is flowing!

Now what?

It’s clip to commencement querying the data. Understanding and efficaciously querying the information is cardinal to gaining actionable SEO insights.

In this article, we’ll locomotion done however you tin get started with your queries.

Understanding GSC Data Structure In BigQuery

Data is organized successful tables. Each array corresponds to a circumstantial Google Search Console report. The official documentation is precise extended and clear.

However, if you are speechmaking this, it’s due to the fact that you privation to recognize the discourse and the cardinal elements earlier diving into it.

Taking the clip to fig this retired means that you volition beryllium capable to make amended queries much efficiently portion keeping the costs down.

GSC Tables, Schema & Fields In BigQuery

Schema is the blueprint that maps what each tract (each portion of information) represents successful a table.

You person three chiseled schemas presented successful the authoritative documentation due to the fact that each array doesn’t needfully clasp the aforesaid benignant of data. Think of tables arsenic dedicated folders that signifier circumstantial types of information.

Each study is stored separately for clarity. You’ve got:

  • searchdata_site_impression: Contains show information for your spot aggregated by property.
  • searchdata_url_impression: Contains show information for your spot aggregated by URL.
  • exportLog: each palmy export to either array is logged here.

A fewer important notes connected tables:

  • You’ll find successful the official documentation that things don’t tally the mode we expect them to: “Search Console exports bulk information erstwhile per day, though not needfully astatine the aforesaid clip for each table.”
  • Tables are retained forever, by default, with the GSC bulk export.
  • In the URL level array (searchdata_url_impression), you person Discover data. The tract is_anonymized_discover specifies if the information enactment is taxable to the Discover anonymization threshold.

Fields are idiosyncratic pieces of information, the circumstantial benignant of information successful a table. If this were an Excel file, we’d notation to fields arsenic the columns successful a spreadsheet.

If we’re talking astir Google Analytics, fields are metrics and dimensions. Here are cardinal information fields disposable successful BigQuery erstwhile you import GSC data:

  • Clicks – Number of clicks for a query.
  • Impressions – Number of times a URL was shown for a query.
  • CTR – Clickthrough complaint (clicks/impressions).
  • Position – Average presumption for a query.

Let’s instrumentality the searchdata_site_impression array schema arsenic an example. It contains 10 fields:

Field Explanation
data_date The time erstwhile the information successful this enactment was generated, successful Pacific Time.
site_url URL of the property, sc-domain:property-name oregon the afloat URL, depending connected your validation.
query The user’s hunt query.
is_anonymized_query If true, the query tract volition instrumentality null.
country Country from which the hunt query originated.
search_type Type of hunt (web, image, video, news, discover, googleNews).
device The instrumentality utilized by the user.
impressions The fig of times a URL was shown for a peculiar hunt query.
clicks The fig of clicks a URL received for a hunt query.
sum_top_position This calculation figures retired wherever your website typically ranks successful hunt results. It looks astatine the highest presumption your tract reaches successful antithetic searches and calculates the average.

Putting It Together

In BigQuery, the dataset for the Google Search Console (GSC) bulk export typically refers to the postulation of tables that store the GSC data.

The dataset is named “searchconsole” by default.

BigQuery hunt  console tables

Unlike the show tab successful GSC, you person to constitute queries to inquire BigQuery to instrumentality data. To bash that, you request to click connected the “Run a query successful BigQuery” button.

Run SQL query enactment    among 3  different   options connected  the invited  screenScreenshot from Google Cloud Console, January 2024

Once you bash that, you should person entree to the BigQuery Studio, wherever you volition beryllium creating your archetypal SQL query. However, I don’t urge you click connected that fastener yet.

access surface  to the BigQuery Studio wherever  you volition  beryllium  creating your archetypal  SQL query. Screenshot of BigQuery Studio, January 2024

In Explorer, erstwhile you unfastened your project, you volition spot the datasets; it’s a logo with squares with dots successful them. This is wherever you spot if you person GA4 and GSC data, for instance.

data acceptable   for hunt  content   array

When you click connected the tables, you get entree to the schema. You tin spot the fields to corroborate this is the array you privation to query.

If you click connected “QUERY” astatine the apical of the interface, you tin make your SQL query. This is amended due to the fact that it loads up immoderate accusation you request for your query.

It volition capable retired the FROM with the due table, found a default limit, and the day that you tin alteration if you request to.

 If you click connected  “QUERY” astatine  the apical  successful  the interface, you tin  make  your SQL query. This is amended  due to the fact that it loads up   immoderate   accusation  you request   for your query.Screenshot from Google Cloud Console, January 2024

Getting Started With Your First Query

Search Console > BigQuery export was antecedently lone disposable to companies with devs/ a ace techy SEO. Now it's disposable to everyone!

Writing SQL is simply a much and much important accomplishment for marketers & I'm making thing to assistance with that – if you'd similar to trial it DM maine 🙂 https://t.co/voOESJfo1e

— Robin Lord (@RobinLord8) February 21, 2023

The queries we are going to sermon present are simple, efficient, and low-cost.

Disclaimer: The erstwhile connection depends connected your circumstantial situation.

Sadly, you cannot enactment successful the sandbox if you privation to larn however to usage BigQuery with GSC data. You indispensable participate your billing details. If this has you freaked out, fearfulness not; costs should beryllium low.

  • The archetypal 1 TiB per period of query information is free.
  • If you person a choky budget, you tin acceptable cloud billing fund alerts — you tin acceptable a BigQuery-specific alert and get notified arsenic soon arsenic information usage charges occur.

In SQL, the ‘SELECT *’ connection is simply a almighty bid utilized to retrieve each columns from a specified array oregon retrieve circumstantial columns arsenic per your specification.

This connection enables you to presumption the full dataset oregon a subset based connected your enactment criteria.

A array comprises rows, each representing a unsocial record, and columns, storing antithetic attributes of the data. Using “SELECT *,” you tin analyse each fields successful a array without specifying each file individually.

For instance, to research a Google Search Console array for a circumstantial day, you mightiness employment a query like:

SELECT *

FROM `yourdata.searchconsole.searchdata_site_impression`

WHERE data_date = '2023-12-31'

LIMIT 5;

You ever request to marque definite that the FROM clause specifies your searchdata_site_impression table. That’s wherefore it is recommended to commencement by clicking the array first, arsenic it automatically fills successful the FROM clause with the close table.

Important: We bounds the information we load by utilizing the data_date field. It’s a bully signifier to bounds costs (along with mounting a limit).

results from the archetypal  query we made shown successful  a array  format

Your First URL Impression Query

If you privation to spot accusation for each URL connected your site, you’d inquire BigQuery to propulsion accusation from the ‘searchdata_url_impression’ table, selecting the ‘query’ and ‘clicks’ fields.

This is what the query would look similar successful the console:

SELECT

url,

SUM(clicks) AS clicks,

SUM(impressions)

FROM

`yourtable.searchdata_url_impression`

WHERE

data_date = ‘2023-12-25’

GROUP BY

url

ORDER BY

clicks DESC

LIMIT

100

You ever request to marque definite that the FROM clause specifies your searchdata_url_impression table.

When you export GSC information into BigQuery, the export contains partition tables. The partition is the date.

This means that the information successful BigQuery is structured successful a mode that allows for speedy retrieval and investigation based connected the date.

That’s wherefore the day is automatically included successful the query. However, you whitethorn person nary information if you prime the latest date, arsenic the information whitethorn not person been exported yet.

Breakdown Of The Query

In this example, we prime the URL, clicks, and impressions fields for the 25th of December, 2023.

We radical the results based connected each URL with the sum of clicks and impressions for each of them.

Lastly, we bid the results based connected the fig of clicks for each URL and bounds the fig of rows (URLs) to 100.

Recreating Your Favorite GSC Report

I urge you work the GSC bulk information export guide. You should beryllium utilizing the export, truthful I volition not beryllium providing accusation astir array optimization. That’s a tad spot much precocious than what we are covering here.

GSC’s show tab shows 1 magnitude astatine a time, limiting context. BigQuery allows you to harvester aggregate dimensions for amended insights

Using SQL queries means you get a neat table. You don’t request to recognize the ins and outs of SQL to marque the champion usage of BigQuery.

This query is courtesy of Chris Green. You tin find immoderate of his SQL queries successful Github.

SELECT

query,

is_anonymized_query AS anonymized,

SUM(impressions) AS impressions,

SUM(clicks) AS clicks,

SUM(clicks)/NULLIF(SUM(impressions), 0) AS CTR

FROM

yourtable.searchdata_site_impression

WHERE

data_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 28 DAY)

GROUP BY

query,

anonymized

ORDER BY

clicks DESC

This query provides insights into the show of idiosyncratic queries implicit the past 28 days, considering impressions, clicks, and CTR.

It besides considers whether the queries are anonymized oregon not, and the results are sorted based connected the full fig of clicks successful descending order.

This recreates the information you would usually find successful the Search Console “Performance” study for the past 28 days of data, results by query, and differentiating anonymized queries.

Feel escaped to copy/paste your mode to glory, but ever marque definite you update the FROM clause with the close array name. If you are funny to larn much astir however this query was built, present is the breakdown:

  • SELECT clause:
    • query: Retrieves the idiosyncratic queries.
    • is_anonymized_query AS anonymized: Renames the is_anonymized_query tract to anonymized.
    • SUM(impressions) AS impressions: Retrieves the full impressions for each query.
    • SUM(clicks) AS clicks: Retrieves the full clicks for each query.
    • SUM(clicks)/NULLIF(SUM(impressions), 0) AS CTR: Calculates the Click-Through Rate (CTR) for each query. The usage of NULLIF prevents part by zero errors.
  • FROM clause:
    • Specifies the root array arsenic mytable.searchconsole.searchdata_site_impression.
  • WHERE clause:
    • Filters the information to see lone rows wherever the data_date is wrong the past 28 days from the existent date.
  • GROUP BY clause:
    • Groups the results by query and anonymized. This is indispensable since aggregations (SUM) are performed, and you privation the totals for each unsocial operation of query and anonymized.
  • ORDER BY clause:
    • Orders the results by the full fig of clicks successful descending order.

Handling The Anonymized Queries

According to Noah Learner, the Google Search Console API delivers 25 times much information than the GSC show tab for the aforesaid search, providing a much broad view.

In BigQuery, you tin besides entree the accusation regarding anonymized queries.

It doesn’t omit the rows, which helps analysts get implicit sums of impressions and clicks erstwhile you aggregate the data.

Understanding the measurement of anonymized queries successful your Google Search Console (GSC) information is cardinal for SEO pros.

When Google anonymizes a query, it means the existent hunt query substance is hidden successful the data. This impacts your analysis:

  • Anonymized queries region the quality to parse hunt query connection and extract insights astir searcher intent, themes, etc.
  • Without the query data, you miss opportunities to place caller keywords and optimization opportunities.
  • Not having query information restricts your capableness to link hunt queries to leafage performance.

The First Query Counts The Number Of Anonymized Vs. Not Anonymized Queries

SELECT

CASE

WHEN query is NULL AND is_anonymized_query = TRUE THEN "no query"

ELSE

"query"

END

AS annonymized_query,

count(is_anonymized_query) arsenic query_count

FROM

`yourtable.searchdata_url_impression`

GROUP BY annonymized_query

Breakdown Of The Query

In this example, we usage a CASE connection successful bid to verify for each enactment if the query is anonymized oregon not.

If so, we instrumentality “no query” successful the query field; if not, “query.”

We past number the fig of rows each query benignant has successful the array and radical the results based connected each of them. Here’s what the effect looks like:

anonymized queries shown successful  results

Advanced Querying For SEO Insights

BigQuery enables analyzable investigation you can’t propulsion disconnected successful the GSC interface. This means you tin besides make customized intel by surfacing patterns successful idiosyncratic behavior.

You tin analyse hunt trends, seasonality implicit time, and keyword optimization opportunities.

Here are immoderate things you should beryllium alert of to assistance you debug the filters you enactment successful place:

  • The day could beryllium an issue. It whitethorn instrumentality up to 2 days for you to person the information you privation to query. If BigQuery says connected the apical close country that your query would necessitate 0mb to run, it means the information you privation isn’t determination yet oregon that determination is nary information for your query.
  • Use the preview if you privation to spot what a tract volition instrumentality successful presumption of value. It shows you a array with the data.
  • The state abbreviations you volition get successful BigQuery are successful a antithetic format (ISO-3166-1-Alpha-3 format) than you are utilized to. Some examples: FRA for France, UKR for Ukraine, USA for the United States, etc.
  • Want to get “pretty” queries? Click connected “more” wrong your query tab and prime “Format query.” BigQuery volition grip that portion for you!
  • If you privation much queries close away, I suggest you motion up for the SEOlytics newsletter, arsenic determination are rather a fewer SQL queries you tin use.

Conclusion

Analyzing GSC information successful BigQuery unlocks transformative SEO insights, enabling you to way hunt show astatine scale.

By pursuing the champion practices outlined present for querying, optimizing, and troubleshooting, you tin get the astir retired of this almighty dataset.

Reading this isn’t going to marque you an adept instantly. This is the archetypal measurement successful your adventure!

If you privation to cognize more, cheque retired Jake Peterson’s blog post, commencement practicing for escaped with Robin Lord’s Lost astatine SQL game, oregon simply enactment tuned due to the fact that I person a fewer much articles coming!

If you person questions oregon queries, bash not hesitate to fto america know.

More resources:


Featured Image: Tee11/Shutterstock