How To Use Google Sheets For Web Scraping With AI via @sejournal, @andreaatzori

1 month ago 20
ARTICLE AD BOX

Scraping information from webpages is simply a comparatively precocious task that, until recently, required a grade of method skill. The thought of diving into codification oregon scripts for information extraction seemed overwhelming for many, myself included.

Data scraping tin powerfulness galore SEO tasks, specified arsenic auditing, rival analysis, and examining website and information structure.

Google sheets offers simple solutions to help.

One of those solutions is the IMPORTXML relation that allows users to scrape webpage information utilizing conscionable a fewer parameters. It makes information extraction accessible to a wider audience, particularly to those who were not well-versed successful programming languages.

While this relation is impressive, the existent breakthrough came with the adoption and integration of generative AI into the mix.

In this guide, we’ll amusement you however to usage Google Sheets and AI, peculiarly ChatGPT, for web scraping without needing precocious coding skills.

The Tools: AI And Chatbots

We are present each acquainted with AI, ChatGPT, and akin chatbots.

In fact, galore of america usage solutions similar ChatGPT to constitute our ain code, scripts, and programs without oregon with precise constricted programming knowledge.

It is arsenic elemental arsenic providing elaborate instructions successful the signifier of prompts and moving with the chatbot to physique tools that lone until precocious we believed were mode supra us.

But astir importantly, these are tools that are profoundly changing the mode we attack our day-to-day work.

For example, if we inquire ChatGPT the pursuing question, “What is the IMPORTXML relation and however tin I usage it successful Google Sheets to scrape the rubric of an HTML webpage? Provide the indispensable codification to bash that successful Google Sheets,” the effect is highly accurate. In a substance of seconds, we person our look acceptable to usage successful Google Sheets.

But to beryllium honest, that was a precise basal and elemental task that we could person easy completed without ChatGPT.

The Task

So, however does this enactment if we privation to extract information that is simply a spot little modular compared to a leafage rubric oregon description?

For example, however does this enactment if we privation to extract the pursuing information from the PPC beforehand leafage of Search Engine Journal?

List each featured articles, their authors, the nexus URLs, and the nonfiction statement for the columns listed connected https://www.searchenginejournal.com/category/paid-media/pay-per-click/.

Can we bash that straight with ChatGPT?

Executing With ChatGPT

When creating prompts, it took a fewer attempts to supply instructions that were elaborate capable for the chatbot to afloat recognize the nonsubjective of the task and instrumentality bully results.

In galore cases, it felt similar the AI was nether unit to instrumentality speedy results contempt their accuracy.

But fto maine explain.

The task was to analyse the leafage and database each featured articles, their authors, the nexus URLs, and the statement for each of the 30 articles listed connected the page. Then compile the information into a array and yet export it into a CSV file.

Simple right?

At first, ChatGPT returned conscionable a illustration of 7 articles and lone their titles and URLs; aft a reworked prompt, it managed to database and export each 30 articles and their links.

Now, that was good. So, to implicit the task, we conscionable needed to adhd the authors and the nonfiction descriptions.

But present is wherever the bot stumbled and was not capable to supply an close statement of each nonfiction contempt america providing examples of the leafage constituent it needed to find and copy.

ChatGPT kept ignoring the instructions and providing its ain nonfiction descriptions clip and clip again.

ChatGPT adjacent failed erstwhile we tried with a antithetic attack and downloaded and uploaded a transcript of the leafage HTML.

ChatGPT extractScreenshot from ChatGPT, February 2024

This time, it was capable to supply close information for 7 articles but couldn’t spell past that. The contented reported:

“…the operation and contented of the leafage contiguous important challenges for broad information extraction successful a azygous session.

The leafage is rather extended and complex, and it’s not feasible to extract each 30 articles successful the existent format of interaction.”

ChatGPT extracting from 30 articlesScreenshot from ChatGPT, February 2024

ChatGPT + Google Sheets

So, going backmost to IMPORTXML and Google Sheets.

This time, getting ChatGPT to supply the formulas for each tract was similar a breeze.

 ChatGPT extracting instructionsScreenshot from ChatGPT, February 2024

Here are immoderate of the formulas, arsenic suggested by the chatbot, that you tin easy effort yourself successful Google Sheets to extract:

Title

=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/h2/a")

Author Name

=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/p[1]/a")

URL Link

=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/h2/a/@href")

Description

=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/p[2]")

In nary time, we were capable to extract the information into the spreadsheet.

Google SheetsScreenshot from Google Sheets, February 2024

Additionally, by utilizing simply built nested formulas, we tin rapidly propulsion the information from aggregate pages astatine the aforesaid time.

In the illustration below, I was capable to extract the aforesaid information related to each nonfiction (title, author, URL link, and description) for the archetypal 10 pages of the PPC section.

The effect is simply a full of 300 articles scraped successful little than a minute!

Google Sheets extract resultsScreenshot from Google Sheets, February 2024

Comparing The Two

So, however bash ChatGPT vs. ChatGPT + Google Sheets IMPORTXML compare?

In my experience, I could not find an casual and speedy mode to usage ChatGPT to scrape the information I was looking for – caput you, that doesn’t mean that this is not possible, and determination mightiness beryllium respective ways to bash this, but I didn’t find any.

What worked for maine was a operation of the antithetic tools, and that served maine truly good for my intended purpose.

ChatGPT was highly utile for penning the IMPORTXML formulas I needed to usage successful Google Sheets, and those formulas did the rest.

An further bonus of the ChatGPT + Google Sheets enactment is that you tin conscionable usage the escaped 3.5 mentation of ChatGPT and get the instrumentality to physique your IMPORTXML formulas, alternatively of having mentation 4 to scan the leafage and extract the data.

Key Takeaway

This highlights a captious facet of however AI has transformed however we deliberation and work.

The champion instrumentality for the occupation isn’t simply utilizing AI, Google Sheets, oregon immoderate circumstantial bundle unsocial but alternatively a combination of tools and skills.

It’s successful this integrated attack that we make workflows that are businesslike and effective, frankincense improving our wide productivity.

More resources: 


Featured Image: Visual Generation/Shutterstock