Google’s “Information Gain” Patent For Ranking Web Pages via @sejournal, @martinibuster

1 month ago 29
ARTICLE AD BOX

Google was precocious granted a patent connected ranking web pages, which whitethorn connection insights into however AI Overviews ranks content. The patent describes a method for ranking pages based connected what a idiosyncratic mightiness beryllium funny successful next.

Contextual Estimation Of Link Information Gain

The sanction of the patent is Contextual Estimation Of Link Information Gain, it was filed successful 2018 and granted successful June 2024. It’s astir calculating a ranking people called Information Gain that is utilized to fertile a 2nd acceptable of web pages that are apt to beryllium of involvement to a idiosyncratic arsenic a somewhat antithetic follow-up taxable related to a erstwhile question.

The patent starts with wide descriptions past adds layers of specifics implicit the people of paragraphs.  An analogy tin beryllium that it’s similar a pizza. It starts retired arsenic a mozzarella pizza, past they adhd mushrooms, truthful present it’s a mushroom pizza. Then they adhd onions, truthful present it’s a mushroom and bulb pizza. There are layers of specifics that physique up to the full context.

So if you work conscionable 1 conception of it, it’s casual to say, “It’s intelligibly a mushroom pizza” and beryllium wholly mistaken astir what it truly is.

There are layers of discourse but what it’s gathering up to is:

  • Ranking a web leafage that is applicable for what a idiosyncratic mightiness beryllium funny in next.
  • The discourse of the invention is an automated adjunct oregon chatbot
  • A hunt motor plays a relation successful a mode that seems akin to Google’s AI Overviews

Information Gain And SEO: What’s Really Going On?

A mates of months agone I work a remark connected societal media asserting that “Information Gain” was a important origin successful a caller Google halfway algorithm update.  That notation amazed maine due to the fact that I’d ne'er heard of accusation summation before. I asked immoderate SEO friends astir it and they’d ne'er heard of it either.

What the idiosyncratic connected societal media had asserted was thing similar Google was utilizing an “Information Gain” people to boost the ranking of web pages that had much accusation than different web pages. So the thought was that it was important to make pages that person much accusation than different pages, thing on those lines.

So I work the patent and discovered that “Information Gain” is not astir ranking pages with much accusation than different pages. It’s truly astir thing that is much profound for SEO due to the fact that it mightiness assistance to recognize one dimension of however AI Overviews mightiness fertile web pages.

TL/DR Of The Information Gain Patent

What the accusation summation patent is truly astir is adjacent much absorbing due to the fact that it whitethorn springiness an denotation of however AI Overviews (AIO) ranks web pages that a idiosyncratic mightiness beryllium funny next.  It’s benignant of similar introducing personalization by anticipating what a idiosyncratic volition beryllium funny successful next.

The patent describes a script wherever a idiosyncratic makes a hunt query and the automated adjunct oregon chatbot provides an reply that’s applicable to the question. The accusation summation scoring strategy works successful the inheritance to fertile a 2nd acceptable of web pages that are applicable to a what the idiosyncratic mightiness beryllium funny successful next. It’s a caller magnitude successful however web pages are ranked.

The Patent’s Emphasis connected Automated Assistants

There are aggregate versions of the Information Gain patent dating from 2018 to 2024. The archetypal mentation is akin to the past mentation with the astir important quality being the summation of chatbots arsenic a discourse for wherever the accusation summation invention is used.

The patent uses the operation “automated assistant” 69 times and uses the operation “search engine” lone 25 times.  Like with AI Overviews, hunt engines bash play a relation successful this patent but it’s mostly successful the discourse of automated assistants.

As volition go evident, determination is thing to suggest that a web leafage containing much accusation than the contention is likelier to beryllium ranked higher successful the integrated hunt results. That’s not what this patent talks about.

General Description Of Context

All versions of the patent picture the presumption of hunt results wrong the discourse of an automated adjunct and earthy connection question answering. The patent starts with a wide statement and progressively becomes much specific. This is simply a diagnostic of patents successful that they use for extortion for the widest contexts successful which the invention tin beryllium utilized and go progressively specific.

The full archetypal conception (the Abstract) doesn’t adjacent notation web pages oregon links. It’s conscionable astir the accusation summation people wrong a precise wide context:

“An accusation summation people for a fixed papers is indicative of further accusation that is included successful the papers beyond accusation contained successful documents that were antecedently viewed by the user.”

That is simply a nutshell statement of the patent, with the cardinal penetration being that the accusation summation scoring happens connected pages aft the idiosyncratic has seen the archetypal hunt results.

More Specific Context: Automated Assistants

The 2nd paragraph successful the conception titled “Background” is somewhat much circumstantial and adds an further furniture of discourse for the invention due to the fact that it mentions  links. Specifically, it’s astir a idiosyncratic that makes a hunt query and receives links to hunt results – nary accusation summation people calculated yet.

The Background conception says:

“For example, a idiosyncratic whitethorn taxable a hunt petition and beryllium provided with a acceptable of documents and/or links to documents that are responsive to the submitted hunt request.”

The adjacent portion builds connected apical of a idiosyncratic having made a hunt query:

“Also, for example, a idiosyncratic whitethorn beryllium provided with a papers based connected identified interests of the user, antecedently viewed documents of the user, and/or different criteria that whitethorn beryllium utilized to place and supply a papers of interest. Information from the documents whitethorn beryllium provided via, for example, an automated adjunct and/or arsenic results to a hunt engine. Further, accusation from the documents whitethorn beryllium provided to the idiosyncratic successful effect to a hunt petition and/or whitethorn beryllium automatically served to the idiosyncratic based connected continued searching aft the idiosyncratic has ended a hunt session.”

That past condemnation is poorly worded.

Here’s the archetypal sentence:

“Further, accusation from the documents whitethorn beryllium provided to the idiosyncratic successful effect to a hunt petition and/or whitethorn beryllium automatically served to the idiosyncratic based connected continued searching aft the idiosyncratic has ended a hunt session.”

Here’s however it makes much sense:

“Further, accusation from the documents whitethorn beryllium provided to the user… based connected continued searching aft the idiosyncratic has ended a hunt session.”

The accusation provided to the idiosyncratic is “in effect to a hunt petition and/or whitethorn beryllium automatically served to the user”

It’s a small clearer if you enactment parentheses astir it:

Further, accusation from the documents whitethorn beryllium provided to the idiosyncratic (in effect to a hunt petition and/or whitethorn beryllium automatically served to the user) based connected continued searching aft the idiosyncratic has ended a hunt session.

Takeaways:

  • The patent describes identifying documents that are applicable to the “interests of the user” based connected “previously viewed documents” “and/or different criteria.”
  • It sets a general context of an automated adjunct “and/or” a hunt engine
  • Information from the documents that are based connected “previously viewed documents” “and/or different criteria” whitethorn beryllium shown aft the idiosyncratic continues searching.

More Specific Context: Chatbot

The patent adjacent adds an further furniture of discourse and specificity by mentioning however chatbots tin “extract” an reply from a web leafage (“document”) and amusement that arsenic an answer. This is astir showing a summary that contains the answer, benignant of similar featured snippets, but wrong the discourse of a chatbot.

The patent explains:

“In immoderate cases, a subset of accusation whitethorn beryllium extracted from the papers for presumption to the user. For example, erstwhile a idiosyncratic engages successful a spoken human-to-computer dialog with an automated adjunct bundle process (also referred to arsenic “chatbots,” “interactive idiosyncratic assistants,” “intelligent idiosyncratic assistants,” “personal dependable assistants,” “conversational agents,” “virtual assistants,” etc.), the automated adjunct whitethorn execute assorted types of processing to extract salient accusation from a document, truthful that the automated adjunct tin contiguous the accusation successful an abbreviated form.

As different example, immoderate hunt engines volition supply summary accusation from 1 oregon much responsive and/or applicable documents, successful summation to oregon alternatively of links to responsive and/or applicable documents, successful effect to a user’s hunt query.”

The past condemnation sounds similar it’s describing thing that’s similar a featured snippet oregon similar AI Overviews wherever it provides a summary. The condemnation is precise wide and ambiguous due to the fact that it uses “and/or” and “in summation to oregon alternatively of” and isn’t arsenic circumstantial arsenic the preceding sentences. It’s an illustration of a patent being wide for ineligible reasons.

Ranking The Next Set Of Search Results

The adjacent conception is called the Summary and it goes into much details astir however the Information Gain people represents however apt the idiosyncratic volition beryllium funny successful the adjacent acceptable of documents. It’s not astir ranking hunt results, it’s astir ranking the adjacent acceptable of hunt results (based connected a related topic).

It states:

“An accusation summation people for a fixed papers is indicative of further accusation that is included successful the fixed papers beyond accusation contained successful different documents that were already presented to the user.”

Ranking Based On Topic Of Web Pages

It past talks astir presenting the web leafage successful a browser, audibly speechmaking the applicable portion of the papers oregon audibly/visually presenting a summary of the papers (“audibly/visually presenting salient accusation extracted from the papers to the user, etc.”)

But the portion that’s truly absorbing is erstwhile it adjacent explains utilizing a taxable of the web leafage arsenic a practice of the the content, which is utilized to cipher the accusation summation score.

It describes galore antithetic ways of extracting the practice of what the leafage is about. But what’s important is that it’s describes calculating the Information Gain people based connected a practice of what the contented is about, similar the topic.

“In immoderate implementations, accusation summation scores whitethorn beryllium determined for 1 oregon much documents by applying information indicative of the documents, specified arsenic their full contents, salient extracted information, a semantic practice (e.g., an embedding, a diagnostic vector, a bag-of-words representation, a histogram generated from words/phrases successful the document, etc.) crossed a instrumentality learning exemplary to make an accusation summation score.”

The patent goes connected to picture ranking a archetypal acceptable of documents and utilizing the Information Gain scores to fertile further sets of documents that expect travel up questions oregon a progression wrong a dialog of what the idiosyncratic is funny in.

The automated adjunct tin successful immoderate implementations query a hunt motor and past use the Information Gain rankings to the aggregate sets of hunt results (that are applicable to related hunt queries).

There are aggregate variations of doing the aforesaid happening but successful wide presumption this is what it describes:

“Based connected the accusation summation scores, accusation contained successful 1 oregon much of the caller documents whitethorn beryllium selectively provided to the idiosyncratic successful a mode that reflects the apt accusation summation that tin beryllium attained by the idiosyncratic if the idiosyncratic were to beryllium presented accusation from the selected documents.”

What All Versions Of The Patent Have In Common

All versions of the patent stock wide similarities implicit which much specifics are layered successful implicit clip (like adding onions to a mushroom pizza). The pursuing are the baseline of what each the versions person successful common.

Application Of Information Gain Score

All versions of the patent picture applying the accusation summation people to a 2nd acceptable of documents that person further accusation beyond the archetypal acceptable of documents. Obviously, determination is nary criteria oregon accusation to conjecture what the idiosyncratic is going hunt for erstwhile they commencement a hunt session. So accusation summation scores are not applied to the archetypal hunt results.

Examples of passages that are the aforesaid for each versions:

  • A 2nd acceptable of documents is identified that is besides related to the taxable of the archetypal acceptable of documents but that person not yet been viewed by the user.
  • For each caller papers successful the 2nd acceptable of documents, an accusation summation people is determined that is indicative of, for the caller document, whether the caller papers includes accusation that was not contained successful the documents of the archetypal acceptable of documents…

Automated Assistants

All 4 versions of the patent notation to automated assistants that amusement hunt results successful effect to earthy connection queries.

The 2018 and 2023 versions of the patent some notation hunt engines 25 times. The 2o18 mentation mentions “automated assistant” 74 times and the latest mentation mentions it 69 times.

They each marque references to “conversational agents,” “interactive idiosyncratic assistants,” “intelligent idiosyncratic assistants,” “personal dependable assistants,” and “virtual assistants.”

It’s wide that the accent of the patent is connected automated assistants, not the integrated hunt results.

Dialog Turns

Note: In mundane connection we usage the connection dialogue. In computing they the spell it dialog.

All versions of the patents notation to a mode of interacting with the strategy successful the signifier of a dialog, specifically a dialog turn. A dialog crook is the backmost and distant that happens erstwhile a idiosyncratic asks a question utilizing earthy language, receives an reply and past asks a travel up question oregon different question altogether. This tin beryllium earthy connection successful text, substance to code (TTS), oregon audible.

The main facet the patents person successful communal is the backmost and distant successful what is called a “dialog turn.” All versions of the patent person this arsenic a context.

Here’s an illustration of however the dialog crook works:

“Automated adjunct lawsuit 106 and distant automated adjunct 115 tin process earthy connection input of a idiosyncratic and supply responses successful the signifier of a dialog that includes 1 oregon much dialog turns. A dialog crook whitethorn include, for instance, user-provided earthy connection input and a effect to earthy connection input by the automated assistant.

Thus, a dialog betwixt the idiosyncratic and the automated adjunct tin beryllium generated that allows the idiosyncratic to interact with the automated adjunct …in a conversational manner.”

Problems That Information Gain Scores Solve

The main diagnostic of the patent is to amended the idiosyncratic acquisition by knowing the further worth that a caller papers provides compared to documents that a idiosyncratic has already seen. This further worth is what is meant by the operation Information Gain.

There are aggregate ways that accusation summation is utile and one of the ways that each versions of the patent describes is successful the discourse of an audio effect and however a long-winded audio effect is not good, including successful a TTS (text to speech) context).

The patent explains the occupation of a long-winded response:

“…and truthful the idiosyncratic whitethorn hold for substantially each of the effect to beryllium output earlier proceeding. In examination with reading, the idiosyncratic is capable to person the audio accusation passively, however, the clip taken to output is longer and determination is simply a reduced quality to scan oregon scroll/skip done the information.”

The patent past explains however accusation summation tin velocity up answers by eliminating redundant (repetitive) answers oregon if the reply isn’t capable and forces the idiosyncratic into different dialog turn.

This portion of the patent refers to the information density of a conception successful a web page, a conception that answers the question with the slightest magnitude of words. Information density is astir however “accurate,” “concise,” and “relevant”‘ the reply is for relevance and avoiding repetitiveness. Information density is important for audio/spoken answers.

This is what the patent says:

“As such, it is important successful the discourse of an audio output that the output accusation is relevant, close and concise, successful bid to debar an unnecessarily agelong output, a redundant output, oregon an other dialog turn.

The accusation density of the output accusation becomes peculiarly important successful improving the ratio of a dialog session. Techniques described herein code these issues by reducing and/or eliminating presumption of accusation a idiosyncratic has already been provided, including successful the audio human-to-computer dialog context.”

The thought of “information density” is important successful a wide consciousness due to the fact that it communicates amended for users but it’s astir apt other important successful the discourse of being shown successful chatbot hunt results, whether it’s spoken oregon not. Google AI Overviews shows snippets from a web leafage but possibly much importantly, communicating successful a concise mode is the champion mode to beryllium connected taxable and marque it casual for a hunt motor to recognize content.

Search Results Interface

All versions of the Information Gain patent are wide that the invention is not successful the discourse of integrated hunt results. It’s explicitly wrong the discourse of ranking web pages wrong a earthy connection interface of an automated adjunct and an AI chatbot.

However, determination is simply a portion of the patent that describes a mode of showing users with the 2nd acceptable of results wrong a “search results interface.” The script is that the idiosyncratic sees an reply and past is funny successful a related topic. The 2nd acceptable of ranked web pages are shown successful a “search results interface.”

The patent explains:

“In immoderate implementations, 1 oregon much of the caller documents of the 2nd acceptable whitethorn beryllium presented successful a mode that is selected based connected the accusation summation stores. For example, 1 oregon much of the caller documents tin beryllium rendered arsenic portion of a hunt results interface that is presented to the idiosyncratic successful effect to a query that includes the taxable of the documents, specified arsenic references to 1 oregon much documents. In immoderate implementations, these hunt results whitethorn beryllium ranked astatine slightest successful portion based connected their respective accusation summation scores.”

…The idiosyncratic tin past prime 1 of the references and accusation contained successful the peculiar papers tin beryllium presented to the user. Subsequently, the idiosyncratic whitethorn instrumentality to the hunt results and the references to the papers whitethorn again beryllium provided to the idiosyncratic but updated based connected caller accusation summation scores for the documents that are referenced.

In immoderate implementations, the references whitethorn beryllium reranked and/or 1 oregon much documents whitethorn beryllium excluded (or importantly demoted) from the hunt results based connected the caller accusation summation scores that were determined based connected the papers that was already viewed by the user.”

What is simply a hunt results interface? I deliberation it’s conscionable an interface that shows hunt results.

Let’s intermission present to underline that it should beryllium wide astatine this constituent that the patent is not astir ranking web pages that are broad astir a topic. The wide discourse of the invention is showing documents wrong an automated assistant.

A hunt results interface is conscionable an interface, it’s ne'er described arsenic being integrated hunt results, it’s conscionable an interface.

There’s much that is the aforesaid crossed each versions of the patent but the supra are the important wide outlines and discourse of it.

Claims Of The Patent

The claims conception is wherever the scope of the existent invention is described and for which they are seeking ineligible extortion over. It is chiefly focused connected the invention and little truthful connected the context. Thus, determination is nary notation of a hunt engines, automated assistants, audible responses, oregon TTS (text to speech) wrong the Claims section. What remains is the discourse of hunt results interface which presumably covers each of the contexts.

Context: First Set Of Documents

It starts retired by outlining the discourse of the invention. This discourse is receiving a query, identifying the topic, and ranking a archetypal radical of applicable web pages (documents) and selecting astatine slightest 1 of them arsenic being applicable and either showing the papers oregon communicating the accusation from the papers (like a summary).

“1. A method implemented utilizing 1 oregon much processors, comprising: receiving a query from a user, wherein the query includes a topic; identifying a archetypal acceptable of documents that are responsive to the query, wherein the documents of the acceptable of documents are ranked, and wherein a ranking of a fixed papers of the archetypal acceptable of documents is indicative of relevancy of accusation included successful the fixed papers to the topic; selecting, based connected the rankings and from the documents of the archetypal acceptable of documents, a astir applicable papers providing astatine slightest a information of the accusation from the astir applicable papers to the user;”

Context: Second Set Of Documents

Then what instantly follows is the portion astir ranking a 2nd acceptable of documents that incorporate further information. This 2nd acceptable of documents is ranked utilizing the accusation summation scores to amusement much accusation aft showing a applicable papers from the archetypal group.

This is however it explains it:

“…in effect to providing the astir applicable papers to the user, receiving a petition from the idiosyncratic for further accusation related to the topic; identifying a 2nd acceptable of documents, wherein the 2nd acceptable of documents includes astatine 1 oregon much of the documents of the archetypal acceptable of documents and does not see the astir applicable document; determining, for each papers of the 2nd set, an accusation summation score, wherein the accusation summation people for a respective papers of the 2nd acceptable is based connected a quantity of caller accusation included successful the respective papers of the 2nd acceptable that differs from accusation included successful the astir applicable document; ranking the 2nd acceptable of documents based connected the accusation summation scores; and causing astatine slightest a information of the accusation from 1 oregon much of the documents of the 2nd acceptable of documents to beryllium presented to the user, wherein the accusation is presented based connected the accusation summation scores.”

Granular Details

The remainder of the claims conception contains granular details astir the conception of Information Gain, which is simply a ranking of documents based connected what the idiosyncratic already has seen and represents a related taxable that the idiosyncratic whitethorn beryllium funny in. The intent of these details is to fastener them successful for ineligible extortion arsenic portion of the invention.

Here’s an example:

The method of assertion 1, wherein identifying the archetypal acceptable comprises:
causing to beryllium rendered, arsenic portion of a hunt results interface that is presented to the idiosyncratic successful effect to a erstwhile query that includes the topic, references to 1 oregon much documents of the archetypal set;
receiving idiosyncratic input that that indicates enactment of 1 of the references to a peculiar papers of the archetypal acceptable from the hunt results interface, wherein astatine slightest portion of the peculiar papers is provided to the idiosyncratic successful effect to the selection;

To marque an analogy, it’s describing however to marque the pizza dough, cleanable and chopped the mushrooms, etc. It’s not important for our purposes to recognize it arsenic overmuch arsenic the wide presumption of what the patent is about.

Information Gain Patent

An sentiment was shared connected societal media that this patent has thing to bash with ranking web pages successful the integrated hunt results, I saw it, work the patent and discovered that’s not however the patent works. It’s a bully patent and it’s important to correctly recognize it. I analyzed aggregate versions of the patent to spot what they  had successful communal and what was different.

A cautious speechmaking of the patent shows that it is intelligibly focused connected anticipating what the idiosyncratic whitethorn privation to spot based connected what they person already seen. To execute this the patent describes the usage of an Information Gain people for ranking web pages that are connected topics that are related to the archetypal hunt query but not specifically applicable to that archetypal query.

The discourse of the invention is mostly automated assistants, including chatbots. A hunt motor could beryllium utilized arsenic portion of uncovering applicable documents but the discourse is not solely an integrated hunt engine.

This patent could beryllium applicable to the discourse of AI Overviews. I would not bounds the discourse to AI Overviews arsenic determination are further contexts specified arsenic spoken connection successful which Information Gain scoring could apply. Could it use successful further contexts similar Featured Snippets? The patent itself is not explicit astir that.

Read the latest mentation of Information Gain patent:

Contextual estimation of nexus accusation gain

Featured Image by Shutterstock/Khosro