Microsoft has released a patent named Web Content Reliability Classification that talks about how to develop a reliability score for a website or content on a website. The patent seems like it can be used by the Bing Search team for better ranking of websites and web content, but that does not mean it is currently being used in the Bing Search results today.
The patent was published on November 2, 2023 after being filed on July 5, 2023 – you can read it over here.
Highlights. Here are some interesting highlights from this patent application.
- The reliability score can be used to block content, rank content, provide a content warning, and select a source to answer a question, along with other uses.
- Traffic data can indicate whether a source is popular, but popular is not the same thing as reliable.
- Natural language processing can be used to determine whether online content is grammatical, but grammatical is also not the same thing as reliable.
- The present technology identifies reliable content by leveraging expert scoring for a small amount of web content by iteratively extending these scores to other content based on how web content is linked.
- User interactions may also be leveraged for determining a reliability score as well
- The high reliability score is generated by first identifying high reliability online content within a web graph.
- These initially scored sites may be described as seed sites.
- Ratings for the seed sites may be taken from authoritative lists of known reliable content providers
- An output of the technology is a high reliability score and a low reliability score for a web content.
- Different applications can consume this score to perform or guide different functions, including search, filtering, content warning generation, and the like.
The abstract. Here is the abstract of the patent:
Technology described herein assigns a reliability score to web content, such as a web site or portion of a website. In one aspect, an output of the technology is a high reliability score and a low reliability score for a web content. The high reliability score represents conformance to high reliability sites, while the low reliability score represents conformance to low reliability sites. The high reliability score may be generated by first identifying high reliability online content within a compressed web graph. In a first iteration, the high reliability score of the seeds is used to score online content that is linked to the seed sites. At a high level, the more links that originate from high reliability sources, the higher the reliability score for the linked content. The low reliability score is similar, but uses outgoing links to low reliability sites instead of incoming links from high reliability sites.
Why we care. Many SEOs enjoy reading patent documents from the Google and Bing Search teams. While we know that just because a patent has been filed, it does not mean a search engine is using the technology as described in the patent in the live search results. Either way, it can be educational and useful to understand how these search scientists who work at Google and Bing think about these ranking and scoring challenges.
Hat tip to Glenn Gabe for spotting this patent.
New on Search Engine Land