The basic idea of "popularity ranking" (also called prestige ranking) is to find pages that are popular , and to rank them higher than other pages that contain the specified keywords. Since most searches are intended to find information from popular pages, ranking such pages higher is generally good idea. For instance , the term "google" may occur in vast numbers of pages, but the page google.com is the most popular among the pages that contain the term "google". The page google.com should therefore be ranked as the most relevant answer to a query consisting of the term "google".
Traditional measures of relevance of a page such as the TF-IDF based measures, can be combined with the popularity of the page to get an overall measure of the relevance of the page to the query. Pages with the highest overall relevance value are returned as the top answers to a query.
This raises the question of how to define and how to find the popularity of a page. One way would be to find how many times a page is accessed and use the number as a measure of the sites popularity. However, getting such information is impossible without the cooperation of the site, and is infeasible for a Web-search engine to implement.
A very effective alternative is to use hyperlinks to a page as a measure of its popularity. Many people have bookmark files that contain links to sites that they use frequently. Sites that appear in a large number of bookmark files can be inferred to very popular sites. Bookmark files are usually stored privately and not accessible on the web. However, many users do maintain Web pages with links to their favorite Web pages. Many Web sites also have links to other related sites, which can also be infer the popularity of the linked sites. A Web search engine can fetch Web pages(by a process called crawling) and analyze them to find links between the pages.
A first solution to estimating the popularity of a page is to use the number of pages that link to the page as a measure of its popularity. However, this by itself has the drawback that may sites have a large number of useful pages, yet external links often point to the root page of the site. The root page in turn has links to other pages in the site. These other pages would then be wrongly inferred to be not very popular, and would have a low ranking when answering a queries.
Tuesday, November 25
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment