Overview
In Chapter 17 we introduce the student to some algorithms that are the basis of information
storage and retrieval on the Internet. We first show how the World Wide Web is naturally
modeled as a digraph with nodes being web pages and directed edges corresponding to
hyperlink references. We then show how search engines use ranking functions to determine the
order in which web pages are presented in answer to a query by the user. Two ranking
functions, PageRank and HITS, are discussed. The idea of hashing is then introduced, and the
main methods of collision resolution are discussed. Web caching is then described, and the
notion of consistent hashing and its implementation are discussed as an efficient way to
maintain web caches. We finish the chapter by a discussion of the mathematical ideas
underlying the RSA Public-Key Cryptography system, and describe this system in some detail.
Chapter Objectives
After completing the chapter the student will know:
• How the World Wide Web can be modeled by a digraph
• How search engines use forward and reverse indexing to build a database of web pages
when responding to a query
• How search engines give web pages a ranking when presenting answers to a query
• The ideas behind the two main ranking functions for web pages, PageRank and HITS
Instructor Notes
The algorithms discussed in this chapter should impress the student with the importance of