Main Components of a Search Engine
Web Crawling
Hardware setup
Web Crawling
Simple Breadth-First Search Crawler
insert set of initial URLs into a queue Q
while Q is not empty
currentURL = dequeue(Q)
download page from currentURL
for any hyperlink found in the page
if hyperlink is to a new page
enqueue hyperlink URL into QLast updated