How does a search engine work?

Search Engine Optimization (SEO) Blog Series

Have you ever wondered how to get your business website to rank higher in Google? If you desperately want to answer this question, then you need to know what Search Engine Optimization (SEO) is. In this blog series, I will give you a brief introduction to SEO and explore the basic SEO techniques available today.

Before you understand what SEO is, you need to understand how a (web based) search engine works. A search engine consists of three important operations:

  • Web crawling
  • Indexing
  • Serving results

Web Crawling

A web crawler or web spider is a piece of automated software that systematically (based on rules) browses the World Wide Web and collects information to be used for indexing. A real world analogy of this would be you visiting every bus stop (website) in the city you live in (the Internet) and taking a picture of the bus schedule in each stop (gathering content for the index). A search engine website such as Google has many web crawlers (known as Googlebots in their case) since there are billions of pages on the Internet. Web crawling is a never-ending process due to the fact that the Internet is always growing.

Going to every single bus stop in Toronto would be a hectic task
Going to every single bus stop in Toronto would be a hectic task

The web crawler process typically begins with a list of addresses of web pages usually generated from the previous web crawl process. The web crawler visits each of these web pages and detects links to other web pages. These newly detected links are added to the list of pages to crawl. The crawler also saves the content of the web page in order to be indexed later. The web crawler process ends when there are no more web pages to crawl or when an algorithmic condition is met. Only crawl 1,000 web pages is an example of an algorithmic condition.


Once the web crawler has completed the content gathering processes, the index table needs to then be created or updated. An index table is used because of the speed benefits it provides when the search engine results are returned to the user. The creation or updating of an index table is usually a long process, however this is acceptable since the process is hidden from the user. The major steps of building an index are:

1. Collect the documents returned from the web crawler. For example, suppose the web crawler returned the following documents:

Document Content
D1 Humpty Dumpty sat on a wall, Humpty Dumpty had a great fall.
D2 Jack and Jill went up the hill to fetch a pail of water.
D3 Private Jack Powers tried to save General George Humpty by climbing the Berlin wall.


2. Remove stop words and punctuation marks from the documents. Stop words are extremely common words in the English language like “a”, “the” and “or”. These words are removed in order to improve efficiency of the search engine when returning results.

Document Content
D1 Humpty Dumpty sat wall Humpty Dumpty great fall
D2 Jack Jill went up hill fetch pail water
D3 Private Jack Powers tried save General George Humpty climbing Berlin wall


3. More linguistic processing is completed by converting each word to its root word. For example, “climbing” to “climb” or “friends” to “friend” or “children” to “child”.

4. Create an index of terms where it contains the document and the frequency in which the word occurs. Below is a sample of the index based on the content above:

Term Document, Frequency
Humpty D1, 2 D3, 1
Jack D2, 1 D3, 1
Jill D2, 1
Powers D2

This example is a very simple indexing method. Search engines today use more complex techniques. The frequency of a term in a document is an important property, however other properties (such as positioning of a term in a document or the geographical location of the server hosting the document) can also be added to the index table.


Serving results

The serving results process occurs once the user enters words (the query) into the search engine and presses okay. Suppose the user searched the following query “I love humpty chips”. Obviously, the search engine returns Documents D1 and D3, but which document is more relevant to the user? This is not an issue for this user since the sample size is so small. But what if it is not? Search engines today rank documents using a sophisticated (secret) technique based on many factors such as the frequency property in our previous example. Overall, the main objective of Search Engine Optimization (SEO) is the understanding of what those factors are in order to improve your website’s rank.


Further Readings

Here is a selected list of readings you can do if you would like to gain more knowledge of how search engine works:

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>