This paper presents a technique for building a search engine using a combination of trie data structure and a unique ‘keyword prefixing’ method. The primary aim of the paper is to reduce keyword access times. It is shown that a trie based approach in conjunction with binary search has a better performance over pure binary search on a lexicographically sorted list of keywords. The paper also discusses the merits of using trie based approach in conjunction with B-Tree based indexing methods that are commonly used in today’s search engines. The method of keyword prefixing is described and the process of keyword list generation is outlined.
Before presenting the technique, some important characteristics of user search and the search keywords are described under this section, as they are central to the search technique developed here.
Although we as users use general purpose search engines for day to day information needs, most of our queries are confined to a specific domain of knowledge. When a user sends a query to the search engine, it is almost always about a particular subject. For example, it could be about medicine or about computers or a recent event or something to do with finance or politics. It rarely cuts across several domains. In other words, a single query is domain-specific most of the time.
A new prefixing method that will be described shortly makes use of this property to achieve better search performance.