* Faculty       * Staff       * Contact       * Institute Directory
* Undergraduate       * Graduate       * Institute Admissions: Undergraduate | Graduate      
* Events       * Institute Events      
* Lab Manual       * Institute Computing      
No Menu Selected

* Research

Ph.D. Theses

A Comparison of Keyword-Based and Semantics-Based Searching

By David Goldschmidt
Advisor: Mukkai Krishnamoorthy
April 26, 2005

Since its emergence in the early 1990s, the World Wide Web has rapidly evolved into a global information space of incomparable size. Keyword-based search engines such as Google's attempt to index Web pages for the benefit of human users. Sophisticated as such search engines have become, they are still often unable to bridge the gap between HTML and the human. Surveys indicate that almost 25% of Web searchers are unable to find useful results in the first set of URLs that are returned. Further, the majority of human-oriented data on the Web prohibits software agents from providing humans with more efficient and effective means of processing such information.

To help solve this problem, Tim Berners-Lee, the inventor of the Web, has architected the Semantic Web in which machine-interpretable information provides an automated means for machines to traverse the Web on behalf of their human counterparts. Though its foundation has been established, the Semantic Web and its related applications are still in their infancy. As the Semantic Web continues to form, a necessary cornerstone application is the search engine capable of tying components of the Semantic Web together into a comprehensive and traversable landscape.

Through our research, we have architected a Semantic Web Search Engine (SWSE) that performs semantics-based searching, providing more predictable and accurate results. We have also implemented a prototype of SWSE that serves as a proof of concept. To evaluate our efforts and compare keyword-based searching to semantics-based searching, we constructed the Google CruciVerbalist (GCV), which attempts to solve crossword puzzles by reformulating the clues into "Google-friendly" queries that are sent to Google via the Google API. Results are used to derive candidate answers to clues, which are inputted into the grid-solving components of GCV.

As a culmination of our research, GCV was integrated with the SWSE prototype to quantitatively show how semantics-based searching improves upon keyword-based searching in machine intelligence. Mimicking the human braini's ability to create and traverse relationships between facts, our research helps the Web to "think" using semantics-based reasoning, opening the door to intelligent search applications using the Semantic Web.

* Return to main PhD Theses page