Research on computerised Information Search and Retrieval can be traced back to the late 1950s, and a signicant number of retrieval techniques have been invented during the period. Specially, profound evolution in this research eld has occurred along with the birth and proliferation of the World Wide Web. Today, search engines built on various Information Retrieval techniques have thoroughly changed the ways that people search and acquire information and knowledge. Nevertheless, in many situations search engines have difculties in retrieving relevant and quality information despite the fact that they make strenuous efforts to expand indices and rectify ranking functions. The challenge is amplied by the large scale and the continuous growth of the Web.
As an extension of the current Web, the semantic Web provides a graceful framework which facilitates knowledge representation and logical inference among distributed sources. In recent years, researchers from various research elds attempted to integrate semantic Web technologies (e.g., ontologies) into Information Retrieval in order to improve the retrieval process, which results in a new search paradigm: semantic search. Much of the research effort on semantic search has been made to improve the retrieval accuracy and to support natural navigation in rich information space by exploiting inference capability of semantic Web languages.
Investigation of related works in the literature reveals two issues which are of paramount importance: knowledge acquisition (e.g., ontology learning in particular) and entity ranking, have not been sufciently studied. In this thesis, we first propose a model for a semantic search framework. The model identifes fundamental components and their connections in common semantic search systems, and serves as the high-level reference model in this thesis. We then focus on discussing two important issues: domain ontology learning and ranking for semantic search systems.
We have developed a new probabilistic method which automatically learns domain ontologies from unstructured text corpus. The method consists of several essential steps: rst, ontological concepts are extracted from a text corpus and transformed into representations in low dimensional semantic space which is computed through the use of probabilistic topic models. Second, Information Theory Principle for Concept Relationships is proposed for establishing relationships between concepts. We have developed two iterative algorithms to organise the concepts into a domain ontology. To assess practical performance of the method, extensive experiments are performed and the results are evaluated by domain experts using popular performance measures from the eld of Information Retrieval. The results are also compared to those generated by some existing methods employing the same dataset. The comparison study shows that our proposed method considerably outperforms the existing methods.
In semantic search, ranking methods are required to rank not only documents, but also generic entities of various types. Taking this into consideration, we propose a model called Rational Research which intuitively simulates a research environment and reects searching behaviour of researchers. The ontologies, more specically, schema ontology, knowledge base, and domain ontology learned using our proposed method are modelled as a directed and weighted graph. Entities in the ontologies represent objects in the real world (e.g., publication, author, and journal), and links between those entities represent the activities of researchers (e.g., browsing an author's publication list or journal issues). On the basis of the Rational Research model we have developed a new algorithm called RareRank for ranking different types of entities in semantic search systems. Experiments are conducted by applying the RareRank algorithm on the retrieved entities with a number of test queries. The results are evaluated using ranking performance measures and compared to existing algorithms. It is shown that >RareRankperforms better than existing algorithms in this experimental study.
To demonstrate the effectiveness of our proposed solutions for ontology learning and entity ranking, a semantic search prototype called IRIS@UNMC is developed for searching and retrieving research related information. It offers entity retrieval and ranking by implementing the RareRank algorithm. Furthermore, the learned domain ontology is used to provide users navigation support and query suggestion.