Search engine have been widely used to find some documents for many reasons. One of frequently used kind of document is office document. Office document is classified as semi-structured document because sometimes they have consistent structure in a document category. Office document also has various categories and formats. To build a search engine, there are two main processes that must be implemented. Those processes are indexing process and query process. Every process consists of some methods that has some function for each of them. Not all kind of methods can be used and implemented for that processes. A suitable method needs to be selected in order to produce an optimal search engine for a specific defined domain. This paper will explain how to recognize office document's pattern that will be used to build a search engine. It will also explain about selection of methods that were used to build an optimal search engine in office document domain. This search engine will be evaluated with some testing scenario to calculate its precision for some queries and to know how optimal it is. This proposed search engine more focus on having effective result than efficiency of processing. However, the evaluation still covers both of effectiveness and efficiency of the system.
[1]
F. Tala.
A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia
,
2003
.
[2]
Catherine Roussey,et al.
DOCUMENT CLASSIFICATION Combining Structure and Content
,
2011,
ICEIS 2011.
[3]
Kai Zheng,et al.
Supporting information retrieval from electronic health records: A report of University of Michigan's nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE)
,
2015,
J. Biomed. Informatics.
[4]
Zhiyong Lu,et al.
Best Match: New relevance search for PubMed
,
2018,
PLoS biology.
[5]
Charles L. A. Clarke,et al.
Information Retrieval - Implementing and Evaluating Search Engines
,
2010
.
[6]
Pongpisit Wuttidittachotti,et al.
COMPARATIVE WEIGHTING METHODS OF VECTOR SPACE MODEL
,
2015
.
[7]
Soumya K. Ghosh,et al.
Optical Character Recognition Systems for Different Languages with Soft Computing
,
2016,
Studies in Fuzziness and Soft Computing.
[8]
W. Bruce Croft,et al.
Search Engines - Information Retrieval in Practice
,
2009
.