Identifying and improving retrieval for procedural questions

People use questions to elicit information from other people in their everyday lives and yet the most common method of obtaining information from a search engine is by posing keywords. There has been research that suggests users are better at expressing their information needs in natural language, however the vast majority of work to improve document retrieval has focused on queries posed as sets of keywords or Boolean queries. This paper focuses on improving document retrieval for the subset of natural language questions asking about how something is done. We classify questions as asking either for a description of a process or asking for a statement of fact, with better than 90% accuracy. Further we identify noncontent features of documents relevant to questions asking about a process. Finally we demonstrate that we can use these features to significantly improve the precision of document retrieval results for questions asking about a process. Our approach, based on exploiting the structure of documents, shows a significant improvement in precision at rank one for questions asking about how something is done.

[1]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Michael Collins,et al.  Answer Extraction , 2000, ANLP.

[4]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[5]  Ben Taskar,et al.  Probabilistic Models of Text and Link Structure for Hypertext Classification , 2001 .

[6]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[7]  Adwait Ratnaparkhi,et al.  IBM's Statistical Question Answering System , 2000, TREC.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[12]  William P. Birmingham,et al.  Improving category specific Web search by learning query modifications , 2001, Proceedings 2001 Symposium on Applications and the Internet.

[13]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[14]  A. Spink,et al.  Web Search: Public Searching of the Web (Information Science and Knowledge Management) , 2005 .

[15]  Sanda M. Harabagiu,et al.  LASSO: A Tool for Surfing the Answer Net , 1999, TREC.

[16]  Howard R. Turtle Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.

[17]  Jeffrey Pomerantz A linguistic analysis of question taxonomies: Research Articles , 2005 .

[18]  Julian Kupiec,et al.  MURAX: a robust linguistic approach for question answering using an on-line encyclopedia , 1993, SIGIR.

[19]  Nicholas J. Belkin,et al.  Features of documents relevant to task- and fact- oriented questions , 2002, CIKM '02.

[20]  W. Bruce Croft,et al.  Analysis of Statistical Question Classification for Fact-Based Questions , 2005, Information Retrieval.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[23]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[24]  Jeffrey Pomerantz,et al.  A linguistic analysis of question taxonomies , 2005, J. Assoc. Inf. Sci. Technol..

[25]  2004 Symposium on Applications and the Internet (SAINT 2004), 26-30 January 2004, Tokyo, Japan , 2004, SAINT.

[26]  David A. Thompson,et al.  Interface design for an interactive information retrieval system: A literature survey and a research system description , 1971 .

[27]  Nicholas J. Belkin,et al.  Helping people find what they don't know , 2000, CACM.

[28]  Nicholas J. Belkin,et al.  Interaction in Information Retrieval: Trends Over Time , 1999, J. Am. Soc. Inf. Sci..

[29]  Nicholas J. Belkin,et al.  Rutgers' TREC 2001 Interactive Track Experience , 2001, TREC.

[30]  Charles T. Meadow,et al.  The analysis of information systems , 1967 .

[31]  Michael E. Lesk,et al.  Interactive search and retrieval methods using automatic information displays , 1899, AFIPS '69 (Spring).

[32]  Xin Fu,et al.  The loquacious user: a document-independent source of terms for query expansion , 2005, SIGIR '05.

[33]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[34]  Kevin Crowston,et al.  Reproduced and emergent genres of communication on the World-Wide Web , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[35]  Giovanni Guida,et al.  User modeling in intelligent information retrieval , 1987, Inf. Process. Manag..

[36]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[37]  Elaine G. Toms,et al.  Does Genre Define the Shape of Information? The Role of Form and Function in User Interaction with Digital Documents. , 1999 .

[38]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[39]  W. Bruce Croft,et al.  Task orientation in question answering , 2002, SIGIR '02.

[40]  Robert S. Taylor The process of asking questions , 1962 .

[41]  Amanda Spink,et al.  Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.

[42]  Peter Ingwersen,et al.  Search Procedures in the Library - Analysed from the Cognitive Point of View , 1982, J. Documentation.