Using question classification to model user intentions of different levels

User information need detection is a fundamental issue in automatic question answering systems. Based on real questions collected from on-line question answering communities, this paper proposes a three-level question type taxonomy to model user information need. The three levels are based on interrogative patterns, hidden user intentions and specific answer expectations. One question can have multiple types in level 2&3. Question type assignment of level 2&3 is subjective-orientated, and may vary between different users. Shallow lexical, syntactic and semantic features are used to model the inherent subjectivity of user intentions. Classification experiments are conducted on a corpus of real questions collected from the web. Different machine learning methods are employed. Experimental results are promising. This indicates the capability of modeling user information need and subjectivity statistically, and that strong correlations exist between question types of the same level.

[1]  Eric Horvitz,et al.  Using Machine Learning Techniques to Interpret WH-questions , 2001, ACL.

[2]  Noriko Tomuro,et al.  Interrogative Reformulation Patterns and Acquisition of Question Paraphrases , 2003, IWP@ACL.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  Jimmy J. Lin,et al.  The role of context in question answering systems , 2003, CHI Extended Abstracts.

[6]  Sanda M. Harabagiu,et al.  Intentions, Implicatures and Processing of Complex Questions , 2004, HLT-NAACL 2004.

[7]  Yong Yu,et al.  Understanding and Summarizing Answers in Community-Based Question Answering Services , 2008, COLING.

[8]  Jimmy J. Lin,et al.  What Makes a Good Answer? The Role of Context in Question Answering , 2003, INTERACT.

[9]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[10]  Chuan-Jie Lin,et al.  Question Pre-Processing in a QA System on Internet Discussion Groups , 2006 .

[11]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[12]  Myung-Gil Jang,et al.  Descriptive Question Answering in Encyclopedia , 2005, ACL.

[13]  Eugene Agichtein,et al.  CoCQA: Co-Training over Questions and Answers with an Application to Predicting Question Subjectivity Orientation , 2008, EMNLP.

[14]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[15]  Sanda M. Harabagiu,et al.  Employing Two Question Answering Systems in TREC 2005 , 2005, TREC.

[16]  Sanda M. Harabagiu,et al.  Answering complex questions with random walk models , 2006, SIGIR '06.