Questionnaire Free Text Summarisation Using Hierarchical Classification

This paper presents an investigation into the summarisation of the free text element of questionnaire data using hierarchical text classification. The process makes the assumption that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. A hierarchical classification approach is suggested which offers the advantage that different levels of classification can be used and the summarisation customised according to which branch of the tree the current document is located. The approach is evaluated using free text from questionnaires used in the SAVSNET (Small Animal Veterinary Surveillance Network) project. The results demonstrate the viability of using hierarchical classification to generate free text summaries.

[1]  Michael Granitzer,et al.  Hierarchical Text Classication using Methods from Machine Learning , 2003 .

[2]  Brendan T. O'Connor,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics , 2011 .

[3]  M. Saravanan,et al.  Summarization and categorization of text data in high-level data cleaning for information retrieval , 2003, Appl. Artif. Intell..

[4]  Spiridon D. Likothanassis,et al.  Best terms: an efficient feature-selection algorithm for text categorization , 2005, Knowledge and Information Systems.

[5]  Rehab Duwairi,et al.  A hierarchical K-NN classifier for textual data , 2011, Int. Arab J. Inf. Technol..

[6]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[7]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[8]  Alan Radford,et al.  A Semi-Automated Approach to Building Text Summarisation Classifiers , 2012, MLDM.

[9]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[10]  Juho Rousu,et al.  Learning hierarchical multi-category text classification models , 2005, ICML.

[11]  Inderjit S. Dhillon,et al.  Enhanced word clustering for hierarchical text classification , 2002, KDD.

[12]  Horacio Rodríguez,et al.  Approaches to Text Summarization: Questions and Answers , 2004, Inteligencia Artif..

[13]  Abdelmajid Ben Hamadou,et al.  Automatic Text Summarization of Scientific Articles Based on Classification of Extract's Population , 2003, CICLing.

[14]  Alemu Kumilachew Hierarchical Amharic News Text Classification , 2011 .

[15]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[16]  Xuanjing Huang,et al.  Hierarchical Text Classification with Latent Concepts , 2011, ACL.

[17]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  Susan Gauch,et al.  Hierarchical Text Classification , .

[19]  I. Buchan,et al.  Developing a network for small animal disease surveillance , 2010, Veterinary Record.

[20]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[21]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[22]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[23]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[24]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[25]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[26]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[27]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[28]  Jihoon Yang,et al.  A Fast Algorithm for Hierarchical Text Classification , 2000, DaWaK.

[29]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[30]  Thomas Hofmann,et al.  Text classification in a hierarchical mixture model for small training sets , 2001, CIKM '01.

[31]  Alan Radford,et al.  An Investigation Concerning the Generation of Text Summarisation Classifiers Using Secondary Data , 2011, MLDM.

[32]  Dilek Z. Hakkani-Tür,et al.  Concept-based classification for multi-document summarization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Feng Gao,et al.  Large-Scale Hierarchical Text Classification Based on Path Semantic Vector and Prior Information , 2009, 2009 International Conference on Computational Intelligence and Security.