Temporal and Contextual Evaluation of Background Knowledge Discovery for Short Text Classification

Background Knowledge BK plays an essential role in machine learning for short-text and non-topical classification. In this paper the authors present and evaluate two Information Retrieval techniques used to assemble four sets of BK in the past seven years. These sets were applied to classify a commercial corpus of search queries by the apparent age of the user. Temporal and contextual evaluations were used to examine results of various classification scenarios providing insight into choice, significance and range of tuning parameters. The evaluations also demonstrated the impact of the dynamic Web collection on classification results, and the advantages of Automatic Query Expansion AQE vs. basic search. The authors discuss other results of this research and its implications on the advancement of short text classification.

[1]  Balachander Krishnamurthy,et al.  Key differences between Web 1.0 and Web 2.0 , 2008, First Monday.

[2]  Tiejun Zhao,et al.  Research on Text Categorization Based on a Weakly-Supervised Transfer Learning Method , 2012, CICLing.

[3]  Jingbo Zhu,et al.  Uncertainty-based active learning with instability estimation for text classification , 2012, TSLP.

[4]  David W. Aha,et al.  Transforming Graph Data for Statistical Relational Learning , 2012, J. Artif. Intell. Res..

[5]  J. Chambers,et al.  The handbook of language variation and change , 2003 .

[6]  Ruth N. Bolton,et al.  Understanding Generation Y and their use of social media: a review and research agenda , 2013 .

[7]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[8]  Wang Meng,et al.  Improving short text classification using public search engines , 2013, IUKM 2013.

[9]  Xin-She Yang,et al.  Chaos-Enhanced Firefly Algorithm with Automatic Parameter Tuning , 2011, Int. J. Swarm Intell. Res..

[10]  Brian Moon,et al.  Automated text classification using a dynamic artificial neural network model , 2012, Expert Syst. Appl..

[11]  Khalid M. AlGhamdi,et al.  Internet use by the public to search for health-related information , 2012, Int. J. Medical Informatics.

[12]  Isak Taksa,et al.  Predicting the cumulative effect of multiple query formulations , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[13]  Santiago P. Torres,et al.  Transmission Expansion Planning by using DC and AC Models and Particle Swarm Optimization , 2013 .

[14]  Pernilla Qvarfordt,et al.  Looking ahead: query preview in exploratory search , 2013, SIGIR.

[15]  G. Fornarelli,et al.  Swarm Intelligence for Electric and Electronic Engineering , 2012 .

[16]  Evgeniy Gabrilovich,et al.  Harnessing the Expertise of 70, 000 Human Editors: Knowledge-Based Feature Generation for Text Categorization , 2007, J. Mach. Learn. Res..

[17]  D. Levinson,et al.  Seasons of a man's life , 1978 .

[18]  Paolo Ferragina,et al.  Classification of Short Texts by Deploying Topical Annotations , 2012, ECIR.

[19]  Guandong Xu,et al.  A feature-free search query classification approach using semantic distance , 2012, Expert Syst. Appl..

[20]  Olivia R. Liu Sheng,et al.  Analysis of the query logs of a Web site search engine , 2005, J. Assoc. Inf. Sci. Technol..

[21]  A. Spink,et al.  Web Search: Public Searching of the Web (Information Science and Knowledge Management) , 2005 .

[22]  Jia Zhang,et al.  Leveraging Incrementally Enriched Domain Knowledge to Enhance Service Categorization , 2012, Int. J. Web Serv. Res..

[23]  Amanda Spink,et al.  Web search : multidisciplinary perspectives , 2008 .

[24]  Dino Isa,et al.  An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization , 2011, Applied Intelligence.

[25]  Osmar R. Zaïane,et al.  Classifying Websites into Non-topical Categories , 2012, DaWaK.

[26]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[27]  Jaime Teevan,et al.  Understanding how people interact with web search results that change in real-time using implicit feedback , 2013, CIKM.

[28]  Manuel Montes-y-Gómez,et al.  A document is known by the company it keeps: neighborhood consensus for short text categorization , 2013, Lang. Resour. Evaluation.

[29]  Kang Tai,et al.  Comparison of statistical and machine learning methods in modelling of data with multicollinearity , 2013, Int. J. Model. Identif. Control..

[30]  John Shawe-Taylor,et al.  Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning , 2012, PloS one.

[31]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[32]  Alan Dix Designing for appropriation , 2007 .

[33]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[34]  Sotiris B. Kotsiantis,et al.  Integrating Global and Local Application of Discriminative Multinomial Bayesian Classifier for Text Classification , 2012, ISI.

[35]  Jackie Krafft,et al.  Profiting in the info-coms industry in the age of broadband: Lessons and new considerations , 2010 .

[36]  Yongfeng Huang,et al.  Short text classification based on strong feature thesaurus , 2012, Journal of Zhejiang University SCIENCE C.

[37]  Xiaorong Yang,et al.  Research on Semantic Text Mining Based on Domain Ontology , 2012, CCTA.

[38]  Mahesh Panchal,et al.  A Review on Support Vector Machine for Data Classification , 2012 .

[39]  R. Satya Prasad,et al.  An Overview of Recent Machine Learning Strategies in Data Mining , 2013 .

[40]  Hattori Gen,et al.  Automatic query expansion and classification for television related tweet collection , 2012 .

[41]  Francisco P. Romero,et al.  Classifying unlabeled short texts using a fuzzy declarative approach , 2013, Lang. Resour. Evaluation.

[42]  Yi Li,et al.  Building High-Performance Classifiers Using Positive and Unlabeled Examples for Text Classification , 2012, ISNN.

[43]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[44]  Andrei Z. Broder,et al.  Classifying search queries using the Web as a source of knowledge , 2009, TWEB.

[45]  Blanca Hernández,et al.  Key website factors in e-business strategy , 2009 .

[46]  Aixin Sun,et al.  Short text classification using very few words , 2012, SIGIR '12.

[47]  Raymond Chiong,et al.  A Novel Extremal Optimization Approach for the Template Design Problem , 2011, Int. J. Organ. Collect. Intell..

[48]  Matjaz Gams,et al.  Combining domain knowledge and machine learning for robust fall detection , 2014, Expert Syst. J. Knowl. Eng..

[49]  Touradj Ebrahimi,et al.  In Tags We Trust: Trust modeling in social tagging of multimedia content , 2012, IEEE Signal Processing Magazine.

[50]  Bernard J. Jansen,et al.  Classifying web search queries to identify high revenue generating customers , 2012, J. Assoc. Inf. Sci. Technol..

[51]  Alessandra Carbone,et al.  CLAG: an unsupervised non hierarchical clustering algorithm handling biological data , 2012, BMC Bioinformatics.

[52]  William W. Cohen,et al.  Extending WHIRL with background knowledge for improved text classification , 2006, Information Retrieval.

[53]  Naonori Ueda,et al.  Adaptive semi-supervised learning on labeled and unlabeled data with different distributions , 2012, Knowledge and Information Systems.

[54]  Jean-Philippe Vert,et al.  Supervised inference of gene regulatory networks from positive and unlabeled examples. , 2013, Methods in molecular biology.

[55]  Lynda Tamine,et al.  Towards a context sensitive approach to searching information based on domain specific knowledge sources , 2012, J. Web Semant..