Extracting information from informal communication

This thesis focuses on the problem of extracting information from informal communication. Textual informal communication, such as e-mail, bulletin boards and blogs, has become a vast information resource. However, such information is poorly organized and difficult for a computer to understand due to lack of editing and structure. Thus, techniques which work well for formal text, such as newspaper articles, may be considered insufficient on informal text. One focus of ours is to attempt to advance the state-of-the-art for sub-problems of the information extraction task. We make contributions to the problems of named entity extraction, co-reference resolution and context tracking. We channel our efforts toward methods which are particularly applicable to informal communication. We also consider a type of information which is somewhat unique to informal communication: preferences and opinions. Individuals often expression their opinions on products and services in such communication. Others' may read these "reviews" to try to predict their own experiences. However, humans do a poor job of aggregating and generalizing large sets of data. We develop techniques that can perform the job of predicting unobserved opinions. We address both the single-user case where information about the items is known, and the multi-user case where we can generalize opinions without external information. Experiments on largescale rating data sets validate our approach.

[1]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[2]  D. G. Simpson,et al.  Conditional risk models for ordinal response data: simultaneous logistic regression analysis and generalized score tests , 2002 .

[3]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[4]  Alan Agresti,et al.  Exact Inference for Contingency Tables with Ordered Categories , 1990 .

[5]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing , 1974 .

[6]  Benjamin M. Marlin,et al.  Collaborative Filtering: A Machine Learning Perspective , 2004 .

[7]  Anoop Sarkar,et al.  Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003) , 2003 .

[8]  Noah A. Smith,et al.  Proceedings of EMNLP , 2007 .

[9]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Mark Von Tress Statistical Models for Ordinal Variables , 1995 .

[12]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .

[13]  Neil D. Lawrence,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[14]  Kenneth Ward Church,et al.  Poisson mixtures , 1995, Natural Language Engineering.

[15]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[16]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[17]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[18]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[19]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[20]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[21]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[22]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[23]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[24]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[25]  Richard S. Zemel,et al.  The multiple multiplicative factor model for collaborative filtering , 2004, ICML.

[26]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[27]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[28]  Alan Agresti,et al.  Statistical models for ordinal variables , 1994 .

[29]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[30]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[31]  Andrew G. Clark,et al.  Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) , 2002 .

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Gerhard Widmer,et al.  Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[34]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .

[35]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[36]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[37]  Charles Elkan,et al.  Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution , 2006, ICML.

[38]  Jason D. M. Rennie,et al.  Loss Functions for Preference Levels: Regression with Discrete Ordered Labels , 2005 .

[39]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.

[40]  G. Tutz Generalized Semiparametrically Structured Ordinal Models , 2003, Biometrics.

[41]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[42]  Stephen E. Fienberg,et al.  The analysis of cross-classified categorical data , 1980 .

[43]  Don R. Swanson,et al.  Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[44]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[45]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[46]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[47]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[48]  Kishore Papineni,et al.  Why Inverse Document Frequency? , 2001, NAACL.

[49]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[50]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[51]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[52]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[53]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[54]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[55]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[56]  Claire Cardie,et al.  Noun Phrase Coreference as Clustering , 1999, EMNLP.

[57]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[58]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[59]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[60]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[61]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[62]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[63]  Karen Spärck Jones Index term weighting , 1973, Inf. Storage Retr..

[64]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[65]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[66]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[67]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[68]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[69]  Tom M. Mitchell,et al.  Using unlabeled data to improve text classification , 2001 .

[70]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[71]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.