Combinatorial markov random fields and their applications to information organization

We propose a new type of undirected graphical models called a Combinatorial Markov Random Field (Comraf) and discuss its advantages over existing graphical models. We develop an efficient inference methodology for Comrafs based on combinatorial optimization of information-theoretic objective functions; both global and local optimization schema are discussed. We apply Comrafs to multi-modal clustering tasks: standard (unsupervised) clustering, semi-supervised clustering, interactive clustering, and one-class clustering. For the one-class clustering task, we analytically show that the proposed optimization method is optimal under certain simplifying assumptions. We empirically demonstrate the power of Comraf models by comparing them to other state-of-the-art machine learning techniques, both in text clustering and image clustering domains. For unsupervised clustering, we show that Comrafs consistently and significantly outperform three previous state-of-the-art clustering techniques on six real-world textual datasets. For semi-supervised clustering, we show that the Comraf model is superior to a well-known constrained optimization method. For interactive clustering, Comraf obtains higher accuracy than a Support Vector Machine, trained on a large amount of labeled data. For one-class clustering, Comrafs demonstrate superior performance over two previously proposed methods. We summarize our thesis by giving a comprehensive recipe for machine learning modeling with Comrafs.

[1]  David A. Shamma,et al.  Network arts: exposing cultural reality , 2004, WWW Alt. '04.

[2]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[3]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Tao Tao,et al.  A two-stage mixture model for pseudo feedback , 2004, SIGIR '04.

[6]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[7]  Ran El-Yaniv,et al.  Iterative Double Clustering for Unsupervised and Semi-supervised Learning , 2001, ECML.

[8]  Ryoji Kataoka,et al.  Clustering Presentation of Web Image Retrieval Results using Textual Information and Image Features , 2006, EuroIMSA.

[9]  A. Land,et al.  An Automatic Method for Solving Discrete Programming Problems , 1960, 50 Years of Integer Programming.

[10]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[11]  Jan-Ming Ho,et al.  Web Appearance Disambiguation of Personal Names Based on Network Motif , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[12]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[13]  Tom M. Mitchell,et al.  Text clustering with extended user feedback , 2006, SIGIR.

[14]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[15]  Koby Crammer,et al.  A needle in a haystack: local one-class optimization , 2004, ICML.

[16]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[17]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[18]  G. A. Mishne,et al.  Expiriments with mood classification in blog posts , 2005, SIGIR 2005.

[19]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[20]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[21]  Joydeep Ghosh,et al.  Robust one-class clustering using hybrid global and local search , 2005, ICML.

[22]  Ivan Bratko,et al.  Testing the significance of attribute interactions , 2004, ICML.

[23]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[24]  N. Kando,et al.  Analysis of Multi-Document Viewpoint Summarization Using Multi-Dimensional Genres , 2004 .

[25]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[26]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[27]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[28]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[29]  Robert Matthews,et al.  Neural Computation in Stylometry I: An Application to the Works of Shakespeare and Fletcher , 1993 .

[30]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[31]  Eduard Hovy,et al.  Multi-Document Person Name Resolution , 2004 .

[32]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[33]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  James Allan,et al.  Unsupervised Non-topical Classification of Documents , 2006 .

[35]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[36]  James Allan,et al.  Web Page Clustering Using Heuristic Search in the Web Graph , 2007, IJCAI.

[37]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[38]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[39]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.

[40]  Yixin Chen,et al.  CLUE: cluster-based retrieval of images by unsupervised learning , 2005, IEEE Transactions on Image Processing.

[41]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[42]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[43]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[44]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[45]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[46]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[47]  Robert P. W. Duin,et al.  Outliers and data descriptions , 2001 .

[48]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[49]  Naftali Tishby,et al.  Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[50]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[51]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[52]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[53]  Omid Madani,et al.  Biasing web search results for topic familiarity , 2005, CIKM '05.

[54]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[55]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[56]  James Allan,et al.  Interactive Clustering of Text Collections According to a User-Specified Criterion , 2007, IJCAI.

[57]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[58]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[59]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[60]  Erik G. Learned-Miller,et al.  Combinatorial Markov Random Fields , 2006, ECML.

[61]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[62]  George Forman,et al.  Quantifying trends accurately despite classifier error and class imbalance , 2006, KDD '06.

[63]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[64]  Ailsa H. Land,et al.  An Automatic Method of Solving Discrete Programming Problems , 1960 .

[65]  Ron Bekkerman,et al.  Semi-supervised Clustering using Combinatorial MRFs , 2006 .

[66]  Kôiti Hasida,et al.  POLYPHONET: An advanced social network extraction system from the Web , 2007, J. Web Semant..

[67]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[68]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[69]  Guoping Qiu,et al.  Image and feature co-clustering , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[70]  Jacob V. Bouvrie Multi-Source Contingency Clustering , 2004 .

[71]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[72]  Koji Eguchi,et al.  Sentiment Retrieval using Generative Models , 2006, EMNLP.

[73]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[74]  Naftali Tishby,et al.  Multivariate Information Bottleneck , 2001, Neural Computation.

[75]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[76]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[77]  David Madigan,et al.  Constructing informative prior distributions from domain knowledge in text classification , 2006, SIGIR.

[78]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[79]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[80]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.

[81]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[82]  David A. Forsyth,et al.  Discriminating Image Senses by Clustering with Multimodal Features , 2006, ACL.

[83]  D. Aldous Exchangeability and related topics , 1985 .

[84]  Xiaojun Wan,et al.  Person resolution in person search results: WebHawk , 2005, CIKM '05.

[85]  R. Manmatha,et al.  Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[86]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[87]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[88]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[89]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[90]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[91]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[92]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[93]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[94]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[95]  Michael I. Jordan,et al.  Graphical models: Probabilistic inference , 2002 .

[96]  Virginia R. de Sa,et al.  Unsupervised Classification Learning from Cross-Modal Environmental Structure , 1994 .

[97]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[98]  Rina Dechter,et al.  AND/OR Branch-and-Bound for Graphical Models , 2005, IJCAI.

[99]  Efstathios Stamatatos,et al.  Text Genre Detection Using Common Word Frequencies , 2000, COLING.

[100]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[101]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[102]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[103]  Ron Bekkerman,et al.  Multi-modal Clustering for Multimedia Collections , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Cordelia Schmid,et al.  Comparing and evaluating interest points , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[105]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[106]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[107]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .

[108]  Yann LeCun,et al.  Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.

[109]  Raymond W. Yeung,et al.  A new outlook of Shannon's information measures , 1991, IEEE Trans. Inf. Theory.

[110]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[111]  Barry Smyth,et al.  Genre Classification and Domain Transfer for Information Filtering , 2002, ECIR.

[112]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[113]  Sung-Hyon Myaeng,et al.  Text genre classification with genre-revealing and subject-revealing features , 2002, SIGIR '02.

[114]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[115]  William Bialek,et al.  How Many Clusters? An Information-Theoretic Perspective , 2003, Neural Computation.

[116]  Tie-Yan Liu,et al.  Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering , 2005, KDD '05.