Cross-partition clustering: revealing corresponding themes across related datasets

This article studies the task of discovering correspondences across related domains based on real-world data collections. We address this task through a designated extension of distributional data-clustering methods. The method is empirically demonstrated on synthetic data as well as on texts addressing different religions, where the goal is to identify commonalities shared by all religions. This article generalises and demonstrates the empirical improvement relative to our previous studies on this subject, as well as to other comparable methods.

[1]  N. Smart,et al.  What is Comparative Religion , 1986 .

[2]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[3]  Melanie Mitchell,et al.  The Copycat project: a model of mental fluidity and analogy-making , 1995 .

[4]  John E. Hummel,et al.  Relational Reasoning in a Neurally Plausible Cognitive Architecture , 2005 .

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Hang Li,et al.  Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[7]  Joachim M. Buhmann,et al.  A theory of proximity based clustering: structure detection by optimization , 2000, Pattern Recognit..

[8]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[9]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[10]  Gal Chechik,et al.  Extracting Relevant Structures with Side Information , 2002, NIPS.

[11]  Dedre Gentner,et al.  Structure-Mapping: A Theoretical Framework for Analogy , 1983, Cogn. Sci..

[12]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[13]  Michael L. Littman,et al.  Corpus-based Learning of Analogies and Semantic Relations , 2005, Machine Learning.

[14]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[15]  Ido Dagan,et al.  Identifying Structure across Pre-partitioned Data , 2003, NIPS.

[16]  David J. Chalmers,et al.  High-level perception, representation, and analogy: a critique of artificial intelligence methodology , 1992, J. Exp. Theor. Artif. Intell..

[17]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[18]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[19]  Thomas Hofmann,et al.  Non-redundant data clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20]  Joachim M. Buhmann,et al.  Coupled Clustering: A Method for Detecting Structural Correspondence , 2001, J. Mach. Learn. Res..

[21]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[22]  Ido Dagan,et al.  Cross-dataset Clustering: Revealing Corresponding Themes across Multiple Corpora , 2002, CoNLL.

[23]  Alexander G. Dimitrov,et al.  Information Distortion and Neural Coding , 2001 .

[24]  N. Smart,et al.  Dimensions of the sacred : an anatomy of the world's beliefs , 1996 .

[25]  Arthur B. Markman,et al.  Analogy just looks like high level perception: why a domain-general approach to analogical mapping is right , 1998, J. Exp. Theor. Artif. Intell..

[26]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[27]  Sylvia Weber Russell The Structure-Mapping Engine: Algorithm and Examples (Book) , 1992 .

[28]  Robert K. Niven,et al.  Combinatorial Information Theory: I. Philosophical Basis of Cross-Entropy and Entropy , 2005, ArXiv.