Resolving Entity Morphs in Censored Data

In some societies, internet users have to create information morphs (e.g. “Peace West King” to refer to “Bo Xilai”) to avoid active censorship or achieve other communication goals. In this paper we aim to solve a new problem of resolving entity morphs to their real targets. We exploit temporal constraints to collect crosssource comparable corpora relevant to any given morph query and identify target candidates. Then we propose various novel similarity measurements including surface features, meta-path based semantic features and social correlation features and combine them in a learning-to-rank framework. Experimental results on Chinese Sina Weibo data demonstrate that our approach is promising and significantly outperforms baseline methods 1 .

[1]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[2]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[3]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[4]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[5]  Ravi Kumar,et al.  Influence and correlation in social networks , 2008, KDD.

[6]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[8]  Vincent Ng,et al.  Supervised Noun Phrase Coreference Research: The First Fifteen Years , 2010, ACL.

[9]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[10]  Hwee Tou Ng,et al.  Mining New Word Translations from Comparable Corpora , 2004, COLING.

[11]  K. Saravanan,et al.  MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora , 2009, EACL.

[12]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[13]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[14]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[15]  Danushka Bollegala,et al.  Automatic Discovery of Personal Name Aliases from the Web , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[17]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[18]  Heng Ji,et al.  Tweet Ranking Based on Heterogeneous Networks , 2012, COLING.

[19]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[20]  Heng Ji,et al.  Mining Name Translations from Comparable Corpora by Creating Bilingual Information Networks , 2009, BUCC@ACL/IJCNLP.

[21]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[22]  Ching-Yung Lin,et al.  On the quality of inferring interests from social neighbors , 2010, KDD.

[23]  Paul Hsiung,et al.  Alias Detection in Link Data Sets , 2004 .

[24]  Patrick Pantel,et al.  Alias Detection in Malicious Environments , 2006, AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection.

[25]  Patrick Barwise,et al.  The One Thing You Must Get Right When Building a Brand , 2010 .

[26]  Qun Liu,et al.  HHMM-based Chinese Lexical Analyzer ICTCLAS , 2003, SIGHAN.

[27]  Bradley Malin,et al.  Email alias detection using social network analysis , 2005, LinkKDD '05.

[28]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[29]  Lei Shi,et al.  Social Network Analysis in Enterprise , 2012, Proceedings of the IEEE.

[30]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.