Chinese Named Entity Abbreviation Generation Using First-Order Logic

Normalizing named entity abbreviations to their standard forms is an important preprocessing task for question answering, entity retrieval, event detection, microblog processing, and many other applications. Along with the quick expansion of microblogs, this task has received more and more attentions in recent years. In this paper, we propose a novel entity abbreviation generation method using first-order logic to model long distance constraints. In order to reduce the human effort of manual annotating corpus, we also introduce an automatically training data construction method with simple strategies. Experimental results demonstrate that the proposed method achieves better performance than state-of-the-art approaches.

[1]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[2]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[3]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Dong Yang,et al.  Vocabulary expansion through automatic abbreviation generation for Chinese voice search , 2012, Comput. Speech Lang..

[5]  Jing-Shin Chang,et al.  Mining atomic Chinese abbreviations with a probabilistic single character recovery model , 2007, Lang. Resour. Evaluation.

[6]  Xing Shi,et al.  Using First-Order Logic to Compress Sentences , 2012, AAAI.

[7]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[8]  Naoaki Okazaki,et al.  A Discriminative Approach to Japanese Abbreviation Extraction , 2008, IJCNLP.

[9]  David Yarowsky,et al.  Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora , 2008, ACL.

[10]  Timothy Baldwin,et al.  Automatically Constructing a Normalisation Dictionary for Microblogs , 2012, EMNLP.

[11]  Zhiyuan Liu,et al.  Extracting Chinese abbreviation-definition pairs from anchor texts , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[12]  Yuji Matsumoto,et al.  Jointly Identifying Temporal Relations with Markov Logic , 2009, ACL.

[13]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[14]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[15]  David Yarowsky,et al.  Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora , 2008, EMNLP.

[16]  Dong Yang,et al.  Automatic Chinese Abbreviation Generation Using Conditional Random Field , 2009, NAACL.

[17]  Fei Liu,et al.  A Broad-Coverage Normalization System for Social Media Language , 2012, ACL.

[18]  Jing-Shin Chang,et al.  A Preliminary Study on Probabilistic Models for Chinese Abbreviations , 2004, SIGHAN@ACL.

[19]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[20]  Dejing Dou,et al.  Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[22]  Houfeng Wang,et al.  Entity-centric topic-oriented opinion summarization in twitter , 2012, KDD.

[23]  Iván V. Meza,et al.  Collective Semantic Role Labelling with Markov Logic , 2008, CoNLL.

[24]  Jian Su,et al.  A Phrase-Based Statistical Model for SMS Text Normalization , 2006, ACL.

[25]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .