Be Appropriate and Funny: Automatic Entity Morph Encoding

Internet users are keen on creating different kinds of morphs to avoid censorship, express strong sentiment or humor. For example, in Chinese social media, users often use the entity morph “¹ ? b (Instant Noodles)” to refer to “h 8 · (Zhou Yongkang)” because it shares one character “· (Kang)” with the well-known brand of instant noodles “·� (Master Kang)”. We developed a wide variety of novel approaches to automatically encode proper and interesting morphs, which can effectively pass decoding tests 1 .

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Juan-Zi Li,et al.  Social Influence Locality for Modeling Retweeting Behaviors , 2013, IJCAI.

[3]  Heng Ji,et al.  Resolving Entity Morphs in Censored Data , 2013, ACL.

[4]  Hannu Toivonen,et al.  “Let Everything Turn Well in Your Wife”: Generation of Adult Humor Using Lexical Constraints , 2013, ACL.

[5]  Jianyu Li,et al.  Chinese character structure analysis based on complex networks , 2007 .

[6]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[7]  David Yarowsky,et al.  Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora , 2008, EMNLP.

[8]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[9]  Kam-Fai Wong,et al.  Anomaly Detecting within Dynamic Chinese Chat Text , 2006, Workshop On New Text Wikis And Blogs And Other Dynamic Text Sources.

[10]  Christo Wilson,et al.  Tweeting under pressure: analyzing trending topics and evolving word choice on sina weibo , 2013, COSN '13.

[11]  Wei Gao,et al.  NIL Is Not Nothing: Recognition of Chinese Network Informal Language Expressions , 2005, IJCNLP.

[12]  Kam-Fai Wong,et al.  A Phonetic-Based Approach to Chinese Chat Text Normalization , 2006, ACL.

[13]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[14]  Min-Yen Kan,et al.  Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation , 2013, ACL.

[15]  Takashi Onishi,et al.  Chinese Informal Word Normalization: an Experimental Study , 2013, IJCNLP.

[16]  David Matthews,et al.  Unsupervised joke generation from big data , 2013, ACL.

[17]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[18]  Joseph Olive,et al.  Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation , 2011 .

[19]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.