Named Entity System for Tweets in Hindi Language

Duetothegrowingneedofsmart-healthapplicationsinHindilanguage,thereisarapiddemandfor health-relatedNamedEntityRecognition(NER)systemforHindi.Forthepurposeofthesame,this researchconsidersTwittersocialnetworktoextracttweetsdated1stOctober2016to15thOctober2017 fromPatanjali,DaburandotherHindilanguage-orientedTwitterbasedhealthsites;whileconsidering fourNEtypes-Person,Disease,ConsumableandOrganization.Tothebestofitsknowledge,the consideredTwitterdatasetandNEtypesforHindilanguageisoneofthefirstresourcesthatisbeing takencare.ThisarticleintroducesthreestageNERsystemforTweetsinHindilanguage(HinTwtNER system)-pre-processingstage;machineLearningstage(HyperspaceAnaloguetoLanguage(HAL) andConditionalRandomField(CRF));andpost-processingstage.HinTwtNERlooksintobinary featuresandachievesanoverallF-scoreof49.87%whichiscomparabletotheTwitterbasedNER systemsforEnglishandotherlanguages. KEywoRdS Analogue to Language, Conditional Random Field, Hindi, Hyperspace Machine Learning, Named Entity Recognition, Online Social Network, Tweets, Twitter

[1]  Sivaji Bandyopadhyay,et al.  Named entity recognition in Bengali and Hindi using support vector machine , 2011 .

[2]  Wesley De Neve,et al.  Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations , 2015, NUT@IJCNLP.

[3]  Ameya Prabhu,et al.  Towards Deep Learning in Hindi NER: An approach to tackle the Labelled Data Sparsity , 2016, ICON.

[4]  Maurice van Keulen,et al.  NEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and Disambiguation , 2015, ACL.

[5]  Arindam Dey,et al.  Named Entity Recognition using Gazetteer Method and N-gram Technique for an Inflectional Language: A Hybrid Approach , 2013 .

[6]  Dilek Küçük Automatic compilation of language resources for named entity recognition in Turkish by utilizing Wikipedia article titles , 2015, Comput. Stand. Interfaces.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Kathleen R. McKeown,et al.  Experiments in multidocument summarization , 2002 .

[9]  Sivaji Bandyopadhyay,et al.  A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi , 2009 .

[10]  Darnes Vilariño Ayala,et al.  The Soundex Phonetic Algorithm Revisited for SMS Text Representation , 2012, TSD.

[11]  Rakesh Chandra Balabantaray,et al.  Automatic creation of NE list for Odia , 2016 .

[12]  Asif Ekbal,et al.  Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition , 2013, Data Knowl. Eng..

[13]  Maurice van Keulen,et al.  Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text , 2014, URSW.

[14]  Roberto Di Pietro,et al.  The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race , 2017, WWW.

[15]  Georgios Balikas,et al.  CAp 2017 challenge: Twitter Named Entity Recognition , 2017, ArXiv.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[18]  Timothy Baldwin,et al.  Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition , 2015, NUT@IJCNLP.

[19]  Divakar Yadav,et al.  NER for Hindi language using association rules , 2014, 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC).

[20]  Bernard P. Veldkamp,et al.  Predicting self-monitoring skills using textual posts on Facebook , 2014, Comput. Hum. Behav..

[21]  Wei Li,et al.  Rapid development of Hindi named entity recognition using conditional random fields and feature induction , 2003, TALIP.

[22]  Mónica Marrero,et al.  Named Entity Recognition: Fallacies, challenges and opportunities , 2013, Comput. Stand. Interfaces.

[23]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[24]  Ikuya Yamada,et al.  Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking , 2015, NUT@IJCNLP.

[25]  Sophia Ananiadou,et al.  Learning to recognise named entities in tweets by exploiting weakly labelled data , 2016, NUT@COLING.

[26]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[27]  Anuja Arora,et al.  An improved approach to English-Hindi based Cross Language Information Retrieval system , 2015, 2015 Eighth International Conference on Contemporary Computing (IC3).

[28]  Caglar Tirkaz,et al.  Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers , 2017, ArXiv.

[29]  Kamal Sarkar,et al.  A Hidden Markov Model Based System for Entity Extraction from Social Media English Text at FIRE 2015 , 2015, FIRE Workshops.

[30]  S. Saha,et al.  Ensemble based active annotation for named entity recognition , 2012, 2012 Third International Conference on Emerging Applications of Information Technology.

[31]  Sivaji Bandyopadhyay,et al.  A Hidden Markov Model Based Named Entity Recognition System: Bengali and Hindi as Case Studies , 2007, PReMI.

[32]  Valentin Jijkoun,et al.  The Impact of Named Entity Normalization on Information Retrieval for Question Answering , 2008, ECIR.

[33]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[34]  Asif Ekbal,et al.  On active annotation for named entity recognition , 2014, International Journal of Machine Learning and Cybernetics.

[35]  Sivaji Bandyopadhyay,et al.  A web-based Bengali news corpus for named entity recognition , 2008, Lang. Resour. Evaluation.

[36]  J. P. Gupta,et al.  A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language , 2011, Expert Syst. Appl..

[37]  Björn Gambäck,et al.  Feature-Rich Twitter Named Entity Recognition and Classification , 2016, NUT@COLING.

[38]  Aba-Sah Dadzie,et al.  Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge , 2014, #MSM.

[39]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[40]  Bahareh Rahmanzadeh Heravi,et al.  Kanopy4Tweets: Entity Extraction and Linking for Twitter , 2016, #Microposts.

[41]  Manika Nanda The Named Entity Recognizer Framework , 2014 .

[42]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[43]  Asif Ekbal,et al.  IITP: Multiobjective Differential Evolution based Twitter Named Entity Recognition , 2015, NUT@IJCNLP.

[44]  Sobha Lalitha Devi,et al.  ESM-IL: Entity Extraction from Social Media Text for Indian Languages @ FIRE 2015 - An Overview , 2015, FIRE Workshops.

[45]  Jian Su,et al.  Improving Twitter Named Entity Recognition using Word Representations , 2015, NUT@IJCNLP.

[46]  Changki Lee,et al.  Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering , 2006, AIRS.

[47]  Ngoc Thanh Nguyen,et al.  A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields , 2017, Knowl. Based Syst..

[48]  Yi Yang,et al.  Beating the Artificial Chaos: Fighting OSN Spam Using Its Own Templates , 2016, IEEE/ACM Transactions on Networking.

[49]  Pabitra Mitra,et al.  A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition , 2012, Knowl. Based Syst..

[50]  Gianluca Stringhini,et al.  Towards Detecting Compromised Accounts on Social Networks , 2015, IEEE Transactions on Dependable and Secure Computing.

[51]  Pabitra Mitra,et al.  A composite kernel for named entity recognition , 2010, Pattern Recognit. Lett..

[52]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[53]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.