Taxonomic survey of Hindi Language NLP systems

Natural Language processing (NLP) represents the task of automatic handling of natural human language by machines.There is large spectrum of possible applications of NLP which help in automating tasks like translating text from one language to other, retrieving and summarizing data from very huge repositories, spam email filtering, identifying fake news in digital media, find sentiment and feedback of people, find political opinions and views of people on various government policies, provide effective medical assistance based on past history records of patient etc. Hindi is the official language of India with nearly 691 million users in India and 366 million in rest of world. At present, a number of government and private sector projects and researchers in India and abroad, are working towards developing NLP applications and resources for Indian languages. This survey gives a report of the resources and applications for Hindi language NLP.

[1]  Akshar Bharati,et al.  A Karaka Based Approach to Parsing of Indian Languages , 1990, COLING.

[2]  Mari Vallez,et al.  Natural Language Processing in Textual Information Retrieval and Related Topics , 2007 .

[3]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[4]  Pushpak Bhattacharyya,et al.  IndoWordNet and its Linking with Ontology , 2011 .

[5]  Gurpreet Singh Josan,et al.  Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey , 2010 .

[6]  Sambhav Jain,et al.  Two Methods to Incorporate ’Local Morphosyntactic’ Features in Hindi Dependency Parsing , 2010, SPMRL@NAACL-HLT.

[7]  Nisheeth Joshi,et al.  Improving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliteration , 2013, ArXiv.

[8]  Vasudeva Varma,et al.  WebKhoj: Indian language IR from multiple character encodings , 2006, WWW '06.

[9]  Vijjini Anvesh Rao,et al.  Hindi Question Generation Using Dependency Structures , 2019, ArXiv.

[10]  Aswarth Dara,et al.  Ensembling Various Dependency Parsers: Adopting Turbo Parser for Indian Languages , 2012 .

[11]  Vasudeva Varma,et al.  Domain specific search in indian languages , 2012, IKM4DR '12.

[12]  Latha R. Nair,et al.  Machine Translation Systems for Indian Languages , 2012 .

[13]  Steven Skiena,et al.  Building Sentiment Lexicons for All Major Languages , 2014, ACL.

[14]  Bernard Vauquois,et al.  A survey of formal grammars and algorithms for recognition and transformation in mechanical translation , 1968, IFIP Congress.

[15]  Ratna Sanyal,et al.  Named Entity Recognition for Indian Languages , 2008, IJCNLP.

[16]  Vasudeva Varma,et al.  clstk: The Cross-Lingual Summarization Tool-Kit , 2019, WSDM.

[17]  Nisheeth Joshi,et al.  Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet , 2019 .

[18]  Nikita Desai,et al.  An affix removal stemmer for Gujarati text , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[19]  Marcin Mironczuk,et al.  A recent overview of the state-of-the-art elements of text classification , 2018, Expert Syst. Appl..

[20]  Oscar Castillo,et al.  Hindi Query Expansion based on Semantic Importance of Hindi WordNet Relations and Fuzzy Graph Connectivity Measures , 2019, Computación y Sistemas.

[21]  Satyendr Singh,et al.  Role of Semantic Relations in Hindi Word Sense Disambiguation , 2015 .

[22]  Pushpak Bhattacharyya,et al.  A Fall-back Strategy for Sentiment Analysis in Hindi: a Case Study , 2010 .

[23]  Karthik Gali,et al.  Part-Of-Speech Tagging for Gujarati Using Conditional Random Fields , 2008, IJCNLP.

[24]  Pushpak Bhattacharyya,et al.  Multiword Expressions Dataset for Indian Languages , 2016, LREC.

[25]  Milam Aiken,et al.  An Analysis of Google Translate Accuracy , 2012 .

[26]  Partha Pratim Talukdar,et al.  Hindi Text Normalization , 2022 .

[27]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition using Support Vector Machine: A Language Independent Approach , 2010 .

[28]  Basant Agarwal,et al.  Named entity recognition for Hindi language: A survey , 2019 .

[29]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[30]  Amitava Das,et al.  Sentence Boundary Detection for Social Media Text , 2015, ICON.

[31]  Satyendr Singh,et al.  Hindi Word Sense Disambiguation Using Semantic Relatedness Measure , 2013, MIWAI.

[32]  Pushpak Bhattacharyya,et al.  Interlingua-based English–Hindi Machine Translation and Language Divergence , 2001, Machine Translation.

[33]  Selvadoss Thanamani Dr.Antony,et al.  Parts Of Speech Tagging for Indian Languages: A Literature Survey , 2011 .

[34]  Pushpak Bhattacharyya,et al.  IndoWordnet Visualizer: A Graphical User Interface for Browsing and Exploring Wordnets of Indian Languages , 2014, GWC.

[35]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[36]  Prabhakar Raghavan,et al.  Information retrieval algorithms: a survey , 1997, SODA '97.

[37]  Satyendr Singh,et al.  Naïve Bayes classifier for Hindi word sense disambiguation , 2014, COMPUTE '14.

[38]  Pushpak Bhattacharyya,et al.  Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge , 2008 .

[39]  Dipti Misra Sharma,et al.  Hindi Derivational Morphological Analyzer , 2012, SIGMORPHON.

[40]  Pushpak Bhattacharyya,et al.  Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujarati , 2011 .

[41]  Jikitsha Sheth,et al.  Dhiya: A stemmer for morphological level analysis of Gujarati language , 2014, 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).

[42]  Poonam Gupta,et al.  A Survey of Text Question Answering Techniques , 2012 .

[43]  Latesh Malik,et al.  Test model for summarizing hindi text using extraction method , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[44]  Pabitra Mitra,et al.  A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition , 2008, IJCNLP.

[45]  Dipti Misra Sharma,et al.  Two stage constraint based hybrid approach to free word order language dependency parsing , 2009, IWPT.

[46]  Snehashish Chakraverty,et al.  Neural Network based Parts of Speech Tagger for Hindi , 2014 .

[47]  Namita Mittal,et al.  Exploiting Wikipedia API for Hindi-english Cross-language Information Retrieval☆ , 2016 .

[48]  Seema Verma,et al.  A Comparative Study of Information Retrieval Using Machine Learning , 2020 .

[49]  Mukund Sanglikar,et al.  Named Entity Recognition System for Hindi Language: A Hybrid Approach , 2011 .

[50]  Raviraj Joshi,et al.  Deep Learning for Hindi Text Classification: A Comparison , 2019, IHCI.

[51]  Dipti Misra Sharma,et al.  Aggregating Machine Learning and Rule Based Heuristics for Named Entity Recognition , 2008, IJCNLP.

[52]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[53]  Aniket Dalal,et al.  Building Feature Rich POS Tagger for Morphologically Rich Languages: Experiences in Hindi , 2022 .

[55]  Kalyani A. Patel,et al.  GH-MAP: translation system for sibling language pair Gujarati--Hindi , 2012, CSI Transactions on ICT.

[56]  Harshad B. Bhadka,et al.  Paradigm-Based Morphological Analyzer for the Gujarati Language , 2020 .

[57]  Archana N. Gulati,et al.  A novel technique for multidocument Hindi text summarization , 2017, 2017 International Conference on Nascent Technologies in Engineering (ICNTE).

[58]  Jatinderkumar R. Saini,et al.  Novel Language Resources for Hindi: An Aesthetics Text Corpus and a Comprehensive Stop Lemma List , 2020, International Journal of Advanced Computer Science and Applications.

[59]  Dipti Misra Sharma,et al.  Simple Parser for Indian Languages in a Dependency Framework , 2009, Linguistic Annotation Workshop.

[60]  Rupal Bhargava,et al.  Sentiment analysis for mixed script Indic sentences , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[61]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[62]  Ashish Jain,et al.  Identification of Conjunct Verbs in Hindi and Its Effect on Parsing Accuracy , 2011, CICLing.

[63]  Surajit Borkotokey,et al.  Text Summarization in Indian Languages: A Critical Review , 2019, 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP).

[64]  Pushpak Bhattacharyya,et al.  Gujarati WordNet: A Profile of the IndoWordNet , 2017 .

[65]  P. Mannem,et al.  Introduction to the Shallow Parsing Contest for South Asian Languages , 2022 .

[66]  Shashi Pal Singh,et al.  Improving the quality of Machine Translation using rule based tense synthesizer for Hindi , 2015, 2015 IEEE International Advance Computing Conference (IACC).

[67]  Pushpak Bhattacharyya,et al.  Morphological Richness Offsets Resource Demand - Experiences in Constructing a POS Tagger for Hindi , 2006, ACL.

[68]  Leah S. Larkey,et al.  Hindi CLIR in thirty days , 2003, TALIP.

[69]  Kavita Tewani Pronominal Anaphora Resolution in Hindi Language Using Number Agreement and Animistic Knowledge , 2020 .

[70]  Rajeev Sangal,et al.  HMM Based Chunker for Hindi , 2005, IJCNLP.

[71]  Prasenjit Majumder,et al.  Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages , 2019, FIRE.

[72]  Ganesh Chandra,et al.  A Literature Survey on Various Approaches of Word Sense Disambiguation , 2014, 2014 2nd International Symposium on Computational and Business Intelligence.

[73]  Vimala Balakrishnan,et al.  Stemming and lemmatization: A comparison of retrieval performances , 2014 .

[74]  K. Vimal Kumar,et al.  Graph Based Technique for Hindi Text Summarization , 2015 .

[75]  Andrés Montoyo,et al.  Advances on natural language processing , 2007, Data Knowl. Eng..

[76]  Dipti Misra Sharma,et al.  Shallow Parsing for South Asian Languages , 2007 .

[77]  Dipti Misra Sharma,et al.  Paninian grammar based hindi dialogue anaphora resolution , 2015, 2015 International Conference on Asian Language Processing (IALP).

[78]  Prasenjit Majumder,et al.  Overview of FIRE 2010 , 2010, FIRE.

[79]  Anshul Verma,et al.  Accountability of NLP Tools in Text Summarization for Indian Languages , 2020, Journal of scientific research.

[80]  Deepti Chopra,et al.  Named Entity Recognition in Hindi Using Hidden Markov Model , 2016, 2016 Second International Conference on Computational Intelligence & Communication Technology (CICT).

[82]  Robert J. Gaizauskas,et al.  Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages , 2010, LREC.

[83]  Gurpreet Singh Lehal,et al.  Automatic standardization of spelling variations of Hindi text , 2010, 2010 International Conference on Computer and Communication Technology (ICCCT).

[84]  Navneet Garg,et al.  Rule Based Hindi Part of Speech Tagger , 2012, COLING.

[85]  P. Agarwal,et al.  Anaphora Resolution in Hindi Documents , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[86]  I. C. Mogotsi,et al.  Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval , 2010, Information Retrieval.

[87]  Nisheeth Joshi,et al.  A Lightweight Stemmer for Gujarati , 2012, ArXiv.

[88]  Sunny Rai,et al.  Shrinking digital gap through automatic generation of WordNet for Indian languages , 2014, AI & SOCIETY.

[89]  Bhanu Pratap Singh,et al.  Sentence Boundary Detection for Hindi–English Social Media Text , 2018 .

[90]  Deepti Chopra Hindi Named Entity Recognition By Aggregating Rule Based Heuristics and Hidden Markov Model , 2012 .

[91]  Ananthakrishnan Ramanathan,et al.  A Lightweight Stemmer for Hindi , 2003 .

[92]  Pushpak Bhattacharyya,et al.  A Common Parts-of-Speech Tagset Framework for Indian Languages , 2008, LREC.

[93]  Pushpak Bhattacharyya,et al.  Projecting Parameters for Multilingual Word Sense Disambiguation , 2009, EMNLP.

[94]  Avinesh Pvs,et al.  Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning , 2006 .

[95]  Gurpreet Singh Lehal,et al.  Hindi Morphological Analyzer and Generator , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[96]  Chandrakant D. Patel,et al.  GUJSTER: A rule based stemmer using dictionary approach , 2017, 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT).

[97]  A. Govardhan,et al.  Application of Latent Semantic Indexing for Hindi-English CLIR Irrespective of Context Similarity , 2011 .

[98]  Pushpak Bhattacharyya,et al.  Automatic Sarcasm Detection: A Survey , 2016 .

[99]  Sivaji Bandyopadhyay,et al.  SentiWordNet for Indian Languages , 2010 .

[100]  Tanveer J. Siddiqui,et al.  An Investigation to Semi supervised approach for HINDI Word sense disambiguation , 2012 .

[101]  Uma Shanker Tiwary,et al.  A language independent approach to multilingual text summarization , 2007 .

[102]  G V Garje,et al.  Survey of Machine Translation Systems in India , 2013 .

[103]  Parneet Kaur,et al.  Hybrid Chunker for Gujarati Language , 2018 .

[104]  Nisheeth Joshi,et al.  Sifar: An Attempt to Develop Interactive Machine Translation System for English to Hindi , 2020 .

[105]  Pasi Tapanainen,et al.  What is a word, What is a sentence? Problems of Tokenization , 1994 .

[106]  Prashanth Mannem,et al.  The ICON-2010 tools contest on Indian language dependency parsing , 2010 .

[107]  Dipti Misra Sharma,et al.  Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text , 2016, NAACL.

[108]  P. Kumar,et al.  A Hindi Question Answering system for E-learning documents , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.

[109]  Tanveer J. Siddiqui,et al.  An unsupervised Hindi stemmer with heuristic improvements , 2008, AND '08.

[110]  Sudeshna Sarkar,et al.  Using Communities of Words Derived from Multilingual Word Vectors for Cross-Language Information Retrieval in Indian Languages , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[111]  Pabitra Mitra,et al.  Gazetteer Preparation for Named Entity Recognition in Indian Languages , 2008, IJCNLP.

[112]  Ajai Kumar Jain,et al.  AnglaHindi: an English to Hindi machine-aided translation system , 2003, MTSUMMIT.

[113]  Noraziah Ahmad,et al.  A Taxonomy and Survey of Semantic Approaches for Query Expansion , 2019, IEEE Access.

[114]  Shashi Pal Singh,et al.  Machine translation using deep learning: An overview , 2017, 2017 International Conference on Computer, Communications and Electronics (Comptelix).

[115]  Nisheeth Joshi,et al.  Design of a Rule Based Hindi Lemmatizer , 2013 .

[116]  Amitava Das,et al.  Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages , 2015, RANLP.

[117]  Ondrej Bojar,et al.  HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation , 2014, LREC.

[118]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[119]  Smriti Singh,et al.  Verbal Inflection in Hindi: A Distributed Morphology Approach , 2011, PACLIC.

[120]  Khaled Shaalan,et al.  A Review of the State of the Art in Hindi Question Answering Systems , 2018 .

[121]  R. Sinha,et al.  Machine Translation of Bi-lingual Hindi-English (Hinglish) Text , 2005, MTSUMMIT.

[122]  Prashanth Mannem,et al.  Statistical Morphological Analyzer for Hindi , 2013, IJCNLP.

[123]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..