Artificial Immune Systems-Based Classification Model for Code-Mixed Social Media Data

Abstract The main focus of the paper is to propose an artificial immune systems-based classification model for code-mixed social media data. The artificial immune systems are computational models inspired by the biological immune system. In this paper, artificial immune systems are used to optimize the initial parameters of Long short-term memory (LSTM) model. The proposed artificial immune systems-based LSTM model is then used for the classification of code-mixed data. The classification of Hindi-English code-mixed data into Hindi, English, and ambiguous words is done. Popular word embedding features were used for the representation of each word. The word embedding features and character embedding features have been used. The proposed method helps in identifying the word context by extracting the intent of user for using the ambiguous word in code-mixed sentence. Extensive experiments reveal that the artificial immune systems-based classification model outperforms competitive models especially when there are some ambiguous words in the social media data.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Sudeshna Sarkar,et al.  Using Communities of Words Derived from Multilingual Word Vectors for Cross-Language Information Retrieval in Indian Languages , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[3]  Mehdi Farhoudi,et al.  Analysis of PSO, AIS and GA-based optimal Wavelet-Neural Network classifier in Brain–Robot Interface , 2015 .

[4]  Rupal Bhargava,et al.  Sentiment analysis for mixed script Indic sentences , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[5]  Shashi Shekhar,et al.  Linguistic structural framework for encoding transliteration variants for word origin detection using bilingual lexicon , 2017, 2017 International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT).

[6]  F. Kayaalp,et al.  Benchmarking the Clustering Performances of Evolutionary Algorithms: A Case Study on Varying Data Size , 2020 .

[7]  Jatin Sharma,et al.  POS Tagging of English-Hindi Code-Mixed Social Media Content , 2014, EMNLP.

[8]  Arkaitz Zubiaga,et al.  TweetLID: a benchmark for tweet language identification , 2016, Lang. Resour. Evaluation.

[9]  Sumam Mary Idicula,et al.  An Improved Word Representation for Deep Learning Based NER in Indian Languages , 2019, Inf..

[10]  Niloy Ganguly,et al.  Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter , 2019, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[11]  Manjit Kaur,et al.  An efficient image encryption using non-dominated sorting genetic algorithm-III based 4-D chaotic maps , 2019, Journal of Ambient Intelligence and Humanized Computing.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Manjit Kaur,et al.  Automated Deep Transfer Learning-Based Approach for Detection of COVID-19 Infection in Chest X-rays , 2020, IRBM.

[14]  Dilip Kumar Sharma,et al.  Language identification framework in code-mixed social media text based on quantum LSTM — the word belongs to which language? , 2020 .

[15]  Manjit Kaur,et al.  Adaptive Differential Evolution-Based Lorenz Chaotic System for Image Encryption , 2018, Arabian Journal for Science and Engineering.

[16]  Shashi Shekhar,et al.  Embedding Framework for Identifying Ambiguous Words in Code-Mixed Social Media Text , 2019, 2019 International Conference on contemporary Computing and Informatics (IC3I).

[17]  Prasenjit Majumder,et al.  Approaches to Temporal Expression Recognition in Hindi , 2015, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[18]  Ben King,et al.  Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods , 2013, NAACL.

[19]  Heder S. Bernardino,et al.  Artificial Immune Systems for Optimization , 2009, Nature-Inspired Algorithms for Optimisation.

[20]  Dilip Kumar Sharma,et al.  An effective cybernated word embedding system for analysis and language identification in code-mixed social media text , 2019, Int. J. Knowl. Based Intell. Eng. Syst..

[21]  Gemma Boleda,et al.  Putting Words in Context: LSTM Language Models and Lexical Ambiguity , 2019, ACL.

[22]  Amitava Das,et al.  Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text , 2014, ICON.

[23]  Gowri Srinivasa,et al.  NELIS - Named Entity and Language Identification System: Shared Task System Description , 2015, FIRE Workshops.

[24]  Guesh Dagnew,et al.  Deep learning approach for microarray cancer data classification , 2020, CAAI Trans. Intell. Technol..

[25]  Manjit Kaur,et al.  Efficient prediction of drug-drug interaction using deep learning models. , 2020, IET systems biology.

[26]  Sagara Sumathipala,et al.  Language identification at word level in Sinhala-English code-mixed social media text , 2019, 2019 International Research Conference on Smart Computing and Systems Engineering (SCSE).

[27]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[28]  Somnath Banerjee,et al.  Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval , 2015, FIRE Workshops.

[29]  Monojit Choudhury,et al.  "ye word kis lang ka hai bhai?" Testing the Limits of Word level Language Identification , 2014, ICON.

[30]  Jaime G. Carbonell,et al.  White Paper on Natural Language Processing , 1989, HLT.

[31]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[32]  Xuanjing Huang,et al.  Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data , 2014, Lecture Notes in Computer Science.

[33]  Somnath Banerjee,et al.  Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016 , 2016, FIRE.

[34]  Sergey I. Nikolenko,et al.  Word Embeddings for User Profiling in Online Social Networks , 2017, Computación y Sistemas.

[35]  K. P. Soman,et al.  LSTM Based Paraphrase Identification Using Combined Word Embedding Features , 2019, Advances in Intelligent Systems and Computing.

[36]  Xu Yong,et al.  Three-stage network for age estimation , 2019 .

[37]  Manjit Kaur,et al.  Color image dehazing using gradient channel prior and guided L0 filter , 2020, Inf. Sci..

[38]  Fatiha Sadat,et al.  Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages , 2018, NEWS@ACL.

[39]  K. V. Arya,et al.  Feature selection for image steganalysis using levy flight-based grey wolf optimization , 2018, Multimedia Tools and Applications.

[40]  Rajeev Srivastava,et al.  Content-based image retrieval based on supervised learning and statistical-based moments , 2019, Modern Physics Letters B.

[41]  Harsh Jhamtani,et al.  Word-level Language Identification in Bi-lingual Code-switched Texts , 2014, PACLIC.

[42]  P SomanK.,et al.  AMRITA_CEN@FIRE 2016: Code-Mix Entity Extraction for Hindi-English and Tamil-English Tweets , 2016, FIRE.

[43]  Vaishali,et al.  Classification of COVID-19 patients from chest CT images using multi-objective differential evolution–based convolutional neural networks , 2020, European Journal of Clinical Microbiology & Infectious Diseases.

[44]  Piyush Kumar Shukla,et al.  Deep Transfer Learning Based Classification Model for COVID-19 Disease , 2020, IRBM.

[45]  S. S. Mehta,et al.  SVM-based algorithm for recognition of QRS complexes in electrocardiogram , 2008 .

[46]  Dong Nguyen,et al.  Word Level Language Identification in Online Multilingual Communication , 2013, EMNLP.

[47]  Joachim Wagner,et al.  Code Mixing: A Challenge for Language Identification in the Language of Social Media , 2014, CodeSwitch@EMNLP.

[48]  Manik Sharma,et al.  Iconography : Stark Assessment of Lifestyle Based Human Disorders Using Data Mining Based Learning Techniques , 2017 .

[49]  Manjit Kaur,et al.  Color image encryption using non-dominated sorting genetic algorithm with local chaotic search based 5D chaotic map , 2020, Future Gener. Comput. Syst..

[50]  T. Nagarajan,et al.  Pause-Based Phrase Extraction and Effective OOV Handling for Low-Resource Machine Translation Systems , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[51]  Dilbag Singh,et al.  Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning , 2020, Journal of biomolecular structure & dynamics.

[52]  Vinay Singh,et al.  A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection , 2018, PEOPLES@NAACL-HTL.

[53]  Huan Wang,et al.  Convolutional neural network based detection and judgement of environmental obstacle in vehicle operation , 2019, CAAI Trans. Intell. Technol..

[54]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[55]  Zied Lachiri,et al.  New Intraclass Helitrons Classification Using DNA-Image Sequences and Machine Learning Approaches , 2020 .

[56]  Chinthaka Premachandra,et al.  Word Level Language Identification of Code Mixing Text in Social Media using NLP , 2018, 2018 3rd International Conference on Information Technology Research (ICITR).

[57]  Urmila Shrawankar,et al.  Transliteration of Secured SMS to Indian Regional Language , 2016 .