Language Identification for Hindi Language Transliterated Text in Roman Script Using Generative Adversarial Networks

This work aims to achieve a novel approach to identify text-based content written in Roman script which conveys meaning in Hindi language. The research work proposes a methodology to identify language based on semantic meaning of the text. The solution is approached by means of feature extraction which are eventually fed to artificial neural network(ANN). The final output of the ANN is multiplied with the feature vector and then fed through a autoencoder and a generative adversarial network(GAN) which then trains the model in a semi-supervised manner. The feature extraction defines a feature vector, and ANN model then detects the probability of language classified correctly. The data set was curated using open data from Web, and common chat applications were used to curate the data set. Parts from that data set were used to form the training and test data and also the data for comparative study for the purpose of evaluation.