ArCAR: A Novel Deep Learning Computer-Aided Recognition for Character-Level Arabic Text Representation and Recognition

Arabic text classification is a process to simultaneously categorize the different contextual Arabic contents into a proper category. In this paper, a novel deep learning Arabic text computer-aided recognition (ArCAR) is proposed to represent and recognize Arabic text at the character level. The input Arabic text is quantized in the form of 1D vectors for each Arabic character to represent a 2D array for the ArCAR system. The ArCAR system is validated over 5-fold cross-validation tests for two applications: Arabic text document classification and Arabic sentiment analysis. For document classification, the ArCAR system achieves the best performance using the Alarabiya-balance dataset in terms of overall accuracy, recall, precision, and F1-score by 97.76%, 94.08%, 94.16%, and 94.09%, respectively. Meanwhile, the ArCAR performs well for Arabic sentiment analysis, achieving the best performance using the hotel Arabic reviews dataset (HARD) balance dataset in terms of overall accuracy and F1-score by 93.58% and 93.23%, respectively. The proposed ArCAR seems to provide a practical solution for accurate Arabic text representation, understanding, and classification.

[1]  Mohd Yamani Idna Idris,et al.  Approaches for preserving content integrity of sensitive online Arabic content: A survey and research challenges , 2019, Inf. Process. Manag..

[2]  Omer Awad Mohammed,et al.  Translating Ambiguous Arabic Words using Text Mining , 2018 .

[3]  Tae-Seong Kim,et al.  A fully integrated computer-aided diagnosis system for digital X-ray mammograms via deep learning detection, segmentation, and classification , 2018, Int. J. Medical Informatics.

[4]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[5]  Mun-Taek Choi,et al.  Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks , 2018, Comput. Methods Programs Biomed..

[6]  Hitoshi Iyatomi,et al.  AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss , 2020, ACL.

[7]  A. Elnagar,et al.  Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications , 2018 .

[8]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[9]  Sungyoung Lee,et al.  “Fast deep learning computer-aided diagnosis of COVID-19 based on digital chest x-ray images” , 2020, Applied Intelligence.

[10]  Muhammad Abdul-Mageed,et al.  Modeling Arabic subjectivity and sentiment in lexical space , 2017, Inf. Process. Manag..

[11]  Arafat Awajan,et al.  The Use of Hidden Markov Model in Natural ARABIC Language Processing: a survey , 2017, EUSPN/ICTH.

[12]  Heung-Kook Choi,et al.  A Study on Deep Learning Binary Classification of Prostate Pathological Images Using Multiple Image Enhancement Techniques , 2020 .

[13]  E. Nfaoui,et al.  Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text , 2020, J. Intell. Syst..

[14]  Walid Magdy,et al.  A comparative study of effective approaches for Arabic sentiment analysis , 2021, Inf. Process. Manag..

[15]  Mahmoud Al-Ayyoub,et al.  Enhancing Aspect-Based Sentiment Analysis of Arabic Hotels' reviews using morphological, syntactic and semantic features , 2019, Inf. Process. Manag..

[16]  Lailatul Qadri Zakaria,et al.  A Comparative Review of Machine Learning for Arabic Named Entity Recognition , 2017 .

[17]  Abed Allah Khamaiseh,et al.  A comprehensive survey of arabic sentiment analysis , 2019, Inf. Process. Manag..

[18]  Said Ouatik El Alaoui,et al.  A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION , 2020 .

[19]  Yonatan Belinkov,et al.  Language processing and learning models for community question answering in Arabic , 2017, Inf. Process. Manag..

[20]  Ashraf Elnagar,et al.  Arabic text classification using deep learning models , 2020, Inf. Process. Manag..

[21]  Gouda I. Salama,et al.  Arabic Opinion Mining Using Combined CNN - LSTM Models , 2020 .

[22]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[23]  Ibrahim Bounhas,et al.  Building a morpho-semantic knowledge graph for Arabic information retrieval , 2020, Inf. Process. Manag..

[24]  Edwin Valarezo,et al.  Simultaneous Detection and Classification of Breast Masses in Digital Mammograms via a Deep Learning YOLO-based CAD System , 2018, Comput. Methods Programs Biomed..

[25]  Ahmed Guessoum,et al.  Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks , 2020, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[26]  Ashraf Elnagar,et al.  SANAD: Single-label Arabic News Articles Dataset for automatic text categorization , 2019, Data in brief.

[27]  Allan Ramsay,et al.  Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions , 2017, Inf. Process. Manag..

[28]  Ahmed Bouridane,et al.  Writer identification approach based on bag of words with OBI features , 2019, Inf. Process. Manag..

[29]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[30]  Erik Cambria,et al.  A review of sentiment analysis research in Arabic language , 2020, Future Gener. Comput. Syst..

[31]  Raymond Chiong,et al.  Multilingual sentiment analysis: from formal to informal and scarce resource languages , 2016, Artificial Intelligence Review.

[32]  Zuping Zhang,et al.  An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization , 2018, ArXiv.

[33]  Arafat Awajan,et al.  Graph-based Arabic text semantic representation , 2020, Inf. Process. Manag..

[34]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[35]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[36]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[37]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[38]  Karima Meftouh,et al.  Machine translation for Arabic dialects (survey) , 2017, Inf. Process. Manag..

[39]  Alaa El-Halees Filtering spam e-mail from mixed arabic and english messages: a comparison of machine learning techniques , 2009, Int. Arab J. Inf. Technol..

[40]  Tae-Seong Kim,et al.  Evaluation of deep learning detection and classification towards computer-aided diagnosis of breast lesions in digital X-ray mammograms , 2020, Comput. Methods Programs Biomed..

[41]  Mohamed Biniz,et al.  Arabic Text Classification Using Deep Learning Technics , 2018, International Journal of Grid and Distributed Computing.

[42]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..

[43]  Ahmad T. Al-Taani,et al.  Classification of Arabic Text Using Singular Value Decomposition and Fuzzy C-Means Algorithms , 2020 .

[44]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.