Multi-label Scientific Document Classification

Scientific document label identification is a significant research area having numerous applications like digital libraries. The author assigns a category or categories to their document manually. Likewise, categories are structured in taxonomy in the form of tree such as ACM CCS. The dilemma becomes more complex when a document belongs to multiple categories. The problem of manual assignment becomes more complicated when the number of expected labels increases. Moreover, the accession schemes are insufficient for solutions with higher accuracy on real scientific document datasets. One way to handle the multi-label classification is to change the problem into a single-label classification. Another way is the variation of the algorithm to handle multi-label classification. The focus of our research is on conversion. Moreover, we propose a solution stimulated from the particle swarm optimization algorithm that can consign a label from the taxonomy. A set of similarity measures is evaluated as well for documentation relatedness that are used in the proposed approach. The designed solution is evaluated on two documents dataset that are retrieved from J. UCS and ACM with an average accuracy of 77 percent as compared to the state of the art algorithms .

[1]  Luca Martino,et al.  Efficient monte carlo optimization for multi-label classifier chains , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  António Paulo Santos,et al.  Multi-label Hierarchical Text Classification using the ACM Taxonomy , 2009 .

[3]  Pavel Král,et al.  Multi-label Document Classification in Czech , 2013, TSD.

[4]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[5]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[6]  Xuesong Yan,et al.  Multi-label Classification based on Particle Swarm Algorithm , 2013, 2013 IEEE 9th International Conference on Mobile Ad-hoc and Sensor Networks.

[7]  Ziqiang Wang,et al.  A PSO-Based Web Document Classification Algorithm , 2007, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007).

[8]  Muhammad Abdul Qadir,et al.  Multi-label classification of computer science documents using fuzzy logic , 2016 .

[9]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Zoran Bosnic,et al.  Ontology-based multi-label classification of economic articles , 2011, Comput. Sci. Inf. Syst..

[12]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[13]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[14]  Luca Martino,et al.  Scalable multi-output label prediction: From classifier chains to classifier trellises , 2015, Pattern Recognit..

[15]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[16]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[17]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[18]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[19]  Choo-Yee Ting,et al.  A Bayesian Approach to Classify Conference Papers , 2006, MICAI.

[20]  Andreas Hotho,et al.  Automatic Multi-label Subject Indexing in a Multilingual Environment , 2003, ECDL.

[21]  Neal S. Coulter,et al.  Computing classification system 1998: Current status and future maintenance , 1998 .

[22]  Geoff Holmes,et al.  MEKA: A Multi-label/Multi-target Extension to WEKA , 2016, J. Mach. Learn. Res..

[23]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Grigorios Tsoumakas,et al.  Protein Classification with Multiple Algorithms , 2005, Panhellenic Conference on Informatics.

[26]  Edward A. Fox,et al.  Combining structural and citation-based evidence for text classification , 2004, CIKM '04.

[27]  Muhammad Abdul Qadir,et al.  Exploiting reference section to classify paper's topics , 2011, MEDES.

[28]  Sohail Asghar,et al.  Classification of Scientific Publications using Swarm Intelligence , 2013 .

[29]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[30]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[31]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[32]  Jan Svec,et al.  Improving Multi-label Document Classification of Czech News Articles , 2015, TSD.

[33]  Armin Eberlein,et al.  Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing , 2009, Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[34]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[35]  Xuesong Yan,et al.  Multi-label classification algorithm research based on swarm intelligence , 2016, Cluster Computing.

[36]  Wolf-Tilo Balke,et al.  Improving citation mining , 2009, 2009 First International Conference on Networked Digital Technologies.

[37]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[38]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[39]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.