论文信息 - Automated Classification and Categorization of Mathematical Knowledge

Automated Classification and Categorization of Mathematical Knowledge

There is a commonMathematics SubjectClassification(MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1- measure achieved on classification task of top-level MSC categories exceeds 89%. We describe and evaluate our methods for measuring the similarity of papers in the digital library based on paper full texts.

Petr Sojka | Radim Rehurek | Petr Sojka | Radim Rehurek

[1] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2] Thierry Bouche. Toward a digital mathematics library , 2008 .

[3] H. E. A.,et al. The International Catalogue of Scientific Literature , 1900, Nature.

[4] Gerard Salton,et al. Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[5] Maria Simi,et al. Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[6] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[7] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8] Yiming Yang,et al. A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[9] Radim Řehůřek,et al. The Influence of Preprocessing Parameters on TextCategorization , 2007 .

[10] Jong-Hak Lee,et al. Analyses of multiple evidence combination , 1997, SIGIR '97.

[11] Stuart Macdonald,et al. User Engagement in Research Data Curation , 2009, ECDL.

[12] Jahrbuch über die Fortschritte der Mathematik , 1889 .

[13] Andrea Esuli,et al. Boosting multi-label hierarchical text categorization , 2008, Information Retrieval.

[14] Robert Krovetz,et al. Viewing morphology as an inference process , 1993, Artif. Intell..

[15] Petr Sojka,et al. From Scanned Image to Knowledge Sharing Formats and Technologies in the Digital Mathematics Library Project , 2005 .

[16] H H Field. THE INTERNATIONAL CATALOGUE OF SCIENTIFIC LITERATURE. , 1899, Science.

[17] Ted E. Dunning,et al. Statistical Identification of Language , 1994 .

[18] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[19] Petr Sojka,et al. DML-CZ: The Objectives and the First Steps , 2008 .

[20] George F. Foster,et al. Confidence estimation for NLP applications , 2006, TSLP.

[21] George Forman,et al. An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[22] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[23] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[24] Yiming Yang,et al. Text categorization , 2008, Scholarpedia.