"Building a search engine for algorithms" by Suppawong Tuarob, Prasenjit Mitra, and C. Lee Giles with Martin Vesely as coordinator

A significant number of scholarly articles in computer science and other disciplines contain algorithms that provide concise descriptions for solving a wide variety of computational problems. Automatically finding and extracting these algorithms in scholarly digital documents would make possible algorithm indexing, searching, discovering, and analysis. Currently, only well known algorithms are cataloged. In order to find new and cutting-edge algorithms, a user must manually search through a large collection of scholarly documents or author homepages. In this article, we describe an initial prototype of AlgorithmSeer, a system for extracting, indexing, and searching for algorithms in scholarly documents. The initial system has been tested as part of the CiteSeerX digital library and search engine. Current issues and future directions, such as algorithm information extraction and classification, are also discussed.

[1]  Sergei Maslov,et al.  Ranking scientific publications using a model of network traffic , 2006, ArXiv.

[2]  Prasenjit Mitra,et al.  Summarizing figures, tables, and algorithms in scientific publications to augment search results , 2012, TOIS.

[3]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[4]  James Ze Wang,et al.  Automatic Extraction of Data from 2-D Plots in Documents , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[5]  C. Lee Giles,et al.  Automatic tag recommendation for metadata annotation using probabilistic topic modeling , 2013, JCDL '13.

[6]  Kun Bai,et al.  TableSeer: automatic table metadata extraction and searching in digital libraries , 2007, JCDL '07.

[7]  Madian Khabsa,et al.  AckSeer: a repository and search engine for automatically extracted acknowledgments from digital libraries , 2012, JCDL '12.

[8]  Jade Goldstein Stewart,et al.  Genre Oriented Summarization , 2009 .

[9]  Preslav Nakov,et al.  BioText Search Engine: beyond abstract search , 2007, Bioinform..

[10]  Hong Guo,et al.  PrestigeRank: A new evaluation method for papers and journals , 2011, J. Informetrics.

[11]  Wang-Chien Lee,et al.  CiteSeerx: an architecture and web service design for an academic document search engine , 2006, WWW '06.

[12]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[13]  Sushil Krishna Bajracharya,et al.  Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation , 2011, ICSE 2011.

[14]  C. Lee Giles,et al.  A classification scheme for algorithm citation function in scholarly works , 2013, JCDL '13.

[15]  Conrad S. Tucker Fad or Here to Stay: Predicting Product Market Adoption and Longevity Using Large Scale, Social Media Data DETC2013-12661 , 2013 .

[16]  A. K. Tripathy,et al.  VEDD- a visual wrapper for extraction of data using DOM tree , 2012, 2012 International Conference on Communication, Information & Computing Technology (ICCICT).

[17]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.

[18]  Prasenjit Mitra,et al.  An algorithm search engine for software developers , 2011, SUITE '11.

[19]  C. Lee Giles,et al.  Automatic Detection of Pseudocodes in Scholarly Documents Using Machine Learning , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[21]  C. Lee Giles,et al.  Improving algorithm search using the algorithm co-citation network , 2012, JCDL '12.