Integrating Projects from Multiple Open Source Code Forges

Much of the data about free, libre, and open source (FLOSS) software development comes from studies of code forges or code repositories used for managing projects. This paper presents a method for integrating data about open source projects by way of matching projects (entities) across multiple code forges. After a review of the relevant literature, a few of the methods are chosen and applied to the FLOSS domain, including a comparison of some simple scoring systems for pairwise project matches. Finally, the paper describes limitations of this approach and recommendations for future work.

[1]  Mehdi Khosrowpour Cases on Database Technologies and Applications , 2006 .

[2]  Sherif Sakr,et al.  Graph Data Management: Techniques and Applications , 2011, Graph Data Management.

[3]  Inya Nlenanya Building an Environmental GIS Knowledge Infrastructure , 2009, Database Technologies: Concepts, Methodologies, Tools, and Applications.

[4]  Ana Paula Appel,et al.  Graph Mining Techniques: Focusing on discriminating between real and synthetic graphs , 2011, Graph Data Management.

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  Guntis Barzdins,et al.  From Databases to Ontologies , 2009, Database Technologies: Concepts, Methodologies, Tools, and Applications.

[7]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[8]  Wendy T. Lucas Search engines, relevancy, and the World Wide Web , 2001 .

[9]  Keng Siau Information Modeling and Method Engineering: A Psychological Perspective , 1999, J. Database Manag..

[10]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[11]  Jiawei Han,et al.  Object Matching for Information Integration: A Profiler-Based Approach , 2003, IIWeb.

[12]  Mathias Klang The Evolution of Free Software , 2007 .

[13]  Xuelong Li,et al.  Semantic Mining Technologies for Multimedia Databases , 2009 .

[14]  Hugo J. Curti Free Software and Open Source Databases , 2005, Encyclopedia of Database Technologies and Applications.

[15]  Barbara J. Haley,et al.  The benefits of data warehousing at Whirlpool , 1999 .

[16]  David W. Versailles,et al.  Open Source Software Governance Serving Technological Agility: The Case of Open Source Software within the DoD , 2009, Int. J. Open Source Softw. Process..

[17]  S.-M. Huang,et al.  Intelligent Cache Management for Mobile Data Warehouse Systems , 2005, J. Database Manag..

[18]  Tuure Tuunanen,et al.  Is Extreme Programming Just Old Wine in New Bottles: A Comparison of Two Cases , 2005, J. Database Manag..

[19]  Jesús M. González-Barahona,et al.  Repositories with Public Data about Software Development , 2010, Int. J. Open Source Softw. Process..

[20]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[21]  John S. Erickson Database Technologies: Concepts, Methodologies, Tools, and Applications (4 Volumes) , 2009, Database Technologies: Concepts, Methodologies, Tools, and Applications.

[22]  David J. DeWitt,et al.  Duplicate record elimination in large data files , 1983, TODS.

[23]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[24]  Paul L. Bowen,et al.  Ex Ante Evaluations of Alternate Data Structures for End User Queries: Theory and Experimental Test , 2004, J. Database Manag..

[25]  Amita Goyal Chin Text Databases and Document Management: Theory and Practice , 2000 .

[26]  Derrick J. Neufeld,et al.  Isobord's geographic information system (GIS) solution , 2000 .

[27]  K. Amant,et al.  Handbook of Research on Open Source Software: Technological, Economic, and Social Perspectives , 2007 .

[28]  Soon-Young Huh,et al.  Relaxing Queries with Hierarchical Quantified Data Abstraction , 2008, J. Database Manag..

[29]  Imed Hammouda,et al.  Tool Assisted Analysis of Open Source Projects: A Multi-Faceted Challenge , 2011, Int. J. Open Source Softw. Process..

[30]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[31]  Marco Scotto,et al.  Agile Technologies in Open Source Development , 2009 .

[32]  Rogerio Atem de Carvalho,et al.  Free and Open Source Enterprise Resource Planning: Systems and Strategies , 2011 .

[33]  Jesús M. González-Barahona,et al.  Developer identification methods for integrated data from various sources , 2005, ACM SIGSOFT Softw. Eng. Notes.

[34]  Hector Garcia-Molina Pair-Wise entity resolution: overview and challenges , 2006, CIKM '06.

[35]  Philip Calvert,et al.  Encyclopedia of Database Technologies and Applications , 2005 .

[36]  Rong Yan,et al.  Formal Models and Hybrid Approaches for Efficient Manual Image Annotation and Retrieval , 2009, Semantic Mining Technologies for Multimedia Databases.