Estimating Digitization Costs in Digital Libraries Using DiCoMo

The estimate of digitization costs is a very difficult task. It is difficult to make exact predictions due to the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and the times involved in the development of their contents. The common practice when we start digitizing a new collection is to set a schedule, and a firm commitment to fulfill it (both in terms of cost and deadlines), even before the actual digitization work starts. As it happens with software development projects, incorrect estimates produce delays and cause costs overdrafts. Based on methods used in Software Engineering for software development cost prediction like COCOMO and Function Points, and using historical data gathered during five years at the Miguel de Cervantes Digital Library, during the digitization of more than 12.000 books, we have developed a method for time and cost estimates named DiCoMo (Digitization Costs Model) for digital content production in general. This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning and OCR, and undergoing human proofreading and error correction, or for the production of digital facsimiles (scanning without OCR). The accuracy of the estimates improve with time, since the algorithms can be optimized by making adjustments based on historical data gathered from previous tasks.

[1]  E. E. Grant,et al.  Exploratory experimental studies comparing online and offline programming performance , 1968, CACM.

[2]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[3]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[4]  Tom DeMarco,et al.  Peopleware: Productive Projects and Teams , 1987 .

[5]  Barry W. Boehm,et al.  Calibrating the COCOMO II Post-Architecture model , 1998, Proceedings of the 20th International Conference on Software Engineering.

[6]  Barry W. Boehm,et al.  Cost models for future software life cycle processes: COCOMO 2.0 , 1995, Ann. Softw. Eng..

[7]  Richard E. Fairley,et al.  Software engineering concepts , 1985, McGraw-Hill series in software engineering and technology.

[8]  Ana Magazinovic Exploring Cost Estimation Inaccuracy - Why do practitioners still fail to predict the actuals? , 2008 .