Editorial for the Inaugural Issue of the ACM Journal of Data and Information Quality (JDIQ)

A growing component of organizational operations today involves the collection, storage, and dissemination of unprecedented vast volumes of data. However, this expansion comes not without growing pains. Organizations are often unable to translate this data into meaningful insights that can be used to improve business processes and change the way we work. The reasons for this difficulty can often be traced to issues of data and information quality, involving both problematic symptoms and their underlying causes. Previously collected data can turn out to be inconsistent, inaccurate, incomplete, or outof-date. Organizations can have inappropriate or conflicting strategies across the “pockets” of an enterprise that interfere with the ability to get the right information to the right stakeholders in the right format at the right place and time. To make matters worse, the boundary of stakeholders is broadening and increasingly involves extended enterprises often reaching a global interenterprise scale. The time horizon for the use of information also becomes an open and moving target. In recent years, several terms have emerged to refer to these issues, such as Information Quality and Data Quality. We have chosen to name this journal Data and Information Quality to cover the full range of issues and will generally use these terms interchangeably. Complicating matters is the fact that today’s organizations need to do more with their data if they are to compete effectively. Data quality as measured by its fitness for use in a particular application is a major consideration and possibly a thorny issue when discussing issues such as data privacy and protection, data lineage and provenance, enterprise architecture, data mining, data cleaning, as well as data integration processes such as entity resolution and master data-management. Particularly in the area of data integration processes, organizations must grapple with how to deal with incomplete customer data, inaccurate or conflicting data, and fuzzy data as they strive to develop measures of confidence for the information produced in this environment. Even more daunting is the reality that even if organizations get the creation and management of information right for current stakeholders, there is always the prospect of unexpected future stakeholders to consider. How does one ensure that over the long term information will remain accessible, trustworthy, and meaningful in the face of rapidly changing computing and storage technologies and corresponding demands for use? What types of models, methods,