Chapter 15 – Using Publicly Available Data

This chapter focuses on the publicly available data sets for the consideration of their value in terms of information integration and exploitation, and shows the value of acquiring and managing this kind of data. A public data set is a collection of data that is collected as a byproduct of some legal or regulatory mandate that requires registration of some event or transaction. In some cases, personal data supplied directly by individuals is made available in both individual and aggregated forms. There are public data sets made available by government bodies as a convenience to their constituents, specifically for the public use. In either of these cases, depending on the context or source, significant value can be added to internal data sets by acquiring and integrating publicly available data. There are three major management issues associated with the use of publicly available data: integration, privacy, and lack of structure. The integration issue is similar to the general data integration problems. The second major issue revolves around personal privacy. There is a perception that any organization that collects data about individuals and then tries to exploit that information is invading that person's privacy. The third major issue is that a lot of publicly available data is not always in a nicely structured form that is easily adaptable.