Large-scale collective intelligence: from in the way to on the way

The amount of data available due to the rapid spread of advanced information technology is exploding. Nowadays, various groups and research communities worldwide continuously contribute numerous data sets. It is expected that this data will be efficiently utilized for data-driven decision making, which is crucial for interdisciplinary research where a comprehensive picture of the subject requires large amounts of data from disparate data sources. For example, epidemiological data analysis often relies upon knowledge of population dynamics, climate change, migration of species, drug development, etc. Despite its increasing availability, the data cannot be properly utilized by a research community for efficient decision making. The existing data sources are mostly used for regional comparative efforts; they vary widely in degree of consistency, reliability, completeness, as well as in data representation format. Managing this data is beyond capabilities of individual research groups and institutions. The data flood challenge is caused by collective effort of large communities. The related task of large-scale data utilization should also be resolved via collaborative efforts within a large network of researchers. I consider an approach that supports efficient "crowdsourcing" of large-scale information consolidation and utilization task. I elaborate on challenges in developing an infrastructure that engages large community of researchers to share their data; collectively resolve the data discrepancies; and harmonize their efforts in data reliability assessment, data fusion and data-driven decision making.