On reasoning from data

Our society is currently entering a new phase in which gigabytes of’ information are becoming readily available for exploration over academic networks, digital libraries, and commercial information services as well as in proprietary commercial and governmental databases. This important technological development presents a substantial challenge, as future intelligent systems must be able to store very large streams of data, summarize and index this data using concise and efficient models, and subsequently perform very efficient retrieval and reasoning in response to real-time queries and updates. We informally refer to this challenging task as reasoning from data. Most previous AI research and applications have concentrated on relatively simple operations, for example, highly constrained queries on relatively static, immutable systems of knowledge such as mathematics, chess, and hardware components inventories, where it is possible to abstract rules that can be viewed as true and valid. There are many other domains in which data changes more or less rapidly and in which abstract truths are at best temporary or contingent, for example, robot environments, software environments, demographic databases and public-health data, ecological and economics (ecosystems, chemical processes, marketing and point-of-sale databases, financial time series, and video and text databases. In addition, these domains are associated with a demand for very fast response to unanticipated queries and continuous updates over uncertain, dynamic, interactive, and rapidly changing environments. These domains present a challenge for purely symbolic, rule-based approaches to AI. For instance, it appears to be difficult to give a formal logical specification of concepts such as an important electronic message, a fair scheduler, an urgent phone call, a good travel package to Hawaii, an intriguing new paper about Bayesian reasoning, a high-risk car, a good real-estate investment, or an interesting economic trend. Statistical decision theory [Pearl 1988] provides a useful framework to model adaptive intelligent agents in stochastic and rapidly evolving domains. Moreover, it provides precise criteria (loss functions, expected utility) to evaluate the performance of such agents. i30wever, when the environment is large, the process of fitting good models (finding maximum u posterior models or even maximum likelihood models) to data generated by the environment is typically computationally intractable. When the envwonment is small, we often have trouble getting sufficient statistics. Thus the models we can devise effectively are rarely accurate, regardless of the size of