Big data, le cas des systèmes d'information

Nous presentons dans cet article les principaux defis que pose le « big data » aux systemes d’information, c’est-a-dire aux systemes en charge du stockage et du traitement des donnees en vue de prises de decision. Apres avoir detaille deux applications majeures du big data que sont la recherche d’information et l’intelligence economique, nous nous interessons a la place des donnees ouvertes et du web dans le big data ainsi qu’a celle que le web occupe dans les sciences et la societe. Nous abordons ensuite les methodes et technologies informatiques deployees pour traiter le big data en mettant l’accent sur la facon dont les donnees sont stockees, traitees et analysees afin d’en extraire des connaissances. Nous nous interessons enfin, aux defis que pose le big data aux entreprises et aux citoyens, notamment en termes de qualite des donnees et de preservation de la vie privee.

[1]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[2]  Ian Rae,et al.  F1: A Distributed SQL Database That Scales , 2013, Proc. VLDB Endow..

[3]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[4]  Vitaly Shmatikov,et al.  How To Break Anonymity of the Netflix Prize Dataset , 2006, ArXiv.

[5]  Gerard Salton Progress in automatic information retrieval , 1965, IEEE Spectrum.

[6]  Yiming Yang,et al.  A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[7]  Jacques Savoy,et al.  Comparative study of monolingual and multilingual search models for use with asian languages , 2005, TALIP.

[8]  Jiawei Han,et al.  Mining heterogeneous information networks , 2010, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10.

[9]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[10]  Josiane Mothe,et al.  Is a Relevant Piece of Information a Valid One? Teaching Critical Evaluation of Online Information , 2011, Teaching and Learning in Information Retrieval.

[11]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[12]  Gail-Joon Ahn,et al.  Security and Privacy Challenges in Cloud Computing Environments , 2010, IEEE Security & Privacy.

[13]  Ramin Sadre,et al.  Changes in the Web from 2000 to 2007 , 2008, DSOM.

[14]  Brian L. Connelly,et al.  Competitor Analysis and Foothold Moves , 2012 .

[15]  Leo Sauermann,et al.  Cool URIs for the semantic web , 2007 .

[16]  Clement T. Yu,et al.  An interactive clustering-based approach to integrating source query interfaces on the deep Web , 2004, SIGMOD '04.

[17]  Weiguo Fan,et al.  Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Cong Wang,et al.  Achieving Secure, Scalable, and Fine-grained Data Access Control in Cloud Computing , 2010, 2010 Proceedings IEEE INFOCOM.

[19]  Valerio Pascucci,et al.  Parallel visualization on large clusters using MapReduce , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[20]  Tavis Ormandy An Empirical Study into the Security Exposure to Hosts of Hostile Virtualized Environments Tavis , 2007 .

[21]  Jimmy J. Lin,et al.  Scaling big data mining infrastructure: the twitter experience , 2013, SKDD.

[22]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[23]  Dan Pritchett,et al.  BASE: An Acid Alternative , 2008, ACM Queue.

[24]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.