Multimodal social intelligence in a real-time dashboard system

Social Networks provide one of the most rapidly evolving data sets in existence today. Traditional Business Intelligence applications struggle to take advantage of such data sets in a timely manner. The BBC SoundIndex, developed by the authors and others, enabled real-time analytics of music popularity using data from a variety of Social Networks. We present this system as a grounding example of how to overcome the challenges of working with this data from social networks. We discuss a variety of technologies to implement near real-time data analytics to transform Social Intelligence into Business Intelligence and evaluate their effectiveness in the music domain. The SoundIndex project helped to highlight a number of key research areas, including named entity recognition and sentiment analysis in Informal English. It also drew attention to the importance of metadata aggregation in multimodal environments. We explored challenges such as drawing data from a wide set of sources spanning a myriad of modalities, developing adjudication techniques to harmonize inputs, and performing deep analytics on extremely challenging Informal English snippets. Ultimately, we seek to provide guidance on developing applications in a variety of domains that allow an analyst to rapidly grasp the evolution in the social landscape, and show how to validate such a system for a real-world application.

[1]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[2]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[3]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Daniel Gruhl,et al.  MONGOOSE: MONitoring Global Online Opinions via Semantic Extraction , 2009, 2009 IEEE International Conference on Cloud Computing.

[6]  Iryna Gurevych,et al.  Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval , 2008, CLEF.

[7]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[9]  K. Arrow,et al.  Social Choice and Individual Values , 1951 .

[10]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[11]  Meena Nagarajan,et al.  Applications of Voting Theory to Information Mashups , 2008, 2008 IEEE International Conference on Semantic Computing.

[12]  W. Scott Spangler,et al.  The integration of business intelligence and knowledge management , 2002, IBM Syst. J..

[13]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[14]  W. Riker,et al.  Liberalism Against Populism: A Confrontation Between the Theory of Democracy and the Theory of Social Choice , 1982 .

[15]  Malik Magdon-Ismail,et al.  The Impact of Ranker Quality on Rank Aggregation Algorithms: Information vs. Robustness , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[16]  Adam Thomason Blog Spam: A Review , 2007, CEAS.

[17]  Abram Burk A Reformulation of Certain Aspects of Welfare Economics , 1938 .

[18]  D TurneyPeter,et al.  Measuring praise and criticism , 2003 .

[19]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[20]  David Josephsen,et al.  Awarded Best Paper! - Scalable Centralized Bayesian Spam Mitigation with Bogofilter , 2004 .

[21]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[22]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[23]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[24]  D. Gruhl,et al.  Artist Ranking Through Analysis of On-line Community Comments , 2008 .

[25]  Jeremy Blosser,et al.  Scalable Centralized Bayesian Spam Mitigation with Bogofilter , 2004 .

[26]  David Riesman,et al.  Listening to Popular Music , 1950 .

[27]  Tyrone Grandison,et al.  Accessing the deep web: when good ideas go bad , 2008, OOPSLA Companion.

[28]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[29]  R. Graham,et al.  Spearman's Footrule as a Measure of Disarray , 1977 .

[30]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss classification , 2005, CIKM '05.

[31]  Amit P. Sheth,et al.  Changing Focus on Interoperability in Information Systems:From System, Syntax, Structure to Semantics , 1999 .

[32]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[33]  David Josephsen,et al.  Scalable Centralized Bayesian Spam Mitigation with Bogofilter (Awarded Best Paper!) , 2004, LISA.

[34]  Gautam Mitra,et al.  Adapting on-line analytical processing for decision modelling: the interaction of information and decision technologies , 1999, Decis. Support Syst..

[35]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[36]  Rida Laraki,et al.  A theory of measuring, electing, and ranking , 2007, Proceedings of the National Academy of Sciences.

[37]  Nilesh M. Shelke,et al.  Survey of Techniques for Opinion Mining , 2012 .

[38]  D. Saari Geometry of voting , 1994 .

[39]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[40]  E. Tufte Beautiful Evidence , 2006 .

[41]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[42]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[43]  Amit P. Sheth,et al.  Context and Domain Knowledge Enhanced Entity Spotting in Informal Text , 2009, SEMWEB.

[44]  Ismailcem Budak Arpinar,et al.  Ontology-Driven Automatic Entity Disambiguation in Unstructured Text , 2006, SEMWEB.