Mining diverse opinions

Network operations that support tactical missions are often characterized by evolving information that needs to be delivered over bandwidth constrained communication networks and presented to a social/cognitive network with limited human attention span and high stress. Most past research efforts on data dissemination examined syntactic redundancy between data items (e.g., common bit strings, entropy coding and compression, etc.), but only limited work has examined the problem of reducing semantic redundancy with the goal of providing higher quality information to end users. In this paper we propose to measure semantic redundancy in large volume text streams using online topic models and opinion analysis (e.g., topic = Location X and opinion = possible_hazard+, safe_zone-). By suppressing semantically redundant content one can better utilize bottleneck resources such as bandwidth on a resource constrained network or attention time of a human user. However, unlike syntactic redundancy (e.g., lossless compression, lossy compression with small reconstruction errors), a semantic redundancy based approach is faced with the challenge of having to deal with larger inaccuracies (e.g., false positive and false negative probabilities in an opinion classifier). This paper seeks to quantify the effectiveness of a semantic redundancy based approach (over its syntactic counterparts) as a function of such inaccuracies and present a detailed experimental evaluation using realistic information flows collected from an enterprise network with about 1500 users1.

[1]  Arun Venkataramani,et al.  DTN routing as a resource allocation problem , 2007, SIGCOMM '07.

[2]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[3]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[4]  Md. Yusuf Sarwar Uddin,et al.  PhotoNet: A Similarity-Aware Picture Delivery Service for Situation Awareness , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[7]  Guohong Cao,et al.  Supporting Cooperative Caching in Disruption Tolerant Networks , 2011, 2011 31st International Conference on Distributed Computing Systems.

[8]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[9]  Brian Gallagher,et al.  MaxProp: Routing for Vehicle-Based Disruption-Tolerant Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[10]  Cauligi S. Raghavendra,et al.  Spray and wait: an efficient routing scheme for intermittently connected mobile networks , 2005, WDTN '05.

[11]  Qinghua Li,et al.  Multicasting in delay tolerant networks: a social network perspective , 2009, MobiHoc '09.

[12]  Guohong Cao,et al.  Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[13]  Vijay Erramilli,et al.  Delegation forwarding , 2008, MobiHoc '08.

[14]  Franck Le,et al.  Byte Caching in Wireless Networks , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.