Directing Attention in Online Aggregate Sensor Streams via Auditory Blind Value Assignment

Multiparty collaborative applications in which groups of people act in concert to achieve some real-world goal abound. In these situations, it is useful for a central planning agent to receive online audio-visual information from all participants. However, as the size of the group grows, it becomes difficult to process all the sensory streams; cognitive overload prevents direct analysis of sensory streams for situational awareness. To avoid this situation, an automatic method is needed to assign value to each stream and direct the attention of the planning agent to those streams which are most valuable. We present an audio-based blind value assignment (BVA) method to address this problem, and experiments demonstrating the method's efficacy. We demonstrate that use of audio BVA techniques results in automatic value judgments which are broadly similar to human value judgments and superior to automatic judgments based on video information

[1]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[2]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[3]  Malcolm Slaney,et al.  Multimedia edges: finding hierarchy in all dimensions , 2001, MULTIMEDIA '01.

[4]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5]  Ankur Teredesai,et al.  Cognitively Motivated Habituation for Novelty Detection in Video , 2003 .

[6]  Brian Patrick Clarkson,et al.  Life patterns : structure from wearable sensors , 2002 .

[7]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[8]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[9]  Malcolm Slaney,et al.  Temporal events in all dimensions and scales , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[10]  Alex Pentland,et al.  Extracting context from environmental audio , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[11]  Daniel P. W. Ellis,et al.  Features for segmenting and classifying long-duration recordings of "personal" audio , 2004, SAPA@INTERSPEECH.

[12]  Keansub Lee,et al.  Minimal-impact audio-based personal archives , 2004, CARPE'04.