3D Auditory Scene Visualizer with Face Tracking: Design and Implementation for Auditory Awareness Compensation

This paper presents the design and implementation of 3D Auditory Scene Visualizer based on the visual information seeking mantra, ``overview first, zoom and filter, then details on demand''. The machine audition system called HARK captures 3D sounds with a microphone array.The natural language processing called SalienceGraph visualizes topic transition by using discourse salience. The 3D visualizer implemented in Java 3D displays topic transition and each sound stream as a beam originating from the microphones (overview mode), shows temporal snapshots with/without specifying focusing areas (zoom-and-filter mode), and shows detailed information about a particular sound stream (details-on-demand mode). This three-mode visualization will give the user auditory awareness enhanced by HARK and SalienceGraph. In addition, a face-tracking system automatically determines the user's intention by tracking the user's face. The resulting system will enable users to manage and browse auditory scene files effectively, so it should acceleration and support the information explosion to compensate the lack of auditory awareness.

[1]  Tetsuya Ogata,et al.  Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Tetsuya Ogata,et al.  Multiple moving speaker tracking by microphone array on mobile robot , 2005, INTERSPEECH.

[3]  Jean Rouat,et al.  Enhanced robot audition based on microphone array source separation with post-filter , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[4]  Yuji Matsumoto,et al.  Japanese Dependency Analysis using Cascaded Chunking , 2002, CoNLL.

[5]  Haiyuan Wu,et al.  A Pixel-wise Object Tracking Algorithm with Target and Background Sample , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[7]  Tetsuya Ogata,et al.  SalienceGraph: Visualizing Salience Dynamics of Written Discourse by Using Reference Probability and PLSA , 2008, PRICAI.

[8]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[9]  Jean Rouat,et al.  Robust Recognition of Simultaneous Speech by a Mobile Robot , 2007, IEEE Transactions on Robotics.

[10]  Ben Shneiderman,et al.  Designing The User Interface , 2013 .

[11]  B. Shneiderman Designing the User Interface (3rd Ed.) , 1998 .