Autonomic User Interface

The object of autonomic computing is essentially to minimize human supervision, i.e. the computer system must at some level manage all its processes and seek human attention only to resolve high level issues. In today’s computer systems, interfaces are by far the most demanding aspect of computers– we spend more time using the “backspace” key than changing broken disks. Thus for a computer system to be truly autonomic it must possess an “autonomic user interface (AUI)”. An autonomic user interface must provide users with a much higher level of service than today’s interfaces while at the same time being self-aware, aware of its environment, adapting to changes and being self-healing. While tasks like wordprocessing and programming can be carried out efficiently with today’s inflexible keyboard and mouse interface, many tasks that involve querying, controlling and instructing a computer can be completed more easily with an autonomic, multi-modal interface, including speech and visual inputs, and more complex output modes than the traditional monitor. The paper discusses a natural interface to a computer through the use of multiple cameras and microphones as sensors while focussing on ways of achieving autonomic characteristics in such interfaces through the use of multiple sensors, cross modality in input and output, learning algorithms and models of system architecture, the user and the environment.

[1]  Kentaro Toyama,et al.  Prolegomena for Robust Face Tracking , 1998 .

[2]  F DejongEstienne,et al.  Voice and emotion , 1991 .

[3]  Mark J. F. Gales,et al.  HMM recognition in noise using parallel model combination , 1993, EUROSPEECH.

[4]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[5]  F Dejong Estienne,et al.  [Voice and emotion]. , 1991, Revue de laryngologie - otologie - rhinologie.

[6]  Sharathchandra U. Pankanti,et al.  Footprints: An IR approach to human detection and tracking , 2001 .

[7]  Gopal Sarma Pingali,et al.  Audio-visual tracking for natural interactivity , 1999, MULTIMEDIA '99.

[8]  Q Summerfield,et al.  Use of Visual Information for Phonetic Perception , 1979, Phonetica.

[9]  Andy Hopper,et al.  The active badge location system , 1992, TOIS.

[10]  Marian Stewart Bartlett,et al.  Classifying Facial Actions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Chalapathy Neti,et al.  Audio-visual intent-to-speak detection for human-computer interaction , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Barry Brumitt,et al.  EasyLiving: Technologies for Intelligent Environments , 2000, HUC.

[13]  Michael S. Brandstein,et al.  A closed-form method for finding source locations from microphone-array time-decay estimates , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14]  J. Cohn,et al.  Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. , 1999, Psychophysiology.

[15]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Philipp Slusallek,et al.  Wide area camera calibration using virtual calibration objects , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  Thomas A. Funkhouser,et al.  SIGGRAPH 2002 Course Notes "Sounds Good to Me!" Computational Sound for Graphics, Virtual Reality, and Interactive Systems , 2002 .

[18]  C. Darwin The Expression of the Emotions in Man and Animals , .

[19]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[20]  Victor Lesser,et al.  The Evolution of Blackboard Control Architectures , 1992 .

[21]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[22]  Myron Flickner,et al.  Detection and tracking of shopping groups in stores , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[24]  Gopal Sarma Pingali,et al.  Integrated audiovisual processing for object localization and tracking , 1997, Electronic Imaging.

[25]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[26]  Benoît Maison,et al.  Audio-visual speaker recognition for video broadcast news: some fusion techniques , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[27]  Fritz Hohl,et al.  Next century challenges: Nexus—an open global infrastructure for spatial-aware applications , 1999, MobiCom.

[28]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[30]  Takeo Kanade,et al.  Detection, tracking, and classification of action units in facial expression , 2000, Robotics Auton. Syst..

[31]  Claudio S. Pinhanez,et al.  Interacting with steerable projected displays , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[32]  Alexander Zelinsky,et al.  An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[33]  Paul Prekop,et al.  Intimate Location Modeling for Context Aware Computing , 2001 .

[34]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[35]  Gregory D. Abowd,et al.  The Aware Home: A Living Laboratory for Ubiquitous Computing Research , 1999, CoBuild.

[36]  Anil K. Jain,et al.  Integrating Faces and Fingerprints for Personal Identification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Ronald Azuma,et al.  A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[38]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[39]  Gregory D. Abowd,et al.  Living laboratories: the future computing environments group at the Georgia Institute of Technology , 2000, CHI Extended Abstracts.

[40]  Gaetano Borriello,et al.  A Survey and Taxonomy of Location Systems for Ubiquitous Computing , 2001 .

[41]  Martin Westphal,et al.  The use of cepstral means in conversational speech recognition , 1997, EUROSPEECH.

[42]  Gaetano Borriello,et al.  Location Systems for Ubiquitous Computing , 2001, Computer.

[43]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[44]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[45]  Cecil H. Coker,et al.  A speech direction finder , 1984, ICASSP '84. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  James M. Rehg,et al.  Vision-based speaker detection using Bayesian networks , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).