Semi-structured capture and display of telephone conversations

Speech dominates day-to-day communication, but despite technological advances, ordinary conversations have remained outside the realm of computer-supported work. This thesis addresses semi-structured audio, a framework for medium-specific information retrieval of stored voice that does not rely upon knowledge of the actual content. Instead, structure is derived from acoustical information inherent in the stored voice and augmented by user interaction. To demonstrate semi-structured audio, two software applications were constructed: the Listener, a tool for capturing structure while recording telephone conversations, and the Browser, for subsequent browsing of speech fragments. The Listener segments the audio signal, using changes in who is speaking to identify conversational turns, and pause detection to identify phrase boundaries. This conversational structure is dynamically determined during the phone call, and presented in a retrospective display that provides flexible capabilities for marking segments of interest. The Listener and Browser make use of the ChatViewer widget, which supports the display of, and user interaction with, collections of sound and text items called chats. This semi-structured approach makes it practical to retain and access large amounts of recorded speech. Thesis Supervisor: Chris Schmandt Title: Principal Research Scientist 'This work was supported by Sun Microsystems, Inc.

[1]  Mark S. Ackerman,et al.  Augmenting a window system with speech input , 1990, Computer.

[2]  Richard Mander,et al.  Working with audio: integrating personal tape recorders and desktop computers , 1992, CHI '92.

[3]  Paul Resnick,et al.  Skip and scan: cleaning up telephone interface , 1992, CHI '92.

[4]  James L. Flanagan,et al.  Autodirective Microphone Systems , 1991 .

[5]  Chris Schmandt,et al.  Phonetool: integrating telephones and workstations , 1989, IEEE Global Telecommunications Conference, 1989, and Exhibition. 'Communications Technology for the 1990s and Beyond.

[6]  S. Rochester The significance of pauses in spontaneous speech , 1973, Journal of psycholinguistic research.

[7]  Sanjay Manandhar,et al.  Observations on using speech input for window navigation , 1990, INTERACT.

[8]  Robert Eckert,et al.  PX: supporting voice in workstations , 1990, Computer.

[9]  Barry Arons Hyperspeech: navigating in speech-only hypermedia , 1991, HYPERTEXT '91.

[10]  Barry Arons Authoring and Transcription Tools for Speech-Based Hypermedia Systems , 1991 .

[11]  Robert E. Kraut,et al.  Quilt: a collaborative tool for cooperative writing , 1988, COCS '88.

[12]  Wolfgang Horak,et al.  Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization , 1985, Computer.

[13]  C. Wilson,et al.  Watergate Words , 1977 .

[14]  M. J. Muller,et al.  Toward a definition of voice documents , 1990, COCS '90.

[15]  Douglas B. Terry,et al.  An overview of the Etherphone system and its applications , 1988, [1988] Proceedings. 2nd IEEE Conference on Computer Workstations.

[16]  Kevin Crowston,et al.  How do experienced information lens users use rules? , 1989, CHI '89.

[17]  Barry Arons,et al.  PHONE SLAVE: A GRAPHICAL TELECOMMUNICATIONS INTERFACE , 1985 .

[18]  Ramana Rao,et al.  Semi-structured messages are surprisingly useful for computer-supported coordination , 1986, CSCW '86.

[19]  Thomas P. Moran,et al.  The workaday world as a paradigm for CSCW design , 1990, CSCW '90.

[20]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[21]  D. Rutter,et al.  The role of visual communication in synchronising conversation , 1977 .

[22]  Robert G. Schwab,et al.  The temporal structure of cooperative activity , 1990, CSCW '90.

[23]  D. Norman,et al.  Psychological Issues in Support of Multiple Activities , 1986 .

[24]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[25]  Wendy E. Mackay,et al.  Virtual video editing in interactive multimedia applications , 1989, CACM.

[26]  Polle Zellweger,et al.  Scripted documents: a hypermedia path mechanism , 1989, Hypertext.

[27]  John Ronayne The integrated services digital network : from concept to application , 1987 .

[28]  Alphonse Chapanis,et al.  The Effects of 10 Communication Modes on the Behavior of Teams During Co-Operative Problem-Solving , 1974, Int. J. Man Mach. Stud..

[29]  J. Karshmer Just say yes. , 1990, Nursing.

[30]  Julia Hirschberg,et al.  The intonational Structuring of Discourse , 1986, ACL.

[31]  B. Butterworth,et al.  Speech and Interaction in Sound-only Communication Channels , 1977 .

[32]  Glorianna Davenport,et al.  Cinematic primitives for multimedia , 1991, IEEE Computer Graphics and Applications.

[33]  Hideo Miyahara,et al.  Multimedia Presentation System Harmony with Temporal and Active Media , 1991, USENIX Summer.

[34]  G. Beattie,et al.  The temporal structure of natural telephone conversations (directory enquiry calls) , 1979 .

[35]  T. Sticht,et al.  Review of research on the intelligibility and comprehension of accelerated speech. , 1969, Psychological bulletin.

[36]  Daniel S. Beasley,et al.  chapter 12 – Time- and Frequency-Altered Speech , 1976 .

[37]  Robert E. Kraut,et al.  Expressive richness: a comparison of speech and text as media for revision , 1991, CHI.

[38]  Lisa J. Stifelman Not Just Another Voice Mail System , 1991 .

[39]  Chris Schmandt The intelligent ear: a graphical interface to digital audio , 1981 .

[40]  D. Rutter Communicating by telephone , 1987 .