Adaptations of multimodal content in dialog systems targeting heterogeneous devices

Dialog systems that adapt to different user needs and preferences appropriately have been shown to achieve higher levels of user satisfaction [4]. However, it is also important that dialog systems be able to adapt to the user's computing environment, because people are able to access computer systems using different kinds of devices such as desktop computers, personal digital assistants, and cellular telephones. Each of these devices has a distinct set of physical capabilities, as well as a distinct set of functions for which it is typically used. Existing research on adaptation in both hypermedia and dialog systems has focused on how to customize content based on user models [2, 4] and interaction history. Some researchers have also investigated device-centered adaptations that range from low-level adaptations such as conversion of multimedia objects [6] (e.g., video to images, audio to text, image size reduction) to higher-level adaptations based on multimedia document models [1] and frameworks for combining output modalities [3, 5]. However, to my knowledge, no work has been done on integrating and coordinating both types of adaptation interdependently. The primary problem I would like to address in this thesis is how multimodal dialog systems can adapt their content and style of interaction, taking the user, the device, and the dependency between them into account. Two main aspects of adaptability that my thesis considers are: (1) adaptability in content presentation and communication and (2) adaptability in computational strategies used to achieve system's and user's goals. Beside general user modeling questions such as how to acquire information about the user and construct a user model, this thesis also considers other issues that deal with device modeling such as (1) how can the system employ user and device models to adapt the content and determine the right combination of modalities effectively? (2) how can the system determine the right combination of multimodal contents that best suits the device? (3) how can one model the characteristics and constraints of devices? and (4) is it possible to generalize device models based on modalities rather than on their typical categories or physical appearance.