Visual Display, Pointing, and Natural Language: The Power of Multimodal Interaction

De Angeli, A.; Gerbino, W.; Petrelli, D.; Cassano, G.

Communication in natural settings is effective and robust also because information obtained through different modalities is integrated within modality-independent representations. While most information is provided by verbal language, speaker's intonation, facial expressions and gestures too convey semantic and pragmatic contents of the message. In many cases, different modalities reinforce each other; as when non-verbal cues stress the most salient concept communicated by an utterance. In other cases, the final meaning is distributed across different modalities; as when a gesture is integrated inside a verbal utterance to designate a location. Under the latter condition, understanding meaning requires combining linguistic and para-linguistic cues. This paper discusses how pointing, natural language and graphical layout should be integrated to enhance the usability of multimodal systems. Its primary goal is a contribution to a predictive model that accounts for communication behaviour during user interaction with intelligent multimodal systems capable of understanding keyboard-mediated natural language and mouse-supported pointing gestures. Its specific goal is to test the effect of different graphical layouts and of user expertise on the production of distinct referent identification strategies. To this aim, a comprehensive exploratory analysis of a wide corpus of locative acts, collected through simulation, is presented