A content annotation language for spoken dialogues

Tovena, L.

This paper describes a context annotation language, called IF (Interchange Format), which is being used for coding dialogues in spontaneous speech and as information exchange protocol within the speech-to-speech translation systems of the international C-STAR consortium. A characteristic of IF is that it results from an effort to approximate the balance point between the conflicting requirements of high expressive power and reduced formal complexity, so as to preserve the possibility of good quality translation while pursuing robustness of the processors. IF captures all the pieces of information which are necessary for a conversation to go on successfully. Therefore it may be used as interlingua, since it need bot be supplemented with extra annotations or paired with other types of representation, and as content annotation for dialogue databases, since some fields in its labels easily provide keywords. IF labels consist of four fields containing an indication of the speaker role in the dialogue, the speech act, a list of domain concepts describing the informational focus of the encoded fragment and a list of attribute-value pairs carrying more specific information