Universität Bielefeld - Sonderforschungsbereich 360
Eye-movement Research and the Investigation of Dialogue Structure
Thomas Clermont, Hendrik Koesling[1], Marc Pomplun,
Elke Prestin, and Hannes Rieser[2]
CRC 360 ``Situated Artificial Communicators''
University of Bielefeld
P.O. Box 10 01 31
33501 Bielefeld, Germany
(The talk will be given by Hendrik Koesling and Hannes Rieser)
[1] ihkoesli@techfak.uni-bielefeld.de
[2] rieser@lili.uni-bielefeld.de
Abstract
In this talk we report about eye-movement research and the investigation of
dialogue structure as it has been going on at several institutions at the
University of Bielefeld (cf.~the reports listed in the bibliography).
Dialogues, one might argue, can be regarded as sequences of turns in which
certain micro- and macro-structures can be distinguished. How
can this be related to the investigation of agents' eye movements? Working
with empirical data, transcripts, and video tapes of two-person,
task-oriented dialogue reveals that agents do not behave as one would expect
from the standpoint of abstract semantics and pragmatics. Most researchers
do know about that, of course, at least since the Russell-Donellan-Kripke
discussion about speakers' meaning and abstract meaning. Nevertheless, it is
not trivial to find out what speakers do and to start to develop theories
thereof. There has been little research in this area since the 1970s.
As we will see, investigation of speakers' meaning is difficult
because one has to develop precise methods for the observation of their
doings. Although dialogues and video tapes provide only a rough idea of
what is going on, let us start from there. The following observations should
be uncontroversial with respect to these and similar data:
- Speakers select domains of interpretation and use them rather
flexibly. This is important if we want to understand the use of definite
descriptions, anaphora, and all sorts of relational expressions.
- Speakers describe things and situations frequently from an
agent-related perspective.
- The use of descriptive vocabulary, especially non-literal expressions
(tropes) and neologism, is induced by the domain under discussion. In short:
Specific domains instigate specific wordings.
- The sequence of turns produced in describing the setup of an object
depends on the ontological structure an agent ``casts'' over this object.
\item{Agents coordinate their wording in order to complete tasks more
efficiently.
We refer to these observations as the ``flexible domain
constraint'' (1), the ``perspective constraint'' (2), the
``domain-description constraint'' (3), the ``ontology constraint'' (4), and
the ``coordination contraint'' (5), respectively.
If these constraints have some initial plausibility, it seems to be
worthwhile finding out more about them. How can we do that? How can we,
e.g., get more reliable information about the mechanisms of flexible domain
selection? One answer is: We try to find out where an agent's focus of
attention is while he produces speech, say the description of an object. The
area spotted by attention can be taken as some sort of relevantly singled
out domain leaving its traces in the language tokens produced. However,
being in the mind, an agent's attention is not directly observable, hence we
must look for its nearest equivalent and that is where his eyes rest upon.
Perhaps one cannot maintain the latter in general, but it seems to be
acceptable for tasks involving the description of objects seen. In
short, we identify the focus of attention with sequences of clustered foveal
fixations.
In our report we describe three experimental studies involving eyetracking
and the findings they led to: A 2D-blocks-world study, a 2D-``airplane''
study, and, finally, a 3D-``airplane'' study. The currently used
3D-setting seems to be the most promising for future research. Our first 2D-study was
based on task-oriented dialogues, in which an instructor told a constructor
to build up a blocks world as shown in Fig. 1. In the scene used, the
instructor had his blocks world presented on a computer monitor (hence 2D).
Only he was integrated into the eyetracker-setting shown in Fig. 2.
Figure 1: Blocks world
Figure 2: 2D-eyetracker setting involving the instructor of the
task-oriented dialogue
The 2D-study confirmed the intuitively set up constraints reported above. In
addition, new and unexpected observations emerged: The instructor's
eye movements are usually several construction steps ahead whereas the
speech production (the directives produced) ``lags behind''. We may
assume that the instructor's pushing ahead is connected with planning
procedures. Hence we call this contraint the ``asynchrony of planning and
production contraint'' (6). In case of production problems, however,
especially problems concerning word finding or selection of syntax patterns,
the focus of attention remains fixed on the object currently investigated,
the object which gives rise to the problem. We call these cases ``blocked
focus movement'' (7). Moreover, agents can coordinate their foci of
attention: They do so by verbally controlling their eye movements
(``coordination of focus contraint'' (8)).
The first 2D-study also reveals the connection between focusing and
discourse structure, especially the initiation of new turns by the
instructor: In the default case where neither he nor the constructor face
major problems, the instructor can proceed unimpeded and produce his next
turn following the ontology constraint. At the heart of his new move there will,
of course, have to be complex attitudinal states concerning the progress of
the task on the constructor's side. (Although being a very interesting
aspect, we will not further discuss mutuality and common ground
in our report.) In non-default cases various sorts of ``side tracks''
(repairs, side sequences, back-tracking) are produced.
Figure 3: 2D-representation of various perspectives of a toy
airplane serving as the basis for the instructor's directives
The second 2D-study (see Fig. 3) revealed that the perspective contraint and
the ontology constraint are central for focus movement and discourse
production. It also matters whether agents can freely rotate their objects
of interest, since rotation of objects and eye movements are closely
related. Furthermore, eye movements also act as a sort of anticipatory
device for word selection.
Both 2D-studies to be reported yield only imperfect data concerning the
coordination between instructor and constructor whilst they are
carrying out their task. Therefore we have tried to develop a 3D-setting
where the instructor's and the constructor's eye movements can both be
recorded and matched with their speech production. As far as we know, this
is the first 3D-eyetracker setting of this kind in operation (see Fig.~4).
Using it we want to get a clearer understanding of the constraints (1)---
(8) mentioned above. We also hope to show a video sequence in which the
instructor's and the constructor's activities are integrated and provide an
impression about how they organize their interaction. This, however, will
very much depend on whether we can overcome the technical problems with the
eyetracker equipment.
Figure 4: 3D-setting with eyetracking equipment for the
instructor and the constructor
References
General
- [Asher, 1993]
Asher, N. (1993).
- Reference to Abstract Objects in Discourse. Kluwer Academic Publishers.
- [Chierchia, 1995]
Chierchia, G. (1995).
- Dynamics of Meaning---Anaphora, Presupposition, and the Theory
of Grammar. Chicago UP.
- [Clark, 1996]
Clark, H. H. (1996).
- Using Language. Cambridge UP.
- [Just and Carpenter, 1987]
Just, M. and Carpenter, P. (1987).
- The Psychology of Reading and Language Comprehension. Allyn \& Bacon.
Selected Research Reports:
- [Clermont et al., 1995a]
Clermont, T., Meier, C., M., P., Prestin, E., Rieser, H., Ritter, H., and
Velichkovsky, B. (1995a).
- Augenbewegung, Fokus und Referenz. Technical Report 95/8,
SFB 360 ``Situierte Künstliche Kommunikatoren'', Univ. of Bielefeld.
- [Clermont et al., 1995b]
Clermont, T., Meier, C., Pomplun, M., Prestin, E., and Rieser, H. (1995b).
- Focus and Reference. Videofilm on Eye Movements and Focussing. Videofilm.
SFB 360 ``Situierte Künstliche Kommunikatoren'', Univ. of Bielefeld.
- [Essig, 1998]
Essig, K. (1998).
- Messung von binokularen Augenbewegungen in realen und virtuellen
3D-Szenarien. Diplomarbeit, Technische Fakultät der Universität Bielefeld.
- [Heydrich and Rieser, 1994]
Heydrich, W. and Rieser, H. (1994).
- Public Information and Mutual Error. In Kunze, J. and Stoyan, H., editors,
KI-94 Workshops, pages 110--2. Gesellschaft für Informatik: Saarbrücken.
Workshop ``Modellierung epistemischer Propositionen''.
- [Heydrich and Rieser, 1995]
Heydrich, W. and Rieser, H. (1995).
- Public Information and Mutual Error.
Technical Report 95/11, SFB 360 ``Situierte
Künstliche Kommunikatoren'', Univ. of Bielefeld.
- [Meier and Rieser, 1995a]
Meier, C. and Rieser, H. (1995a).
- Modelling Situated Agents' ``Reference Shifts'' in Task-Oriented
Dialogue. In Dreschler-Fischer, L. and Pribbenow, S., editors, KI-95
Activities: Workshops, Posters, Demos, pages 318--21. Gesellschaft für
Informatik: Bonn.
- [Meier and Rieser, 1995b]
Meier, C. and Rieser, H. (1995b).
- Modelling Situated Agents' ``Reference Shifts'' in Task-Oriented
Dialogue.
Technical Report 95/11, SFB 360 ``Situierte
Künstliche Kommunikatoren'', Univ. of Bielefeld.
- [Meier and Rieser, 1996]
Meier, C. and Rieser, H. (1996).
- Perception, Focus and Resolution of Metonymy.
In Gibbon, D., editor, Natural Language Processing and Speech
Technology. Results of the 3rd KONVENS Conference, pages 305--9.
- [Meyer-Fujara and Rieser, 1997]
Meyer-Fujara, J. and Rieser, H. (1997).
- Zur Semantik von Repräsentationsrelationen. Fallstudie Eins zum
SFB-Flugzeug. Technical Report 97/7, SFB 360 ``Situierte
Künstliche Kommunikatoren'', Univ. of Bielefeld.
- [Pomplun et al., 1997]
Pomplun, M., Rieser, H., Ritter, H., and Velichkovsky, B. (1997).
- Augenbewegungen als kognitionswissenschaftlicher
Forschungsgegenstand.
In Kluwe, R., editor, Strukturen und Prozesse intelligenter
Systeme. DUV.
- [Pomplun et al., 1996]
Pomplun, M., Ritter, H., and Velichkovsky, B. (1996).
- Disambiguating Complex Visual Information: Towards Communication of
Personal Views of a Scene.
Perception, 25: 931--948.
- [Pomplun et al., 1994]
Pomplun, M., Velichkovsky, B., and Ritter, H. (1994).
- An Artificial Neural Network for High Precision Eye Movement
Tracking.
In Nebel, B. and Dreschler-Fischer, L., editors, Lecture notes
in artificial intelligence: Proceedings KI-94, pages 63--9. Springer.
- [Rieser, 1997]
Rieser, H. (1997).
- Repräsentations-Metonymie, Perspektive und Koordination in
aufgabenorientierten Dialogen.
In Umbach, C., Grabski, M., and H\"ornig, R., editors,
Perspektive in Sprache und Raum, pages 1--26. DUV.
- [Stampe, 1993]
Stampe, D. (1993).
- Heuristic Filtering and Reliable Calibration Methods for Video-Based
Pupil-Tracking Systems.
Behavioral Research Methods, Instruments, and Computers,
25: 137--42.
- [Velichkovsky et al., 1995]
Velichkovsky, B., Pomplun, M., and Rieser, H. (1995).
- Attention and Communication: Eye-Movement-Based Research Paradigms.
In Zangemeister, W. H., Stiehl, H. S., and Freksa, C., editors,
Visual Attention and Cognition. Elsevier.
Anke Weinberger, 1998-11-13, 1998-11-16