Assessing Reliability on Annotations (2):
Statistical Results for the DEIKON Scheme
Andy Lücking and Jens Stegmann
Abstract
This is the second part of a two-report mini-series focussing on issues in the evaluation
of annotations. In this empirically-oriented report we lay out the documentation of
the annotation scheme used in the deikon pro ject, discuss the results obtained in a
respective reliability study and conclude with some suggestions regarding forthcoming
versions of the scheme. Relevant statistical background, theoretical considerations in
reliability statistics and an evaluation of some pertaining approaches are given in the
first, more theoretically-oriented report [Stegmann and Lücking, 2005]. The following
points are dealt with in detail here: we describe the setting that was used to elicit
the empirical data. The annotation scheme that is put to scrutiny is documented and
exemplified. Aspects of our theoretical work in linguistics are mentioned en passant.
Then we present, discuss, and interpret the actual results obtained for our scheme. We
find a high degree of correlation on the exact placement of time-stretched entities (word
and gesture phase boundaries), mildly good results pertaining to agreement concerning
time-related categories that appeal to structural configurations (e. g. the position of a
gesture with respect to the parts of accompanying speech), but rather weak agreement
with respect to the determination of gesture function. Therefore, the results for time-
based type-i data look more promising than those obtained for the more theoretically-
framed type-ii categories. However, the type-i results must not be compared with the
type-ii ones on superficial grounds, since the statistics are of a different kind (correlation
vs. agreement, i. e. not chance-adjusted vs. chance-adjusted) and, hence, the results
have to be interpreted in different terms, respectively. Finally, we discuss some issues
in the future make-up of the annotation scheme with a focus on its dialogue parts. Our
respective suggestions amount to a shift towards a more theory-oriented annotation.
(~1372 k)
Anke Weinberger, 2006-05-29,
2006-05-30