Angewandte Sprachwissenschaft und Computerlinguistik
Universität Gießen
Abstract
In the past, several approaches to automatic discourse analysis have been
developed as applications of relational discourse theories which describe
the semantics of discourse. Often, these approaches deal with the analysis
of newspaper articles. When dealing with more complex text types, it seems
reasonable to consider document structure as an additional source of
knowledge about discourse (or textual) semantics. Document syntax (in the
form of logical document structure), and document pragmatics (genre-specific
text type structure) can play a major role in the relational analysis of
complex linearly ordered text such as represented by the text type of
scientific articles. In the project SemDok, which is part of the Research
Group "Text-technological modelling of information" funded by the German
Research Foundation DFG and scheduled to run in its second phase for three
years 2005-2008, a discourse parser for scientific research articles is
being developed. While relational discourse parsing is traditionally based
on the analysis of discourse connectives and morphological and
sentence-syntactic features, in SemDok we aim at additionally describing and
processing properties of the logical document structure, the thematic
structure, and the text type structure as (abstract) discourse markers. For
the representation and processing of the several knowledge sources and
grammatical and text linguistic analyses on different levels,
text-technological (XML-based) formalisms and methods are employed.