Discourse relations and document structure

Henning Lobin

Angewandte Sprachwissenschaft und Computerlinguistik
Universität Gießen

In the past, several approaches to automatic discourse analysis have been developed as applications of relational discourse theories which describe the semantics of discourse. Often, these approaches deal with the analysis of newspaper articles. When dealing with more complex text types, it seems reasonable to consider document structure as an additional source of knowledge about discourse (or textual) semantics. Document syntax (in the form of logical document structure), and document pragmatics (genre-specific text type structure) can play a major role in the relational analysis of complex linearly ordered text such as represented by the text type of scientific articles. In the project SemDok, which is part of the Research Group "Text-technological modelling of information" funded by the German Research Foundation DFG and scheduled to run in its second phase for three years 2005-2008, a discourse parser for scientific research articles is being developed. While relational discourse parsing is traditionally based on the analysis of discourse connectives and morphological and sentence-syntactic features, in SemDok we aim at additionally describing and processing properties of the logical document structure, the thematic structure, and the text type structure as (abstract) discourse markers. For the representation and processing of the several knowledge sources and grammatical and text linguistic analyses on different levels, text-technological (XML-based) formalisms and methods are employed.

