Universität Bielefeld - Sonderforschungsbereich 360

Automatic Extraction of Lexico-Syntactic
Information from Treebanks

Prof. Gerald Gazdar, University of Sussex, Brighton, UK

Wednesday, December 10th, 1997
16.15 - 17.45 h
Hörsaal 8


Gerald Gazdar is one of the internationally best known linguists and computational linguists, whose work has had a decisive influence on the development of these fields during the past twenty years. His major contributions include work in logical pragmatics in the 1970s, explicit falsification of basic assumptions of Chomskyan transformational grammar, the development of `Generalised Phrase Structure Grammar (GPSG)', the main precursor of `Head-driven Phrase Structure Grammar (HPSG)', and of the widely-used lexicon knowledge representation formalism DATR.

More recently Gerald Gazdar has been concerned with problems of linguistic knowledge acquisition from corpora, the topic of this lecture. The lecture is based on work done together with Robert Gaizauskas and Diana McCarthy:

Most (morphological) words occur in more than one kind of syntactic environment. Thus 'eat', for example, can be used (at least) transitively and intransitively. Suppose we want to know
  1. what classes of syntactic environment the most commonly used words do occur in, and
  2. how frequently. Then the obvious resource to exploit is a treebank.
This paper describes work that automatically extracts verbal subcategorization frames (and their frequencies) from the Penn Treebank (PTB-II), a task which turns out to be less easy to do than the authors originally envisaged.

In addition to giving an account of the difficulties and exhibiting some results, part of the talk will be spent on explaining why knowledge of the relative frequencies of the verbal subcategorization frames in PTB-II might be of interest even to people who have no concern with developing efficient parsers for Wall Street Journal text.


Anke Weinberger, 1997-11-18