More recently Gerald Gazdar has been concerned with problems of linguistic knowledge acquisition from corpora, the topic of this lecture. The lecture is based on work done together with Robert Gaizauskas and Diana McCarthy:
Most (morphological) words occur in more than one kind of syntactic environment. Thus 'eat', for example, can be used (at least) transitively and intransitively. Suppose we want to knowThis paper describes work that automatically extracts verbal subcategorization frames (and their frequencies) from the Penn Treebank (PTB-II), a task which turns out to be less easy to do than the authors originally envisaged.
- what classes of syntactic environment the most commonly used words do occur in, and
- how frequently. Then the obvious resource to exploit is a treebank.
In addition to giving an account of the difficulties and exhibiting some results, part of the talk will be spent on explaining why knowledge of the relative frequencies of the verbal subcategorization frames in PTB-II might be of interest even to people who have no concern with developing efficient parsers for Wall Street Journal text.