Semantic Dependency Relations

(This is a copy of my blog post mode on the wordpress opencog brainwave blog on 5th of October 2009 -- Linas Vepstas.)

I spent the weekend comparing the Stanford parser to RelEx, and learned a lot. RelEx really does deserve to be called a “semantic relation extractor”, and not just a “dependency relation extractor”. It provides a more abstract, more semantic output than the Stanford parser, which sticks very narrowly to the syntactic structure of a sentence.

I wrote up a few paragraphs on the most prominent differences; most of my updates were to the RelEx dependency relations page.

Here are the main bullet points:

RelEx attempts basic entity extraction, and thus avoids generating nn noun modifier relations for named entities.
RelEx will collapse the object and complement of a preposition into one. Stanford will do this for some, but not all relationships.
RelEx will convert passive subjects into objects, and instead indicate passiveness by tagging the verb with a passive tense feature.
RelEx avoids generating copulas, if at all possible, and instead indicates copular relations as predicative adjectives, or in other ways.
RelEx extracts semantic variables from questions, with the intent of simplifying question answering. For example, “Where is the ball?” generates _pobj(_%atLocation, _$qVar) _psubj(_%atLocation, ball), which can then pattern-match a plausible answer: _pobj(under, couch).
RelEx attempts to extract comparison variables.

Its also clear to me that I could split up the relex processing into two stages: one which generates stanford-style syntactic relations, and a second stage that generates the more abstract stuff. This might be a wise move … Since RelEx is already more than 3x faster than the Stanford parser, this could attract new users.