Universal Proposition Banks

One of my main research projects is to semi-automatically create the Universal Proposition Banks, a set of treebanks that enable the training of crosslingual parsers and the study of crosslingual semantics. This effort builds on the Universal Dependencies project and adds a layer of crosslingually unified shallow semantic information.

To illustrate what Universal Proposition Banks are, consider the following sentences:

  1. Letzte Woche habe ich den Futon bestellt. (German)
  2. Vahtimestari tilasi taksin. (Finnish)
  3. 同年12月29日, 美國大陸航空 訂購 10架787. (Mandarin Chinese)
While these sentences have different meanings, they all share the fact that they pertain to a similar action, namely the action of ordering something (a futon, a taxi, or some airplanes). The Universal Proposition Banks formalize these shared semantics. The action of "ordering something" for instance is marked with the frame label "order.02". The entity that places the order is marked up as A0, and the thing that is being ordered is marked up as A1. Check out these sentences with semantic markup in the Universal Proposition Banks: A picture of should be here

We formalize these crosslingual shallow semantics using Proposition Bank frame and role labels. Check out version 1.0 of the Universal Proposition Banks, publicly available here!

Two-Step Annotation Projection

We proposed a two-step annotation projection approach to automatically generate the universal proposition banks. In annotation projection, we utilize parallel corpora that consist of English sentences and their translations in a target language. We use an English SRL system to predict English semantic frame and role labels for the English sentences. Then, we project this annotation along word alignments to the target language side. This produces a target language sentence with English frame and role labels.

A picture of should be here In our canonical work, we presented a two-step filtering and bootstrapping approach to address the problem of translation shift in annotation projection (i.e. which occurs when a sentence is translated in a non-literal manner). Using this approach, we auto-generated Proposition Banks for 7 languages. The corresponding publication is:

Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015. [pdf]

Low Resource Languages

We are also interested in text analytics and semantic parsing over low-resource languages.

A picture of should be here We applied our approach to Bengali, Malayalam and Tamil, three low-resource languages with over 300 million first language speakers - but almost no NLP resources and tools.

We propose a combination of annotation projection, crowdsourcing and limited expert involvement to create NLP resources for low-resource languages in a cost-effective way. The corresponding publication is:

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik and Yunyao Li. 2016 Conference on Empirical Methods on Natural Language Processing, EMNLP 2016.

Multilingual Aliasing

We also experimented with expert frame aliasing to attain better mappings between target languages and English frames. To this end, we defined a two-stage process for the curation of automatically determined mappings in which (1) incorrect mappings are manually removed, and (2) synonymous mappings are manually merged. Our experiments show that this significantly increases the quality of generated Universal Proposition Banks, while also increasing the saliency of the frame lexicon. The corresponding publication is:

Multilingual Aliasing for Auto-Generating Proposition Banks. Alan Akbik, Xinyu Guan and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]

A picture of me should be here

Alan Akbik

Text and Data Mining
IBM Research
akbika [ät] us [dot] ibm [dot] com