About

I am a research scientist at Zalando Research, researching advanced text analytics capabilities over large-scale multilingual text data that is often ungrammatical (Web text) and domain-specific. Before this, I was a postdoctoral researcher at IBM Research Almaden in San Jose, California, and before that a research associate at the TU Berlin. My research lies at the intersection of natural language processing (NLP) and information extraction (IE), with a particular focus on multilingual data and models of crosslingual semantics. One of my main research projects are the Universal Proposition Banks, a set of treebanks in currently 7 languages annotated with a layer of crosslingually unified shallow semantics. In addition, I pursue research in semantic parsing, unsupervised induction of semantics, information discovery and language modeling.

If you'd like to know more, check out my publications and my projects, or contact me.


Latest News

  • News (01.03.2017): I'm extremely thrilled to join the newly created Zalando Research lab as research scientist to build up NLP and text analytics capabilities!
  • News (09.12.2016): Version 1.0 of Universal Proposition Banks released! It consists of treebanks in several languages with "universal" semantic role labeling annotation.
  • News (11.10.2016): Will give a guest talk at UC Berkeley on multilingual SRL on November 17th!
  • News (30.09.2016): Demo paper on Multilingual IE accepted at COLING 2016!
  • News (20.09.2016): Two full papers accepted at COLING 2016!
  • News (30.08.2016): Check out a screencast of PolyglotIE, our multilingual Information Extraction system!
  • News (29.07.2016): Paper on Semantic Role Labeling of underresourced languages accepted at EMNLP 2016!
  • News (15.06.2016): Check out a screencast of our Multilingual Semantic Role Labeler!
  • News (23.05.2016): Demonstration paper on Multilingual Semantic Role Labeling with Unified labels accepted to ACL 2016!


Latest Publications

Multilingual Information Extraction with PolyglotIE. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yonas Kbrom, Yunyao Li and Huaiyu Zhu. 26th International Conference on Computational Linguistics, COLING 2016. [pdf][video]

K-SRL: Instance-based Learning for Semantic Role Labeling. Alan Akbik and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]

Multilingual Aliasing for Auto-Generating Proposition Banks. Alan Akbik, Xinyu Guan and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik, Vishwajeet Kumar and Yunyao Li. 2016 Conference on Empirical Methods on Natural Language Processing, EMNLP 2016.[pdf]

Polyglot: Multilingual Semantic Role Labeling with Unified Labels. Alan Akbik and Yunyao Li. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016. [pdf]

Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015. [pdf]

more publications


Main Research

Universal Proposition Banks. In this line of research, I am investigating methods for semantically parsing text data in a wide range of languages, such as Arabic, Chinese, German, Hindi, Russian and many others. In order to train such parsers, we are automatically generating Proposition Bank-style resources from parallel corpora. We are making all resources publicly available, so check out the project overview page for more details and the generated Proposition Banks.

Multilingual Text Analytics. Text data is readily available in a multitude of human languages; on the Web and elsewere, trends point to a relative decline of English and a rise in use of non-English languages. Effectively mining such data for structured information of interest is a huge challenge, since traditionally, separate NLP pipelines and extractors need to be build for each language. In this research, I am investigating more cost-effective ways of creating high quality extractors for multilingual data. Check out the project overview page for more details

A picture of me should be here

Alan Akbik

Text and Data Mining
Zalando Research
alan [dot] akbik [├Ąt] zalando [dot] de