About

I am senior research scientist at Zalando Research, researching deep learning technologies for advanced text analytics capabilities over large-scale multilingual text data. Before this, I was a researcher at IBM Research Almaden in San Jose, California, and before that a research associate at the TU Berlin. My research lies at the intersection of natural language processing (NLP) and information extraction (IE), with a particular focus on multilingual data and models of crosslingual semantics.

My current research focus is Flair, a new approach for state-of-the-art NLP. Here, we leverage large-scale neural language modeling at the character-level to enable new state-of-the-art sequence labeling and text classification approaches that outperform all previous approaches at core NLP tasks such as shallow syntactic parsing and named entity recognition. Check it out!

If you'd like to know more, check out my publications, the Flair NLP project, the Universal Proposition Banks, or contact me.


Latest News

  • News (24.06.2018): We are open-sourcing Flair, our very simple framework for state-of-the-art NLP! Try it out!
  • News (20.05.2018): Full paper accepted at COLING 2018, presenting a new type of word embeddings and the new state-of-the-art in NER!
  • News (20.12.2017): Two papers accepted at LREC 2018, on multimodal and multilingual resources released by us!
  • News (16.10.2017): We are happy to announce the ZAP annotation projection framework, which is now open source!
  • News (10.10.2017): I will be invited speaker at Data Natives conference!
  • News (30.06.2017): Two papers on annotation projection and semantic role labeling accepted at EMNLP 2017!


Latest Publications

Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe and Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018. [pdf]

ZAP: An Open-Source Multilingual Annotation Projection Framework. Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]

FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German. Leonidas Lefakis, Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]

The Projector: An Interactive Annotation Projection Visualization Tool. Alan Akbik and Roland Vollgraf. 2017 Conference on Empirical Methods on Natural Language Processing, EMNLP 2017. [pdf][video]

CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles. Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, Anbang Xu. 2017 Conference on Empirical Methods on Natural Language Processing, EMNLP 2017. [pdf]

Multilingual Information Extraction with PolyglotIE. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yonas Kbrom, Yunyao Li and Huaiyu Zhu. 26th International Conference on Computational Linguistics, COLING 2016. [pdf][video]

K-SRL: Instance-based Learning for Semantic Role Labeling. Alan Akbik and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]

Multilingual Aliasing for Auto-Generating Proposition Banks. Alan Akbik, Xinyu Guan and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik, Vishwajeet Kumar and Yunyao Li. 2016 Conference on Empirical Methods on Natural Language Processing, EMNLP 2016.[pdf]

Polyglot: Multilingual Semantic Role Labeling with Unified Labels. Alan Akbik and Yunyao Li. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016. [pdf]

more publications


Main Research

Flair NLP. My main current line of research focuses on new neural approaches to core NLP tasks. In particular, we present an approach that leverages character-level neural language modeling to learn latent representations that encode "general linguistic and world knowledge". These representations are then used as word embeddings to set new state-of-the-art scores for classic NLP tasks such as multilingual named entity recognition and part-of-speech tagging. Check out the project overview page for more details

Universal Proposition Banks. In this line of research, I am investigating methods for semantically parsing text data in a wide range of languages, such as Arabic, Chinese, German, Hindi, Russian and many others. In order to train such parsers, we are automatically generating Proposition Bank-style resources from parallel corpora. We are making all resources publicly available, so check out the project overview page for more details and the generated Proposition Banks.

A picture of me should be here

Alan Akbik

Text and Data Mining
Zalando Research
alan [dot] akbik [├Ąt] zalando [dot] de

(BTW: we are hiring!)