I am senior research scientist at Zalando Research, researching deep learning technologies for advanced text analytics capabilities over large-scale multilingual text data. Before this, I was a researcher at IBM Research Almaden in San Jose, California, and before that a research associate at the TU Berlin. My research lies at the intersection of natural language processing (NLP) and information extraction (IE), with a particular focus on multilingual data and models of crosslingual semantics.
My current research focus is Flair, a new approach for state-of-the-art NLP. Here, we leverage large-scale neural language modeling at the character-level to enable new state-of-the-art sequence labeling and text classification approaches that outperform all previous approaches at core NLP tasks such as shallow syntactic parsing and named entity recognition. Check it out!
Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe and Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018. [pdf]
ZAP: An Open-Source Multilingual Annotation Projection Framework. Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]
FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German. Leonidas Lefakis, Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]
CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles. Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, Anbang Xu. 2017 Conference on Empirical Methods on Natural Language Processing, EMNLP 2017. [pdf]
Multilingual Information Extraction with PolyglotIE. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yonas Kbrom, Yunyao Li and Huaiyu Zhu. 26th International Conference on Computational Linguistics, COLING 2016. [pdf][video]
K-SRL: Instance-based Learning for Semantic Role Labeling. Alan Akbik and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]
Multilingual Aliasing for Auto-Generating Proposition Banks. Alan Akbik, Xinyu Guan and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]
Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik, Vishwajeet Kumar and Yunyao Li. 2016 Conference on Empirical Methods on Natural Language Processing, EMNLP 2016.[pdf]
Polyglot: Multilingual Semantic Role Labeling with Unified Labels. Alan Akbik and Yunyao Li. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016. [pdf]
Flair NLP. My main current line of research focuses on new neural approaches to core NLP tasks. In particular, we present an approach that leverages character-level neural language modeling to learn latent representations that encode "general linguistic and world knowledge". These representations are then used as word embeddings to set new state-of-the-art scores for classic NLP tasks such as multilingual named entity recognition and part-of-speech tagging. Check out the project overview page for more details
Universal Proposition Banks. In this line of research, I am investigating methods for semantically parsing text data in a wide range of languages, such as Arabic, Chinese, German, Hindi, Russian and many others. In order to train such parsers, we are automatically generating Proposition Bank-style resources from parallel corpora. We are making all resources publicly available, so check out the project overview page for more details and the generated Proposition Banks.
Text and Data Mining
alan [dot] akbik [ät] zalando [dot] de
(BTW: we are hiring!)