I am a postdoctoral researcher at IBM Research Almaden in San Jose, California, working at the intersection of natural language processing (NLP) and large-scale data mining technologies. My current research focus is Shallow Semantic Parsing and Information Extraction in multilingual data. To enable this, I am researching an approach to auto-generate semantic role labelers for arbitrary languages that parse different languages into a shared semantic abstraction. I pursue this research in order to enable the development of crosslingual Information Extraction and Question Answering applications, as well as to facilitate studies of crosslingual semantics.
Multilingual Information Extraction with PolyglotIE. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yonas Kbrom, Yunyao Li and Huaiyu Zhu. 26th International Conference on Computational Linguistics, COLING 2016. [pdf][video]
K-SRL: Instance-based Learning for Semantic Role Labeling. Alan Akbik and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]
Multilingual Aliasing for Auto-Generating Proposition Banks. Alan Akbik, Xinyu Guan and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]
Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik, Vishwajeet Kumar and Yunyao Li. 2016 Conference on Empirical Methods on Natural Language Processing, EMNLP 2016.[pdf]
Polyglot: Multilingual Semantic Role Labeling with Unified Labels. Alan Akbik and Yunyao Li. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016. [pdf]
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015. [pdf]
Multilingual Semantic Role Labeling. In this line of research, I am investigating methods for semantically parsing text data in a wide range of languages, such as Arabic, Chinese, German, Hindi, Russian and many others. In order to train such parsers, we are automatically generating Proposition Bank-style resources from parallel corpora. We are making all resources publicly available, so check out the project overview page for more details and the generated Proposition Banks.
Multilingual Information Extraction. Text data is readily available in a multitude of human languages; on the Web and elsewere, trends point to a relative decline of English and a rise in use of non-English languages. Effectively mining such data for structured information of interest is a huge challenge, since traditionally, separate NLP pipelines and extractors need to be build for each language. In this research, I am investigating more cost-effective ways of creating high quality extractors for multilingual data.
Text and Data Mining
akbika [ät] us [dot] ibm [dot] com