Publications

On this page you find an overview of my scientific publications ordered by year. Further down the page, you also find other scientific output. Finally, there is a list of the bachelor and master theses I advised while I was a research associate at the Berlin Institute of Technology.

Latest Publications (since 2024)

Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions. Max Dallabetta, Conrad Dobberstein, Adrian Breiding and Alan Akbik. ArXiv, 2024. [pdf]

OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs. Patrick Haller, Ansar Aynetdinov and Alan Akbik. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics: System Demonstrations, NAACL 2024. [pdf]

BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models. Jacek Wiland, Max Ploner and Alan Akbik. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024.

HunFlair2 in a cross-corpus evaluation of named entity recognition and normalization tools. Mario Sänger, Samuele Garda, Xing David Wang, Leon Weber-Genzel, Pia Droop, Benedikt Fuchs, Alan Akbik, Ulf Leser. ArXiv 2024. [pdf]

PECC: Problem Extraction and Coding Challenges. Patrick Haller, Jonas Golde and Alan Akbik. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, COLING-LREC 2024.

SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity. Ansar Aynetdinov and Alan Akbik. ArXiv, 2024. [pdf]

Large-Scale Label Interpretation Learning for Few-Shot Named Entity Recognition. Jonas Golde, Felix Hamborg and Alan Akbik. 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024. [pdf]

Parameter-Efficient Fine-Tuning: Is There An Optimal Subset of Parameters to Tune? Max Ploner and Alan Akbik. 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024. [pdf]


2023

CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset. Susanna Rücker and Alan Akbik. The 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023. [pdf]

Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs. Jonas Golde, Patrick Haller, Felix Hamborg, Julian Risch and Alan Akbik. The 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2023. [pdf]

OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs. Patrick Haller, Ansar Aynetdinov and Alan Akbik. ArXiv, 2023. [pdf]

ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation. Marcel Milich and Alan Akbik. 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023. [pdf]


2022

Task-Specific Embeddings for Ante-Hoc Explainable Text Classification. Kishaloy Halder, Josip Krapac, Alan Akbik, Anthony Brew, Matti Lyra. ArXiv, 2022[pdf]

Medical Coding with Biomedical Transformer Ensembles and Zero/Few-shot Learning. Angelo Ziletti, Alan Akbik, Christoph Berns, Thomas Herold, Marion Legler, Martina Viell. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, NAACL 2022.[pdf]


2021

Early Detection of Sexual Predators in Chats. Matthias Vogt, Ulf Leser and Alan Akbik. Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021.[pdf]

HunFlair: An easy-to-use tool for state-of-the-art biomedical named entity recognition. Leon Weber, Mario Sänger, Jannes Münchmeyer, Maryam Habibi, Ulf Leser and Alan Akbik. Bioinformatics. 2021 [pdf]


2020

FLERT: Document-Level Features for Named Entity Recognition. Stefan Schweter and Alan Akbik. arxiv. 2020 [pdf]

Task-Aware Representation of Sentences for Generic Text Classification. Kishaloy Halder, Alan Akbik, Josip Krapac and Roland Vollgraf. 28th International Conference on Computational Linguistics, COLING 2020. [pdf]


2019

FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter and Roland Vollgraf. Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2019. [pdf]

Pooled Contextualized Embeddings for Named Entity Recognition. Alan Akbik, Tanja Bergmann and Roland Vollgraf. Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2019. [pdf]

Multilingual Sequence Labeling With One Model. Alan Akbik, Tanja Bergmann and Roland Vollgraf. Northern Lights Deep Learning Workshop, NLDL 2019. [pdf]


2018

Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe and Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018. [pdf]

ZAP: An Open-Source Multilingual Annotation Projection Framework. Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]

FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German. Leonidas Lefakis, Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]


2017

The Projector: An Interactive Annotation Projection Visualization Tool. Alan Akbik and Roland Vollgraf. 2017 Conference on Empirical Methods on Natural Language Processing, EMNLP 2017. [pdf][video]

CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles. Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, Anbang Xu. 2017 Conference on Empirical Methods on Natural Language Processing, EMNLP 2017. [pdf]


2016

Multilingual Information Extraction with PolyglotIE. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yonas Kbrom, Yunyao Li and Huaiyu Zhu. 26th International Conference on Computational Linguistics, COLING 2016. [pdf][video]

K-SRL: Instance-based Learning for Semantic Role Labeling. Alan Akbik and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]

Multilingual Aliasing for Auto-Generating Proposition Banks. Alan Akbik, Xinyu Guan and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]

Improving Data Quality by Leveraging Statistical Relational Learning. Larysa Visengeriyeva, Alan Akbik and Manohar Kaul. 21st International Conference on Information Quality, ICIQ 2016.

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik, Vishwajeet Kumar and Yunyao Li. 2016 Conference on Empirical Methods on Natural Language Processing, EMNLP 2016.[pdf]

Polyglot: Multilingual Semantic Role Labeling with Unified Labels. Alan Akbik and Yunyao Li. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016. [pdf]

Exploratory Relation Extraction in Large Multilingual Data. Alan Akbik. PhD Thesis.


2015

Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015. [pdf]

SCHNÄPPER: A Web Toolkit for Exploratory Relation Extraction. Thilo Michael and Alan Akbik. 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015. [pdf]


2014

Proceedings of the First AHA!-Workshop on Information Discovery in Text. Alan Akbik and Larysa Visengeriyeva. 25th International Conference on Computational Linguistics, COLING 2014. [pdf]

Extracting a Repository of Events and Event References from News Clusters. Silvia Julinda, Christoph Boden and Alan Akbik. AHA! Workshop on Information Discovery in Text, COLING 2014. [pdf]

Nerdle: Topic-Specific Question Answering Using Wikia Seeds. Umar Maqsud, Sebastian Arnold, Michael Hülfenhaus and Alan Akbik. 25th International Conference on Computational Linguistics, COLING 2014. [pdf]

Exploratory Relation Extraction from Large Text Corpora. Alan Akbik, Thilo Michael and Christoph Boden. 25th International Conference on Computational Linguistics, COLING 2014. [pdf]

The Weltmodell: A Data-Driven Commonsense Knowledge Base. Alan Akbik and Thilo Michael.
9th Edition of the Language Resources and Evaluation Conference, LREC 2014. [pdf]

Freepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction. Johannes Kirschnick, Alan Akbik, Holmer Hemsen. 9th Edition of the Language Resources and Evaluation Conference, LREC 2014. [pdf]


2013

Effective Selectional Restrictions for Unsupervised Relation Extraction. Alan Akbik, Larysa Visengeriyeva, Johannes Kirschnick, Alexander Löser. 6th International Joint Conference on Natural Language Processing, IJCNLP 2013. [pdf]

Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees. Alan Akbik, Oresti Konomi and Michail Melnikov
The 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013. [pdf]

Automatic Preservation Watch using Information Extraction on the Web. Luis Faria, Alan Akbik, Barbara Sierman, Marcel Ras, Miguel Ferreira and Jose Carlos Ramalho. 10th International Conference on Preservation of Digital Objects, iPres 2013. [pdf]

QuoteMine: A Repository of Newsworthy Quotes. Alan Akbik, Martin Schenck. International Conference of the German Society for Computational Linguistics and Language Technology, GSCL 2013. [pdf]


2012 and earlier

Unsupervised Discovery of Relations and Discriminative Extraction Patterns. Alan Akbik, Larysa Visengeriyeva, Priska Herger, Holmer Hemsen, Alexander Löser
24th International Conference on Computational Linguistics, COLING 2012. [pdf]

KrakeN: N-ary Facts in Open Information Extraction. Alan Akbik, Alexander Löser
The Knowledge Extraction Workshop at NAACL-HLT, 2012. [pdf]

Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns. Alan Akbik, Jürgen Broß. Workshop on Semantic Search, WWW 2009. [paper][video]


Other Scientific Output

Workshop Organizer

Together with Larysa Visengeriyeva, I organized the First AHA!-Workshop on Information Discovery in Text, held at the 25th International Conference on Computational Linguistics, COLING 2014. You can download the workshop proceedings here: [pdf]

Programme Committee

  • 2017 European Chapter of the Association for Computational Linguistics, EACL 2017
  • 26th International Conference on Computational Linguistics, COLING 2016
  • 2016 Conference on Information and Knowledge Management, CIKM 2016
  • 5th Workshop on Automated Knowledge Base Construction, AKBC 2016
  • 25th International Conference on Computational Linguistics, COLING 2014
  • 4th Workshop on Automated Knowledge Base Construction, AKBC 2014

Thesis Advisor

While at the Berlin Institute of Technology, I advised a number of Bachelor and Master theses. I am happy to say that many of these works either contributed to or directly resulted in academic publications. Some of the theses were written in German, so I am giving English translations of the titles in parentheses.

Master Thesis: Extracting a Repository of Events and Event References from News Clusters
Silvia Julinda created an approach for mining events and their textual representations from news clusters on the Web. Her thesis work led to a publication at the AHA! Workshop on Information Discovery at COLING 2014.

Master Thesis: Learning Semantic Relations with Distributional Similarity

Priska Herger used clustering methods on large corpora of text to determine broad relationship types that hold between nouns, such as hypernymy, meronymy, co-hypernymy and others.

Diplomarbeit: Automatisierte Extraktion textueller Änderungen aus dem Bearbeitungsverlauf von Online-Nachrichtenartikeln ("Extracting Microedits from Online News Articles")
Christian Niedrich mined online news for 'microedits', i.e. small changes that are made to online news articles after they are online. He is developing a method that automatically constructs a corpus of such edits.

Bachelor Thesis: A Workflow for Defining Information Extraction Patterns Oresti Konomi designed a workflow and implemented a tool for defining Information Extraction patterns addressed at users without a background in NLP. Together with the work from Michail Melnikov's bachelor thesis, this work has been published as an ACL 2013 demo.

Bachelor Thesis: Design und Umsetzung eines Systems zur verteilten Ausfuhrung von patternbasierter Informationsextraktion ("Design and Implementation of a System for Distributed Pattern-Based Information Extraction")
Michail Melnikov built a system that executes complex Information Extraction patterns in a distributed environment. Together with the work from Oresti Konomi's bachelor thesis, this work has been published as an ACL 2013 demo.

Diplomarbeit: Automatisierte Extraktion von Zitaten und zugehörigen Themen aus Webdokumenten ("Automatic Extraction of Quotes and Speakers from Web Documents")
Philipp Keese built and evaluated an Information Extraction system that finds quotes and their speakers in Web documents.

Bachelor Thesis: Mining von Events in Twitter mit Fokus auf deutscher Politik ("Mining Twitter Events with Focus on German Politics")
Kfir Admoni built a system that continuously mines twitter for news related to German politics for 'hot topics'.

Bachelor Thesis: Gezieltes Retrieval von faktenstarken Sätzen im Web auf Basis von Wikipedia ("Targeted Retrieval of Sentences with High Information Content from the Web")
Do Tuan Ahn built a system that finds and retrieves sentences that contain factual data from the Web.

Bachelor Thesis: Generation and Evaluation of N-ary Extraction Patterns for Open Information Extraction

Stephan Pieper investigated machine learning approaches to automatically learn patterns for N-ary open information extraction.

Bachelor Thesis: Information Extraction von Zitaten in türkischsprachigen Quellen ("Quote-Extraction from Turkish Newswire Text")
Ahmet Karakas built a pipeline for extracting quotes and speakers from Turkish-language text. He will also conduct a survey of existing NLP resources for the Turkish language. The results of his work will be integrated into the QuoteMine project.

Bachelor Thesis: Extraktion von Relationen und Konzepten von komplexen Nominalphrasen ("Extraction of Relations and Concepts from Complex Noun Phrases")
Umar Maqsut extracted information from complex noun phrases and investigated when such phrases can be used as concepts in a knowledge base.

Bachelor Thesis: Implementierung und Evaluierung eines Verfahrens zur Erhöhung der Qualität der flachen Extraktion komplexer Nominalphrasen ("Using World Knowledge to find Complex Noun Phrases in Shallow Parsing")
Stefan Schramm investigated a number of 'world knowledge' features in a CRF classifier for finding complex noun phrases, something that can normally only be achieved using a deep syntactic parser. He trained a classifierusing different featuresets and evaluates the results.


My Master Thesis

Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns

This is my Master thesis, which I submitted in May 2009. I examine Relation Extraction as a method for automatically generating a semantically annotated wiki from the English Wikipedia. This work has given rise to many ideas I have since been exploring.

A picture of me should be here

Alan Akbik

Professor of Machine Learning
Humbold-Universität zu Berlin
alan [dot] akbik [ät] hu-berlin [dot] de