Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning. David Schulte, Felix Hamborg and Alan Akbik. The 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024.
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition. Elena Merdjanovska, Ansar Aynetdinov and Alan Akbik. The 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024. [pdf]
TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks. Lukas Garbas, Max Ploner and Alan Akbik. ArXiv, 2024. [pdf]
LM-Pub-Quiz: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models. Max Ploner, Jacek Wiland, Sebastian Pohl and Alan Akbik. ArXiv, 2024. [pdf]
Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions. Max Dallabetta, Conrad Dobberstein, Adrian Breiding and Alan Akbik. The 62nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ACL 2024. [pdf]
Choose Your Transformer: Improved Transferability Estimation of Transformer Models on Classification Tasks. Lukas Garbaciauskas, Max Ploner and Alan Akbik. The 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024. [pdf]
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs. Patrick Haller, Ansar Aynetdinov and Alan Akbik. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics: System Demonstrations, NAACL 2024. [pdf]
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models. Jacek Wiland, Max Ploner and Alan Akbik. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024.[pdf]
HunFlair2 in a cross-corpus evaluation of named entity recognition and normalization tools. Mario Sänger, Samuele Garda, Xing David Wang, Leon Weber-Genzel, Pia Droop, Benedikt Fuchs, Alan Akbik, Ulf Leser. Bioinformatics 2024. [pdf]
PECC: Problem Extraction and Coding Challenges. Patrick Haller, Jonas Golde and Alan Akbik. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, COLING-LREC 2024. [pdf]
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity. Ansar Aynetdinov and Alan Akbik. ArXiv, 2024. [pdf]
Large-Scale Label Interpretation Learning for Few-Shot Named Entity Recognition. Jonas Golde, Felix Hamborg and Alan Akbik. 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024. [pdf]
Parameter-Efficient Fine-Tuning: Is There An Optimal Subset of Parameters to Tune? Max Ploner and Alan Akbik. 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024. [pdf]
CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset. Susanna Rücker and Alan Akbik. The 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023. [pdf]
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs. Jonas Golde, Patrick Haller, Felix Hamborg, Julian Risch and Alan Akbik. The 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2023. [pdf]
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs. Patrick Haller, Ansar Aynetdinov and Alan Akbik. ArXiv, 2023. [pdf]
ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation. Marcel Milich and Alan Akbik. 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023. [pdf]
Task-Specific Embeddings for Ante-Hoc Explainable Text Classification. Kishaloy Halder, Josip Krapac, Alan Akbik, Anthony Brew, Matti Lyra. ArXiv, 2022[pdf]
Medical Coding with Biomedical Transformer Ensembles and Zero/Few-shot Learning. Angelo Ziletti, Alan Akbik, Christoph Berns, Thomas Herold, Marion Legler, Martina Viell. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, NAACL 2022.[pdf]
Early Detection of Sexual Predators in Chats. Matthias Vogt, Ulf Leser and Alan Akbik. Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021.[pdf]
HunFlair: An easy-to-use tool for state-of-the-art biomedical named entity recognition. Leon Weber, Mario Sänger, Jannes Münchmeyer, Maryam Habibi, Ulf Leser and Alan Akbik. Bioinformatics 2021 [pdf]
FLERT: Document-Level Features for Named Entity Recognition. Stefan Schweter and Alan Akbik. ArXiv. 2020 [pdf]
Task-Aware Representation of Sentences for Generic Text Classification. Kishaloy Halder, Alan Akbik, Josip Krapac and Roland Vollgraf. 28th International Conference on Computational Linguistics, COLING 2020. [pdf]
FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter and Roland Vollgraf. Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2019. [pdf]
Pooled Contextualized Embeddings for Named Entity Recognition. Alan Akbik, Tanja Bergmann and Roland Vollgraf. Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2019. [pdf]
Multilingual Sequence Labeling With One Model. Alan Akbik, Tanja Bergmann and Roland Vollgraf. Northern Lights Deep Learning Workshop, NLDL 2019. [pdf]
Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe and Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018. [pdf]
ZAP: An Open-Source Multilingual Annotation Projection Framework. Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]
FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German. Leonidas Lefakis, Alan Akbik and Roland Vollgraf. 11th Language Resources and Evaluation Conference, LREC 2018. [pdf]
The Projector: An Interactive Annotation Projection Visualization Tool. Alan Akbik and Roland Vollgraf. 2017 Conference on Empirical Methods on Natural Language Processing, EMNLP 2017. [pdf][video]
CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles. Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, Anbang Xu. 2017 Conference on Empirical Methods on Natural Language Processing, EMNLP 2017. [pdf]
Multilingual Information Extraction with PolyglotIE. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yonas Kbrom, Yunyao Li and Huaiyu Zhu. 26th International Conference on Computational Linguistics, COLING 2016. [pdf][video]
K-SRL: Instance-based Learning for Semantic Role Labeling. Alan Akbik and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]
Multilingual Aliasing for Auto-Generating Proposition Banks. Alan Akbik, Xinyu Guan and Yunyao Li. 26th International Conference on Computational Linguistics, COLING 2016. [pdf]
Improving Data Quality by Leveraging Statistical Relational Learning. Larysa Visengeriyeva, Alan Akbik and Manohar Kaul. 21st International Conference on Information Quality, ICIQ 2016.
Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik, Vishwajeet Kumar and Yunyao Li. 2016 Conference on Empirical Methods on Natural Language Processing, EMNLP 2016.[pdf]
Polyglot: Multilingual Semantic Role Labeling with Unified Labels. Alan Akbik and Yunyao Li. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016. [pdf]
Exploratory Relation Extraction in Large Multilingual Data. Alan Akbik. PhD Thesis.
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015. [pdf]
SCHNÄPPER: A Web Toolkit for Exploratory Relation Extraction. Thilo Michael and Alan Akbik. 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015. [pdf]
Proceedings of the First AHA!-Workshop on Information Discovery in Text. Alan Akbik and Larysa Visengeriyeva. 25th International Conference on Computational Linguistics, COLING 2014. [pdf]
Extracting a Repository of Events and Event References from News Clusters. Silvia Julinda, Christoph Boden and Alan Akbik. AHA! Workshop on Information Discovery in Text, COLING 2014. [pdf]
Nerdle: Topic-Specific Question Answering Using Wikia Seeds. Umar Maqsud, Sebastian Arnold, Michael Hülfenhaus and Alan Akbik. 25th International Conference on Computational Linguistics, COLING 2014. [pdf]
Exploratory Relation Extraction from Large Text Corpora. Alan Akbik, Thilo Michael and Christoph Boden. 25th International Conference on Computational Linguistics, COLING 2014. [pdf]
The Weltmodell: A Data-Driven Commonsense Knowledge Base.
Alan Akbik and Thilo Michael.
9th Edition of the Language Resources and Evaluation Conference, LREC
2014.
[pdf]
Freepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction. Johannes Kirschnick, Alan Akbik, Holmer Hemsen. 9th Edition of the Language Resources and Evaluation Conference, LREC 2014. [pdf]
Effective Selectional Restrictions for Unsupervised Relation Extraction. Alan Akbik, Larysa Visengeriyeva, Johannes Kirschnick, Alexander Löser. 6th International Joint Conference on Natural Language Processing, IJCNLP 2013. [pdf]
Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency
Trees.
Alan Akbik, Oresti Konomi and Michail Melnikov
The 51st Annual Meeting of the Association for Computational Linguistics,
ACL 2013.
[pdf]
Automatic Preservation Watch using Information Extraction on the Web. Luis Faria, Alan Akbik, Barbara Sierman, Marcel Ras, Miguel Ferreira and Jose Carlos Ramalho. 10th International Conference on Preservation of Digital Objects, iPres 2013. [pdf]
QuoteMine: A Repository of Newsworthy Quotes. Alan Akbik, Martin Schenck. International Conference of the German Society for Computational Linguistics and Language Technology, GSCL 2013. [pdf]
Unsupervised Discovery of Relations and Discriminative Extraction Patterns.
Alan Akbik, Larysa Visengeriyeva, Priska Herger, Holmer Hemsen, Alexander Löser
24th International Conference on Computational Linguistics, COLING
2012.
[pdf]
KrakeN: N-ary Facts in Open Information Extraction.
Alan Akbik, Alexander Löser
The Knowledge Extraction Workshop at NAACL-HLT, 2012.
[pdf]
Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns. Alan Akbik, Jürgen Broß. Workshop on Semantic Search, WWW 2009. [paper][video]
Master Thesis: Extracting a Repository of Events and Event References from News Clusters
Silvia Julinda created an approach for mining events and their textual representations from
news clusters on the Web. Her thesis work led to a publication at the AHA! Workshop on Information
Discovery at COLING 2014.
Priska Herger used clustering methods on large corpora of text to determine broad relationship types that hold between nouns, such as hypernymy, meronymy, co-hypernymy and others.
Diplomarbeit: Automatisierte Extraktion textueller Änderungen aus dem Bearbeitungsverlauf
von Online-Nachrichtenartikeln
("Extracting Microedits from Online News Articles")
Christian Niedrich mined online news for 'microedits', i.e. small changes that
are made to online news articles after they are online. He is developing a method that
automatically constructs a corpus of such edits.
Bachelor Thesis: A Workflow for Defining Information Extraction Patterns Oresti Konomi designed a workflow and implemented a tool for defining Information Extraction patterns addressed at users without a background in NLP. Together with the work from Michail Melnikov's bachelor thesis, this work has been published as an ACL 2013 demo.
Bachelor Thesis: Design und Umsetzung eines Systems zur verteilten Ausfuhrung von
patternbasierter Informationsextraktion
("Design and Implementation of a System for Distributed Pattern-Based
Information Extraction")
Michail Melnikov built a system that executes complex Information Extraction patterns in a
distributed environment.
Together with the work from Oresti Konomi's bachelor thesis, this work has been published as an ACL
2013
demo.
Diplomarbeit: Automatisierte Extraktion von Zitaten und zugehörigen Themen aus
Webdokumenten
("Automatic Extraction of Quotes and Speakers from Web
Documents")
Philipp Keese built and evaluated an Information Extraction system that finds quotes and
their speakers in Web documents.
Bachelor Thesis: Mining von Events in Twitter mit Fokus auf deutscher Politik
("Mining Twitter Events with Focus on German Politics")
Kfir Admoni built a system that continuously mines twitter for news related
to German politics for 'hot topics'.
Bachelor Thesis: Gezieltes Retrieval von faktenstarken Sätzen im Web auf Basis von
Wikipedia
("Targeted Retrieval of Sentences with High Information Content from the
Web")
Do Tuan Ahn built a system that finds and retrieves sentences that contain
factual data from the Web.
Stephan Pieper investigated machine learning approaches to automatically learn patterns for N-ary open information extraction.
Bachelor Thesis: Information Extraction von Zitaten in türkischsprachigen Quellen
("Quote-Extraction from Turkish Newswire Text")
Ahmet Karakas built a pipeline for extracting quotes and speakers from Turkish-language text.
He will also conduct a survey of existing NLP resources for the Turkish language.
The results of his work will be integrated into the QuoteMine project.
Bachelor Thesis: Extraktion von Relationen und Konzepten von komplexen Nominalphrasen
("Extraction of Relations and Concepts from Complex Noun
Phrases")
Umar Maqsut extracted information from complex noun phrases and investigated when such
phrases can be used as concepts in a knowledge base.
Bachelor Thesis: Implementierung und Evaluierung eines Verfahrens zur Erhöhung der
Qualität der flachen Extraktion komplexer Nominalphrasen
("Using World Knowledge to find Complex Noun Phrases in Shallow Parsing")
Stefan Schramm investigated a number of 'world knowledge' features in a CRF
classifier for finding complex noun phrases, something that can normally only be achieved using a
deep syntactic parser.
He trained a classifierusing different featuresets and evaluates the results.
This is my Master thesis, which I submitted in May 2009. I examine Relation Extraction as a method
for
automatically generating a semantically annotated wiki from the English Wikipedia. This work has
given
rise to many ideas I have since been exploring.
Professor of Machine Learning
Humbold-Universität zu Berlin
alan [dot] akbik [ät] hu-berlin [dot] de