NoiseBench

With NoiseBench, you can measure the robustness of your ML approach to real-world label noise!

NoiseBench

Machine learning (ML) requires labeled training examples. However, in real-world use cases, training data is often imperfect. Even well-known, expert-created dataset are known to have a significant percentage of incorrect labels. This poses a big challenge: ML models trained over imperfect data typically perform significantly worse than models trained over clean data.

With NoiseBench, we present a new benchmark for evaluating the noise-robustness of different learning approaches. Our benchmark focuses on the task of named entity recognition (NER) and includes different types of real noise such as:

real mistakes by expert labelers
mistakes by crowd workers (pictured above)
errors from distant and weak supervision
errors from LLMs

Use NoiseBench to evaluate the robustness of your learning approach to different types of real noise!

Getting Started

Check out our github repo!

Publications

NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition.Elena Merdjanovska, Ansar Aynetdinov and Alan Akbik. EMNLP 2024.

More research