NoiseBench
With NoiseBench, you can measure the robustness of your ML approach to real-world label noise!
NoiseBench
Machine learning (ML) requires labeled training examples. However, in real-world use cases, training data is often imperfect. Even well-known, expert-created dataset are known to have a significant percentage of incorrect labels. This poses a big challenge: ML models trained over imperfect data typically perform significantly worse than models trained over clean data.
With NoiseBench, we present a new benchmark for evaluating the noise-robustness of different learning approaches. Our benchmark focuses on the task of named entity recognition (NER) and includes different types of real noise such as:
- real mistakes by expert labelers
- mistakes by crowd workers (pictured above)
- errors from distant and weak supervision
- errors from LLMs
Use NoiseBench to evaluate the robustness of your learning approach to different types of real noise!
Getting Started
- Check out our github repo!
Publications
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition.Elena Merdjanovska, Ansar Aynetdinov and Alan Akbik. EMNLP 2024.