Boldt
Boldt is a series of state-of-the-art German language models, trained at HU!
Boldt
Boldt is a series of German Language Models trained at HU Berlin. Our overarching goal is to create state-of-the-art language models for German - and in the future for other domains - with fewer computational and data resources.
Boldt was trained using a new paradigm in which we strictly filter available text for German to a highest-quality subset and then repeatedly train for many epochs over this high-signal data. We find that Boldt outperforms all other state-of-the-art LLMs in the 1B parameter range, including Gemma-3 and Llama-3.2:
We also release a suite of modernized German benchmarks to evaluate LLM performance. Our suite fixes errors in previous benchmarks to give more accurate readings of LLM abilities.
To get started:
- Check out all Boldt models on Huggingface!
- Check our paper for all details!
- Check our German benchmarks!