Büble-LM

Büble-LM is our new state-of-the-art 2 billion parameter language model (LM) for German!

Büble-LM

BübleLM is a state-of-the-art German language model based on Gemma-2B, adapted using trans-tokenization with a custom German SentencePiece tokenizer.

Büble significantly outperforms other German LMs like Sauerkraut-2B and LLäMmlein-1B on most benchmarks we tried. It was trained with a novel trans-tokenization approach by Pieter Delobelle when he was a guest researcher at our chair!

More details on this model coming soon!

Getting Started