Is there a library to distill bigger models into BitNet?

timschmidt · on April 17, 2025

I could be wrong, but my understanding is that bitnet models have to be trained that way.

babelfish · on April 17, 2025

They don't have to be trained that way! The training data for 1-bit LLMs is the same as for any other LLM. A common way to generate this data is called 'model distillation', where you take completions from a teacher model and use them to train the child model (what you're describing)!

timschmidt · on April 17, 2025

Maybe I wasn't clear, I think you've misunderstood me. I understand that all sorts of LLMs can be trained using a common corpus of data. But my understanding is that the choice of creating a bitnet LLM must be made at training time, as modifications to the training algorithms are required. In other words, an existing FP16 model cannot be quantized to bitnet.

babelfish · on April 18, 2025

Ah yes, definitely misunderstood you, my bad