NF4

The QLoRA paper of May 2023 introduced the 4 bit Normal Float (NF4). QLoRA stores model weights in NF4 and uses BF16 for computation.

NF4 is specifically adapted for weights that have been initialized using a normal distribution. Compressed NF4 values from training are usually decompressed to BF16 values for inference.

HuggingFace

Can be enabled with the load_in_4bit flag.

Was this page helpful?