NF4
The QLoRA paper of May 2023 introduced the 4 bit Normal Float (NF4). QLoRA stores model weights in NF4 and uses BF16 for computation.
NF4 is specifically adapted for weights that have been initialized using a normal distribution. Compressed NF4 values from training are usually decompressed to BF16 values for inference.
HuggingFace
Can be enabled with the load_in_4bit flag.