llm-algorithm-16Mqat
Language Modelsllm-16m-qat-runtime
Description
llm-algorithm-16Mqat
This task studies QAT method design for a 16M-scale autoregressive language model.
The environment is intentionally minimal:
- a compact GPT-style model
- FineWeb token shards prepared in the
parameter-golfformat - SentencePiece tokenizer-based
val_bpbevaluation - optional weight-only QAT inserted through
runtime/weight_quant.py
Training follows a two-stage workflow:
- train an FP checkpoint for 1 epoch
- finetune each quantized configuration from that FP checkpoint for 1 epoch
The current baselines compare:
- no QAT
- naive STE at 4/3/2 bits
- RobuQ at 4/3/2 bits
- LSQ at 4/3/2 bits
- StableQAT at 4/3/2 bits
Primary metric:
- final
val_bpb
Code
Results
No results yet.