llm-algorithm-16Mqat

Language Modelsllm-16m-qat-runtime

Description

llm-algorithm-16Mqat

This task studies QAT method design for a 16M-scale autoregressive language model.

The environment is intentionally minimal:

a compact GPT-style model
FineWeb token shards prepared in the parameter-golf format
SentencePiece tokenizer-based val_bpb evaluation
optional weight-only QAT inserted through runtime/weight_quant.py

Training follows a two-stage workflow:

train an FP checkpoint for 1 epoch
finetune each quantized configuration from that FP checkpoint for 1 epoch

The current baselines compare:

no QAT
naive STE at 4/3/2 bits
RobuQ at 4/3/2 bits
LSQ at 4/3/2 bits
StableQAT at 4/3/2 bits

Primary metric:

final val_bpb

Code

Results

No results yet.