llm-algorithm-16Mqat

Language Modelsllm-16m-qat-runtime

Description

llm-algorithm-16Mqat

This task studies QAT method design for a 16M-scale autoregressive language model.

The environment is intentionally minimal:

  • a compact GPT-style model
  • FineWeb token shards prepared in the parameter-golf format
  • SentencePiece tokenizer-based val_bpb evaluation
  • optional weight-only QAT inserted through runtime/weight_quant.py

Training follows a two-stage workflow:

  1. train an FP checkpoint for 1 epoch
  2. finetune each quantized configuration from that FP checkpoint for 1 epoch

The current baselines compare:

  • no QAT
  • naive STE at 4/3/2 bits
  • RobuQ at 4/3/2 bits
  • LSQ at 4/3/2 bits
  • StableQAT at 4/3/2 bits

Primary metric:

  • final val_bpb

Code

Results

No results yet.