cv-meanflow-perceptual-loss

Computer Visionalphaflow-mainrigorous codebase

Description

Flow Matching with Perceptual Loss

Background

Flow matching trains a neural network to predict velocity fields that transport samples from noise to data. Traditional training uses only MSE loss on the predicted velocity:

loss = ||v_pred - v_target||^2

However, we can also compute the denoised image from the predicted velocity:

x_denoised = x_t - t * v_pred

And apply perceptual losses (LPIPS, gradient loss, etc.) on x_denoised to encourage the network to generate high-quality images, not just accurate velocities.

Research Question

Can adding perceptual losses to flow matching training improve FID scores?

Task

You are given custom_train_perceptual.py, a self-contained training script that trains a small DiT on CIFAR-10 (32x32) using flow matching with mean velocity objectives.

The editable region contains the loss computation in the training loop:

# Current: MSE loss only
loss_mse = ((pred_mean_vel - mean_vel_target) ** 2).mean()
loss = loss_mse

The fixed code already exposes:

  • lpips_fn(x_denoised, x_target) - perceptual loss
  • compute_gradient_loss(x_denoised, x_target) - gradient-domain loss
  • compute_multiscale_loss(x_denoised, x_target) - multi-resolution loss

Key constraint: Only apply auxiliary losses when t > 0.1 to avoid instability at small noise levels.

Evaluation

  • Dataset: CIFAR-10 (32x32)
  • Model: SmallDiT (512 hidden, 8 layers, ~40M params)
  • Training: 10000 steps, batch size 128
  • Metric: FID (lower is better), computed with clean-fid against CIFAR-10 train set
  • Inference: 10-step Euler sampler

Baselines

  1. mse-only: Pure MSE loss on velocity
  2. mse-lpips: MSE + LPIPS perceptual loss (VGG features)
  3. mse-lpips-grad: MSE + LPIPS + Gradient loss with timestep-adaptive weighting

Code

custom_train_perceptual.py
EditableRead-only
1"""Custom Flow Matching Training Script — Perceptual Loss Variant
2Small-scale flow matching training on CIFAR-10 with a lightweight DiT.
3The training objective (MeanFlow) is pre-implemented; your task is to
4design an improved loss function, optionally using perceptual losses.
5"""
6
7import math
8import os
9import time
10
11import lpips
12import numpy as np
13import torch
14import torch.nn as nn
15import torch.nn.functional as F

Additional context files (read-only):

  • alphaflow-main/perceptual_utils.py

Results

ModelTypebest fid small best fid medium best fid large
lpips_gradbaseline17.79017.19014.490
lpips_spectralbaseline17.38015.82013.630
mse_basebaseline22.33021.910N/A
anthropic/claude-opus-4.6vanilla20.61016.830-
deepseek-reasonervanilla26.94026.610-
google/gemini-3.1-pro-previewvanilla20.98020.720-
qwen/qwen3.6-plusvanilla-16.520-
anthropic/claude-opus-4.6agent19.88015.05014.000
deepseek-reasoneragent20.81020.16019.160
google/gemini-3.1-pro-previewagent20.98020.72018.630
qwen/qwen3.6-plusagentN/A16.520N/A

Agent Conversations