Agent Conversation: deepseek-reasoner
Task: cv-multitask-loss
CV Multi-Task Loss Combination Strategy Design
Research Question
Design a novel multi-task loss combination strategy for jointly training fine-grained (100-class) and coarse (20-superclass) classification on CIFAR-100 that maximizes fine-class test accuracy.
Background
CIFAR-100 contains 100 fine classes organized into 20 coarse superclasses. Training a model with two classification heads (fine + coarse) provides a natural multi-task learning setup where the coarse task acts as an auxiliary signal. The key challenge is how to combine the two losses effectively.
Classic approaches include:
- Equal weighting: Simply sum the losses (baseline default)
- Uncertainty weighting (Kendall et al., 2018): Learn task-specific uncertainty as log-variance parameters
- Dynamic Weight Average (Liu et al., 2019): Weight tasks by their relative loss change rate
- PCGrad (Yu et al., NeurIPS 2020): Project conflicting task gradients onto each other's normal plane to reduce gradient interference
The coarse labels encode semantic hierarchy. The task is to balance the auxiliary coarse signal against the primary fine-class objective across different architectures and training stages.
What You Can Modify
The MultiTaskLoss class (lines 195-216) in custom_mtl.py. This class receives individual task losses and must combine them into a single scalar loss.
You can modify:
- The
__init__method: add learnable parameters (log-variances, weights, etc.) - The
forwardmethod: implement any combination strategy - Use
epochandtotal_epochsfor curriculum/scheduling approaches - Add any auxiliary state (e.g., loss history buffers)
The forward method receives:
fine_loss: scalar tensor, cross-entropy for 100-class fine predictioncoarse_loss: scalar tensor, cross-entropy for 20-class coarse predictionepoch: int, current epoch (0-indexed)total_epochs: int, total number of training epochs
Note: The MultiTaskLoss parameters are included in the optimizer, so learnable parameters will be trained.
Evaluation
- Metric: Best fine-class test accuracy (%, higher is better)
- Architectures (all on CIFAR-100 with fine+coarse heads):
- ResNet-20 (shallow residual network)
- ResNet-56 (deeper residual network)
- VGG-16-BN (deep non-residual with BatchNorm) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
pytorch-vision/custom_mtl.py [EDITABLE — lines 195–216 only]
1: """CV Multi-Task Loss Benchmark.
2:
3: Train vision models (ResNet, VGG) on CIFAR-100 with TWO classification heads
4: (fine: 100 classes, coarse: 20 superclasses) to evaluate multi-task loss
5: combination strategies.
6:
7: FIXED: Model architectures, data pipeline, training loop.
8: EDITABLE: MultiTaskLoss class.
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_mtl.py",
4 "start_line": 195,
5 "end_line": 216,
6 "content": "class MultiTaskLoss(nn.Module):\n \"\"\"Hierarchical Adaptive Weighting (HAW): Uncertainty weighting with curriculum decay on coarse task.\n\n Learns per-task log-variances, and reduces coarse weight over epochs.\n Total = exp(-log_var_fine) * L_fine + exp(-log_var_coarse) * (1 - epoch/total_epochs)^β * L_coarse + log_var_fine + log_var_coarse\n \"\"\"\n\n def __init__(self, num_tasks=2):\n super().__init__()\n self.log_vars = nn.Parameter(torch.zeros(num_tasks))\n # beta controls decay strength; hardcoded\n self.beta = 2.0\n\n def forward(self, fine_loss, coarse_loss, epoch, total_epochs):\n fine_weight = torch.exp(-self.log_vars[0])\n coarse_weight = torch.exp(-self.log_vars[1])\n # curriculum decay factor for coarse task\n progress = epoch / total_epochs\n decay = (1.0 - progress) ** self.beta\n coarse_weight = coarse_weight * decay\n \n total = fine_weight * fine_loss + coarse_weight * coarse_loss + self.log_vars[0] + self.log_vars[1]\n return total"
7}OK: Replaced lines 195..216 with 23 line(s) in pytorch-vision/custom_mtl.py. Editable range: 195–217. [Current file: pytorch-vision/custom_mtl.py | editable: 195–217 | total: 388 lines] 192| # ============================================================================ 193| 194| # -- EDITABLE REGION START (lines 195-216) ------------------------------------ 195 class MultiTaskLoss(nn.Module): 196 """Hierarchical Adaptive Weighting (HAW): Uncertainty weighting with curriculum decay on coarse task. 197 198 Learns per-task log-variances, and reduces coarse weight over epochs. 199 Total = exp(-log_var_fine) * L_fine + exp(-log_var_coarse) * (1 - epoch/total_epochs)^β * L_coarse + log_var_fine + log_var_coarse 200 """ 201 202 def __init__(self, num_tasks=2): 203 super().__init__() 204 self.log_vars = nn.Parameter(torch.zeros(num_tasks)) 205 # beta controls decay strength; hardcoded 206
[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final) ### resnet20-cifar100mt (scripts/resnet20_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=-2.0740 train_acc=71.45 test_acc=61.86 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=-2.9604 train_acc=74.15 test_acc=63.35 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=-4.1847 train_acc=77.10 test_acc=64.40 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=-6.0121 train_acc=79.91 test_acc=65.63 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=-8.9071 train_acc=81.40 test_acc=66.54 lr=0.000000 Final metrics (resnet20-cifar100mt): test_acc_resnet20-cifar100mt=66.55 ### resnet56-cifar100mt (scripts/resnet56_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=-2.8977 train_acc=78.32 test_acc=65.40 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=-4.0024 train_acc=81.83 test_acc=67.64 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=-5.513
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_mtl.py",
4 "start_line": 195,
5 "end_line": 217,
6 "content": "class MultiTaskLoss(nn.Module):\n \"\"\"Hierarchical Adaptive Weighting with regularization (HAW-Reg).\n \n Uncertainty weighting with curriculum decay on coarse task, plus L2 penalty on log-variances.\n Total = exp(-log_var_fine) * L_fine + exp(-log_var_coarse) * (1 - epoch/total_epochs)^β * L_coarse \n + log_var_fine + log_var_coarse + λ * (log_var_fine^2 + log_var_coarse^2)\n \"\"\"\n\n def __init__(self, num_tasks=2):\n super().__init__()\n self.log_vars = nn.Parameter(torch.zeros(num_tasks))\n # beta controls decay strength; hardcoded\n self.beta = 2.0\n # regularization strength\n self.lambda_reg = 0.01\n\n def forward(self, fine_loss, coarse_loss, epoch, total_epochs):\n fine_weight = torch.exp(-self.log_vars[0])\n coarse_weight = torch.exp(-self.log_vars[1])\n # curriculum decay factor for coarse task\n progress = epoch / total_epochs\n decay = (1.0 - progress) ** self.beta\n coarse_weight = coarse_weight * decay\n \n total = (fine_weight * fine_loss + coarse_weight * coarse_loss + \n self.log_vars[0] + self.log_vars[1] +\n self.lambda_reg * (self.log_vars[0]**2 + self.log_vars[1]**2))\n return total"
7}OK: Replaced lines 195..217 with 28 line(s) in pytorch-vision/custom_mtl.py. Editable range: 195–222. [Current file: pytorch-vision/custom_mtl.py | editable: 195–222 | total: 393 lines] 192| # ============================================================================ 193| 194| # -- EDITABLE REGION START (lines 195-216) ------------------------------------ 195 class MultiTaskLoss(nn.Module): 196 """Hierarchical Adaptive Weighting with regularization (HAW-Reg). 197 198 Uncertainty weighting with curriculum decay on coarse task, plus L2 penalty on log-variances. 199 Total = exp(-log_var_fine) * L_fine + exp(-log_var_coarse) * (1 - epoch/total_epochs)^β * L_coarse 200 + log_var_fine + log_var_coarse + λ * (log_var_fine^2 + log_var_coarse^2) 201 """ 202 203 def __init__(self, num_tasks=2): 204 super().__init__() 205 self.log_vars = nn.Parameter(torch.zeros(num_tasks)) 206
[Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final) ### resnet20-cifar100mt (scripts/resnet20_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=-1.9509 train_acc=71.88 test_acc=61.86 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=-2.7809 train_acc=74.47 test_acc=64.64 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=-3.9269 train_acc=78.00 test_acc=64.98 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=-5.5384 train_acc=80.75 test_acc=66.63 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=-7.8515 train_acc=82.03 test_acc=67.05 lr=0.000000 Final metrics (resnet20-cifar100mt): test_acc_resnet20-cifar100mt=67.10 ### resnet56-cifar100mt (scripts/resnet56_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=-2.7788 train_acc=79.14 test_acc=66.01 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=-3.7797 train_acc=82.21 test_acc=67.59 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=-5.1890
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_mtl.py",
4 "start_line": 195,
5 "end_line": 222,
6 "content": "class MultiTaskLoss(nn.Module):\n \"\"\"Adaptive Task Weighting with Hierarchical Curriculum (ATWHC).\n \n Learn task weights via softmax over learned logits, apply curriculum decay to coarse weight.\n Total = w_fine * L_fine + w_coarse * (1 - epoch/total_epochs)^β * L_coarse.\n \"\"\"\n def __init__(self, num_tasks=2):\n super().__init__()\n self.logits = nn.Parameter(torch.zeros(num_tasks))\n self.beta = 2.0\n self.tau = 0.5 # temperature\n def forward(self, fine_loss, coarse_loss, epoch, total_epochs):\n weights = F.softmax(self.logits / self.tau, dim=0)\n fine_weight = weights[0]\n coarse_weight = weights[1] * (1.0 - epoch / total_epochs) ** self.beta\n return fine_weight * fine_loss + coarse_weight * coarse_loss"
7}OK: Replaced lines 195..222 with 16 line(s) in pytorch-vision/custom_mtl.py. Editable range: 195–210. [Current file: pytorch-vision/custom_mtl.py | editable: 195–210 | total: 381 lines] 192| # ============================================================================ 193| 194| # -- EDITABLE REGION START (lines 195-216) ------------------------------------ 195 class MultiTaskLoss(nn.Module): 196 """Adaptive Task Weighting with Hierarchical Curriculum (ATWHC). 197 198 Learn task weights via softmax over learned logits, apply curriculum decay to coarse weight. 199 Total = w_fine * L_fine + w_coarse * (1 - epoch/total_epochs)^β * L_coarse. 200 """ 201 def __init__(self, num_tasks=2): 202 super().__init__() 203 self.logits = nn.Parameter(torch.zeros(num_tasks)) 204 self.beta = 2.0 205 self.tau = 0.5 # temperature 206 def forward(self, fine_loss, coarse_loss, epoch, tota
[Test #3] (0 tests remaining; call submit(n=N) to choose which test result to submit as final) [NOTE] This was your last test. You MUST now call submit(n=X) to choose which test result to submit as your final answer. ### resnet20-cifar100mt (scripts/resnet20_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.0195 train_acc=26.51 test_acc=23.59 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.0108 train_acc=26.67 test_acc=23.86 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=0.0053 train_acc=26.63 test_acc=24.11 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=0.0019 train_acc=26.63 test_acc=24.04 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=0.0006 train_acc=26.63 test_acc=24.15 lr=0.000000 Final metrics (resnet20-cifar100mt): test_acc_resnet20-cifar100mt=24.69 ### resnet56-cifar100mt (scripts/resnet56_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.0077 train_acc=24.20 test_acc=21.72 lr=0.009549 TRAIN_METRIC
Tool Input
1{
2 "n": 1
3}Undo complete: Restored: pytorch-vision/custom_mtl.py
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_mtl.py",
4 "start_line": 195,
5 "end_line": 222,
6 "content": "class MultiTaskLoss(nn.Module):\n \"\"\"Hierarchical Adaptive Weighting with regularization (HAW-Reg).\n \n Uncertainty weighting with curriculum decay on coarse task, plus L2 penalty on log-variances.\n Total = exp(-log_var_fine) * L_fine + exp(-log_var_coarse) * (1 - epoch/total_epochs)^β * L_coarse \n + log_var_fine + log_var_coarse + λ * (log_var_fine^2 + log_var_coarse^2)\n \"\"\"\n\n def __init__(self, num_tasks=2):\n super().__init__()\n self.log_vars = nn.Parameter(torch.zeros(num_tasks))\n # beta controls decay strength; hardcoded\n self.beta = 2.0\n # regularization strength\n self.lambda_reg = 0.01\n\n def forward(self, fine_loss, coarse_loss, epoch, total_epochs):\n fine_weight = torch.exp(-self.log_vars[0])\n coarse_weight = torch.exp(-self.log_vars[1])\n # curriculum decay factor for coarse task\n progress = epoch / total_epochs\n decay = (1.0 - progress) ** self.beta\n coarse_weight = coarse_weight * decay\n \n total = (fine_weight * fine_loss + coarse_weight * coarse_loss + \n self.log_vars[0] + self.log_vars[1] +\n self.lambda_reg * (self.log_vars[0]**2 + self.log_vars[1]**2))\n return total"
7}OK: Replaced lines 195..222 with 28 line(s) in pytorch-vision/custom_mtl.py. Editable range: 195–222. [Current file: pytorch-vision/custom_mtl.py | editable: 195–222 | total: 393 lines] 192| # ============================================================================ 193| 194| # -- EDITABLE REGION START (lines 195-216) ------------------------------------ 195 class MultiTaskLoss(nn.Module): 196 """Hierarchical Adaptive Weighting with regularization (HAW-Reg). 197 198 Uncertainty weighting with curriculum decay on coarse task, plus L2 penalty on log-variances. 199 Total = exp(-log_var_fine) * L_fine + exp(-log_var_coarse) * (1 - epoch/total_epochs)^β * L_coarse 200 + log_var_fine + log_var_coarse + λ * (log_var_fine^2 + log_var_coarse^2) 201 """ 202 203 def __init__(self, num_tasks=2): 204 super().__init__() 205 self.log_vars = nn.Parameter(torch.zeros(num_tasks)) 206
Tool Input
1{
2 "n": 2
3}[submit] Submitting result from test #2 as final. ### resnet20-cifar100mt (scripts/resnet20_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=-1.9509 train_acc=71.88 test_acc=61.86 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=-2.7809 train_acc=74.47 test_acc=64.64 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=-3.9269 train_acc=78.00 test_acc=64.98 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=-5.5384 train_acc=80.75 test_acc=66.63 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=-7.8515 train_acc=82.03 test_acc=67.05 lr=0.000000 Final metrics (resnet20-cifar100mt): test_acc_resnet20-cifar100mt=67.10 ### resnet56-cifar100mt (scripts/resnet56_cifar100mt.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=-2.7788 train_acc=79.14 test_acc=66.01 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=-3.7797 train_acc=82.21 test_acc=67.59 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=-5.1890 train_acc=85.41 test_acc=68.50 lr=0.002447