Agent Conversation: openai/gpt-5.4-pro

Model: openai/gpt-5.4-pro
Total steps: 6
Messages: 13
UserStep 0

Task: ts-imputation

Time Series Imputation: Custom Model Design

Objective

Design and implement a custom deep learning model for time series missing value imputation. Your code goes in the Model class in models/Custom.py. Three reference implementations (DLinear, TimesNet, PatchTST) are provided as read-only.

Evaluation

Trained and evaluated on three multivariate datasets with 25% random masking:

  • ETTh1 (7 variables)
  • Weather (21 variables)
  • ECL (321 variables)

All use seq_len=96. Metrics: MSE and MAE on masked regions only (lower is better).

Time-Series-Library/models/Custom.py [EDITABLE — entire file only]

     1: import torch
     2: import torch.nn as nn
     3: 
     4: 
     5: class Model(nn.Module):
     6:     """
     7:     Custom model for time series imputation.
     8: 
     9:     Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)
    10:     - x_enc: [batch, seq_len, enc_in] — input with masked values set to 0
    11:     - x_mark_enc: [batch, seq_len, time_features] — time feature encoding
    12:     - x_dec: not used for imputation (None)
    13:     - x_mark_dec: not used for imputation (None)
    14:     - mask: [batch, seq_len, enc_in] — binary mask (1=observed, 0=masked)
    15: 
    16:     Must return: [batch, seq_len, enc_in] — reconstructed sequence
    17:     """
    18: 
    19:     def __init__(self, configs):
    20:         super(Model, self).__init__()
    21:         self.task_name = configs.task_name
    22:         self.seq_len = configs.seq_len
    23:         self.pred_len = configs.seq_len  # For imputation, pred_len = seq_len
    24:         self.enc_in = configs.enc_in
    25:         # TODO: Define your model architecture here
    26: 
    27:     def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):
    28:         """
    29:         Imputation: reconstruct missing values in the input sequence.
    30:         Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions
    31:         Mask: [batch, seq_len, enc_in], 1=observed, 0=masked
    32:         Output: [batch, seq_len, enc_in]
    33:         """
    34:         # TODO: Implement your imputation logic
    35:         return x_enc  # Placeholder: return input as-is
    36: 
    37:     def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
    38:         if self.task_name == 'imputation':
    39:             dec_out = self.imputation(x_enc, x_mark_enc, x_dec, x_mark_dec, mask)
    40:             return dec_out
    41:         return None

Time-Series-Library/models/DLinear.py [READ-ONLY — do not edit]

     1: import torch
     2: import torch.nn as nn
     3: import torch.nn.functional as F
     4: from layers.Autoformer_EncDec import series_decomp
     5: 
     6: 
     7: class Model(nn.Module):
     8:     """
     9:     Paper link: https://arxiv.org/pdf/2205.13504.pdf
    10:     """
    11: 
    12:     def __ini
AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "Time-Series-Library/models/Custom.py",
4  "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer\nfrom layers.Autoformer_EncDec import Encoder, EncoderLayer\nfrom layers.Embed import DataEmbedding, TokenEmbedding\n\n\nclass Model(nn.Module):\n    \"\"\"\n    Mask-Adaptive Expert Imputation with Innovation Propagation.\n\n    Key idea:\n    1) Build a strong nonparametric prior from observed values only using an\n       adaptive mixture of interpolation experts.\n    2) Predict only the residual innovation around that prior, instead of the\n       whole signal, using a periodic encoder and a low-rank channel adapter.\n    \"\"\"\n\n    def __init__(self, configs):\n        super(Model, self).__init__()\n        self.task_name = configs.task_name\n        self.seq_len = configs.seq_len\n        self.pred_len = configs.seq_len  # For imputation, pred_len = seq_len\n        self.enc_in = configs.enc_in\n        self.d_model = configs.d_model\n        self.dropout = configs.dropout\n        self.factor = configs.factor\n        self.n_heads = configs.n_heads\n        self.e_layers = configs.e_layers\n        self.d_ff = configs.d_ff\n        self.activation = configs.activation\n        self.embed = configs.embed\n        self.freq = configs.freq\n        self.moving_avg = getattr(configs, 'moving_avg', 25)\n        self.eps = 1e-5\n\n        # Adaptive interpolation experts: linear interpolation + multi-scale\n        # mask-normalized smoothing + zero-mean fallback.\n        self.kernel_sizes = [3, 7, 15, 31]\n        self.num_gate_features = len(self.kernel_sizes) + 3  # mask, span, one-sided + densities\n        self.num_experts = 1 + len(self.kernel_sizes) + 1\n\n        self.prior_gate = nn.Sequential(\n            nn.Linear(self.num_gate_features, 32),\n            nn.GELU(),\n            nn.Linear(32, self.num_experts)\n        )\n\n        # Residual branch on innovations around the prior.\n        self.value_embedding = DataEmbedding(\n            self.enc_in, self.d_model, self.embed, self.freq, self.dropout\n        )\n        self.mask_embedding = TokenEmbedding(self.enc_in, self.d_model)\n        self.uncertainty_embedding = TokenEmbedding(self.enc_in, self.d_model)\n\n        attn_layers = [\n            EncoderLayer(\n                AutoCorrelationLayer(\n                    AutoCorrelation(\n                        mask_flag=False,\n                        factor=self.factor,\n                        attention_dropout=self.dropout,\n                        output_attention=False,\n                    ),\n                    self.d_model,\n                    self.n_heads,\n                ),\n                self.d_model,\n                self.d_ff,\n                moving_avg=self.moving_avg,\n                dropout=self.dropout,\n                activation=self.activation,\n            )\n            for _ in range(self.e_layers)\n        ]\n        self.encoder = Encoder(attn_layers, norm_layer=nn.LayerNorm(self.d_model))\n        self.delta_proj = nn.Linear(self.d_model, self.enc_in)\n\n        # Cross-variable residual propagation at each timestamp.\n        self.channel_rank = min(64, max(4, min(self.enc_in, self.d_model) // 2))\n        self.channel_norm = nn.LayerNorm(self.enc_in)\n        self.channel_down = nn.Linear(self.enc_in, self.channel_rank, bias=False)\n        self.channel_up = nn.Linear(self.channel_rank, self.enc_in, bias=False)\n\n        # Confidence gate decides how much residual correction to apply on top\n        # of the interpolation prior.\n        self.correction_gate = nn.Sequential(\n            nn.Linear(self.num_gate_features, 16),\n            nn.GELU(),\n            nn.Linear(16, 1)\n        )\n\n        self._reset_parameters()\n\n    def _reset_parameters(self):\n        # Prefer interpolation-like experts before learning starts.\n        with torch.no_grad():\n            expert_bias = torch.tensor([2.0, 1.0, 0.5, 0.0, -0.5, -1.0], dtype=torch.float32)\n            self.prior_gate[-1].bias.copy_(expert_bias)\n            self.correction_gate[-1].bias.fill_(-0.5)\n\n        nn.init.normal_(self.delta_proj.weight, mean=0.0, std=1e-3)\n        nn.init.zeros_(self.delta_proj.bias)\n        nn.init.normal_(self.channel_up.weight, mean=0.0, std=1e-3)\n\n    def _masked_stats(self, x, mask):\n        obs_count = mask.sum(dim=1, keepdim=True).clamp_min(1.0)\n        mean = (x * mask).sum(dim=1, keepdim=True) / obs_count\n        centered = (x - mean) * mask\n        stdev = torch.sqrt((centered * centered).sum(dim=1, keepdim=True) / obs_count + self.eps)\n        return mean.detach(), stdev.detach()\n\n    def _masked_smooth(self, x, mask, kernel_size):\n        pad = kernel_size // 2\n        x_t = x.transpose(1, 2)\n        m_t = mask.transpose(1, 2)\n\n        weight = torch.ones(self.enc_in, 1, kernel_size, device=x.device, dtype=x.dtype)\n        num = F.conv1d(F.pad(x_t, (pad, pad)), weight, groups=self.enc_in)\n        den = F.conv1d(F.pad(m_t, (pad, pad)), weight, groups=self.enc_in)\n\n        avg = (num / den.clamp_min(self.eps)).transpose(1, 2)\n        density = (den / float(kernel_size)).clamp(0.0, 1.0).transpose(1, 2)\n        return avg, density\n\n    def _linear_interpolate(self, x, mask):\n        bsz, seq_len, channels = x.shape\n        large = float(seq_len + 1)\n\n        f_val = torch.zeros_like(x)\n        b_val = torch.zeros_like(x)\n        f_dist = torch.full_like(x, large)\n        b_dist = torch.full_like(x, large)\n\n        last_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n        last_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n        for t in range(seq_len):\n            obs = mask[:, t, :] > 0.5\n            last_val = torch.where(obs, x[:, t, :], last_val)\n            last_dist = torch.where(obs, torch.zeros_like(last_dist), last_dist + 1.0)\n            f_val[:, t, :] = last_val\n            f_dist[:, t, :] = last_dist\n\n        next_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n        next_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n        for t in range(seq_len - 1, -1, -1):\n            obs = mask[:, t, :] > 0.5\n            next_val = torch.where(obs, x[:, t, :], next_val)\n            next_dist = torch.where(obs, torch.zeros_like(next_dist), next_dist + 1.0)\n            b_val[:, t, :] = next_val\n            b_dist[:, t, :] = next_dist\n\n        f_valid = (f_dist < large).float()\n        b_valid = (b_dist < large).float()\n        both_valid = (f_valid * b_valid) > 0.5\n        only_f = (f_valid > 0.5) & (b_valid < 0.5)\n        only_b = (b_valid > 0.5) & (f_valid < 0.5)\n\n        interp = torch.zeros_like(x)\n        interp = torch.where(\n            both_valid,\n            (b_dist * f_val + f_dist * b_val) / (f_dist + b_dist).clamp_min(self.eps),\n            interp,\n        )\n        interp = torch.where(only_f, f_val, interp)\n        interp = torch.where(only_b, b_val, interp)\n        interp = torch.where(mask > 0.5, x, interp)\n\n        span = torch.where(\n            both_valid,\n            (f_dist + b_dist) / float(seq_len),\n            torch.ones_like(x),\n        )\n        span = torch.where(mask > 0.5, torch.zeros_like(span), span)\n        one_sided = ((f_valid + b_valid) < 1.5).float()\n        one_sided = torch.where(mask > 0.5, torch.zeros_like(one_sided), one_sided)\n        return interp, span.clamp(0.0, 1.0), one_sided\n\n    def _build_prior(self, x, mask):\n        interp, span, one_sided = self._linear_interpolate(x, mask)\n\n        experts = [interp]\n        densities = []\n        for kernel_size in self.kernel_sizes:\n            smooth, density = self._masked_smooth(x, mask, kernel_size)\n            experts.append(smooth)\n            densities.append(density)\n        experts.append(torch.zeros_like(x))\n\n        expert_stack = torch.stack(experts, dim=-1)\n        gate_features = torch.stack([mask, span, one_sided] + densities, dim=-1)\n        prior_weights = torch.softmax(self.prior_gate(gate_features), dim=-1)\n        prior = torch.sum(prior_weights * expert_stack, dim=-1)\n\n        density_stack = torch.stack(densities, dim=-1)\n        confidence = 0.5 * (1.0 - span) + 0.5 * density_stack.max(dim=-1).values\n        confidence = confidence.clamp(0.0, 1.0)\n        uncertainty = 1.0 - confidence\n        return prior, gate_features, uncertainty\n\n    def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):\n        \"\"\"\n        Imputation: reconstruct missing values in the input sequence.\n        Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions\n        Mask: [batch, seq_len, enc_in], 1=observed, 0=masked\n        Output: [batch, seq_len, enc_in]\n        \"\"\"\n        if mask is None:\n            mask = torch.ones_like(x_enc)\n        mask = mask.float()\n\n        means, stdev = self._masked_stats(x_enc, mask)\n        x_norm = ((x_enc - means) / stdev) * mask\n\n        prior, gate_features, uncertainty = self._build_prior(x_norm, mask)\n        filled = mask * x_norm + (1.0 - mask) * prior\n\n        # Innovation around the interpolation prior, only where we actually have\n        # observations. This turns the learning problem into propagating\n        # observed deviations to masked locations.\n        innovation = mask * (x_norm - prior)\n\n        enc_in = self.value_embedding(innovation, x_mark_enc)\n        enc_in = enc_in + self.mask_embedding(mask) + self.uncertainty_embedding(uncertainty)\n        enc_out, _ = self.encoder(enc_in, attn_mask=None)\n        delta_context = self.delta_proj(enc_out)\n\n        delta_channel = self.channel_up(self.channel_down(self.channel_norm(innovation)))\n\n        correction_gain = torch.sigmoid(self.correction_gate(gate_features)).squeeze(-1)\n        delta = correction_gain * (delta_context + delta_channel)\n\n        pred_norm = filled + (1.0 - mask) * delta\n        pred = pred_norm * stdev + means\n\n        # Keep observed values untouched and only impute masked entries.\n        pred = mask * x_enc + (1.0 - mask) * pred\n        return pred\n\n    def forward(self, x_enc, x_mark_e\n... (truncated)",
5  "after_line": 1,
6  "start_line": 1,
7  "end_line": 40
8}
Tool ResultStep 1
OK: Replaced lines 1..40 with 246 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.

[Current file: Time-Series-Library/models/Custom.py | editable: 1–247 | total: 247 lines]
     1  import torch
     2  import torch.nn as nn
     3  import torch.nn.functional as F
     4  
     5  from layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer
     6  from layers.Autoformer_EncDec import Encoder, EncoderLayer
     7  from layers.Embed import DataEmbedding, TokenEmbedding
     8  
     9  
    10  class Model(nn.Module):
    11      """
    12      Mask-Adaptive Expert Imputation with Innovation Propagation.
    13  
    14      Key idea:
    15      1) Build a strong nonparametric prior from observed values only using an
    16         adaptive mixture of interpolation experts.
    17      2) Predict only the residual innovation around that prior, instead of the
    18         whole signal, using a periodic encoder and a low-rank channel adapter.
 
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)

### ETTh1 (scripts/etth1.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=6 train_loss=0.0331812 vali_loss=0.0512693 test_loss=0.0424405
TRAIN_METRICS epoch=7 train_loss=0.0327520 vali_loss=0.0508572 test_loss=0.0422574
TRAIN_METRICS epoch=8 train_loss=0.0322304 vali_loss=0.0506850 test_loss=0.0421985
TRAIN_METRICS epoch=9 train_loss=0.0326539 vali_loss=0.0502444 test_loss=0.0418822
TRAIN_METRICS epoch=10 train_loss=0.0322635 vali_loss=0.0501821 test_loss=0.0422133
Test results (ETTh1):
  MSE: 0.042496, MAE: 0.137878

### Weather (scripts/weather.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=6 train_loss=0.1353455 vali_loss=0.0657196 test_loss=0.0257231
TRAIN_METRICS epoch=7 train_loss=0.1266806 vali_loss=0.0627417 test_loss=0.0258923
TRAIN_METRICS epoch=8 train_loss=0.1314173 vali_loss=0.0591142 test_loss=0.0258205
TRAIN_METRICS epoch=9 train_loss=0.1255150 vali_loss=0.0675052 test_loss=0.0259505
T
AssistantStep 3edit
Tool Input
1{
2  "op": "replace",
3  "filename": "Time-Series-Library/models/Custom.py",
4  "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer\nfrom layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp\nfrom layers.Embed import DataEmbedding, TokenEmbedding\n\n\nclass Model(nn.Module):\n    \"\"\"\n    Mask-Adaptive Expert Imputation with Innovation Propagation.\n\n    Key idea:\n    1) Build a strong nonparametric prior from observed values only using an\n       adaptive mixture of interpolation experts.\n    2) Predict only the residual innovation around that prior, instead of the\n       whole signal, using a periodic encoder and low-rank temporal/channel\n       propagation branches.\n    \"\"\"\n\n    def __init__(self, configs):\n        super(Model, self).__init__()\n        self.task_name = configs.task_name\n        self.seq_len = configs.seq_len\n        self.pred_len = configs.seq_len  # For imputation, pred_len = seq_len\n        self.enc_in = configs.enc_in\n        self.d_model = configs.d_model\n        self.dropout = configs.dropout\n        self.factor = configs.factor\n        self.n_heads = configs.n_heads\n        self.e_layers = configs.e_layers\n        self.d_ff = configs.d_ff\n        self.activation = configs.activation\n        self.embed = configs.embed\n        self.freq = configs.freq\n        self.moving_avg = getattr(configs, 'moving_avg', 25)\n        if self.moving_avg % 2 == 0:\n            self.moving_avg += 1\n        self.eps = 1e-5\n\n        # Adaptive interpolation experts: linear interpolation + multi-scale\n        # mask-normalized smoothing + zero-mean fallback.\n        self.kernel_sizes = [3, 7, 15, 31]\n        self.num_gate_features = len(self.kernel_sizes) + 3  # mask, span, one-sided + densities\n        self.num_experts = 1 + len(self.kernel_sizes) + 1\n\n        self.prior_gate = nn.Sequential(\n            nn.Linear(self.num_gate_features, 32),\n            nn.GELU(),\n            nn.Linear(32, self.num_experts)\n        )\n\n        # Residual branch on innovations around the prior.\n        self.value_embedding = DataEmbedding(\n            self.enc_in, self.d_model, self.embed, self.freq, self.dropout\n        )\n        self.mask_embedding = TokenEmbedding(self.enc_in, self.d_model)\n        self.uncertainty_embedding = TokenEmbedding(self.enc_in, self.d_model)\n\n        attn_layers = [\n            EncoderLayer(\n                AutoCorrelationLayer(\n                    AutoCorrelation(\n                        mask_flag=False,\n                        factor=self.factor,\n                        attention_dropout=self.dropout,\n                        output_attention=False,\n                    ),\n                    self.d_model,\n                    self.n_heads,\n                ),\n                self.d_model,\n                self.d_ff,\n                moving_avg=self.moving_avg,\n                dropout=self.dropout,\n                activation=self.activation,\n            )\n            for _ in range(self.e_layers)\n        ]\n        self.encoder = Encoder(attn_layers, norm_layer=nn.LayerNorm(self.d_model))\n        self.delta_proj = nn.Linear(self.d_model, self.enc_in)\n\n        # Temporal propagation of observed innovations (DLinear-style but on the\n        # residual around the interpolation prior).\n        self.temporal_decomp = series_decomp(self.moving_avg)\n        self.linear_seasonal = nn.Linear(self.seq_len, self.seq_len)\n        self.linear_trend = nn.Linear(self.seq_len, self.seq_len)\n\n        # Cross-variable factor propagation at each timestamp.\n        self.channel_rank = min(64, max(4, min(self.enc_in, self.d_model) // 2))\n        self.channel_norm = nn.LayerNorm(self.enc_in)\n        self.channel_down = nn.Linear(self.enc_in, self.channel_rank, bias=False)\n        self.channel_up = nn.Linear(self.channel_rank, self.enc_in, bias=False)\n\n        # Confidence gate decides how much residual correction to apply on top\n        # of the interpolation prior.\n        self.correction_gate = nn.Sequential(\n            nn.Linear(self.num_gate_features, 16),\n            nn.GELU(),\n            nn.Linear(16, 1)\n        )\n\n        self._reset_parameters()\n\n    def _reset_parameters(self):\n        # Prefer interpolation-like experts before learning starts.\n        with torch.no_grad():\n            expert_bias = torch.tensor([2.0, 1.0, 0.5, 0.0, -0.5, -1.0], dtype=torch.float32)\n            self.prior_gate[-1].bias.copy_(expert_bias)\n            self.correction_gate[-1].bias.fill_(-0.25)\n\n            avg_init = (1.0 / self.seq_len) * torch.ones(self.seq_len, self.seq_len)\n            self.linear_seasonal.weight.copy_(avg_init)\n            self.linear_trend.weight.copy_(avg_init)\n            self.linear_seasonal.bias.zero_()\n            self.linear_trend.bias.zero_()\n\n        nn.init.normal_(self.delta_proj.weight, mean=0.0, std=1e-3)\n        nn.init.zeros_(self.delta_proj.bias)\n        nn.init.normal_(self.channel_up.weight, mean=0.0, std=1e-3)\n\n    def _masked_stats(self, x, mask):\n        obs_count = mask.sum(dim=1, keepdim=True).clamp_min(1.0)\n        mean = (x * mask).sum(dim=1, keepdim=True) / obs_count\n        centered = (x - mean) * mask\n        stdev = torch.sqrt((centered * centered).sum(dim=1, keepdim=True) / obs_count + self.eps)\n        return mean.detach(), stdev.detach()\n\n    def _masked_smooth(self, x, mask, kernel_size):\n        pad = kernel_size // 2\n        x_t = x.transpose(1, 2)\n        m_t = mask.transpose(1, 2)\n\n        weight = torch.ones(self.enc_in, 1, kernel_size, device=x.device, dtype=x.dtype)\n        num = F.conv1d(F.pad(x_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n        den = F.conv1d(F.pad(m_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n\n        avg = (num / den.clamp_min(self.eps)).transpose(1, 2)\n        density = (den / float(kernel_size)).clamp(0.0, 1.0).transpose(1, 2)\n        return avg, density\n\n    def _linear_interpolate(self, x, mask):\n        bsz, seq_len, channels = x.shape\n        large = float(seq_len + 1)\n\n        f_val = torch.zeros_like(x)\n        b_val = torch.zeros_like(x)\n        f_dist = torch.full_like(x, large)\n        b_dist = torch.full_like(x, large)\n\n        last_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n        last_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n        for t in range(seq_len):\n            obs = mask[:, t, :] > 0.5\n            last_val = torch.where(obs, x[:, t, :], last_val)\n            last_dist = torch.where(obs, torch.zeros_like(last_dist), last_dist + 1.0)\n            f_val[:, t, :] = last_val\n            f_dist[:, t, :] = last_dist\n\n        next_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n        next_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n        for t in range(seq_len - 1, -1, -1):\n            obs = mask[:, t, :] > 0.5\n            next_val = torch.where(obs, x[:, t, :], next_val)\n            next_dist = torch.where(obs, torch.zeros_like(next_dist), next_dist + 1.0)\n            b_val[:, t, :] = next_val\n            b_dist[:, t, :] = next_dist\n\n        f_valid = (f_dist < large).float()\n        b_valid = (b_dist < large).float()\n        both_valid = (f_valid * b_valid) > 0.5\n        only_f = (f_valid > 0.5) & (b_valid < 0.5)\n        only_b = (b_valid > 0.5) & (f_valid < 0.5)\n\n        interp = torch.zeros_like(x)\n        interp = torch.where(\n            both_valid,\n            (b_dist * f_val + f_dist * b_val) / (f_dist + b_dist).clamp_min(self.eps),\n            interp,\n        )\n        interp = torch.where(only_f, f_val, interp)\n        interp = torch.where(only_b, b_val, interp)\n        interp = torch.where(mask > 0.5, x, interp)\n\n        span = torch.where(\n            both_valid,\n            (f_dist + b_dist) / float(seq_len),\n            torch.ones_like(x),\n        )\n        span = torch.where(mask > 0.5, torch.zeros_like(span), span)\n        one_sided = ((f_valid + b_valid) < 1.5).float()\n        one_sided = torch.where(mask > 0.5, torch.zeros_like(one_sided), one_sided)\n        return interp, span.clamp(0.0, 1.0), one_sided\n\n    def _build_prior(self, x, mask):\n        interp, span, one_sided = self._linear_interpolate(x, mask)\n\n        experts = [interp]\n        densities = []\n        for kernel_size in self.kernel_sizes:\n            smooth, density = self._masked_smooth(x, mask, kernel_size)\n            experts.append(smooth)\n            densities.append(density)\n        experts.append(torch.zeros_like(x))\n\n        expert_stack = torch.stack(experts, dim=-1)\n        gate_features = torch.stack([mask, span, one_sided] + densities, dim=-1)\n        prior_weights = torch.softmax(self.prior_gate(gate_features), dim=-1)\n        prior = torch.sum(prior_weights * expert_stack, dim=-1)\n\n        density_stack = torch.stack(densities, dim=-1)\n        confidence = 0.5 * (1.0 - span) + 0.5 * density_stack.max(dim=-1).values\n        confidence = confidence.clamp(0.0, 1.0)\n        uncertainty = 1.0 - confidence\n        return prior, gate_features, uncertainty\n\n    def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):\n        \"\"\"\n        Imputation: reconstruct missing values in the input sequence.\n        Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions\n        Mask: [batch, seq_len, enc_in], 1=observed, 0=masked\n        Output: [batch, seq_len, enc_in]\n        \"\"\"\n        if mask is None:\n            mask = torch.ones_like(x_enc)\n        mask = mask.float()\n\n        means, stdev = self._masked_stats(x_enc, mask)\n        x_norm = ((x_enc - means) / stdev) * mask\n\n        prior, gate_features, uncertainty = self._build_prior(x_norm, mask)\n        filled = mask * x_norm + (1.0 - mask) * prior\n\n        # Innovation around the interpolation prior, only where we actually have\n        # observations. This turns the learning problem into propagating\n        # observed deviations to masked locations.\n        innovation = mask * (x_norm - prior)\n\n        enc_in = self.\n... (truncated)",
5  "after_line": 1,
6  "start_line": 1,
7  "end_line": 247
8}
Tool ResultStep 3
OK: Replaced lines 1..247 with 266 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.

[Current file: Time-Series-Library/models/Custom.py | editable: 1–266 | total: 266 lines]
     1  import torch
     2  import torch.nn as nn
     3  import torch.nn.functional as F
     4  
     5  from layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer
     6  from layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp
     7  from layers.Embed import DataEmbedding, TokenEmbedding
     8  
     9  
    10  class Model(nn.Module):
    11      """
    12      Mask-Adaptive Expert Imputation with Innovation Propagation.
    13  
    14      Key idea:
    15      1) Build a strong nonparametric prior from observed values only using an
    16         adaptive mixture of interpolation experts.
    17      2) Predict only the residual innovation around that prior, instead of the
    18         whole signal, using a periodic encoder and low-rank temp
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### ETTh1 (scripts/etth1.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=5 train_loss=0.0387552 vali_loss=0.0550815 test_loss=0.0464709
TRAIN_METRICS epoch=6 train_loss=0.0375881 vali_loss=0.0539315 test_loss=0.0455713
TRAIN_METRICS epoch=7 train_loss=0.0366244 vali_loss=0.0545323 test_loss=0.0454442
TRAIN_METRICS epoch=8 train_loss=0.0368078 vali_loss=0.0545149 test_loss=0.0455774
TRAIN_METRICS epoch=9 train_loss=0.0366550 vali_loss=0.0547521 test_loss=0.0453765
Test results (ETTh1):
  MSE: 0.046623, MAE: 0.144486

### Weather (scripts/weather.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=5 train_loss=0.1339980 vali_loss=0.0557957 test_loss=0.0244370
TRAIN_METRICS epoch=6 train_loss=0.1362699 vali_loss=0.0556953 test_loss=0.0245191
TRAIN_METRICS epoch=7 train_loss=0.1216792 vali_loss=0.0604484 test_loss=0.0243143
TRAIN_METRICS epoch=8 train_loss=0.1454444 vali_loss=0.0616314 test_loss=0.0243583
TRA
AssistantStep 5edit
Tool Input
1{
2  "op": "replace",
3  "filename": "Time-Series-Library/models/Custom.py",
4  "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer\nfrom layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp\nfrom layers.Embed import DataEmbedding, TokenEmbedding\n\n\nclass Model(nn.Module):\n    \"\"\"\n    Mask-Adaptive Expert Imputation with Innovation Propagation.\n\n    Key idea:\n    1) Build a strong nonparametric prior from observed values only using an\n       adaptive mixture of interpolation experts.\n    2) Predict only the residual innovation around that prior, instead of the\n       whole signal, using a periodic encoder and low-rank temporal/channel\n       propagation branches.\n    \"\"\"\n\n    def __init__(self, configs):\n        super(Model, self).__init__()\n        self.task_name = configs.task_name\n        self.seq_len = configs.seq_len\n        self.pred_len = configs.seq_len  # For imputation, pred_len = seq_len\n        self.enc_in = configs.enc_in\n        self.d_model = configs.d_model\n        self.dropout = configs.dropout\n        self.factor = configs.factor\n        self.n_heads = configs.n_heads\n        self.e_layers = configs.e_layers\n        self.d_ff = configs.d_ff\n        self.activation = configs.activation\n        self.embed = configs.embed\n        self.freq = configs.freq\n        self.moving_avg = getattr(configs, 'moving_avg', 25)\n        if self.moving_avg % 2 == 0:\n            self.moving_avg += 1\n        self.eps = 1e-5\n\n        # Adaptive interpolation experts: linear interpolation + multi-scale\n        # mask-normalized smoothing + zero-mean fallback.\n        self.kernel_sizes = [3, 7, 15, 31]\n        self.num_gate_features = len(self.kernel_sizes) + 3  # mask, span, one-sided + densities\n        self.num_experts = 1 + len(self.kernel_sizes) + 1\n\n        self.prior_gate = nn.Sequential(\n            nn.Linear(self.num_gate_features, 32),\n            nn.GELU(),\n            nn.Linear(32, self.num_experts)\n        )\n\n        # Residual branch on innovations around the prior.\n        self.value_embedding = DataEmbedding(\n            self.enc_in, self.d_model, self.embed, self.freq, self.dropout\n        )\n        self.mask_embedding = TokenEmbedding(self.enc_in, self.d_model)\n        self.uncertainty_embedding = TokenEmbedding(self.enc_in, self.d_model)\n\n        attn_layers = [\n            EncoderLayer(\n                AutoCorrelationLayer(\n                    AutoCorrelation(\n                        mask_flag=False,\n                        factor=self.factor,\n                        attention_dropout=self.dropout,\n                        output_attention=False,\n                    ),\n                    self.d_model,\n                    self.n_heads,\n                ),\n                self.d_model,\n                self.d_ff,\n                moving_avg=self.moving_avg,\n                dropout=self.dropout,\n                activation=self.activation,\n            )\n            for _ in range(self.e_layers)\n        ]\n        self.encoder = Encoder(attn_layers, norm_layer=nn.LayerNorm(self.d_model))\n        self.delta_proj = nn.Linear(self.d_model, self.enc_in)\n\n        # Temporal propagation of observed innovations (DLinear-style but on the\n        # residual around the interpolation prior).\n        self.temporal_decomp = series_decomp(self.moving_avg)\n        self.linear_seasonal = nn.Linear(self.seq_len, self.seq_len)\n        self.linear_trend = nn.Linear(self.seq_len, self.seq_len)\n        self.temporal_scale = 0.5\n\n        # Cross-variable factor propagation at each timestamp. We mix the full\n        # filled signal and the residual innovation more aggressively when the\n        # number of variables is large, since high-dimensional panels benefit\n        # more from contemporaneous cross-series factors.\n        self.channel_rank = min(64, max(4, min(self.enc_in, self.d_model) // 2))\n        self.channel_norm = nn.LayerNorm(self.enc_in)\n        self.channel_down = nn.Linear(self.enc_in, self.channel_rank, bias=False)\n        self.channel_up = nn.Linear(self.channel_rank, self.enc_in, bias=False)\n        self.channel_mix = float(self.enc_in) / float(self.enc_in + 16)\n        self.channel_scale = min(1.0, float(self.enc_in) / 32.0)\n\n        # Confidence gate decides how much residual correction to apply on top\n        # of the interpolation prior.\n        self.correction_gate = nn.Sequential(\n            nn.Linear(self.num_gate_features, 16),\n            nn.GELU(),\n            nn.Linear(16, 1)\n        )\n\n        self._reset_parameters()\n\n    def _reset_parameters(self):\n        # Prefer interpolation-like experts before learning starts.\n        with torch.no_grad():\n            expert_bias = torch.tensor([2.0, 1.0, 0.5, 0.0, -0.5, -1.0], dtype=torch.float32)\n            self.prior_gate[-1].bias.copy_(expert_bias)\n            self.correction_gate[-1].bias.fill_(-0.25)\n\n            avg_init = (1.0 / self.seq_len) * torch.ones(self.seq_len, self.seq_len)\n            self.linear_seasonal.weight.copy_(avg_init)\n            self.linear_trend.weight.copy_(avg_init)\n            self.linear_seasonal.bias.zero_()\n            self.linear_trend.bias.zero_()\n\n        nn.init.normal_(self.delta_proj.weight, mean=0.0, std=1e-3)\n        nn.init.zeros_(self.delta_proj.bias)\n        nn.init.normal_(self.channel_up.weight, mean=0.0, std=1e-3)\n\n    def _masked_stats(self, x, mask):\n        obs_count = mask.sum(dim=1, keepdim=True).clamp_min(1.0)\n        mean = (x * mask).sum(dim=1, keepdim=True) / obs_count\n        centered = (x - mean) * mask\n        stdev = torch.sqrt((centered * centered).sum(dim=1, keepdim=True) / obs_count + self.eps)\n        return mean.detach(), stdev.detach()\n\n    def _masked_smooth(self, x, mask, kernel_size):\n        pad = kernel_size // 2\n        x_t = x.transpose(1, 2)\n        m_t = mask.transpose(1, 2)\n\n        weight = torch.ones(self.enc_in, 1, kernel_size, device=x.device, dtype=x.dtype)\n        num = F.conv1d(F.pad(x_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n        den = F.conv1d(F.pad(m_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n\n        avg = (num / den.clamp_min(self.eps)).transpose(1, 2)\n        density = (den / float(kernel_size)).clamp(0.0, 1.0).transpose(1, 2)\n        return avg, density\n\n    def _linear_interpolate(self, x, mask):\n        bsz, seq_len, channels = x.shape\n        large = float(seq_len + 1)\n\n        f_val = torch.zeros_like(x)\n        b_val = torch.zeros_like(x)\n        f_dist = torch.full_like(x, large)\n        b_dist = torch.full_like(x, large)\n\n        last_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n        last_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n        for t in range(seq_len):\n            obs = mask[:, t, :] > 0.5\n            last_val = torch.where(obs, x[:, t, :], last_val)\n            last_dist = torch.where(obs, torch.zeros_like(last_dist), last_dist + 1.0)\n            f_val[:, t, :] = last_val\n            f_dist[:, t, :] = last_dist\n\n        next_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n        next_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n        for t in range(seq_len - 1, -1, -1):\n            obs = mask[:, t, :] > 0.5\n            next_val = torch.where(obs, x[:, t, :], next_val)\n            next_dist = torch.where(obs, torch.zeros_like(next_dist), next_dist + 1.0)\n            b_val[:, t, :] = next_val\n            b_dist[:, t, :] = next_dist\n\n        f_valid = (f_dist < large).float()\n        b_valid = (b_dist < large).float()\n        both_valid = (f_valid * b_valid) > 0.5\n        only_f = (f_valid > 0.5) & (b_valid < 0.5)\n        only_b = (b_valid > 0.5) & (f_valid < 0.5)\n\n        interp = torch.zeros_like(x)\n        interp = torch.where(\n            both_valid,\n            (b_dist * f_val + f_dist * b_val) / (f_dist + b_dist).clamp_min(self.eps),\n            interp,\n        )\n        interp = torch.where(only_f, f_val, interp)\n        interp = torch.where(only_b, b_val, interp)\n        interp = torch.where(mask > 0.5, x, interp)\n\n        span = torch.where(\n            both_valid,\n            (f_dist + b_dist) / float(seq_len),\n            torch.ones_like(x),\n        )\n        span = torch.where(mask > 0.5, torch.zeros_like(span), span)\n        one_sided = ((f_valid + b_valid) < 1.5).float()\n        one_sided = torch.where(mask > 0.5, torch.zeros_like(one_sided), one_sided)\n        return interp, span.clamp(0.0, 1.0), one_sided\n\n    def _build_prior(self, x, mask):\n        interp, span, one_sided = self._linear_interpolate(x, mask)\n\n        experts = [interp]\n        densities = []\n        for kernel_size in self.kernel_sizes:\n            smooth, density = self._masked_smooth(x, mask, kernel_size)\n            experts.append(smooth)\n            densities.append(density)\n        experts.append(torch.zeros_like(x))\n\n        expert_stack = torch.stack(experts, dim=-1)\n        gate_features = torch.stack([mask, span, one_sided] + densities, dim=-1)\n        prior_weights = torch.softmax(self.prior_gate(gate_features), dim=-1)\n        prior = torch.sum(prior_weights * expert_stack, dim=-1)\n\n        density_stack = torch.stack(densities, dim=-1)\n        confidence = 0.5 * (1.0 - span) + 0.5 * density_stack.max(dim=-1).values\n        confidence = confidence.clamp(0.0, 1.0)\n        uncertainty = 1.0 - confidence\n        return prior, gate_features, uncertainty\n\n    def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):\n        \"\"\"\n        Imputation: reconstruct missing values in the input sequence.\n        Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions\n        Mask: [batch, seq_len, enc_in], 1=observed, 0=masked\n        Output: [batch, seq_len, enc_in]\n        \"\"\"\n        if mask is None:\n            mask = torch.ones_like(x_enc)\n        mask = mask.float()\n\n        means, stdev = self._masked_stats(x_enc, mask)\n        x_norm = ((x_enc - means) / stdev) * mask\n\n  \n... (truncated)",
5  "after_line": 1,
6  "start_line": 1,
7  "end_line": 266
8}
Tool ResultStep 5
OK: Replaced lines 1..266 with 275 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.

[Current file: Time-Series-Library/models/Custom.py | editable: 1–275 | total: 275 lines]
     1  import torch
     2  import torch.nn as nn
     3  import torch.nn.functional as F
     4  
     5  from layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer
     6  from layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp
     7  from layers.Embed import DataEmbedding, TokenEmbedding
     8  
     9  
    10  class Model(nn.Module):
    11      """
    12      Mask-Adaptive Expert Imputation with Innovation Propagation.
    13  
    14      Key idea:
    15      1) Build a strong nonparametric prior from observed values only using an
    16         adaptive mixture of interpolation experts.
    17      2) Predict only the residual innovation around that prior, instead of the
    18         whole signal, using a periodic encoder and low-rank temp
AssistantStep 6test
Tool ResultStep 6
[Test #3] (0 tests remaining)

### ETTh1 (scripts/etth1.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=6 train_loss=0.0360194 vali_loss=0.0518446 test_loss=0.0448159
TRAIN_METRICS epoch=7 train_loss=0.0350492 vali_loss=0.0517632 test_loss=0.0443042
TRAIN_METRICS epoch=8 train_loss=0.0352486 vali_loss=0.0517983 test_loss=0.0445040
TRAIN_METRICS epoch=9 train_loss=0.0351590 vali_loss=0.0518177 test_loss=0.0442372
TRAIN_METRICS epoch=10 train_loss=0.0350072 vali_loss=0.0517611 test_loss=0.0443840
Test results (ETTh1):
  MSE: 0.044474, MAE: 0.141060

### Weather (scripts/weather.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=5 train_loss=0.1341699 vali_loss=0.0558085 test_loss=0.0245149
TRAIN_METRICS epoch=6 train_loss=0.1363803 vali_loss=0.0557269 test_loss=0.0246034
TRAIN_METRICS epoch=7 train_loss=0.1218875 vali_loss=0.0603000 test_loss=0.0244031
TRAIN_METRICS epoch=8 train_loss=0.1455532 vali_loss=0.0616333 test_loss=0.0244405
TRAIN_METRICS epoch=9 train_loss=0.125665