Agent Conversation: openai/gpt-5.4-pro
Model: openai/gpt-5.4-pro
Total steps: 6
Messages: 13
UserStep 0
Task: ts-imputation
Time Series Imputation: Custom Model Design
Objective
Design and implement a custom deep learning model for time series missing value imputation. Your code goes in the Model class in models/Custom.py. Three reference implementations (DLinear, TimesNet, PatchTST) are provided as read-only.
Evaluation
Trained and evaluated on three multivariate datasets with 25% random masking:
- ETTh1 (7 variables)
- Weather (21 variables)
- ECL (321 variables)
All use seq_len=96. Metrics: MSE and MAE on masked regions only (lower is better).
Time-Series-Library/models/Custom.py [EDITABLE — entire file only]
1: import torch
2: import torch.nn as nn
3:
4:
5: class Model(nn.Module):
6: """
7: Custom model for time series imputation.
8:
9: Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)
10: - x_enc: [batch, seq_len, enc_in] — input with masked values set to 0
11: - x_mark_enc: [batch, seq_len, time_features] — time feature encoding
12: - x_dec: not used for imputation (None)
13: - x_mark_dec: not used for imputation (None)
14: - mask: [batch, seq_len, enc_in] — binary mask (1=observed, 0=masked)
15:
16: Must return: [batch, seq_len, enc_in] — reconstructed sequence
17: """
18:
19: def __init__(self, configs):
20: super(Model, self).__init__()
21: self.task_name = configs.task_name
22: self.seq_len = configs.seq_len
23: self.pred_len = configs.seq_len # For imputation, pred_len = seq_len
24: self.enc_in = configs.enc_in
25: # TODO: Define your model architecture here
26:
27: def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):
28: """
29: Imputation: reconstruct missing values in the input sequence.
30: Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions
31: Mask: [batch, seq_len, enc_in], 1=observed, 0=masked
32: Output: [batch, seq_len, enc_in]
33: """
34: # TODO: Implement your imputation logic
35: return x_enc # Placeholder: return input as-is
36:
37: def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
38: if self.task_name == 'imputation':
39: dec_out = self.imputation(x_enc, x_mark_enc, x_dec, x_mark_dec, mask)
40: return dec_out
41: return None
Time-Series-Library/models/DLinear.py [READ-ONLY — do not edit]
1: import torch
2: import torch.nn as nn
3: import torch.nn.functional as F
4: from layers.Autoformer_EncDec import series_decomp
5:
6:
7: class Model(nn.Module):
8: """
9: Paper link: https://arxiv.org/pdf/2205.13504.pdf
10: """
11:
12: def __iniAssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer\nfrom layers.Autoformer_EncDec import Encoder, EncoderLayer\nfrom layers.Embed import DataEmbedding, TokenEmbedding\n\n\nclass Model(nn.Module):\n \"\"\"\n Mask-Adaptive Expert Imputation with Innovation Propagation.\n\n Key idea:\n 1) Build a strong nonparametric prior from observed values only using an\n adaptive mixture of interpolation experts.\n 2) Predict only the residual innovation around that prior, instead of the\n whole signal, using a periodic encoder and a low-rank channel adapter.\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.seq_len # For imputation, pred_len = seq_len\n self.enc_in = configs.enc_in\n self.d_model = configs.d_model\n self.dropout = configs.dropout\n self.factor = configs.factor\n self.n_heads = configs.n_heads\n self.e_layers = configs.e_layers\n self.d_ff = configs.d_ff\n self.activation = configs.activation\n self.embed = configs.embed\n self.freq = configs.freq\n self.moving_avg = getattr(configs, 'moving_avg', 25)\n self.eps = 1e-5\n\n # Adaptive interpolation experts: linear interpolation + multi-scale\n # mask-normalized smoothing + zero-mean fallback.\n self.kernel_sizes = [3, 7, 15, 31]\n self.num_gate_features = len(self.kernel_sizes) + 3 # mask, span, one-sided + densities\n self.num_experts = 1 + len(self.kernel_sizes) + 1\n\n self.prior_gate = nn.Sequential(\n nn.Linear(self.num_gate_features, 32),\n nn.GELU(),\n nn.Linear(32, self.num_experts)\n )\n\n # Residual branch on innovations around the prior.\n self.value_embedding = DataEmbedding(\n self.enc_in, self.d_model, self.embed, self.freq, self.dropout\n )\n self.mask_embedding = TokenEmbedding(self.enc_in, self.d_model)\n self.uncertainty_embedding = TokenEmbedding(self.enc_in, self.d_model)\n\n attn_layers = [\n EncoderLayer(\n AutoCorrelationLayer(\n AutoCorrelation(\n mask_flag=False,\n factor=self.factor,\n attention_dropout=self.dropout,\n output_attention=False,\n ),\n self.d_model,\n self.n_heads,\n ),\n self.d_model,\n self.d_ff,\n moving_avg=self.moving_avg,\n dropout=self.dropout,\n activation=self.activation,\n )\n for _ in range(self.e_layers)\n ]\n self.encoder = Encoder(attn_layers, norm_layer=nn.LayerNorm(self.d_model))\n self.delta_proj = nn.Linear(self.d_model, self.enc_in)\n\n # Cross-variable residual propagation at each timestamp.\n self.channel_rank = min(64, max(4, min(self.enc_in, self.d_model) // 2))\n self.channel_norm = nn.LayerNorm(self.enc_in)\n self.channel_down = nn.Linear(self.enc_in, self.channel_rank, bias=False)\n self.channel_up = nn.Linear(self.channel_rank, self.enc_in, bias=False)\n\n # Confidence gate decides how much residual correction to apply on top\n # of the interpolation prior.\n self.correction_gate = nn.Sequential(\n nn.Linear(self.num_gate_features, 16),\n nn.GELU(),\n nn.Linear(16, 1)\n )\n\n self._reset_parameters()\n\n def _reset_parameters(self):\n # Prefer interpolation-like experts before learning starts.\n with torch.no_grad():\n expert_bias = torch.tensor([2.0, 1.0, 0.5, 0.0, -0.5, -1.0], dtype=torch.float32)\n self.prior_gate[-1].bias.copy_(expert_bias)\n self.correction_gate[-1].bias.fill_(-0.5)\n\n nn.init.normal_(self.delta_proj.weight, mean=0.0, std=1e-3)\n nn.init.zeros_(self.delta_proj.bias)\n nn.init.normal_(self.channel_up.weight, mean=0.0, std=1e-3)\n\n def _masked_stats(self, x, mask):\n obs_count = mask.sum(dim=1, keepdim=True).clamp_min(1.0)\n mean = (x * mask).sum(dim=1, keepdim=True) / obs_count\n centered = (x - mean) * mask\n stdev = torch.sqrt((centered * centered).sum(dim=1, keepdim=True) / obs_count + self.eps)\n return mean.detach(), stdev.detach()\n\n def _masked_smooth(self, x, mask, kernel_size):\n pad = kernel_size // 2\n x_t = x.transpose(1, 2)\n m_t = mask.transpose(1, 2)\n\n weight = torch.ones(self.enc_in, 1, kernel_size, device=x.device, dtype=x.dtype)\n num = F.conv1d(F.pad(x_t, (pad, pad)), weight, groups=self.enc_in)\n den = F.conv1d(F.pad(m_t, (pad, pad)), weight, groups=self.enc_in)\n\n avg = (num / den.clamp_min(self.eps)).transpose(1, 2)\n density = (den / float(kernel_size)).clamp(0.0, 1.0).transpose(1, 2)\n return avg, density\n\n def _linear_interpolate(self, x, mask):\n bsz, seq_len, channels = x.shape\n large = float(seq_len + 1)\n\n f_val = torch.zeros_like(x)\n b_val = torch.zeros_like(x)\n f_dist = torch.full_like(x, large)\n b_dist = torch.full_like(x, large)\n\n last_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n last_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n for t in range(seq_len):\n obs = mask[:, t, :] > 0.5\n last_val = torch.where(obs, x[:, t, :], last_val)\n last_dist = torch.where(obs, torch.zeros_like(last_dist), last_dist + 1.0)\n f_val[:, t, :] = last_val\n f_dist[:, t, :] = last_dist\n\n next_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n next_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n for t in range(seq_len - 1, -1, -1):\n obs = mask[:, t, :] > 0.5\n next_val = torch.where(obs, x[:, t, :], next_val)\n next_dist = torch.where(obs, torch.zeros_like(next_dist), next_dist + 1.0)\n b_val[:, t, :] = next_val\n b_dist[:, t, :] = next_dist\n\n f_valid = (f_dist < large).float()\n b_valid = (b_dist < large).float()\n both_valid = (f_valid * b_valid) > 0.5\n only_f = (f_valid > 0.5) & (b_valid < 0.5)\n only_b = (b_valid > 0.5) & (f_valid < 0.5)\n\n interp = torch.zeros_like(x)\n interp = torch.where(\n both_valid,\n (b_dist * f_val + f_dist * b_val) / (f_dist + b_dist).clamp_min(self.eps),\n interp,\n )\n interp = torch.where(only_f, f_val, interp)\n interp = torch.where(only_b, b_val, interp)\n interp = torch.where(mask > 0.5, x, interp)\n\n span = torch.where(\n both_valid,\n (f_dist + b_dist) / float(seq_len),\n torch.ones_like(x),\n )\n span = torch.where(mask > 0.5, torch.zeros_like(span), span)\n one_sided = ((f_valid + b_valid) < 1.5).float()\n one_sided = torch.where(mask > 0.5, torch.zeros_like(one_sided), one_sided)\n return interp, span.clamp(0.0, 1.0), one_sided\n\n def _build_prior(self, x, mask):\n interp, span, one_sided = self._linear_interpolate(x, mask)\n\n experts = [interp]\n densities = []\n for kernel_size in self.kernel_sizes:\n smooth, density = self._masked_smooth(x, mask, kernel_size)\n experts.append(smooth)\n densities.append(density)\n experts.append(torch.zeros_like(x))\n\n expert_stack = torch.stack(experts, dim=-1)\n gate_features = torch.stack([mask, span, one_sided] + densities, dim=-1)\n prior_weights = torch.softmax(self.prior_gate(gate_features), dim=-1)\n prior = torch.sum(prior_weights * expert_stack, dim=-1)\n\n density_stack = torch.stack(densities, dim=-1)\n confidence = 0.5 * (1.0 - span) + 0.5 * density_stack.max(dim=-1).values\n confidence = confidence.clamp(0.0, 1.0)\n uncertainty = 1.0 - confidence\n return prior, gate_features, uncertainty\n\n def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):\n \"\"\"\n Imputation: reconstruct missing values in the input sequence.\n Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions\n Mask: [batch, seq_len, enc_in], 1=observed, 0=masked\n Output: [batch, seq_len, enc_in]\n \"\"\"\n if mask is None:\n mask = torch.ones_like(x_enc)\n mask = mask.float()\n\n means, stdev = self._masked_stats(x_enc, mask)\n x_norm = ((x_enc - means) / stdev) * mask\n\n prior, gate_features, uncertainty = self._build_prior(x_norm, mask)\n filled = mask * x_norm + (1.0 - mask) * prior\n\n # Innovation around the interpolation prior, only where we actually have\n # observations. This turns the learning problem into propagating\n # observed deviations to masked locations.\n innovation = mask * (x_norm - prior)\n\n enc_in = self.value_embedding(innovation, x_mark_enc)\n enc_in = enc_in + self.mask_embedding(mask) + self.uncertainty_embedding(uncertainty)\n enc_out, _ = self.encoder(enc_in, attn_mask=None)\n delta_context = self.delta_proj(enc_out)\n\n delta_channel = self.channel_up(self.channel_down(self.channel_norm(innovation)))\n\n correction_gain = torch.sigmoid(self.correction_gate(gate_features)).squeeze(-1)\n delta = correction_gain * (delta_context + delta_channel)\n\n pred_norm = filled + (1.0 - mask) * delta\n pred = pred_norm * stdev + means\n\n # Keep observed values untouched and only impute masked entries.\n pred = mask * x_enc + (1.0 - mask) * pred\n return pred\n\n def forward(self, x_enc, x_mark_e\n... (truncated)",
5 "after_line": 1,
6 "start_line": 1,
7 "end_line": 40
8}Tool ResultStep 1
OK: Replaced lines 1..40 with 246 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–247 | total: 247 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4
5 from layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer
6 from layers.Autoformer_EncDec import Encoder, EncoderLayer
7 from layers.Embed import DataEmbedding, TokenEmbedding
8
9
10 class Model(nn.Module):
11 """
12 Mask-Adaptive Expert Imputation with Innovation Propagation.
13
14 Key idea:
15 1) Build a strong nonparametric prior from observed values only using an
16 adaptive mixture of interpolation experts.
17 2) Predict only the residual innovation around that prior, instead of the
18 whole signal, using a periodic encoder and a low-rank channel adapter.
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=6 train_loss=0.0331812 vali_loss=0.0512693 test_loss=0.0424405 TRAIN_METRICS epoch=7 train_loss=0.0327520 vali_loss=0.0508572 test_loss=0.0422574 TRAIN_METRICS epoch=8 train_loss=0.0322304 vali_loss=0.0506850 test_loss=0.0421985 TRAIN_METRICS epoch=9 train_loss=0.0326539 vali_loss=0.0502444 test_loss=0.0418822 TRAIN_METRICS epoch=10 train_loss=0.0322635 vali_loss=0.0501821 test_loss=0.0422133 Test results (ETTh1): MSE: 0.042496, MAE: 0.137878 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=6 train_loss=0.1353455 vali_loss=0.0657196 test_loss=0.0257231 TRAIN_METRICS epoch=7 train_loss=0.1266806 vali_loss=0.0627417 test_loss=0.0258923 TRAIN_METRICS epoch=8 train_loss=0.1314173 vali_loss=0.0591142 test_loss=0.0258205 TRAIN_METRICS epoch=9 train_loss=0.1255150 vali_loss=0.0675052 test_loss=0.0259505 T
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer\nfrom layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp\nfrom layers.Embed import DataEmbedding, TokenEmbedding\n\n\nclass Model(nn.Module):\n \"\"\"\n Mask-Adaptive Expert Imputation with Innovation Propagation.\n\n Key idea:\n 1) Build a strong nonparametric prior from observed values only using an\n adaptive mixture of interpolation experts.\n 2) Predict only the residual innovation around that prior, instead of the\n whole signal, using a periodic encoder and low-rank temporal/channel\n propagation branches.\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.seq_len # For imputation, pred_len = seq_len\n self.enc_in = configs.enc_in\n self.d_model = configs.d_model\n self.dropout = configs.dropout\n self.factor = configs.factor\n self.n_heads = configs.n_heads\n self.e_layers = configs.e_layers\n self.d_ff = configs.d_ff\n self.activation = configs.activation\n self.embed = configs.embed\n self.freq = configs.freq\n self.moving_avg = getattr(configs, 'moving_avg', 25)\n if self.moving_avg % 2 == 0:\n self.moving_avg += 1\n self.eps = 1e-5\n\n # Adaptive interpolation experts: linear interpolation + multi-scale\n # mask-normalized smoothing + zero-mean fallback.\n self.kernel_sizes = [3, 7, 15, 31]\n self.num_gate_features = len(self.kernel_sizes) + 3 # mask, span, one-sided + densities\n self.num_experts = 1 + len(self.kernel_sizes) + 1\n\n self.prior_gate = nn.Sequential(\n nn.Linear(self.num_gate_features, 32),\n nn.GELU(),\n nn.Linear(32, self.num_experts)\n )\n\n # Residual branch on innovations around the prior.\n self.value_embedding = DataEmbedding(\n self.enc_in, self.d_model, self.embed, self.freq, self.dropout\n )\n self.mask_embedding = TokenEmbedding(self.enc_in, self.d_model)\n self.uncertainty_embedding = TokenEmbedding(self.enc_in, self.d_model)\n\n attn_layers = [\n EncoderLayer(\n AutoCorrelationLayer(\n AutoCorrelation(\n mask_flag=False,\n factor=self.factor,\n attention_dropout=self.dropout,\n output_attention=False,\n ),\n self.d_model,\n self.n_heads,\n ),\n self.d_model,\n self.d_ff,\n moving_avg=self.moving_avg,\n dropout=self.dropout,\n activation=self.activation,\n )\n for _ in range(self.e_layers)\n ]\n self.encoder = Encoder(attn_layers, norm_layer=nn.LayerNorm(self.d_model))\n self.delta_proj = nn.Linear(self.d_model, self.enc_in)\n\n # Temporal propagation of observed innovations (DLinear-style but on the\n # residual around the interpolation prior).\n self.temporal_decomp = series_decomp(self.moving_avg)\n self.linear_seasonal = nn.Linear(self.seq_len, self.seq_len)\n self.linear_trend = nn.Linear(self.seq_len, self.seq_len)\n\n # Cross-variable factor propagation at each timestamp.\n self.channel_rank = min(64, max(4, min(self.enc_in, self.d_model) // 2))\n self.channel_norm = nn.LayerNorm(self.enc_in)\n self.channel_down = nn.Linear(self.enc_in, self.channel_rank, bias=False)\n self.channel_up = nn.Linear(self.channel_rank, self.enc_in, bias=False)\n\n # Confidence gate decides how much residual correction to apply on top\n # of the interpolation prior.\n self.correction_gate = nn.Sequential(\n nn.Linear(self.num_gate_features, 16),\n nn.GELU(),\n nn.Linear(16, 1)\n )\n\n self._reset_parameters()\n\n def _reset_parameters(self):\n # Prefer interpolation-like experts before learning starts.\n with torch.no_grad():\n expert_bias = torch.tensor([2.0, 1.0, 0.5, 0.0, -0.5, -1.0], dtype=torch.float32)\n self.prior_gate[-1].bias.copy_(expert_bias)\n self.correction_gate[-1].bias.fill_(-0.25)\n\n avg_init = (1.0 / self.seq_len) * torch.ones(self.seq_len, self.seq_len)\n self.linear_seasonal.weight.copy_(avg_init)\n self.linear_trend.weight.copy_(avg_init)\n self.linear_seasonal.bias.zero_()\n self.linear_trend.bias.zero_()\n\n nn.init.normal_(self.delta_proj.weight, mean=0.0, std=1e-3)\n nn.init.zeros_(self.delta_proj.bias)\n nn.init.normal_(self.channel_up.weight, mean=0.0, std=1e-3)\n\n def _masked_stats(self, x, mask):\n obs_count = mask.sum(dim=1, keepdim=True).clamp_min(1.0)\n mean = (x * mask).sum(dim=1, keepdim=True) / obs_count\n centered = (x - mean) * mask\n stdev = torch.sqrt((centered * centered).sum(dim=1, keepdim=True) / obs_count + self.eps)\n return mean.detach(), stdev.detach()\n\n def _masked_smooth(self, x, mask, kernel_size):\n pad = kernel_size // 2\n x_t = x.transpose(1, 2)\n m_t = mask.transpose(1, 2)\n\n weight = torch.ones(self.enc_in, 1, kernel_size, device=x.device, dtype=x.dtype)\n num = F.conv1d(F.pad(x_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n den = F.conv1d(F.pad(m_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n\n avg = (num / den.clamp_min(self.eps)).transpose(1, 2)\n density = (den / float(kernel_size)).clamp(0.0, 1.0).transpose(1, 2)\n return avg, density\n\n def _linear_interpolate(self, x, mask):\n bsz, seq_len, channels = x.shape\n large = float(seq_len + 1)\n\n f_val = torch.zeros_like(x)\n b_val = torch.zeros_like(x)\n f_dist = torch.full_like(x, large)\n b_dist = torch.full_like(x, large)\n\n last_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n last_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n for t in range(seq_len):\n obs = mask[:, t, :] > 0.5\n last_val = torch.where(obs, x[:, t, :], last_val)\n last_dist = torch.where(obs, torch.zeros_like(last_dist), last_dist + 1.0)\n f_val[:, t, :] = last_val\n f_dist[:, t, :] = last_dist\n\n next_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n next_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n for t in range(seq_len - 1, -1, -1):\n obs = mask[:, t, :] > 0.5\n next_val = torch.where(obs, x[:, t, :], next_val)\n next_dist = torch.where(obs, torch.zeros_like(next_dist), next_dist + 1.0)\n b_val[:, t, :] = next_val\n b_dist[:, t, :] = next_dist\n\n f_valid = (f_dist < large).float()\n b_valid = (b_dist < large).float()\n both_valid = (f_valid * b_valid) > 0.5\n only_f = (f_valid > 0.5) & (b_valid < 0.5)\n only_b = (b_valid > 0.5) & (f_valid < 0.5)\n\n interp = torch.zeros_like(x)\n interp = torch.where(\n both_valid,\n (b_dist * f_val + f_dist * b_val) / (f_dist + b_dist).clamp_min(self.eps),\n interp,\n )\n interp = torch.where(only_f, f_val, interp)\n interp = torch.where(only_b, b_val, interp)\n interp = torch.where(mask > 0.5, x, interp)\n\n span = torch.where(\n both_valid,\n (f_dist + b_dist) / float(seq_len),\n torch.ones_like(x),\n )\n span = torch.where(mask > 0.5, torch.zeros_like(span), span)\n one_sided = ((f_valid + b_valid) < 1.5).float()\n one_sided = torch.where(mask > 0.5, torch.zeros_like(one_sided), one_sided)\n return interp, span.clamp(0.0, 1.0), one_sided\n\n def _build_prior(self, x, mask):\n interp, span, one_sided = self._linear_interpolate(x, mask)\n\n experts = [interp]\n densities = []\n for kernel_size in self.kernel_sizes:\n smooth, density = self._masked_smooth(x, mask, kernel_size)\n experts.append(smooth)\n densities.append(density)\n experts.append(torch.zeros_like(x))\n\n expert_stack = torch.stack(experts, dim=-1)\n gate_features = torch.stack([mask, span, one_sided] + densities, dim=-1)\n prior_weights = torch.softmax(self.prior_gate(gate_features), dim=-1)\n prior = torch.sum(prior_weights * expert_stack, dim=-1)\n\n density_stack = torch.stack(densities, dim=-1)\n confidence = 0.5 * (1.0 - span) + 0.5 * density_stack.max(dim=-1).values\n confidence = confidence.clamp(0.0, 1.0)\n uncertainty = 1.0 - confidence\n return prior, gate_features, uncertainty\n\n def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):\n \"\"\"\n Imputation: reconstruct missing values in the input sequence.\n Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions\n Mask: [batch, seq_len, enc_in], 1=observed, 0=masked\n Output: [batch, seq_len, enc_in]\n \"\"\"\n if mask is None:\n mask = torch.ones_like(x_enc)\n mask = mask.float()\n\n means, stdev = self._masked_stats(x_enc, mask)\n x_norm = ((x_enc - means) / stdev) * mask\n\n prior, gate_features, uncertainty = self._build_prior(x_norm, mask)\n filled = mask * x_norm + (1.0 - mask) * prior\n\n # Innovation around the interpolation prior, only where we actually have\n # observations. This turns the learning problem into propagating\n # observed deviations to masked locations.\n innovation = mask * (x_norm - prior)\n\n enc_in = self.\n... (truncated)",
5 "after_line": 1,
6 "start_line": 1,
7 "end_line": 247
8}Tool ResultStep 3
OK: Replaced lines 1..247 with 266 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–266 | total: 266 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4
5 from layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer
6 from layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp
7 from layers.Embed import DataEmbedding, TokenEmbedding
8
9
10 class Model(nn.Module):
11 """
12 Mask-Adaptive Expert Imputation with Innovation Propagation.
13
14 Key idea:
15 1) Build a strong nonparametric prior from observed values only using an
16 adaptive mixture of interpolation experts.
17 2) Predict only the residual innovation around that prior, instead of the
18 whole signal, using a periodic encoder and low-rank temp
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=5 train_loss=0.0387552 vali_loss=0.0550815 test_loss=0.0464709 TRAIN_METRICS epoch=6 train_loss=0.0375881 vali_loss=0.0539315 test_loss=0.0455713 TRAIN_METRICS epoch=7 train_loss=0.0366244 vali_loss=0.0545323 test_loss=0.0454442 TRAIN_METRICS epoch=8 train_loss=0.0368078 vali_loss=0.0545149 test_loss=0.0455774 TRAIN_METRICS epoch=9 train_loss=0.0366550 vali_loss=0.0547521 test_loss=0.0453765 Test results (ETTh1): MSE: 0.046623, MAE: 0.144486 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=5 train_loss=0.1339980 vali_loss=0.0557957 test_loss=0.0244370 TRAIN_METRICS epoch=6 train_loss=0.1362699 vali_loss=0.0556953 test_loss=0.0245191 TRAIN_METRICS epoch=7 train_loss=0.1216792 vali_loss=0.0604484 test_loss=0.0243143 TRAIN_METRICS epoch=8 train_loss=0.1454444 vali_loss=0.0616314 test_loss=0.0243583 TRA
AssistantStep 5edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer\nfrom layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp\nfrom layers.Embed import DataEmbedding, TokenEmbedding\n\n\nclass Model(nn.Module):\n \"\"\"\n Mask-Adaptive Expert Imputation with Innovation Propagation.\n\n Key idea:\n 1) Build a strong nonparametric prior from observed values only using an\n adaptive mixture of interpolation experts.\n 2) Predict only the residual innovation around that prior, instead of the\n whole signal, using a periodic encoder and low-rank temporal/channel\n propagation branches.\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.seq_len # For imputation, pred_len = seq_len\n self.enc_in = configs.enc_in\n self.d_model = configs.d_model\n self.dropout = configs.dropout\n self.factor = configs.factor\n self.n_heads = configs.n_heads\n self.e_layers = configs.e_layers\n self.d_ff = configs.d_ff\n self.activation = configs.activation\n self.embed = configs.embed\n self.freq = configs.freq\n self.moving_avg = getattr(configs, 'moving_avg', 25)\n if self.moving_avg % 2 == 0:\n self.moving_avg += 1\n self.eps = 1e-5\n\n # Adaptive interpolation experts: linear interpolation + multi-scale\n # mask-normalized smoothing + zero-mean fallback.\n self.kernel_sizes = [3, 7, 15, 31]\n self.num_gate_features = len(self.kernel_sizes) + 3 # mask, span, one-sided + densities\n self.num_experts = 1 + len(self.kernel_sizes) + 1\n\n self.prior_gate = nn.Sequential(\n nn.Linear(self.num_gate_features, 32),\n nn.GELU(),\n nn.Linear(32, self.num_experts)\n )\n\n # Residual branch on innovations around the prior.\n self.value_embedding = DataEmbedding(\n self.enc_in, self.d_model, self.embed, self.freq, self.dropout\n )\n self.mask_embedding = TokenEmbedding(self.enc_in, self.d_model)\n self.uncertainty_embedding = TokenEmbedding(self.enc_in, self.d_model)\n\n attn_layers = [\n EncoderLayer(\n AutoCorrelationLayer(\n AutoCorrelation(\n mask_flag=False,\n factor=self.factor,\n attention_dropout=self.dropout,\n output_attention=False,\n ),\n self.d_model,\n self.n_heads,\n ),\n self.d_model,\n self.d_ff,\n moving_avg=self.moving_avg,\n dropout=self.dropout,\n activation=self.activation,\n )\n for _ in range(self.e_layers)\n ]\n self.encoder = Encoder(attn_layers, norm_layer=nn.LayerNorm(self.d_model))\n self.delta_proj = nn.Linear(self.d_model, self.enc_in)\n\n # Temporal propagation of observed innovations (DLinear-style but on the\n # residual around the interpolation prior).\n self.temporal_decomp = series_decomp(self.moving_avg)\n self.linear_seasonal = nn.Linear(self.seq_len, self.seq_len)\n self.linear_trend = nn.Linear(self.seq_len, self.seq_len)\n self.temporal_scale = 0.5\n\n # Cross-variable factor propagation at each timestamp. We mix the full\n # filled signal and the residual innovation more aggressively when the\n # number of variables is large, since high-dimensional panels benefit\n # more from contemporaneous cross-series factors.\n self.channel_rank = min(64, max(4, min(self.enc_in, self.d_model) // 2))\n self.channel_norm = nn.LayerNorm(self.enc_in)\n self.channel_down = nn.Linear(self.enc_in, self.channel_rank, bias=False)\n self.channel_up = nn.Linear(self.channel_rank, self.enc_in, bias=False)\n self.channel_mix = float(self.enc_in) / float(self.enc_in + 16)\n self.channel_scale = min(1.0, float(self.enc_in) / 32.0)\n\n # Confidence gate decides how much residual correction to apply on top\n # of the interpolation prior.\n self.correction_gate = nn.Sequential(\n nn.Linear(self.num_gate_features, 16),\n nn.GELU(),\n nn.Linear(16, 1)\n )\n\n self._reset_parameters()\n\n def _reset_parameters(self):\n # Prefer interpolation-like experts before learning starts.\n with torch.no_grad():\n expert_bias = torch.tensor([2.0, 1.0, 0.5, 0.0, -0.5, -1.0], dtype=torch.float32)\n self.prior_gate[-1].bias.copy_(expert_bias)\n self.correction_gate[-1].bias.fill_(-0.25)\n\n avg_init = (1.0 / self.seq_len) * torch.ones(self.seq_len, self.seq_len)\n self.linear_seasonal.weight.copy_(avg_init)\n self.linear_trend.weight.copy_(avg_init)\n self.linear_seasonal.bias.zero_()\n self.linear_trend.bias.zero_()\n\n nn.init.normal_(self.delta_proj.weight, mean=0.0, std=1e-3)\n nn.init.zeros_(self.delta_proj.bias)\n nn.init.normal_(self.channel_up.weight, mean=0.0, std=1e-3)\n\n def _masked_stats(self, x, mask):\n obs_count = mask.sum(dim=1, keepdim=True).clamp_min(1.0)\n mean = (x * mask).sum(dim=1, keepdim=True) / obs_count\n centered = (x - mean) * mask\n stdev = torch.sqrt((centered * centered).sum(dim=1, keepdim=True) / obs_count + self.eps)\n return mean.detach(), stdev.detach()\n\n def _masked_smooth(self, x, mask, kernel_size):\n pad = kernel_size // 2\n x_t = x.transpose(1, 2)\n m_t = mask.transpose(1, 2)\n\n weight = torch.ones(self.enc_in, 1, kernel_size, device=x.device, dtype=x.dtype)\n num = F.conv1d(F.pad(x_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n den = F.conv1d(F.pad(m_t, (pad, pad), mode='replicate'), weight, groups=self.enc_in)\n\n avg = (num / den.clamp_min(self.eps)).transpose(1, 2)\n density = (den / float(kernel_size)).clamp(0.0, 1.0).transpose(1, 2)\n return avg, density\n\n def _linear_interpolate(self, x, mask):\n bsz, seq_len, channels = x.shape\n large = float(seq_len + 1)\n\n f_val = torch.zeros_like(x)\n b_val = torch.zeros_like(x)\n f_dist = torch.full_like(x, large)\n b_dist = torch.full_like(x, large)\n\n last_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n last_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n for t in range(seq_len):\n obs = mask[:, t, :] > 0.5\n last_val = torch.where(obs, x[:, t, :], last_val)\n last_dist = torch.where(obs, torch.zeros_like(last_dist), last_dist + 1.0)\n f_val[:, t, :] = last_val\n f_dist[:, t, :] = last_dist\n\n next_val = torch.zeros(bsz, channels, device=x.device, dtype=x.dtype)\n next_dist = torch.full((bsz, channels), large, device=x.device, dtype=x.dtype)\n for t in range(seq_len - 1, -1, -1):\n obs = mask[:, t, :] > 0.5\n next_val = torch.where(obs, x[:, t, :], next_val)\n next_dist = torch.where(obs, torch.zeros_like(next_dist), next_dist + 1.0)\n b_val[:, t, :] = next_val\n b_dist[:, t, :] = next_dist\n\n f_valid = (f_dist < large).float()\n b_valid = (b_dist < large).float()\n both_valid = (f_valid * b_valid) > 0.5\n only_f = (f_valid > 0.5) & (b_valid < 0.5)\n only_b = (b_valid > 0.5) & (f_valid < 0.5)\n\n interp = torch.zeros_like(x)\n interp = torch.where(\n both_valid,\n (b_dist * f_val + f_dist * b_val) / (f_dist + b_dist).clamp_min(self.eps),\n interp,\n )\n interp = torch.where(only_f, f_val, interp)\n interp = torch.where(only_b, b_val, interp)\n interp = torch.where(mask > 0.5, x, interp)\n\n span = torch.where(\n both_valid,\n (f_dist + b_dist) / float(seq_len),\n torch.ones_like(x),\n )\n span = torch.where(mask > 0.5, torch.zeros_like(span), span)\n one_sided = ((f_valid + b_valid) < 1.5).float()\n one_sided = torch.where(mask > 0.5, torch.zeros_like(one_sided), one_sided)\n return interp, span.clamp(0.0, 1.0), one_sided\n\n def _build_prior(self, x, mask):\n interp, span, one_sided = self._linear_interpolate(x, mask)\n\n experts = [interp]\n densities = []\n for kernel_size in self.kernel_sizes:\n smooth, density = self._masked_smooth(x, mask, kernel_size)\n experts.append(smooth)\n densities.append(density)\n experts.append(torch.zeros_like(x))\n\n expert_stack = torch.stack(experts, dim=-1)\n gate_features = torch.stack([mask, span, one_sided] + densities, dim=-1)\n prior_weights = torch.softmax(self.prior_gate(gate_features), dim=-1)\n prior = torch.sum(prior_weights * expert_stack, dim=-1)\n\n density_stack = torch.stack(densities, dim=-1)\n confidence = 0.5 * (1.0 - span) + 0.5 * density_stack.max(dim=-1).values\n confidence = confidence.clamp(0.0, 1.0)\n uncertainty = 1.0 - confidence\n return prior, gate_features, uncertainty\n\n def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):\n \"\"\"\n Imputation: reconstruct missing values in the input sequence.\n Input: x_enc [batch, seq_len, enc_in] with zeros at masked positions\n Mask: [batch, seq_len, enc_in], 1=observed, 0=masked\n Output: [batch, seq_len, enc_in]\n \"\"\"\n if mask is None:\n mask = torch.ones_like(x_enc)\n mask = mask.float()\n\n means, stdev = self._masked_stats(x_enc, mask)\n x_norm = ((x_enc - means) / stdev) * mask\n\n \n... (truncated)",
5 "after_line": 1,
6 "start_line": 1,
7 "end_line": 266
8}Tool ResultStep 5
OK: Replaced lines 1..266 with 275 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–275 | total: 275 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4
5 from layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer
6 from layers.Autoformer_EncDec import Encoder, EncoderLayer, series_decomp
7 from layers.Embed import DataEmbedding, TokenEmbedding
8
9
10 class Model(nn.Module):
11 """
12 Mask-Adaptive Expert Imputation with Innovation Propagation.
13
14 Key idea:
15 1) Build a strong nonparametric prior from observed values only using an
16 adaptive mixture of interpolation experts.
17 2) Predict only the residual innovation around that prior, instead of the
18 whole signal, using a periodic encoder and low-rank temp
AssistantStep 6test
Tool ResultStep 6
[Test #3] (0 tests remaining) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=6 train_loss=0.0360194 vali_loss=0.0518446 test_loss=0.0448159 TRAIN_METRICS epoch=7 train_loss=0.0350492 vali_loss=0.0517632 test_loss=0.0443042 TRAIN_METRICS epoch=8 train_loss=0.0352486 vali_loss=0.0517983 test_loss=0.0445040 TRAIN_METRICS epoch=9 train_loss=0.0351590 vali_loss=0.0518177 test_loss=0.0442372 TRAIN_METRICS epoch=10 train_loss=0.0350072 vali_loss=0.0517611 test_loss=0.0443840 Test results (ETTh1): MSE: 0.044474, MAE: 0.141060 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=5 train_loss=0.1341699 vali_loss=0.0558085 test_loss=0.0245149 TRAIN_METRICS epoch=6 train_loss=0.1363803 vali_loss=0.0557269 test_loss=0.0246034 TRAIN_METRICS epoch=7 train_loss=0.1218875 vali_loss=0.0603000 test_loss=0.0244031 TRAIN_METRICS epoch=8 train_loss=0.1455532 vali_loss=0.0616333 test_loss=0.0244405 TRAIN_METRICS epoch=9 train_loss=0.125665