Agent Conversation: openai/gpt-5.4-pro
Model: openai/gpt-5.4-pro
Total steps: 5
Messages: 11
UserStep 0
Task: ts-exogenous-forecast
Exogenous Variable Forecasting: Custom Model Design
Objective
Design and implement a custom deep learning model for time series forecasting with exogenous (external) variables. Uses features=MS: all variables as input, predict only the target (last dimension). Your code goes in the Model class in models/Custom.py. Three reference implementations (DLinear, PatchTST, iTransformer) are provided as read-only.
Evaluation
Trained and evaluated on three datasets with MS features:
- ETTh1 (7 → 1, hourly electricity data)
- Weather (21 → 1, weather observations)
- ECL (321 → 1, electricity consumption)
All use seq_len=96, pred_len=96. Metrics: MSE and MAE on the target variable (lower is better). The framework automatically extracts outputs[:, :, -1:].
Time-Series-Library/models/Custom.py [EDITABLE — entire file only]
1: import torch
2: import torch.nn as nn
3:
4:
5: class Model(nn.Module):
6: """
7: Custom model for exogenous variable forecasting (features=MS).
8:
9: Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)
10: - x_enc: [batch, seq_len, enc_in] — all input variables
11: - x_mark_enc: [batch, seq_len, time_features] — time feature encoding
12: - x_dec: [batch, label_len+pred_len, dec_in] — decoder input
13: - x_mark_dec: [batch, label_len+pred_len, time_features] — decoder time features
14:
15: Must return: [batch, pred_len, c_out] for forecasting
16: Note: c_out = enc_in. The framework extracts the target (last dim) for MS mode.
17: """
18:
19: def __init__(self, configs):
20: super(Model, self).__init__()
21: self.task_name = configs.task_name
22: self.seq_len = configs.seq_len
23: self.pred_len = configs.pred_len
24: self.enc_in = configs.enc_in
25: self.c_out = configs.c_out
26: # TODO: Define your model architecture here
27:
28: def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
29: """
30: Forecasting with exogenous variables.
31: Input: x_enc [batch, seq_len, enc_in] — all variables
32: Output: [batch, pred_len, c_out] — predict all variables
33: """
34: # TODO: Implement your forecasting logic
35: batch_size = x_enc.shape[0]
36: return torch.zeros(batch_size, self.pred_len, self.c_out).to(x_enc.device)
37:
38: def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
39: if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
40: dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
41: return dec_out[:, -self.pred_len:, :]
42: return None
Time-Series-Library/models/DLinear.py [REA
AssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "content": "import math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.Autoformer_EncDec import series_decomp\nfrom layers.Embed import TemporalEmbedding, TimeFeatureEmbedding\n\n\nclass Model(nn.Module):\n \"\"\"\n Lag-Correlation Sparse Exogenous Mixture (LCSEM).\n\n Core idea:\n 1) Forecast the target with a strong decomposed linear self-forecast branch.\n 2) Build one future path per exogenous variable with a shared decomposed linear expert.\n 3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag\n target/exogenous correlations, to mix only a small subset of exogenous experts.\n 4) Dynamically calibrate each selected exogenous expert before adding it as a\n residual correction to the target forecast.\n\n The framework evaluates only the last channel in MS mode, so the model focuses its\n capacity on the target (last dimension) while still returning [B, pred_len, c_out].\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.pred_len\n self.enc_in = configs.enc_in\n self.c_out = configs.c_out\n\n self.d_model = getattr(configs, 'd_model', 128)\n self.dropout = nn.Dropout(getattr(configs, 'dropout', 0.1))\n self.max_exogenous = 8\n self.corr_lags = [0, 1, 2, 3, 6, 12, 24]\n\n moving_avg = getattr(configs, 'moving_avg', 25)\n if isinstance(moving_avg, (list, tuple)):\n moving_avg = moving_avg[0]\n if moving_avg % 2 == 0:\n moving_avg += 1\n self.decomp = series_decomp(moving_avg)\n\n # Target self-forecast branch (DLinear-style on the target only).\n self.target_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n self.target_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n # Shared exogenous future experts.\n self.exo_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n self.exo_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n self._init_linear_as_average(self.target_linear_seasonal)\n self._init_linear_as_average(self.target_linear_trend)\n self._init_linear_as_average(self.exo_linear_seasonal)\n self._init_linear_as_average(self.exo_linear_trend)\n\n # History encoders for exogenous selection and calibration.\n self.hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n self.diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n self.future_token = nn.Linear(self.pred_len, self.d_model, bias=False)\n self.target_hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n self.target_diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n\n # Horizon queries: future calendar features + learned horizon embedding + target token.\n self.horizon_embedding = nn.Embedding(self.pred_len, self.d_model)\n embed_type = getattr(configs, 'embed', 'timeF')\n freq = getattr(configs, 'freq', 'h')\n if embed_type == 'timeF':\n self.future_time_embedding = TimeFeatureEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n else:\n self.future_time_embedding = TemporalEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n\n self.token_norm = nn.LayerNorm(self.d_model)\n self.query_norm = nn.LayerNorm(self.d_model)\n self.query_proj = nn.Linear(self.d_model, self.d_model)\n self.key_proj = nn.Linear(self.d_model, self.d_model)\n self.value_proj = nn.Linear(self.d_model, self.d_model)\n\n # Dynamic expert calibration and residual fusion.\n self.affine_head = nn.Sequential(\n nn.Linear(self.d_model, self.d_model),\n nn.GELU(),\n nn.Linear(self.d_model, 2)\n )\n self.context_head = nn.Sequential(\n nn.Linear(self.d_model, self.d_model),\n nn.GELU(),\n nn.Linear(self.d_model, 1)\n )\n self.mix_gate = nn.Sequential(\n nn.Linear(self.d_model * 2, self.d_model),\n nn.GELU(),\n nn.Linear(self.d_model, 1)\n )\n self.corr_scale = nn.Parameter(torch.tensor(1.0))\n\n # Start close to the self-forecast branch; let exogenous corrections grow only if useful.\n nn.init.constant_(self.mix_gate[-1].bias, -1.5)\n\n def _init_linear_as_average(self, layer):\n layer.weight = nn.Parameter(\n (1.0 / self.seq_len) * torch.ones(self.pred_len, self.seq_len)\n )\n if layer.bias is not None:\n nn.init.zeros_(layer.bias)\n\n def _build_future_query(self, x_mark_dec, batch_size, device):\n horizon_ids = torch.arange(self.pred_len, device=device)\n query = self.horizon_embedding(horizon_ids).unsqueeze(0).expand(batch_size, -1, -1)\n\n if x_mark_dec is not None and x_mark_dec.size(-1) > 0:\n future_marks = x_mark_dec[:, -self.pred_len:, :]\n query = query + self.future_time_embedding(future_marks)\n\n return query\n\n def _lag_corr_bias(self, target_hist, exo_hist):\n \"\"\"\n target_hist: [B, L]\n exo_hist: [B, L, N_exo]\n returns: [B, N_exo] absolute max correlation across a small lag bank\n \"\"\"\n if exo_hist.size(-1) == 0:\n return exo_hist.new_zeros(exo_hist.size(0), 0)\n\n target_hist = target_hist - target_hist.mean(dim=1, keepdim=True)\n target_hist = target_hist / (target_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n exo_hist = exo_hist - exo_hist.mean(dim=1, keepdim=True)\n exo_hist = exo_hist / (exo_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n corrs = []\n for lag in self.corr_lags:\n if lag == 0:\n corr = (target_hist.unsqueeze(-1) * exo_hist).mean(dim=1)\n elif lag < target_hist.size(1):\n corr = (target_hist[:, lag:].unsqueeze(-1) * exo_hist[:, :-lag, :]).mean(dim=1)\n else:\n continue\n corrs.append(corr.abs())\n\n if len(corrs) == 0:\n return exo_hist.new_zeros(exo_hist.size(0), exo_hist.size(-1))\n\n return torch.stack(corrs, dim=-1).amax(dim=-1)\n\n def _sparse_attention(self, logits):\n if logits.size(-1) == 0:\n return logits\n\n k = min(self.max_exogenous, logits.size(-1))\n if k < logits.size(-1):\n top_vals, top_idx = torch.topk(logits, k=k, dim=-1)\n sparse_logits = torch.full_like(logits, -1e9)\n sparse_logits.scatter_(-1, top_idx, top_vals)\n logits = sparse_logits\n return torch.softmax(logits, dim=-1)\n\n def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n \"\"\"\n Forecasting with exogenous variables.\n Input: x_enc [batch, seq_len, enc_in] — all variables\n Output: [batch, pred_len, c_out] — predict all variables\n \"\"\"\n batch_size, _, num_vars = x_enc.shape\n device = x_enc.device\n\n # RevIN-style normalization with last-value centering.\n last = x_enc[:, -1:, :]\n centered = x_enc - last\n scale = torch.sqrt(torch.var(centered, dim=1, keepdim=True, unbiased=False) + 1e-5)\n x_norm = centered / scale\n\n target_hist = x_norm[:, :, -1:] # [B, L, 1]\n target_seasonal, target_trend = self.decomp(target_hist)\n target_base = (\n self.target_linear_seasonal(target_seasonal.squeeze(-1))\n + self.target_linear_trend(target_trend.squeeze(-1))\n ) # [B, pred_len]\n\n target_pred_norm = target_base\n\n if num_vars > 1:\n exo_hist = x_norm[:, :, :-1] # [B, L, N_exo]\n exo_seasonal, exo_trend = self.decomp(exo_hist)\n exo_future = (\n self.exo_linear_seasonal(exo_seasonal.permute(0, 2, 1))\n + self.exo_linear_trend(exo_trend.permute(0, 2, 1))\n ) # [B, N_exo, pred_len]\n\n exo_diff = F.pad(exo_hist[:, 1:, :] - exo_hist[:, :-1, :], (0, 0, 1, 0))\n target_diff = F.pad(target_hist[:, 1:, :] - target_hist[:, :-1, :], (0, 0, 1, 0))\n\n exo_tokens = (\n self.hist_token(exo_hist.permute(0, 2, 1))\n + self.diff_token(exo_diff.permute(0, 2, 1))\n + self.future_token(exo_future)\n )\n exo_tokens = self.dropout(self.token_norm(exo_tokens))\n\n target_token = (\n self.target_hist_token(target_hist.squeeze(-1))\n + self.target_diff_token(target_diff.squeeze(-1))\n )\n\n query = self._build_future_query(x_mark_dec, batch_size, device) + target_token.unsqueeze(1)\n query = self.dropout(self.query_norm(query))\n\n logits = torch.einsum(\n 'bpd,bnd->bpn',\n self.query_proj(query),\n self.key_proj(exo_tokens)\n ) / math.sqrt(self.d_model)\n logits = logits + self.corr_scale * self._lag_corr_bias(target_hist.squeeze(-1), exo_hist).unsqueeze(1)\n\n attn = self._sparse_attention(logits) # [B, pred_len, N_exo]\n\n affine = self.affine_head(exo_tokens)\n gain = torch.tanh(affine[..., :1])\n bias = affine[..., 1:2]\n calibrated_exo_future = gain * exo_future + bias # [B, N_exo, pred_len]\n\n exo_context = torch.einsum('bpn,bnd->bpd', attn, self.value_proj(exo_tokens))\n exo_delta = (\n torch.einsum('bpn,bnp->bp', attn, calibrated_exo_future)\n + self.context_head(exo_context).squeeze(-1)\n )\n\n mix = torch.sigmoid(self.mix_gate(torch.cat([query, exo_context], dim=-1))).squeeze(-1)\n target_pred_norm = target_base + mix * exo_delta\n\n target_out = target_pred_norm.unsqueeze(-1) * scale[:, :, -1:] + last[:, :, -1:]\n\n # Return a full multivariate tensor; only the last channel is eval\n... (truncated)",
5 "after_line": 0,
6 "start_line": 1,
7 "end_line": 42
8}Tool ResultStep 1
OK: Replaced lines 1..42 with 249 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–249 | total: 249 lines]
1 import math
2 import torch
3 import torch.nn as nn
4 import torch.nn.functional as F
5
6 from layers.Autoformer_EncDec import series_decomp
7 from layers.Embed import TemporalEmbedding, TimeFeatureEmbedding
8
9
10 class Model(nn.Module):
11 """
12 Lag-Correlation Sparse Exogenous Mixture (LCSEM).
13
14 Core idea:
15 1) Forecast the target with a strong decomposed linear self-forecast branch.
16 2) Build one future path per exogenous variable with a shared decomposed linear expert.
17 3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag
18 target/exogenous correlations, to mix only a small subset of exogenous experts.
19
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=0.1693751 vali_loss=0.1008762 test_loss=0.0565066 TRAIN_METRICS epoch=2 train_loss=0.1568139 vali_loss=0.1017272 test_loss=0.0559623 TRAIN_METRICS epoch=3 train_loss=0.1514786 vali_loss=0.1019467 test_loss=0.0574856 TRAIN_METRICS epoch=4 train_loss=0.1476967 vali_loss=0.1044249 test_loss=0.0579439 Test results (ETTh1): MSE: 0.056116, MAE: 0.179302 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=1.0075170 vali_loss=0.0030075 test_loss=0.0012176 TRAIN_METRICS epoch=2 train_loss=0.9407428 vali_loss=0.0028374 test_loss=0.0012368 TRAIN_METRICS epoch=3 train_loss=0.9200521 vali_loss=0.0031417 test_loss=0.0012587 TRAIN_METRICS epoch=4 train_loss=0.9129418 vali_loss=0.0033366 test_loss=0.0013579 TRAIN_METRICS epoch=5 train_loss=0.9082097 vali_loss=0.0032226 test_loss=0.0013119 Te
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "content": "import math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.Autoformer_EncDec import series_decomp\nfrom layers.Embed import TemporalEmbedding, TimeFeatureEmbedding\n\n\nclass Model(nn.Module):\n \"\"\"\n Lag-Correlation Sparse Exogenous Mixture (LCSEM).\n\n Core idea:\n 1) Forecast the target with a strong decomposed linear self-forecast branch.\n 2) Build one future path per exogenous variable with a shared decomposed linear expert.\n 3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag\n target/exogenous correlations, to mix only a small subset of exogenous experts.\n 4) Dynamically calibrate each selected exogenous expert before adding it as a\n residual correction to the target forecast.\n\n The framework evaluates only the last channel in MS mode, so the model focuses its\n capacity on the target (last dimension) while still returning [B, pred_len, c_out].\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.pred_len\n self.enc_in = configs.enc_in\n self.c_out = configs.c_out\n\n self.d_model = getattr(configs, 'd_model', 128)\n self.dropout = nn.Dropout(getattr(configs, 'dropout', 0.1))\n self.max_exogenous = 12\n self.corr_lags = [0, 1, 2, 3, 6, 12, 24]\n\n moving_avg = getattr(configs, 'moving_avg', 25)\n if isinstance(moving_avg, (list, tuple)):\n moving_avg = moving_avg[0]\n if moving_avg % 2 == 0:\n moving_avg += 1\n self.decomp = series_decomp(moving_avg)\n\n # Target self-forecast branch (DLinear-style on the target only).\n self.target_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n self.target_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n # Shared exogenous future experts.\n self.exo_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n self.exo_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n self._init_linear_as_average(self.target_linear_seasonal)\n self._init_linear_as_average(self.target_linear_trend)\n self._init_linear_as_average(self.exo_linear_seasonal)\n self._init_linear_as_average(self.exo_linear_trend)\n\n # History encoders for exogenous selection and calibration.\n self.hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n self.diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n self.future_token = nn.Linear(self.pred_len, self.d_model, bias=False)\n self.target_hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n self.target_diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n\n # Horizon queries: future calendar features + learned horizon embedding + target token.\n self.horizon_embedding = nn.Embedding(self.pred_len, self.d_model)\n embed_type = getattr(configs, 'embed', 'timeF')\n freq = getattr(configs, 'freq', 'h')\n if embed_type == 'timeF':\n self.future_time_embedding = TimeFeatureEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n else:\n self.future_time_embedding = TemporalEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n\n self.token_norm = nn.LayerNorm(self.d_model)\n self.query_norm = nn.LayerNorm(self.d_model)\n self.query_proj = nn.Linear(self.d_model, self.d_model)\n self.key_proj = nn.Linear(self.d_model, self.d_model)\n self.value_proj = nn.Linear(self.d_model, self.d_model)\n\n # Dynamic expert calibration and residual fusion.\n self.affine_head = nn.Sequential(\n nn.Linear(self.d_model, self.d_model),\n nn.GELU(),\n nn.Linear(self.d_model, 2)\n )\n self.context_head = nn.Sequential(\n nn.Linear(self.d_model, self.d_model),\n nn.GELU(),\n nn.Linear(self.d_model, 1)\n )\n self.mix_gate = nn.Sequential(\n nn.Linear(self.d_model * 2, self.d_model),\n nn.GELU(),\n nn.Linear(self.d_model, 1)\n )\n self.corr_scale = nn.Parameter(torch.tensor(1.0))\n\n # Start close to the self-forecast branch; let exogenous corrections grow only if useful.\n nn.init.constant_(self.mix_gate[-1].bias, -1.0)\n\n def _init_linear_as_average(self, layer):\n layer.weight = nn.Parameter(\n (1.0 / self.seq_len) * torch.ones(self.pred_len, self.seq_len)\n )\n if layer.bias is not None:\n nn.init.zeros_(layer.bias)\n\n def _build_future_query(self, x_mark_dec, batch_size, device):\n horizon_ids = torch.arange(self.pred_len, device=device)\n query = self.horizon_embedding(horizon_ids).unsqueeze(0).expand(batch_size, -1, -1)\n\n if x_mark_dec is not None and x_mark_dec.size(-1) > 0:\n future_marks = x_mark_dec[:, -self.pred_len:, :]\n query = query + self.future_time_embedding(future_marks)\n\n return query\n\n def _lag_corr_stats(self, target_hist, exo_hist):\n \"\"\"\n target_hist: [B, L]\n exo_hist: [B, L, N_exo]\n returns:\n corr_abs: [B, N_exo] absolute max correlation across the lag bank\n corr_signed: [B, N_exo] signed correlation at the selected lag\n \"\"\"\n if exo_hist.size(-1) == 0:\n empty = exo_hist.new_zeros(exo_hist.size(0), 0)\n return empty, empty\n\n target_hist = target_hist - target_hist.mean(dim=1, keepdim=True)\n target_hist = target_hist / (target_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n exo_hist = exo_hist - exo_hist.mean(dim=1, keepdim=True)\n exo_hist = exo_hist / (exo_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n corrs = []\n for lag in self.corr_lags:\n if lag == 0:\n corr = (target_hist.unsqueeze(-1) * exo_hist).mean(dim=1)\n elif lag < target_hist.size(1):\n corr = (target_hist[:, lag:].unsqueeze(-1) * exo_hist[:, :-lag, :]).mean(dim=1)\n else:\n continue\n corrs.append(corr)\n\n if len(corrs) == 0:\n empty = exo_hist.new_zeros(exo_hist.size(0), exo_hist.size(-1))\n return empty, empty\n\n corr_stack = torch.stack(corrs, dim=-1)\n best_idx = corr_stack.abs().argmax(dim=-1, keepdim=True)\n corr_signed = torch.gather(corr_stack, dim=-1, index=best_idx).squeeze(-1)\n corr_abs = corr_signed.abs()\n return corr_abs, corr_signed\n\n def _sparse_attention(self, logits):\n if logits.size(-1) == 0:\n return logits\n\n k = min(self.max_exogenous, logits.size(-1))\n if k < logits.size(-1):\n top_vals, top_idx = torch.topk(logits, k=k, dim=-1)\n sparse_logits = torch.full_like(logits, -1e9)\n sparse_logits.scatter_(-1, top_idx, top_vals)\n logits = sparse_logits\n return torch.softmax(logits, dim=-1)\n\n def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n \"\"\"\n Forecasting with exogenous variables.\n Input: x_enc [batch, seq_len, enc_in] — all variables\n Output: [batch, pred_len, c_out] — predict all variables\n \"\"\"\n batch_size, _, num_vars = x_enc.shape\n device = x_enc.device\n\n # RevIN-style normalization with last-value centering.\n last = x_enc[:, -1:, :]\n centered = x_enc - last\n scale = torch.sqrt(torch.var(centered, dim=1, keepdim=True, unbiased=False) + 1e-5)\n x_norm = centered / scale\n\n target_hist = x_norm[:, :, -1:] # [B, L, 1]\n target_seasonal, target_trend = self.decomp(target_hist)\n target_base = (\n self.target_linear_seasonal(target_seasonal.squeeze(-1))\n + self.target_linear_trend(target_trend.squeeze(-1))\n ) # [B, pred_len]\n\n target_pred_norm = target_base\n\n if num_vars > 1:\n exo_hist = x_norm[:, :, :-1] # [B, L, N_exo]\n exo_seasonal, exo_trend = self.decomp(exo_hist)\n exo_future = (\n self.exo_linear_seasonal(exo_seasonal.permute(0, 2, 1))\n + self.exo_linear_trend(exo_trend.permute(0, 2, 1))\n ) # [B, N_exo, pred_len]\n\n exo_diff = F.pad(exo_hist[:, 1:, :] - exo_hist[:, :-1, :], (0, 0, 1, 0))\n target_diff = F.pad(target_hist[:, 1:, :] - target_hist[:, :-1, :], (0, 0, 1, 0))\n\n exo_tokens = (\n self.hist_token(exo_hist.permute(0, 2, 1))\n + self.diff_token(exo_diff.permute(0, 2, 1))\n + self.future_token(exo_future)\n )\n exo_tokens = self.dropout(self.token_norm(exo_tokens))\n\n target_token = (\n self.target_hist_token(target_hist.squeeze(-1))\n + self.target_diff_token(target_diff.squeeze(-1))\n )\n\n query = self._build_future_query(x_mark_dec, batch_size, device) + target_token.unsqueeze(1)\n query = self.dropout(self.query_norm(query))\n\n corr_abs, corr_signed = self._lag_corr_stats(target_hist.squeeze(-1), exo_hist)\n future_match = F.cosine_similarity(\n exo_future,\n target_base.unsqueeze(1).expand(-1, exo_future.size(1), -1),\n dim=-1,\n eps=1e-6,\n )\n\n logits = torch.einsum(\n 'bpd,bnd->bpn',\n self.query_proj(query),\n self.key_proj(exo_tokens)\n ) / math.sqrt(self.d_model)\n logits = logits + (2.0 * self.corr_scale) * corr_abs.unsqueeze(1) + future_match.unsqueeze(1)\n\n attn = self._sparse_attention(logits) # [B, pred_len, N_exo]\n\n affine = self.affine_head(exo_tokens)\n gain = (torch.tanh(affine[..., :1]) + corr_signed.unsqueeze(-1)).clamp(-2.0, 2.\n... (truncated)",
5 "after_line": 0,
6 "start_line": 1,
7 "end_line": 249
8}Tool ResultStep 3
OK: Replaced lines 1..249 with 268 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–268 | total: 268 lines]
1 import math
2 import torch
3 import torch.nn as nn
4 import torch.nn.functional as F
5
6 from layers.Autoformer_EncDec import series_decomp
7 from layers.Embed import TemporalEmbedding, TimeFeatureEmbedding
8
9
10 class Model(nn.Module):
11 """
12 Lag-Correlation Sparse Exogenous Mixture (LCSEM).
13
14 Core idea:
15 1) Forecast the target with a strong decomposed linear self-forecast branch.
16 2) Build one future path per exogenous variable with a shared decomposed linear expert.
17 3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag
18 target/exogenous correlations, to mix only a small subset of exogenous experts.
19
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=0.1703812 vali_loss=0.1008947 test_loss=0.0567407 TRAIN_METRICS epoch=2 train_loss=0.1569466 vali_loss=0.1023170 test_loss=0.0561373 TRAIN_METRICS epoch=3 train_loss=0.1514480 vali_loss=0.1029651 test_loss=0.0578335 TRAIN_METRICS epoch=4 train_loss=0.1478871 vali_loss=0.1053999 test_loss=0.0585066 Test results (ETTh1): MSE: 0.056272, MAE: 0.179377 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=1.0114511 vali_loss=0.0038976 test_loss=0.0014760 TRAIN_METRICS epoch=2 train_loss=0.9514343 vali_loss=0.0031899 test_loss=0.0012818 TRAIN_METRICS epoch=3 train_loss=0.9250270 vali_loss=0.0033208 test_loss=0.0013006 TRAIN_METRICS epoch=4 train_loss=0.9147430 vali_loss=0.0033153 test_loss=0.0012979 TRAIN_METRICS epoch=5 train_loss=0.9094940 vali_loss=0.0033257 test_loss=0.0012681 Tes
AssistantStep 5submit
Tool Input
1{
2 "n": 2
3}Tool ResultStep 5
[submit] Submitting result from test #2 as final. ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=0.1703812 vali_loss=0.1008947 test_loss=0.0567407 TRAIN_METRICS epoch=2 train_loss=0.1569466 vali_loss=0.1023170 test_loss=0.0561373 TRAIN_METRICS epoch=3 train_loss=0.1514480 vali_loss=0.1029651 test_loss=0.0578335 TRAIN_METRICS epoch=4 train_loss=0.1478871 vali_loss=0.1053999 test_loss=0.0585066 Test results (ETTh1): MSE: 0.056272, MAE: 0.179377 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=1.0114511 vali_loss=0.0038976 test_loss=0.0014760 TRAIN_METRICS epoch=2 train_loss=0.9514343 vali_loss=0.0031899 test_loss=0.0012818 TRAIN_METRICS epoch=3 train_loss=0.9250270 vali_loss=0.0033208 test_loss=0.0013006 TRAIN_METRICS epoch=4 train_loss=0.9147430 vali_loss=0.0033153 test_loss=0.0012979 TRAIN_METRICS epoch=5 train_loss=0.9094940 vali_loss=0.0033257 test_loss=0.0012681 Test results (Weather)