Agent Conversation: openai/gpt-5.4-pro

Model: openai/gpt-5.4-pro
Total steps: 5
Messages: 11
UserStep 0

Task: ts-exogenous-forecast

Exogenous Variable Forecasting: Custom Model Design

Objective

Design and implement a custom deep learning model for time series forecasting with exogenous (external) variables. Uses features=MS: all variables as input, predict only the target (last dimension). Your code goes in the Model class in models/Custom.py. Three reference implementations (DLinear, PatchTST, iTransformer) are provided as read-only.

Evaluation

Trained and evaluated on three datasets with MS features:

  • ETTh1 (7 → 1, hourly electricity data)
  • Weather (21 → 1, weather observations)
  • ECL (321 → 1, electricity consumption)

All use seq_len=96, pred_len=96. Metrics: MSE and MAE on the target variable (lower is better). The framework automatically extracts outputs[:, :, -1:].

Time-Series-Library/models/Custom.py [EDITABLE — entire file only]

     1: import torch
     2: import torch.nn as nn
     3: 
     4: 
     5: class Model(nn.Module):
     6:     """
     7:     Custom model for exogenous variable forecasting (features=MS).
     8: 
     9:     Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)
    10:     - x_enc: [batch, seq_len, enc_in] — all input variables
    11:     - x_mark_enc: [batch, seq_len, time_features] — time feature encoding
    12:     - x_dec: [batch, label_len+pred_len, dec_in] — decoder input
    13:     - x_mark_dec: [batch, label_len+pred_len, time_features] — decoder time features
    14: 
    15:     Must return: [batch, pred_len, c_out] for forecasting
    16:     Note: c_out = enc_in. The framework extracts the target (last dim) for MS mode.
    17:     """
    18: 
    19:     def __init__(self, configs):
    20:         super(Model, self).__init__()
    21:         self.task_name = configs.task_name
    22:         self.seq_len = configs.seq_len
    23:         self.pred_len = configs.pred_len
    24:         self.enc_in = configs.enc_in
    25:         self.c_out = configs.c_out
    26:         # TODO: Define your model architecture here
    27: 
    28:     def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
    29:         """
    30:         Forecasting with exogenous variables.
    31:         Input: x_enc [batch, seq_len, enc_in] — all variables
    32:         Output: [batch, pred_len, c_out] — predict all variables
    33:         """
    34:         # TODO: Implement your forecasting logic
    35:         batch_size = x_enc.shape[0]
    36:         return torch.zeros(batch_size, self.pred_len, self.c_out).to(x_enc.device)
    37: 
    38:     def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
    39:         if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
    40:             dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
    41:             return dec_out[:, -self.pred_len:, :]
    42:         return None

Time-Series-Library/models/DLinear.py [REA

AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "Time-Series-Library/models/Custom.py",
4  "content": "import math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.Autoformer_EncDec import series_decomp\nfrom layers.Embed import TemporalEmbedding, TimeFeatureEmbedding\n\n\nclass Model(nn.Module):\n    \"\"\"\n    Lag-Correlation Sparse Exogenous Mixture (LCSEM).\n\n    Core idea:\n    1) Forecast the target with a strong decomposed linear self-forecast branch.\n    2) Build one future path per exogenous variable with a shared decomposed linear expert.\n    3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag\n       target/exogenous correlations, to mix only a small subset of exogenous experts.\n    4) Dynamically calibrate each selected exogenous expert before adding it as a\n       residual correction to the target forecast.\n\n    The framework evaluates only the last channel in MS mode, so the model focuses its\n    capacity on the target (last dimension) while still returning [B, pred_len, c_out].\n    \"\"\"\n\n    def __init__(self, configs):\n        super(Model, self).__init__()\n        self.task_name = configs.task_name\n        self.seq_len = configs.seq_len\n        self.pred_len = configs.pred_len\n        self.enc_in = configs.enc_in\n        self.c_out = configs.c_out\n\n        self.d_model = getattr(configs, 'd_model', 128)\n        self.dropout = nn.Dropout(getattr(configs, 'dropout', 0.1))\n        self.max_exogenous = 8\n        self.corr_lags = [0, 1, 2, 3, 6, 12, 24]\n\n        moving_avg = getattr(configs, 'moving_avg', 25)\n        if isinstance(moving_avg, (list, tuple)):\n            moving_avg = moving_avg[0]\n        if moving_avg % 2 == 0:\n            moving_avg += 1\n        self.decomp = series_decomp(moving_avg)\n\n        # Target self-forecast branch (DLinear-style on the target only).\n        self.target_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n        self.target_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n        # Shared exogenous future experts.\n        self.exo_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n        self.exo_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n        self._init_linear_as_average(self.target_linear_seasonal)\n        self._init_linear_as_average(self.target_linear_trend)\n        self._init_linear_as_average(self.exo_linear_seasonal)\n        self._init_linear_as_average(self.exo_linear_trend)\n\n        # History encoders for exogenous selection and calibration.\n        self.hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n        self.diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n        self.future_token = nn.Linear(self.pred_len, self.d_model, bias=False)\n        self.target_hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n        self.target_diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n\n        # Horizon queries: future calendar features + learned horizon embedding + target token.\n        self.horizon_embedding = nn.Embedding(self.pred_len, self.d_model)\n        embed_type = getattr(configs, 'embed', 'timeF')\n        freq = getattr(configs, 'freq', 'h')\n        if embed_type == 'timeF':\n            self.future_time_embedding = TimeFeatureEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n        else:\n            self.future_time_embedding = TemporalEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n\n        self.token_norm = nn.LayerNorm(self.d_model)\n        self.query_norm = nn.LayerNorm(self.d_model)\n        self.query_proj = nn.Linear(self.d_model, self.d_model)\n        self.key_proj = nn.Linear(self.d_model, self.d_model)\n        self.value_proj = nn.Linear(self.d_model, self.d_model)\n\n        # Dynamic expert calibration and residual fusion.\n        self.affine_head = nn.Sequential(\n            nn.Linear(self.d_model, self.d_model),\n            nn.GELU(),\n            nn.Linear(self.d_model, 2)\n        )\n        self.context_head = nn.Sequential(\n            nn.Linear(self.d_model, self.d_model),\n            nn.GELU(),\n            nn.Linear(self.d_model, 1)\n        )\n        self.mix_gate = nn.Sequential(\n            nn.Linear(self.d_model * 2, self.d_model),\n            nn.GELU(),\n            nn.Linear(self.d_model, 1)\n        )\n        self.corr_scale = nn.Parameter(torch.tensor(1.0))\n\n        # Start close to the self-forecast branch; let exogenous corrections grow only if useful.\n        nn.init.constant_(self.mix_gate[-1].bias, -1.5)\n\n    def _init_linear_as_average(self, layer):\n        layer.weight = nn.Parameter(\n            (1.0 / self.seq_len) * torch.ones(self.pred_len, self.seq_len)\n        )\n        if layer.bias is not None:\n            nn.init.zeros_(layer.bias)\n\n    def _build_future_query(self, x_mark_dec, batch_size, device):\n        horizon_ids = torch.arange(self.pred_len, device=device)\n        query = self.horizon_embedding(horizon_ids).unsqueeze(0).expand(batch_size, -1, -1)\n\n        if x_mark_dec is not None and x_mark_dec.size(-1) > 0:\n            future_marks = x_mark_dec[:, -self.pred_len:, :]\n            query = query + self.future_time_embedding(future_marks)\n\n        return query\n\n    def _lag_corr_bias(self, target_hist, exo_hist):\n        \"\"\"\n        target_hist: [B, L]\n        exo_hist:    [B, L, N_exo]\n        returns:     [B, N_exo] absolute max correlation across a small lag bank\n        \"\"\"\n        if exo_hist.size(-1) == 0:\n            return exo_hist.new_zeros(exo_hist.size(0), 0)\n\n        target_hist = target_hist - target_hist.mean(dim=1, keepdim=True)\n        target_hist = target_hist / (target_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n        exo_hist = exo_hist - exo_hist.mean(dim=1, keepdim=True)\n        exo_hist = exo_hist / (exo_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n        corrs = []\n        for lag in self.corr_lags:\n            if lag == 0:\n                corr = (target_hist.unsqueeze(-1) * exo_hist).mean(dim=1)\n            elif lag < target_hist.size(1):\n                corr = (target_hist[:, lag:].unsqueeze(-1) * exo_hist[:, :-lag, :]).mean(dim=1)\n            else:\n                continue\n            corrs.append(corr.abs())\n\n        if len(corrs) == 0:\n            return exo_hist.new_zeros(exo_hist.size(0), exo_hist.size(-1))\n\n        return torch.stack(corrs, dim=-1).amax(dim=-1)\n\n    def _sparse_attention(self, logits):\n        if logits.size(-1) == 0:\n            return logits\n\n        k = min(self.max_exogenous, logits.size(-1))\n        if k < logits.size(-1):\n            top_vals, top_idx = torch.topk(logits, k=k, dim=-1)\n            sparse_logits = torch.full_like(logits, -1e9)\n            sparse_logits.scatter_(-1, top_idx, top_vals)\n            logits = sparse_logits\n        return torch.softmax(logits, dim=-1)\n\n    def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n        \"\"\"\n        Forecasting with exogenous variables.\n        Input: x_enc [batch, seq_len, enc_in] — all variables\n        Output: [batch, pred_len, c_out] — predict all variables\n        \"\"\"\n        batch_size, _, num_vars = x_enc.shape\n        device = x_enc.device\n\n        # RevIN-style normalization with last-value centering.\n        last = x_enc[:, -1:, :]\n        centered = x_enc - last\n        scale = torch.sqrt(torch.var(centered, dim=1, keepdim=True, unbiased=False) + 1e-5)\n        x_norm = centered / scale\n\n        target_hist = x_norm[:, :, -1:]  # [B, L, 1]\n        target_seasonal, target_trend = self.decomp(target_hist)\n        target_base = (\n            self.target_linear_seasonal(target_seasonal.squeeze(-1))\n            + self.target_linear_trend(target_trend.squeeze(-1))\n        )  # [B, pred_len]\n\n        target_pred_norm = target_base\n\n        if num_vars > 1:\n            exo_hist = x_norm[:, :, :-1]  # [B, L, N_exo]\n            exo_seasonal, exo_trend = self.decomp(exo_hist)\n            exo_future = (\n                self.exo_linear_seasonal(exo_seasonal.permute(0, 2, 1))\n                + self.exo_linear_trend(exo_trend.permute(0, 2, 1))\n            )  # [B, N_exo, pred_len]\n\n            exo_diff = F.pad(exo_hist[:, 1:, :] - exo_hist[:, :-1, :], (0, 0, 1, 0))\n            target_diff = F.pad(target_hist[:, 1:, :] - target_hist[:, :-1, :], (0, 0, 1, 0))\n\n            exo_tokens = (\n                self.hist_token(exo_hist.permute(0, 2, 1))\n                + self.diff_token(exo_diff.permute(0, 2, 1))\n                + self.future_token(exo_future)\n            )\n            exo_tokens = self.dropout(self.token_norm(exo_tokens))\n\n            target_token = (\n                self.target_hist_token(target_hist.squeeze(-1))\n                + self.target_diff_token(target_diff.squeeze(-1))\n            )\n\n            query = self._build_future_query(x_mark_dec, batch_size, device) + target_token.unsqueeze(1)\n            query = self.dropout(self.query_norm(query))\n\n            logits = torch.einsum(\n                'bpd,bnd->bpn',\n                self.query_proj(query),\n                self.key_proj(exo_tokens)\n            ) / math.sqrt(self.d_model)\n            logits = logits + self.corr_scale * self._lag_corr_bias(target_hist.squeeze(-1), exo_hist).unsqueeze(1)\n\n            attn = self._sparse_attention(logits)  # [B, pred_len, N_exo]\n\n            affine = self.affine_head(exo_tokens)\n            gain = torch.tanh(affine[..., :1])\n            bias = affine[..., 1:2]\n            calibrated_exo_future = gain * exo_future + bias  # [B, N_exo, pred_len]\n\n            exo_context = torch.einsum('bpn,bnd->bpd', attn, self.value_proj(exo_tokens))\n            exo_delta = (\n                torch.einsum('bpn,bnp->bp', attn, calibrated_exo_future)\n                + self.context_head(exo_context).squeeze(-1)\n            )\n\n            mix = torch.sigmoid(self.mix_gate(torch.cat([query, exo_context], dim=-1))).squeeze(-1)\n            target_pred_norm = target_base + mix * exo_delta\n\n        target_out = target_pred_norm.unsqueeze(-1) * scale[:, :, -1:] + last[:, :, -1:]\n\n        # Return a full multivariate tensor; only the last channel is eval\n... (truncated)",
5  "after_line": 0,
6  "start_line": 1,
7  "end_line": 42
8}
Tool ResultStep 1
OK: Replaced lines 1..42 with 249 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.

[Current file: Time-Series-Library/models/Custom.py | editable: 1–249 | total: 249 lines]
     1  import math
     2  import torch
     3  import torch.nn as nn
     4  import torch.nn.functional as F
     5  
     6  from layers.Autoformer_EncDec import series_decomp
     7  from layers.Embed import TemporalEmbedding, TimeFeatureEmbedding
     8  
     9  
    10  class Model(nn.Module):
    11      """
    12      Lag-Correlation Sparse Exogenous Mixture (LCSEM).
    13  
    14      Core idea:
    15      1) Forecast the target with a strong decomposed linear self-forecast branch.
    16      2) Build one future path per exogenous variable with a shared decomposed linear expert.
    17      3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag
    18         target/exogenous correlations, to mix only a small subset of exogenous experts.
    19      
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)

### ETTh1 (scripts/etth1.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=1 train_loss=0.1693751 vali_loss=0.1008762 test_loss=0.0565066
TRAIN_METRICS epoch=2 train_loss=0.1568139 vali_loss=0.1017272 test_loss=0.0559623
TRAIN_METRICS epoch=3 train_loss=0.1514786 vali_loss=0.1019467 test_loss=0.0574856
TRAIN_METRICS epoch=4 train_loss=0.1476967 vali_loss=0.1044249 test_loss=0.0579439
Test results (ETTh1):
  MSE: 0.056116, MAE: 0.179302

### Weather (scripts/weather.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=1 train_loss=1.0075170 vali_loss=0.0030075 test_loss=0.0012176
TRAIN_METRICS epoch=2 train_loss=0.9407428 vali_loss=0.0028374 test_loss=0.0012368
TRAIN_METRICS epoch=3 train_loss=0.9200521 vali_loss=0.0031417 test_loss=0.0012587
TRAIN_METRICS epoch=4 train_loss=0.9129418 vali_loss=0.0033366 test_loss=0.0013579
TRAIN_METRICS epoch=5 train_loss=0.9082097 vali_loss=0.0032226 test_loss=0.0013119
Te
AssistantStep 3edit
Tool Input
1{
2  "op": "replace",
3  "filename": "Time-Series-Library/models/Custom.py",
4  "content": "import math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom layers.Autoformer_EncDec import series_decomp\nfrom layers.Embed import TemporalEmbedding, TimeFeatureEmbedding\n\n\nclass Model(nn.Module):\n    \"\"\"\n    Lag-Correlation Sparse Exogenous Mixture (LCSEM).\n\n    Core idea:\n    1) Forecast the target with a strong decomposed linear self-forecast branch.\n    2) Build one future path per exogenous variable with a shared decomposed linear expert.\n    3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag\n       target/exogenous correlations, to mix only a small subset of exogenous experts.\n    4) Dynamically calibrate each selected exogenous expert before adding it as a\n       residual correction to the target forecast.\n\n    The framework evaluates only the last channel in MS mode, so the model focuses its\n    capacity on the target (last dimension) while still returning [B, pred_len, c_out].\n    \"\"\"\n\n    def __init__(self, configs):\n        super(Model, self).__init__()\n        self.task_name = configs.task_name\n        self.seq_len = configs.seq_len\n        self.pred_len = configs.pred_len\n        self.enc_in = configs.enc_in\n        self.c_out = configs.c_out\n\n        self.d_model = getattr(configs, 'd_model', 128)\n        self.dropout = nn.Dropout(getattr(configs, 'dropout', 0.1))\n        self.max_exogenous = 12\n        self.corr_lags = [0, 1, 2, 3, 6, 12, 24]\n\n        moving_avg = getattr(configs, 'moving_avg', 25)\n        if isinstance(moving_avg, (list, tuple)):\n            moving_avg = moving_avg[0]\n        if moving_avg % 2 == 0:\n            moving_avg += 1\n        self.decomp = series_decomp(moving_avg)\n\n        # Target self-forecast branch (DLinear-style on the target only).\n        self.target_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n        self.target_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n        # Shared exogenous future experts.\n        self.exo_linear_seasonal = nn.Linear(self.seq_len, self.pred_len)\n        self.exo_linear_trend = nn.Linear(self.seq_len, self.pred_len)\n\n        self._init_linear_as_average(self.target_linear_seasonal)\n        self._init_linear_as_average(self.target_linear_trend)\n        self._init_linear_as_average(self.exo_linear_seasonal)\n        self._init_linear_as_average(self.exo_linear_trend)\n\n        # History encoders for exogenous selection and calibration.\n        self.hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n        self.diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n        self.future_token = nn.Linear(self.pred_len, self.d_model, bias=False)\n        self.target_hist_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n        self.target_diff_token = nn.Linear(self.seq_len, self.d_model, bias=False)\n\n        # Horizon queries: future calendar features + learned horizon embedding + target token.\n        self.horizon_embedding = nn.Embedding(self.pred_len, self.d_model)\n        embed_type = getattr(configs, 'embed', 'timeF')\n        freq = getattr(configs, 'freq', 'h')\n        if embed_type == 'timeF':\n            self.future_time_embedding = TimeFeatureEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n        else:\n            self.future_time_embedding = TemporalEmbedding(self.d_model, embed_type=embed_type, freq=freq)\n\n        self.token_norm = nn.LayerNorm(self.d_model)\n        self.query_norm = nn.LayerNorm(self.d_model)\n        self.query_proj = nn.Linear(self.d_model, self.d_model)\n        self.key_proj = nn.Linear(self.d_model, self.d_model)\n        self.value_proj = nn.Linear(self.d_model, self.d_model)\n\n        # Dynamic expert calibration and residual fusion.\n        self.affine_head = nn.Sequential(\n            nn.Linear(self.d_model, self.d_model),\n            nn.GELU(),\n            nn.Linear(self.d_model, 2)\n        )\n        self.context_head = nn.Sequential(\n            nn.Linear(self.d_model, self.d_model),\n            nn.GELU(),\n            nn.Linear(self.d_model, 1)\n        )\n        self.mix_gate = nn.Sequential(\n            nn.Linear(self.d_model * 2, self.d_model),\n            nn.GELU(),\n            nn.Linear(self.d_model, 1)\n        )\n        self.corr_scale = nn.Parameter(torch.tensor(1.0))\n\n        # Start close to the self-forecast branch; let exogenous corrections grow only if useful.\n        nn.init.constant_(self.mix_gate[-1].bias, -1.0)\n\n    def _init_linear_as_average(self, layer):\n        layer.weight = nn.Parameter(\n            (1.0 / self.seq_len) * torch.ones(self.pred_len, self.seq_len)\n        )\n        if layer.bias is not None:\n            nn.init.zeros_(layer.bias)\n\n    def _build_future_query(self, x_mark_dec, batch_size, device):\n        horizon_ids = torch.arange(self.pred_len, device=device)\n        query = self.horizon_embedding(horizon_ids).unsqueeze(0).expand(batch_size, -1, -1)\n\n        if x_mark_dec is not None and x_mark_dec.size(-1) > 0:\n            future_marks = x_mark_dec[:, -self.pred_len:, :]\n            query = query + self.future_time_embedding(future_marks)\n\n        return query\n\n    def _lag_corr_stats(self, target_hist, exo_hist):\n        \"\"\"\n        target_hist: [B, L]\n        exo_hist:    [B, L, N_exo]\n        returns:\n            corr_abs:    [B, N_exo] absolute max correlation across the lag bank\n            corr_signed: [B, N_exo] signed correlation at the selected lag\n        \"\"\"\n        if exo_hist.size(-1) == 0:\n            empty = exo_hist.new_zeros(exo_hist.size(0), 0)\n            return empty, empty\n\n        target_hist = target_hist - target_hist.mean(dim=1, keepdim=True)\n        target_hist = target_hist / (target_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n        exo_hist = exo_hist - exo_hist.mean(dim=1, keepdim=True)\n        exo_hist = exo_hist / (exo_hist.std(dim=1, keepdim=True, unbiased=False) + 1e-5)\n\n        corrs = []\n        for lag in self.corr_lags:\n            if lag == 0:\n                corr = (target_hist.unsqueeze(-1) * exo_hist).mean(dim=1)\n            elif lag < target_hist.size(1):\n                corr = (target_hist[:, lag:].unsqueeze(-1) * exo_hist[:, :-lag, :]).mean(dim=1)\n            else:\n                continue\n            corrs.append(corr)\n\n        if len(corrs) == 0:\n            empty = exo_hist.new_zeros(exo_hist.size(0), exo_hist.size(-1))\n            return empty, empty\n\n        corr_stack = torch.stack(corrs, dim=-1)\n        best_idx = corr_stack.abs().argmax(dim=-1, keepdim=True)\n        corr_signed = torch.gather(corr_stack, dim=-1, index=best_idx).squeeze(-1)\n        corr_abs = corr_signed.abs()\n        return corr_abs, corr_signed\n\n    def _sparse_attention(self, logits):\n        if logits.size(-1) == 0:\n            return logits\n\n        k = min(self.max_exogenous, logits.size(-1))\n        if k < logits.size(-1):\n            top_vals, top_idx = torch.topk(logits, k=k, dim=-1)\n            sparse_logits = torch.full_like(logits, -1e9)\n            sparse_logits.scatter_(-1, top_idx, top_vals)\n            logits = sparse_logits\n        return torch.softmax(logits, dim=-1)\n\n    def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n        \"\"\"\n        Forecasting with exogenous variables.\n        Input: x_enc [batch, seq_len, enc_in] — all variables\n        Output: [batch, pred_len, c_out] — predict all variables\n        \"\"\"\n        batch_size, _, num_vars = x_enc.shape\n        device = x_enc.device\n\n        # RevIN-style normalization with last-value centering.\n        last = x_enc[:, -1:, :]\n        centered = x_enc - last\n        scale = torch.sqrt(torch.var(centered, dim=1, keepdim=True, unbiased=False) + 1e-5)\n        x_norm = centered / scale\n\n        target_hist = x_norm[:, :, -1:]  # [B, L, 1]\n        target_seasonal, target_trend = self.decomp(target_hist)\n        target_base = (\n            self.target_linear_seasonal(target_seasonal.squeeze(-1))\n            + self.target_linear_trend(target_trend.squeeze(-1))\n        )  # [B, pred_len]\n\n        target_pred_norm = target_base\n\n        if num_vars > 1:\n            exo_hist = x_norm[:, :, :-1]  # [B, L, N_exo]\n            exo_seasonal, exo_trend = self.decomp(exo_hist)\n            exo_future = (\n                self.exo_linear_seasonal(exo_seasonal.permute(0, 2, 1))\n                + self.exo_linear_trend(exo_trend.permute(0, 2, 1))\n            )  # [B, N_exo, pred_len]\n\n            exo_diff = F.pad(exo_hist[:, 1:, :] - exo_hist[:, :-1, :], (0, 0, 1, 0))\n            target_diff = F.pad(target_hist[:, 1:, :] - target_hist[:, :-1, :], (0, 0, 1, 0))\n\n            exo_tokens = (\n                self.hist_token(exo_hist.permute(0, 2, 1))\n                + self.diff_token(exo_diff.permute(0, 2, 1))\n                + self.future_token(exo_future)\n            )\n            exo_tokens = self.dropout(self.token_norm(exo_tokens))\n\n            target_token = (\n                self.target_hist_token(target_hist.squeeze(-1))\n                + self.target_diff_token(target_diff.squeeze(-1))\n            )\n\n            query = self._build_future_query(x_mark_dec, batch_size, device) + target_token.unsqueeze(1)\n            query = self.dropout(self.query_norm(query))\n\n            corr_abs, corr_signed = self._lag_corr_stats(target_hist.squeeze(-1), exo_hist)\n            future_match = F.cosine_similarity(\n                exo_future,\n                target_base.unsqueeze(1).expand(-1, exo_future.size(1), -1),\n                dim=-1,\n                eps=1e-6,\n            )\n\n            logits = torch.einsum(\n                'bpd,bnd->bpn',\n                self.query_proj(query),\n                self.key_proj(exo_tokens)\n            ) / math.sqrt(self.d_model)\n            logits = logits + (2.0 * self.corr_scale) * corr_abs.unsqueeze(1) + future_match.unsqueeze(1)\n\n            attn = self._sparse_attention(logits)  # [B, pred_len, N_exo]\n\n            affine = self.affine_head(exo_tokens)\n            gain = (torch.tanh(affine[..., :1]) + corr_signed.unsqueeze(-1)).clamp(-2.0, 2.\n... (truncated)",
5  "after_line": 0,
6  "start_line": 1,
7  "end_line": 249
8}
Tool ResultStep 3
OK: Replaced lines 1..249 with 268 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.

[Current file: Time-Series-Library/models/Custom.py | editable: 1–268 | total: 268 lines]
     1  import math
     2  import torch
     3  import torch.nn as nn
     4  import torch.nn.functional as F
     5  
     6  from layers.Autoformer_EncDec import series_decomp
     7  from layers.Embed import TemporalEmbedding, TimeFeatureEmbedding
     8  
     9  
    10  class Model(nn.Module):
    11      """
    12      Lag-Correlation Sparse Exogenous Mixture (LCSEM).
    13  
    14      Core idea:
    15      1) Forecast the target with a strong decomposed linear self-forecast branch.
    16      2) Build one future path per exogenous variable with a shared decomposed linear expert.
    17      3) Use horizon-wise target-conditioned sparse attention, biased by multi-lag
    18         target/exogenous correlations, to mix only a small subset of exogenous experts.
    19     
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### ETTh1 (scripts/etth1.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=1 train_loss=0.1703812 vali_loss=0.1008947 test_loss=0.0567407
TRAIN_METRICS epoch=2 train_loss=0.1569466 vali_loss=0.1023170 test_loss=0.0561373
TRAIN_METRICS epoch=3 train_loss=0.1514480 vali_loss=0.1029651 test_loss=0.0578335
TRAIN_METRICS epoch=4 train_loss=0.1478871 vali_loss=0.1053999 test_loss=0.0585066
Test results (ETTh1):
  MSE: 0.056272, MAE: 0.179377

### Weather (scripts/weather.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=1 train_loss=1.0114511 vali_loss=0.0038976 test_loss=0.0014760
TRAIN_METRICS epoch=2 train_loss=0.9514343 vali_loss=0.0031899 test_loss=0.0012818
TRAIN_METRICS epoch=3 train_loss=0.9250270 vali_loss=0.0033208 test_loss=0.0013006
TRAIN_METRICS epoch=4 train_loss=0.9147430 vali_loss=0.0033153 test_loss=0.0012979
TRAIN_METRICS epoch=5 train_loss=0.9094940 vali_loss=0.0033257 test_loss=0.0012681
Tes
AssistantStep 5submit
Tool Input
1{
2  "n": 2
3}
Tool ResultStep 5
[submit] Submitting result from test #2 as final.

### ETTh1 (scripts/etth1.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=1 train_loss=0.1703812 vali_loss=0.1008947 test_loss=0.0567407
TRAIN_METRICS epoch=2 train_loss=0.1569466 vali_loss=0.1023170 test_loss=0.0561373
TRAIN_METRICS epoch=3 train_loss=0.1514480 vali_loss=0.1029651 test_loss=0.0578335
TRAIN_METRICS epoch=4 train_loss=0.1478871 vali_loss=0.1053999 test_loss=0.0585066
Test results (ETTh1):
  MSE: 0.056272, MAE: 0.179377

### Weather (scripts/weather.sh)
Training metrics (last epochs):
TRAIN_METRICS epoch=1 train_loss=1.0114511 vali_loss=0.0038976 test_loss=0.0014760
TRAIN_METRICS epoch=2 train_loss=0.9514343 vali_loss=0.0031899 test_loss=0.0012818
TRAIN_METRICS epoch=3 train_loss=0.9250270 vali_loss=0.0033208 test_loss=0.0013006
TRAIN_METRICS epoch=4 train_loss=0.9147430 vali_loss=0.0033153 test_loss=0.0012979
TRAIN_METRICS epoch=5 train_loss=0.9094940 vali_loss=0.0033257 test_loss=0.0012681
Test results (Weather)