Agent Conversation: deepseek-reasoner
Model: deepseek-reasoner
Total steps: 9
Messages: 19
UserStep 0
Task: ts-exogenous-forecast
Exogenous Variable Forecasting: Custom Model Design
Objective
Design and implement a custom deep learning model for time series forecasting with exogenous (external) variables. Uses features=MS: all variables as input, predict only the target (last dimension). Your code goes in the Model class in models/Custom.py. Three reference implementations (DLinear, PatchTST, iTransformer) are provided as read-only.
Evaluation
Trained and evaluated on three datasets with MS features:
- ETTh1 (7 → 1, hourly electricity data)
- Weather (21 → 1, weather observations)
- ECL (321 → 1, electricity consumption)
All use seq_len=96, pred_len=96. Metrics: MSE and MAE on the target variable (lower is better). The framework automatically extracts outputs[:, :, -1:].
Time-Series-Library/models/Custom.py [EDITABLE — entire file only]
1: import torch
2: import torch.nn as nn
3:
4:
5: class Model(nn.Module):
6: """
7: Custom model for exogenous variable forecasting (features=MS).
8:
9: Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)
10: - x_enc: [batch, seq_len, enc_in] — all input variables
11: - x_mark_enc: [batch, seq_len, time_features] — time feature encoding
12: - x_dec: [batch, label_len+pred_len, dec_in] — decoder input
13: - x_mark_dec: [batch, label_len+pred_len, time_features] — decoder time features
14:
15: Must return: [batch, pred_len, c_out] for forecasting
16: Note: c_out = enc_in. The framework extracts the target (last dim) for MS mode.
17: """
18:
19: def __init__(self, configs):
20: super(Model, self).__init__()
21: self.task_name = configs.task_name
22: self.seq_len = configs.seq_len
23: self.pred_len = configs.pred_len
24: self.enc_in = configs.enc_in
25: self.c_out = configs.c_out
26: # TODO: Define your model architecture here
27:
28: def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
29: """
30: Forecasting with exogenous variables.
31: Input: x_enc [batch, seq_len, enc_in] — all variables
32: Output: [batch, pred_len, c_out] — predict all variables
33: """
34: # TODO: Implement your forecasting logic
35: batch_size = x_enc.shape[0]
36: return torch.zeros(batch_size, self.pred_len, self.c_out).to(x_enc.device)
37:
38: def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
39: if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
40: dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
41: return dec_out[:, -self.pred_len:, :]
42: return None
Time-Series-Library/models/DLinear.py [REA
AssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 42,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom layers.Transformer_EncDec import Encoder, EncoderLayer\nfrom layers.SelfAttention_Family import FullAttention, AttentionLayer\nfrom layers.Embed import DataEmbedding_inverted\n\n\nclass ExogenousGate(nn.Module):\n \"\"\"Learn per-variable importance weight based on its time series.\"\"\"\n def __init__(self, enc_in, seq_len, hidden_dim=64):\n super(ExogenousGate, self).__init__()\n self.enc_in = enc_in\n self.seq_len = seq_len\n self.conv = nn.Conv1d(in_channels=seq_len, out_channels=hidden_dim, kernel_size=3, padding=1)\n self.pool = nn.AdaptiveAvgPool1d(1)\n self.fc = nn.Linear(hidden_dim, 1)\n self.sigmoid = nn.Sigmoid()\n \n def forward(self, x):\n # x: [batch, seq_len, enc_in]\n batch = x.shape[0]\n # Permute to [batch, enc_in, seq_len] for conv over time\n x_perm = x.permute(0, 2, 1) # [batch, enc_in, seq_len]\n # Treat each variable independently, conv across time\n # We'll apply conv across time dimension, but conv expects input channels = seq_len? Actually conv1d expects (batch, channels, length). \n # We'll reshape to treat each variable as batch dimension temporarily.\n x_reshaped = x_perm.reshape(-1, self.seq_len, 1) # [batch*enc_in, seq_len, 1]\n x_reshaped = x_reshaped.permute(0, 2, 1) # [batch*enc_in, 1, seq_len] for conv1d\n conv_out = F.relu(self.conv(x_reshaped)) # [batch*enc_in, hidden_dim, seq_len]\n pooled = self.pool(conv_out).squeeze(-1) # [batch*enc_in, hidden_dim]\n weight = self.sigmoid(self.fc(pooled)) # [batch*enc_in, 1]\n weight = weight.reshape(batch, self.enc_in, 1) # [batch, enc_in, 1]\n # Multiply original x by weight across variable dimension\n x_weighted = x * weight.permute(0, 2, 1) # [batch, seq_len, enc_in] * [batch, 1, enc_in]\n return x_weighted, weight.squeeze(-1)\n\n\nclass VariableEmbedding(nn.Module):\n \"\"\"Embed each variable's time series into d_model.\"\"\"\n def __init__(self, seq_len, d_model, enc_in, dropout=0.1):\n super(VariableEmbedding, self).__init__()\n self.seq_len = seq_len\n self.d_model = d_model\n self.enc_in = enc_in\n self.linear = nn.Linear(seq_len, d_model)\n self.var_pos_embed = nn.Parameter(torch.randn(1, enc_in, d_model))\n self.dropout = nn.Dropout(dropout)\n \n def forward(self, x):\n # x: [batch, seq_len, enc_in]\n batch = x.shape[0]\n # Permute to [batch, enc_in, seq_len]\n x_perm = x.permute(0, 2, 1) # [batch, enc_in, seq_len]\n # Embed each variable's time series\n embedded = self.linear(x_perm) # [batch, enc_in, d_model]\n # Add variable positional embedding\n embedded = embedded + self.var_pos_embed\n embedded = self.dropout(embedded)\n return embedded # [batch, enc_in, d_model]\n\n\nclass Model(nn.Module):\n \"\"\"\n Exogenous-Aware Transformer (ExAT) for exogenous variable forecasting.\n \"\"\"\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.pred_len\n self.enc_in = configs.enc_in\n self.c_out = configs.c_out\n \n # Normalization will be done manually as in Non-stationary Transformer\n self.exo_gate = ExogenousGate(configs.enc_in, configs.seq_len)\n self.var_embedding = VariableEmbedding(configs.seq_len, configs.d_model, configs.enc_in, configs.dropout)\n \n # Transformer encoder across variables\n self.encoder = Encoder(\n [\n EncoderLayer(\n AttentionLayer(\n FullAttention(False, configs.factor, attention_dropout=configs.dropout,\n output_attention=False), configs.d_model, configs.n_heads),\n configs.d_model,\n configs.d_ff,\n dropout=configs.dropout,\n activation=configs.activation\n ) for l in range(configs.e_layers)\n ],\n norm_layer=nn.LayerNorm(configs.d_model)\n )\n \n # Prediction head: map each variable's d_model to pred_len\n self.projection = nn.Linear(configs.d_model, configs.pred_len, bias=True)\n \n def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n # Normalization from Non-stationary Transformer\n means = x_enc.mean(1, keepdim=True).detach()\n x_enc = x_enc - means\n stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)\n x_enc /= stdev\n \n # Exogenous gating\n x_weighted, _ = self.exo_gate(x_enc)\n \n # Variable embedding\n embedded = self.var_embedding(x_weighted) # [batch, enc_in, d_model]\n \n # Transformer encoder across variables (treat each variable as token)\n enc_out, _ = self.encoder(embedded, attn_mask=None) # [batch, enc_in, d_model]\n \n # Project to prediction length\n dec_out = self.projection(enc_out) # [batch, enc_in, pred_len]\n dec_out = dec_out.permute(0, 2, 1) # [batch, pred_len, enc_in]\n \n # De-Normalization\n dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n return dec_out\n \n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':\n dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)\n return dec_out[:, -self.pred_len:, :]\n return None"
7}Tool ResultStep 1
OK: Replaced lines 1..42 with 127 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–127 | total: 127 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 from layers.Transformer_EncDec import Encoder, EncoderLayer
5 from layers.SelfAttention_Family import FullAttention, AttentionLayer
6 from layers.Embed import DataEmbedding_inverted
7
8
9 class ExogenousGate(nn.Module):
10 """Learn per-variable importance weight based on its time series."""
11 def __init__(self, enc_in, seq_len, hidden_dim=64):
12 super(ExogenousGate, self).__init__()
13 self.enc_in = enc_in
14 self.seq_len = seq_len
15 self.conv = nn.Conv1d(in_channels=seq_len, out_channels=hidden_dim, kernel_size=3, padding=1)
16 self.pool = nn.AdaptiveAvgPool1d(1)
17
AssistantStep 2edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 127,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom layers.Transformer_EncDec import Encoder, EncoderLayer\nfrom layers.SelfAttention_Family import FullAttention, AttentionLayer\n\n\nclass ExogenousGate(nn.Module):\n \"\"\"Learn per-variable importance weight based on its time series.\"\"\"\n def __init__(self, enc_in, seq_len, hidden_dim=64):\n super(ExogenousGate, self).__init__()\n self.enc_in = enc_in\n self.seq_len = seq_len\n self.fc1 = nn.Linear(seq_len, hidden_dim)\n self.fc2 = nn.Linear(hidden_dim, 1)\n self.sigmoid = nn.Sigmoid()\n \n def forward(self, x):\n # x: [batch, seq_len, enc_in]\n batch = x.shape[0]\n # Permute to [batch, enc_in, seq_len]\n x_perm = x.permute(0, 2, 1) # [batch, enc_in, seq_len]\n # Flatten batch and variable dimensions\n x_flat = x_perm.reshape(-1, self.seq_len) # [batch*enc_in, seq_len]\n # Compute weight\n hidden = F.relu(self.fc1(x_flat))\n weight = self.sigmoid(self.fc2(hidden)) # [batch*enc_in, 1]\n weight = weight.reshape(batch, self.enc_in, 1) # [batch, enc_in, 1]\n # Multiply original x by weight across variable dimension\n x_weighted = x * weight.permute(0, 2, 1) # [batch, seq_len, enc_in] * [batch, 1, enc_in]\n return x_weighted, weight.squeeze(-1)\n\n\nclass VariableEmbedding(nn.Module):\n \"\"\"Embed each variable's time series into d_model.\"\"\"\n def __init__(self, seq_len, d_model, dropout=0.1):\n super(VariableEmbedding, self).__init__()\n self.seq_len = seq_len\n self.d_model = d_model\n self.linear = nn.Linear(seq_len, d_model)\n self.dropout = nn.Dropout(dropout)\n \n def forward(self, x, var_pos_embed):\n # x: [batch, num_vars, seq_len]\n # var_pos_embed: [1, num_vars, d_model]\n embedded = self.linear(x) # [batch, num_vars, d_model]\n embedded = embedded + var_pos_embed\n embedded = self.dropout(embedded)\n return embedded # [batch, num_vars, d_model]\n\n\nclass Model(nn.Module):\n \"\"\"\n Exogenous-Aware Transformer (ExAT) for exogenous variable forecasting.\n \"\"\"\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.pred_len\n self.enc_in = configs.enc_in\n self.c_out = configs.c_out\n \n # Normalization will be done manually as in Non-stationary Transformer\n self.exo_gate = ExogenousGate(configs.enc_in, configs.seq_len)\n self.var_embedding = VariableEmbedding(configs.seq_len, configs.d_model, configs.dropout)\n \n # Transformer encoder across variables\n self.encoder = Encoder(\n [\n EncoderLayer(\n AttentionLayer(\n FullAttention(False, configs.factor, attention_dropout=configs.dropout,\n output_attention=False), configs.d_model, configs.n_heads),\n configs.d_model,\n configs.d_ff,\n dropout=configs.dropout,\n activation=configs.activation\n ) for l in range(configs.e_layers)\n ],\n norm_layer=nn.LayerNorm(configs.d_model)\n )\n \n # Prediction head: map each variable's d_model to pred_len\n self.projection = nn.Linear(configs.d_model, configs.pred_len, bias=True)\n \n def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n # Normalization from Non-stationary Transformer\n means = x_enc.mean(1, keepdim=True).detach()\n x_enc = x_enc - means\n stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)\n x_enc /= stdev\n \n # Exogenous gating\n x_weighted, _ = self.exo_gate(x_enc)\n \n # Concatenate time features as additional variables\n if x_mark_enc is not None:\n x_cat = torch.cat([x_weighted, x_mark_enc], dim=-1) # [batch, seq_len, enc_in + time_feat_dim]\n else:\n x_cat = x_weighted\n num_vars = x_cat.shape[-1]\n \n # Prepare variable positional embedding (learned)\n if not hasattr(self, 'var_pos_embed'):\n # Create learnable positional embedding for up to max possible variables\n # We'll create for max_vars = enc_in + 10 (assuming time_feat_dim <= 10)\n max_vars = self.enc_in + 10\n self.var_pos_embed = nn.Parameter(torch.randn(1, max_vars, self.configs.d_model))\n # Slice to actual num_vars\n var_pos_embed = self.var_pos_embed[:, :num_vars, :]\n \n # Permute to [batch, num_vars, seq_len] for embedding\n x_perm = x_cat.permute(0, 2, 1) # [batch, num_vars, seq_len]\n \n # Variable embedding\n embedded = self.var_embedding(x_perm, var_pos_embed) # [batch, num_vars, d_model]\n \n # Transformer encoder across variables (treat each variable as token)\n enc_out, _ = self.encoder(embedded, attn_mask=None) # [batch, num_vars, d_model]\n \n # Project to prediction length\n dec_out = self.projection(enc_out) # [batch, num_vars, pred_len]\n # Keep only the original variables (first enc_in)\n dec_out = dec_out[:, :self.enc_in, :] # [batch, enc_in, pred_len]\n dec_out = dec_out.permute(0, 2, 1) # [batch, pred_len, enc_in]\n \n # De-Normalization\n dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n return dec_out\n \n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':\n dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)\n return dec_out[:, -self.pred_len:, :]\n return None"
7}Tool ResultStep 2
OK: Replaced lines 1..127 with 137 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–137 | total: 137 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 from layers.Transformer_EncDec import Encoder, EncoderLayer
5 from layers.SelfAttention_Family import FullAttention, AttentionLayer
6
7
8 class ExogenousGate(nn.Module):
9 """Learn per-variable importance weight based on its time series."""
10 def __init__(self, enc_in, seq_len, hidden_dim=64):
11 super(ExogenousGate, self).__init__()
12 self.enc_in = enc_in
13 self.seq_len = seq_len
14 self.fc1 = nn.Linear(seq_len, hidden_dim)
15 self.fc2 = nn.Linear(hidden_dim, 1)
16 self.sigmoid = nn.Sigmoid()
17
18 def forward(self, x):
19 # x:
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 56,
5 "end_line": 137,
6 "content": " def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.pred_len\n self.enc_in = configs.enc_in\n self.c_out = configs.c_out\n self.d_model = configs.d_model\n self.configs = configs\n \n # Normalization will be done manually as in Non-stationary Transformer\n self.exo_gate = ExogenousGate(configs.enc_in, configs.seq_len)\n self.var_embedding = VariableEmbedding(configs.seq_len, configs.d_model, configs.dropout)\n \n # Transformer encoder across variables\n self.encoder = Encoder(\n [\n EncoderLayer(\n AttentionLayer(\n FullAttention(False, configs.factor, attention_dropout=configs.dropout,\n output_attention=False), configs.d_model, configs.n_heads),\n configs.d_model,\n configs.d_ff,\n dropout=configs.dropout,\n activation=configs.activation\n ) for l in range(configs.e_layers)\n ],\n norm_layer=nn.LayerNorm(configs.d_model)\n )\n \n # Prediction head: map each variable's d_model to pred_len\n self.projection = nn.Linear(configs.d_model, configs.pred_len, bias=True)\n \n # Learnable positional embedding for variables (including time features)\n # Assume max variables = enc_in + 20 (time features)\n max_vars = configs.enc_in + 20\n self.var_pos_embed = nn.Parameter(torch.randn(1, max_vars, configs.d_model))\n \n def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n # Normalization from Non-stationary Transformer\n means = x_enc.mean(1, keepdim=True).detach()\n x_enc = x_enc - means\n stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)\n x_enc /= stdev\n \n # Exogenous gating\n x_weighted, _ = self.exo_gate(x_enc)\n \n # Concatenate time features as additional variables\n if x_mark_enc is not None:\n x_cat = torch.cat([x_weighted, x_mark_enc], dim=-1) # [batch, seq_len, enc_in + time_feat_dim]\n else:\n x_cat = x_weighted\n num_vars = x_cat.shape[-1]\n \n # Slice variable positional embedding to actual num_vars\n var_pos_embed = self.var_pos_embed[:, :num_vars, :]\n \n # Permute to [batch, num_vars, seq_len] for embedding\n x_perm = x_cat.permute(0, 2, 1) # [batch, num_vars, seq_len]\n \n # Variable embedding\n embedded = self.var_embedding(x_perm, var_pos_embed) # [batch, num_vars, d_model]\n \n # Transformer encoder across variables (treat each variable as token)\n enc_out, _ = self.encoder(embedded, attn_mask=None) # [batch, num_vars, d_model]\n \n # Project to prediction length\n dec_out = self.projection(enc_out) # [batch, num_vars, pred_len]\n # Keep only the original variables (first enc_in)\n dec_out = dec_out[:, :self.enc_in, :] # [batch, enc_in, pred_len]\n dec_out = dec_out.permute(0, 2, 1) # [batch, pred_len, enc_in]\n \n # De-Normalization\n dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n return dec_out\n \n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':\n dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)\n return dec_out[:, -self.pred_len:, :]\n return None"
7}Tool ResultStep 3
OK: Replaced lines 56..137 with 83 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–138 | total: 138 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 from layers.Transformer_EncDec import Encoder, EncoderLayer
5 from layers.SelfAttention_Family import FullAttention, AttentionLayer
6
7
8 class ExogenousGate(nn.Module):
9 """Learn per-variable importance weight based on its time series."""
10 def __init__(self, enc_in, seq_len, hidden_dim=64):
11 super(ExogenousGate, self).__init__()
12 self.enc_in = enc_in
13 self.seq_len = seq_len
14 self.fc1 = nn.Linear(seq_len, hidden_dim)
15 self.fc2 = nn.Linear(hidden_dim, 1)
16 self.sigmoid = nn.Sigmoid()
17
18 def forward(self, x):
19 # x:
AssistantStep 4test
Tool ResultStep 4
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=0.1780973 vali_loss=0.1169172 test_loss=0.0648168 TRAIN_METRICS epoch=2 train_loss=0.1525517 vali_loss=0.1228985 test_loss=0.0665723 TRAIN_METRICS epoch=3 train_loss=0.1308659 vali_loss=0.1298345 test_loss=0.0693807 TRAIN_METRICS epoch=4 train_loss=0.1183508 vali_loss=0.1414869 test_loss=0.0731921 Test results (ETTh1): MSE: 0.063875, MAE: 0.191893 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=6 train_loss=0.8985955 vali_loss=0.0033878 test_loss=0.0014034 TRAIN_METRICS epoch=7 train_loss=0.8987076 vali_loss=0.0033776 test_loss=0.0013996 TRAIN_METRICS epoch=8 train_loss=0.8953281 vali_loss=0.0033789 test_loss=0.0014046 TRAIN_METRICS epoch=9 train_loss=0.8948039 vali_loss=0.0033792 test_loss=0.0014035 TRAIN_METRICS epoch=10 train_loss=0.8950643 vali_loss=0.0033775 test_loss=0.0014039 T
AssistantStep 5edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 138,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom layers.Transformer_EncDec import Encoder, EncoderLayer\nfrom layers.SelfAttention_Family import FullAttention, AttentionLayer\n\n\nclass ExogenousGate(nn.Module):\n \"\"\"Learn per-variable importance weight based on its time series.\"\"\"\n def __init__(self, enc_in, seq_len, hidden_dim=128):\n super(ExogenousGate, self).__init__()\n self.enc_in = enc_in\n self.seq_len = seq_len\n self.fc1 = nn.Linear(seq_len, hidden_dim)\n self.fc2 = nn.Linear(hidden_dim, hidden_dim)\n self.fc3 = nn.Linear(hidden_dim, 1)\n self.ln = nn.LayerNorm(hidden_dim)\n self.sigmoid = nn.Sigmoid()\n \n def forward(self, x):\n # x: [batch, seq_len, enc_in]\n batch = x.shape[0]\n # Permute to [batch, enc_in, seq_len]\n x_perm = x.permute(0, 2, 1) # [batch, enc_in, seq_len]\n # Flatten batch and variable dimensions\n x_flat = x_perm.reshape(-1, self.seq_len) # [batch*enc_in, seq_len]\n # Compute weight\n hidden = F.relu(self.fc1(x_flat))\n hidden = self.ln(hidden)\n hidden = F.relu(self.fc2(hidden))\n weight = self.sigmoid(self.fc3(hidden)) # [batch*enc_in, 1]\n # Scale weights to allow larger emphasis\n weight = weight * 2.0\n weight = weight.reshape(batch, self.enc_in, 1) # [batch, enc_in, 1]\n # Multiply original x by weight across variable dimension\n x_weighted = x * weight.permute(0, 2, 1) # [batch, seq_len, enc_in] * [batch, 1, enc_in]\n return x_weighted, weight.squeeze(-1)\n\n\nclass VariableEmbedding(nn.Module):\n \"\"\"Embed each variable's time series into d_model.\"\"\"\n def __init__(self, seq_len, d_model, dropout=0.1):\n super(VariableEmbedding, self).__init__()\n self.seq_len = seq_len\n self.d_model = d_model\n self.linear = nn.Linear(seq_len, d_model)\n self.ln = nn.LayerNorm(d_model)\n self.dropout = nn.Dropout(dropout)\n \n def forward(self, x, var_pos_embed):\n # x: [batch, num_vars, seq_len]\n # var_pos_embed: [1, num_vars, d_model]\n embedded = self.linear(x) # [batch, num_vars, d_model]\n embedded = self.ln(embedded)\n embedded = embedded + var_pos_embed\n embedded = self.dropout(embedded)\n return embedded # [batch, num_vars, d_model]\n\n\nclass Model(nn.Module):\n \"\"\"\n Exogenous-Aware Transformer (ExAT) for exogenous variable forecasting.\n \"\"\"\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.pred_len = configs.pred_len\n self.enc_in = configs.enc_in\n self.c_out = configs.c_out\n self.d_model = configs.d_model\n self.configs = configs\n \n # Normalization will be done manually as in Non-stationary Transformer\n self.exo_gate = ExogenousGate(configs.enc_in, configs.seq_len)\n self.var_embedding = VariableEmbedding(configs.seq_len, configs.d_model, configs.dropout)\n \n # Transformer encoder across variables\n self.encoder = Encoder(\n [\n EncoderLayer(\n AttentionLayer(\n FullAttention(False, configs.factor, attention_dropout=configs.dropout,\n output_attention=False), configs.d_model, configs.n_heads),\n configs.d_model,\n configs.d_ff,\n dropout=configs.dropout,\n activation=configs.activation\n ) for l in range(configs.e_layers)\n ],\n norm_layer=nn.LayerNorm(configs.d_model)\n )\n \n # Prediction head: map each variable's d_model to pred_len\n self.projection = nn.Linear(configs.d_model, configs.pred_len, bias=True)\n \n # Learnable positional embedding for variables (including time features)\n # Assume max variables = enc_in + 50 (time features)\n max_vars = configs.enc_in + 50\n self.var_pos_embed = nn.Parameter(torch.randn(1, max_vars, configs.d_model))\n \n def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):\n # Normalization from Non-stationary Transformer\n means = x_enc.mean(1, keepdim=True).detach()\n x_enc = x_enc - means\n stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)\n x_enc /= stdev\n \n # Exogenous gating\n x_weighted, _ = self.exo_gate(x_enc)\n \n # Concatenate time features as additional variables\n if x_mark_enc is not None:\n x_cat = torch.cat([x_weighted, x_mark_enc], dim=-1) # [batch, seq_len, enc_in + time_feat_dim]\n else:\n x_cat = x_weighted\n num_vars = x_cat.shape[-1]\n \n # Slice variable positional embedding to actual num_vars\n if num_vars > self.var_pos_embed.shape[1]:\n # Expand if needed (should not happen with max_vars large enough)\n self.var_pos_embed = nn.Parameter(torch.cat([\n self.var_pos_embed,\n torch.randn(1, num_vars - self.var_pos_embed.shape[1], self.d_model, device=self.var_pos_embed.device)\n ], dim=1))\n var_pos_embed = self.var_pos_embed[:, :num_vars, :]\n \n # Permute to [batch, num_vars, seq_len] for embedding\n x_perm = x_cat.permute(0, 2, 1) # [batch, num_vars, seq_len]\n \n # Variable embedding\n embedded = self.var_embedding(x_perm, var_pos_embed) # [batch, num_vars, d_model]\n \n # Transformer encoder across variables (treat each variable as token)\n enc_out, _ = self.encoder(embedded, attn_mask=None) # [batch, num_vars, d_model]\n \n # Project to prediction length\n dec_out = self.projection(enc_out) # [batch, num_vars, pred_len]\n # Keep only the original variables (first enc_in)\n dec_out = dec_out[:, :self.enc_in, :] # [batch, enc_in, pred_len]\n dec_out = dec_out.permute(0, 2, 1) # [batch, pred_len, enc_in]\n \n # De-Normalization\n dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))\n return dec_out\n \n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':\n dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)\n return dec_out[:, -self.pred_len:, :]\n return None"
7}Tool ResultStep 5
OK: Replaced lines 1..138 with 152 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–152 | total: 152 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 from layers.Transformer_EncDec import Encoder, EncoderLayer
5 from layers.SelfAttention_Family import FullAttention, AttentionLayer
6
7
8 class ExogenousGate(nn.Module):
9 """Learn per-variable importance weight based on its time series."""
10 def __init__(self, enc_in, seq_len, hidden_dim=128):
11 super(ExogenousGate, self).__init__()
12 self.enc_in = enc_in
13 self.seq_len = seq_len
14 self.fc1 = nn.Linear(seq_len, hidden_dim)
15 self.fc2 = nn.Linear(hidden_dim, hidden_dim)
16 self.fc3 = nn.Linear(hidden_dim, 1)
17 self.ln = nn.LayerNorm(hidden_dim)
AssistantStep 6test
Tool ResultStep 6
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=0.1609662 vali_loss=0.1140823 test_loss=0.0678334 TRAIN_METRICS epoch=2 train_loss=0.1243619 vali_loss=0.1275137 test_loss=0.0777948 TRAIN_METRICS epoch=3 train_loss=0.0940622 vali_loss=0.1385370 test_loss=0.0877937 TRAIN_METRICS epoch=4 train_loss=0.0827605 vali_loss=0.1418833 test_loss=0.0920219 Test results (ETTh1): MSE: 0.067411, MAE: 0.202068 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=6 train_loss=0.8889313 vali_loss=0.0030862 test_loss=0.0013004 TRAIN_METRICS epoch=7 train_loss=0.8894307 vali_loss=0.0030691 test_loss=0.0013000 TRAIN_METRICS epoch=8 train_loss=0.8874146 vali_loss=0.0030462 test_loss=0.0012982 TRAIN_METRICS epoch=9 train_loss=0.8875334 vali_loss=0.0030393 test_loss=0.0012977 TRAIN_METRICS epoch=10 train_loss=0.8879560 vali_loss=0.0030314 test_loss=0.0012970 Te
AssistantStep 7edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 8,
5 "end_line": 37,
6 "content": "class GlobalExogenousGate(nn.Module):\n \"\"\"Learn per-variable importance weight based on global context.\"\"\"\n def __init__(self, enc_in, seq_len, hidden_dim=128):\n super(GlobalExogenousGate, self).__init__()\n self.enc_in = enc_in\n self.seq_len = seq_len\n # Global context encoder\n self.global_pool = nn.AdaptiveAvgPool1d(1)\n self.fc1 = nn.Linear(seq_len * enc_in, hidden_dim)\n self.fc2 = nn.Linear(hidden_dim, hidden_dim)\n self.fc3 = nn.Linear(hidden_dim, enc_in)\n self.ln = nn.LayerNorm(hidden_dim)\n self.sigmoid = nn.Sigmoid()\n \n def forward(self, x):\n # x: [batch, seq_len, enc_in]\n batch = x.shape[0]\n # Compute global context: average across time\n x_pool = x.mean(dim=1) # [batch, enc_in]\n # Flatten across variables\n x_flat = x.view(batch, -1) # [batch, seq_len * enc_in]\n hidden = F.relu(self.fc1(x_flat))\n hidden = self.ln(hidden)\n hidden = F.relu(self.fc2(hidden))\n weights = self.sigmoid(self.fc3(hidden)) # [batch, enc_in]\n weights = weights.unsqueeze(1) # [batch, 1, enc_in]\n # Residual gating: weighted combination\n x_weighted = x * weights + x # [batch, seq_len, enc_in]\n return x_weighted, weights.squeeze(1)"
7}Tool ResultStep 7
OK: Replaced lines 8..37 with 29 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–151 | total: 151 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 from layers.Transformer_EncDec import Encoder, EncoderLayer
5 from layers.SelfAttention_Family import FullAttention, AttentionLayer
6
7
8 class GlobalExogenousGate(nn.Module):
9 """Learn per-variable importance weight based on global context."""
10 def __init__(self, enc_in, seq_len, hidden_dim=128):
11 super(GlobalExogenousGate, self).__init__()
12 self.enc_in = enc_in
13 self.seq_len = seq_len
14 # Global context encoder
15 self.global_pool = nn.AdaptiveAvgPool1d(1)
16 self.fc1 = nn.Linear(seq_len * enc_in, hidden_dim)
17 self.fc2 = nn.Linear(hidden_dim,
AssistantStep 8edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 74,
5 "end_line": 74,
6 "content": " self.exo_gate = GlobalExogenousGate(configs.enc_in, configs.seq_len)"
7}Tool ResultStep 8
OK: Replaced lines 74..74 with 1 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–151 | total: 151 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 from layers.Transformer_EncDec import Encoder, EncoderLayer
5 from layers.SelfAttention_Family import FullAttention, AttentionLayer
6
7
8 class GlobalExogenousGate(nn.Module):
9 """Learn per-variable importance weight based on global context."""
10 def __init__(self, enc_in, seq_len, hidden_dim=128):
11 super(GlobalExogenousGate, self).__init__()
12 self.enc_in = enc_in
13 self.seq_len = seq_len
14 # Global context encoder
15 self.global_pool = nn.AdaptiveAvgPool1d(1)
16 self.fc1 = nn.Linear(seq_len * enc_in, hidden_dim)
17 self.fc2 = nn.Linear(hidden_dim,
AssistantStep 9test
Tool ResultStep 9
[Test #3] (0 tests remaining) ### ETTh1 (scripts/etth1.sh) Training metrics (last epochs): TRAIN_METRICS epoch=1 train_loss=0.1602022 vali_loss=0.1119324 test_loss=0.0685405 TRAIN_METRICS epoch=2 train_loss=0.1193624 vali_loss=0.1280445 test_loss=0.0821681 TRAIN_METRICS epoch=3 train_loss=0.0928540 vali_loss=0.1363018 test_loss=0.0921022 TRAIN_METRICS epoch=4 train_loss=0.0816690 vali_loss=0.1411806 test_loss=0.0951280 Test results (ETTh1): MSE: 0.067990, MAE: 0.200467 ### Weather (scripts/weather.sh) Training metrics (last epochs): TRAIN_METRICS epoch=6 train_loss=0.8898499 vali_loss=0.0031197 test_loss=0.0013067 TRAIN_METRICS epoch=7 train_loss=0.8887307 vali_loss=0.0030886 test_loss=0.0013020 TRAIN_METRICS epoch=8 train_loss=0.8884105 vali_loss=0.0030915 test_loss=0.0013042 TRAIN_METRICS epoch=9 train_loss=0.8881880 vali_loss=0.0030923 test_loss=0.0013049 TRAIN_METRICS epoch=10 train_loss=0.8875890 vali_loss=0.0030812 test_loss=0.0013033 Test results (Weather): MSE: 0.001306,