Agent Conversation: anthropic/claude-opus-4.6
Model: anthropic/claude-opus-4.6
Total steps: 6
Messages: 13
UserStep 0
Task: ts-classification
Time Series Classification: Custom Model Design
Objective
Design and implement a custom deep learning model for multivariate time series classification. Your code goes in the Model class in models/Custom.py. Three reference implementations (DLinear, TimesNet, PatchTST) are provided as read-only.
Evaluation
Trained and evaluated on three UEA datasets:
- EthanolConcentration — spectral data classification
- FaceDetection — MEG brain imaging classification
- Handwriting — accelerometer-based character recognition
Training uses RAdam optimizer, CrossEntropyLoss, patience=10. Metric: accuracy (higher is better).
Time-Series-Library/models/Custom.py [EDITABLE — entire file only]
1: import torch
2: import torch.nn as nn
3: import torch.nn.functional as F
4:
5:
6: class Model(nn.Module):
7: """
8: Custom model for time series classification.
9:
10: Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)
11: - x_enc: [batch, seq_len, enc_in] — input time series
12: - x_mark_enc: [batch, seq_len] — padding mask (1=valid, 0=padding)
13: - x_dec: not used (None)
14: - x_mark_dec: not used (None)
15:
16: Must return: [batch, num_class] — class logits (before softmax)
17:
18: Note: configs.seq_len, configs.enc_in, and configs.num_class are set
19: dynamically from the dataset at runtime.
20: """
21:
22: def __init__(self, configs):
23: super(Model, self).__init__()
24: self.task_name = configs.task_name
25: self.seq_len = configs.seq_len
26: self.enc_in = configs.enc_in
27: self.num_class = configs.num_class
28: # TODO: Define your model architecture here
29:
30: def classification(self, x_enc, x_mark_enc):
31: """
32: Classification: assign a label to the input time series.
33: Input: x_enc [batch, seq_len, enc_in]
34: x_mark_enc: [batch, seq_len] padding mask
35: Output: [batch, num_class] logits
36: """
37: # TODO: Implement your classification logic
38: batch_size = x_enc.shape[0]
39: return torch.zeros(batch_size, self.num_class).to(x_enc.device)
40:
41: def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
42: if self.task_name == 'classification':
43: dec_out = self.classification(x_enc, x_mark_enc)
44: return dec_out
45: return None
Time-Series-Library/models/DLinear.py [READ-ONLY — do not edit]
1: import torch
2: import torch.nn as nn
3: import torch.nn.functional as F
4: from layers.Autoformer_EncDec import series_decomp
5:
6:
7: class Model(nn.Module):
8: """
9: Paper lAssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 45,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\n\n\nclass MultiScaleBlock(nn.Module):\n \"\"\"\n Multi-scale depthwise-separable temporal convolution block.\n Uses depthwise convolutions at scales 3,5,7 with configurable dilation,\n pointwise channel mixing, and a feed-forward network with residual connections.\n Exponentially increasing dilation across stacked blocks expands the receptive field.\n \"\"\"\n def __init__(self, d_model, dilation=1, dropout=0.25):\n super().__init__()\n self.dw_convs = nn.ModuleList()\n for k in [3, 5, 7]:\n pad = ((k - 1) * dilation) // 2\n self.dw_convs.append(\n nn.Conv1d(d_model, d_model, kernel_size=k, padding=pad,\n dilation=dilation, groups=d_model)\n )\n self.pw_conv = nn.Conv1d(d_model * 3, d_model, kernel_size=1)\n self.norm1 = nn.BatchNorm1d(d_model)\n self.drop1 = nn.Dropout(dropout)\n\n self.ff = nn.Sequential(\n nn.Conv1d(d_model, d_model * 4, kernel_size=1),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Conv1d(d_model * 4, d_model, kernel_size=1),\n nn.Dropout(dropout)\n )\n self.norm2 = nn.BatchNorm1d(d_model)\n\n def forward(self, x):\n # x: [B, D, T]\n residual = x\n branches = [F.gelu(conv(x)) for conv in self.dw_convs]\n out = torch.cat(branches, dim=1) # [B, 3D, T]\n out = self.drop1(self.pw_conv(out)) # [B, D, T]\n x = self.norm1(residual + out)\n\n residual = x\n x = self.norm2(residual + self.ff(x))\n return x\n\n\nclass SpectralGating(nn.Module):\n \"\"\"\n Spectral Channel Gating: computes per-channel importance gates from\n frequency domain amplitude statistics (mean + std). Unlike SE-Net which\n uses spatial/temporal statistics, this captures the spectral characteristics\n of each feature channel, providing a complementary gating signal.\n \"\"\"\n def __init__(self, d_model, reduction=4):\n super().__init__()\n self.fc = nn.Sequential(\n nn.Linear(d_model * 2, d_model // reduction),\n nn.ReLU(inplace=True),\n nn.Linear(d_model // reduction, d_model),\n nn.Sigmoid()\n )\n\n def forward(self, x):\n # x: [B, D, T]\n x_fft = torch.fft.rfft(x, dim=-1)\n amp = torch.abs(x_fft) # [B, D, F]\n amp_mean = amp.mean(dim=-1) # [B, D]\n amp_std = amp.std(dim=-1) # [B, D]\n stats = torch.cat([amp_mean, amp_std], dim=-1) # [B, 2*D]\n gate = self.fc(stats) # [B, D]\n return x * gate.unsqueeze(-1)\n\n\nclass Model(nn.Module):\n \"\"\"\n Multi-Scale Spectral-Gated Network (MSSGNet) for Time Series Classification.\n\n Novel algorithmic contributions:\n 1. Multi-scale depthwise-separable convolutions with exponentially increasing\n dilation (1,2,4) capture temporal patterns at multiple resolutions efficiently.\n Depthwise convs are parameter-efficient; pointwise mixing enables cross-channel\n interaction.\n 2. Spectral channel gating uses FFT amplitude statistics (mean+std) to compute\n per-channel importance gates. This leverages the insight that a channel's\n spectral profile indicates its information density and classification relevance.\n 3. Triple pooling aggregation (mean + max + learned attention) provides\n complementary sequence-level representations: average behavior, salient\n features, and task-specific learned weighting.\n\n Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)\n - x_enc: [batch, seq_len, enc_in] — input time series\n - x_mark_enc: [batch, seq_len] — padding mask (1=valid, 0=padding)\n Must return: [batch, num_class] — class logits (before softmax)\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.enc_in = configs.enc_in\n self.num_class = configs.num_class\n\n # Hardcoded hyperparameters\n d_model = 128\n n_blocks = 3\n dropout = 0.25\n self.d_model = d_model\n\n # Input projection: map enc_in channels to d_model with a wider kernel\n self.input_proj = nn.Sequential(\n nn.Conv1d(self.enc_in, d_model, kernel_size=7, padding=3),\n nn.BatchNorm1d(d_model),\n nn.GELU(),\n nn.Dropout(dropout)\n )\n\n # Multi-scale blocks with exponentially increasing dilation\n self.blocks = nn.ModuleList([\n MultiScaleBlock(d_model, dilation=2**i, dropout=dropout)\n for i in range(n_blocks)\n ])\n\n # Spectral channel gating\n self.spectral_gate = SpectralGating(d_model)\n\n # Learned attention pooling\n self.attn_pool = nn.Linear(d_model, 1)\n\n # Classification head: takes concatenated triple-pooled features\n self.head = nn.Sequential(\n nn.LayerNorm(d_model * 3),\n nn.Linear(d_model * 3, d_model),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_model, self.num_class)\n )\n\n def classification(self, x_enc, x_mark_enc):\n \"\"\"\n Classification: assign a label to the input time series.\n Input: x_enc [batch, seq_len, enc_in]\n x_mark_enc: [batch, seq_len] padding mask\n Output: [batch, num_class] logits\n \"\"\"\n B, T, C = x_enc.shape\n\n # Instance normalization for stationarity\n means = x_enc.mean(1, keepdim=True).detach()\n x = x_enc - means\n stdev = torch.sqrt(x.var(1, keepdim=True, unbiased=False) + 1e-5).detach()\n x = x / stdev\n\n # Project to d_model: [B, C, T] -> [B, D, T]\n x = self.input_proj(x.permute(0, 2, 1))\n\n # Multi-scale temporal processing with increasing dilation\n for block in self.blocks:\n x = block(x)\n\n # Spectral channel gating\n x = self.spectral_gate(x)\n\n # Transpose for pooling: [B, T, D]\n x_t = x.permute(0, 2, 1)\n\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [B, T, 1]\n x_masked = x_t * mask\n lengths = mask.sum(1).clamp(min=1) # [B, 1]\n\n # 1. Mean pooling over valid positions\n mean_pool = x_masked.sum(1) / lengths # [B, D]\n\n # 2. Max pooling (mask invalid positions with -inf)\n x_for_max = x_t.masked_fill(mask == 0, float('-inf'))\n max_pool = x_for_max.max(1)[0] # [B, D]\n max_pool = torch.nan_to_num(max_pool, nan=0.0, neginf=0.0)\n\n # 3. Learned attention pooling\n attn_scores = self.attn_pool(x_t).squeeze(-1) # [B, T]\n attn_scores = attn_scores.masked_fill(x_mark_enc == 0, float('-inf'))\n attn_weights = F.softmax(attn_scores, dim=-1) # [B, T]\n attn_weights = torch.nan_to_num(attn_weights, nan=0.0)\n attn_pooled = torch.einsum('bt,btd->bd', attn_weights, x_t) # [B, D]\n else:\n mean_pool = x_t.mean(1)\n max_pool = x_t.max(1)[0]\n attn_scores = self.attn_pool(x_t).squeeze(-1)\n attn_weights = F.softmax(attn_scores, dim=-1)\n attn_pooled = torch.einsum('bt,btd->bd', attn_weights, x_t)\n\n # Combine triple pooling: [B, 3*D]\n combined = torch.cat([mean_pool, max_pool, attn_pooled], dim=-1)\n\n return self.head(combined)\n\n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'classification':\n dec_out = self.classification(x_enc, x_mark_enc)\n return dec_out\n return None\n"
7}Tool ResultStep 1
OK: Replaced lines 1..45 with 203 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–203 | total: 203 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 import math
5
6
7 class MultiScaleBlock(nn.Module):
8 """
9 Multi-scale depthwise-separable temporal convolution block.
10 Uses depthwise convolutions at scales 3,5,7 with configurable dilation,
11 pointwise channel mixing, and a feed-forward network with residual connections.
12 Exponentially increasing dilation across stacked blocks expands the receptive field.
13 """
14 def __init__(self, d_model, dilation=1, dropout=0.25):
15 super().__init__()
16 self.dw_convs = nn.ModuleList()
17 for k in [3, 5, 7]:
18 pad = ((k - 1) * dilation) // 2
19
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### EthanolConcentration (scripts/ethanol.sh) Training metrics (last epochs): TRAIN_METRICS epoch=34 train_loss=1.270 vali_loss=1.459 vali_acc=0.285 test_acc=0.285 TRAIN_METRICS epoch=35 train_loss=1.252 vali_loss=1.504 vali_acc=0.266 test_acc=0.266 TRAIN_METRICS epoch=36 train_loss=1.234 vali_loss=1.464 vali_acc=0.274 test_acc=0.274 TRAIN_METRICS epoch=37 train_loss=1.246 vali_loss=1.504 vali_acc=0.285 test_acc=0.285 TRAIN_METRICS epoch=38 train_loss=1.220 vali_loss=1.503 vali_acc=0.266 test_acc=0.266 Test results (EthanolConcentration): Accuracy: 0.31179 ### FaceDetection (scripts/facedetection.sh) Training metrics (last epochs): TRAIN_METRICS epoch=13 train_loss=0.247 vali_loss=1.497 vali_acc=0.567 test_acc=0.567 TRAIN_METRICS epoch=14 train_loss=0.209 vali_loss=1.503 vali_acc=0.593 test_acc=0.593 TRAIN_METRICS epoch=15 train_loss=0.171 vali_loss=1.806 vali_acc=0.574 test_acc=0.574 TRAIN_METRICS e
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 203,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\n\n\nclass SpectralGating(nn.Module):\n \"\"\"\n Spectral Channel Gating: computes per-channel importance gates from\n frequency domain amplitude statistics (mean + std). Unlike SE-Net which\n uses temporal statistics, this captures the spectral characteristics\n of each feature channel, providing frequency-aware channel recalibration.\n \"\"\"\n def __init__(self, d_model, reduction=2):\n super().__init__()\n self.fc = nn.Sequential(\n nn.Linear(d_model * 2, d_model // reduction),\n nn.ReLU(inplace=True),\n nn.Linear(d_model // reduction, d_model),\n nn.Sigmoid()\n )\n\n def forward(self, x):\n # x: [B, D, T]\n x_fft = torch.fft.rfft(x, dim=-1)\n amp = torch.abs(x_fft) # [B, D, F]\n amp_mean = amp.mean(dim=-1) # [B, D]\n amp_std = amp.std(dim=-1) # [B, D]\n stats = torch.cat([amp_mean, amp_std], dim=-1) # [B, 2*D]\n gate = self.fc(stats) # [B, D]\n return x * gate.unsqueeze(-1)\n\n\nclass MultiScaleBlock(nn.Module):\n \"\"\"\n Multi-scale depthwise-separable temporal convolution block with\n integrated spectral channel gating.\n \n Uses larger kernel sizes [7,15,31] for expanded receptive field,\n enabling capture of long-range temporal patterns. Spectral gating\n is applied after the multi-scale convolution to provide frequency-aware\n channel recalibration at each processing level.\n \"\"\"\n def __init__(self, d_model, kernel_sizes=[7, 15, 31], dropout=0.3):\n super().__init__()\n self.dw_convs = nn.ModuleList()\n for k in kernel_sizes:\n self.dw_convs.append(\n nn.Conv1d(d_model, d_model, kernel_size=k, padding=k // 2,\n groups=d_model)\n )\n n_branches = len(kernel_sizes)\n self.pw_conv = nn.Conv1d(d_model * n_branches, d_model, kernel_size=1)\n self.norm1 = nn.BatchNorm1d(d_model)\n self.drop1 = nn.Dropout(dropout)\n\n # Spectral gating integrated per block\n self.spectral_gate = SpectralGating(d_model, reduction=2)\n\n # Feed-forward network\n self.ff = nn.Sequential(\n nn.Conv1d(d_model, d_model * 4, kernel_size=1),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Conv1d(d_model * 4, d_model, kernel_size=1),\n nn.Dropout(dropout)\n )\n self.norm2 = nn.BatchNorm1d(d_model)\n\n def forward(self, x):\n # x: [B, D, T]\n residual = x\n branches = [F.gelu(conv(x)) for conv in self.dw_convs]\n out = torch.cat(branches, dim=1) # [B, n*D, T]\n out = self.drop1(self.pw_conv(out)) # [B, D, T]\n x = self.norm1(residual + out)\n\n # Spectral channel gating at this level\n x = self.spectral_gate(x)\n\n residual = x\n x = self.norm2(residual + self.ff(x))\n return x\n\n\nclass Model(nn.Module):\n \"\"\"\n Multi-Scale Spectral-Gated Network (MSSGNet) for Time Series Classification.\n\n Novel algorithmic contributions:\n 1. Multi-scale depthwise-separable convolutions with large kernels [7,15,31]\n capture temporal patterns across a wide receptive field efficiently.\n Depthwise convs keep parameters low; pointwise mixing enables channel interaction.\n 2. Per-block spectral channel gating uses FFT amplitude statistics (mean+std)\n to compute channel importance gates at each processing level. This provides\n multi-level frequency-aware feature recalibration, where each block can\n focus on different spectral characteristics.\n 3. Triple pooling aggregation (mean + max + learned attention) provides\n complementary sequence-level representations for robust classification.\n\n Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.enc_in = configs.enc_in\n self.num_class = configs.num_class\n\n # Hardcoded hyperparameters\n d_model = 64\n n_blocks = 3\n dropout = 0.3\n self.d_model = d_model\n\n # Input projection\n self.input_proj = nn.Sequential(\n nn.Conv1d(self.enc_in, d_model, kernel_size=7, padding=3),\n nn.BatchNorm1d(d_model),\n nn.GELU(),\n nn.Dropout(dropout)\n )\n\n # Multi-scale blocks with per-block spectral gating\n self.blocks = nn.ModuleList([\n MultiScaleBlock(d_model, kernel_sizes=[7, 15, 31], dropout=dropout)\n for _ in range(n_blocks)\n ])\n\n # Learned attention pooling\n self.attn_pool = nn.Linear(d_model, 1)\n\n # Classification head\n self.head = nn.Sequential(\n nn.LayerNorm(d_model * 3),\n nn.Linear(d_model * 3, d_model),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_model, self.num_class)\n )\n\n def classification(self, x_enc, x_mark_enc):\n B, T, C = x_enc.shape\n\n # Instance normalization\n means = x_enc.mean(1, keepdim=True).detach()\n x = x_enc - means\n stdev = torch.sqrt(x.var(1, keepdim=True, unbiased=False) + 1e-5).detach()\n x = x / stdev\n\n # [B, C, T] -> [B, D, T]\n x = self.input_proj(x.permute(0, 2, 1))\n\n # Multi-scale temporal processing with per-block spectral gating\n for block in self.blocks:\n x = block(x)\n\n # Transpose for pooling: [B, T, D]\n x_t = x.permute(0, 2, 1)\n\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [B, T, 1]\n x_masked = x_t * mask\n lengths = mask.sum(1).clamp(min=1) # [B, 1]\n\n # 1. Mean pooling\n mean_pool = x_masked.sum(1) / lengths\n\n # 2. Max pooling\n x_for_max = x_t.masked_fill(mask == 0, float('-inf'))\n max_pool = x_for_max.max(1)[0]\n max_pool = torch.nan_to_num(max_pool, nan=0.0, neginf=0.0)\n\n # 3. Attention pooling\n attn_scores = self.attn_pool(x_t).squeeze(-1)\n attn_scores = attn_scores.masked_fill(x_mark_enc == 0, float('-inf'))\n attn_weights = F.softmax(attn_scores, dim=-1)\n attn_weights = torch.nan_to_num(attn_weights, nan=0.0)\n attn_pooled = torch.einsum('bt,btd->bd', attn_weights, x_t)\n else:\n mean_pool = x_t.mean(1)\n max_pool = x_t.max(1)[0]\n attn_scores = self.attn_pool(x_t).squeeze(-1)\n attn_weights = F.softmax(attn_scores, dim=-1)\n attn_pooled = torch.einsum('bt,btd->bd', attn_weights, x_t)\n\n # Triple pooling combination\n combined = torch.cat([mean_pool, max_pool, attn_pooled], dim=-1)\n return self.head(combined)\n\n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'classification':\n dec_out = self.classification(x_enc, x_mark_enc)\n return dec_out\n return None\n"
7}Tool ResultStep 3
OK: Replaced lines 1..203 with 196 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–196 | total: 196 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 import math
5
6
7 class SpectralGating(nn.Module):
8 """
9 Spectral Channel Gating: computes per-channel importance gates from
10 frequency domain amplitude statistics (mean + std). Unlike SE-Net which
11 uses temporal statistics, this captures the spectral characteristics
12 of each feature channel, providing frequency-aware channel recalibration.
13 """
14 def __init__(self, d_model, reduction=2):
15 super().__init__()
16 self.fc = nn.Sequential(
17 nn.Linear(d_model * 2, d_model // reduction),
18 nn.ReLU(inplace=True),
19 nn.L
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### EthanolConcentration (scripts/ethanol.sh) Training metrics (last epochs): TRAIN_METRICS epoch=21 train_loss=1.369 vali_loss=1.428 vali_acc=0.259 test_acc=0.259 TRAIN_METRICS epoch=22 train_loss=1.363 vali_loss=1.430 vali_acc=0.240 test_acc=0.240 TRAIN_METRICS epoch=23 train_loss=1.359 vali_loss=1.419 vali_acc=0.251 test_acc=0.251 TRAIN_METRICS epoch=24 train_loss=1.361 vali_loss=1.418 vali_acc=0.251 test_acc=0.251 TRAIN_METRICS epoch=25 train_loss=1.373 vali_loss=1.394 vali_acc=0.270 test_acc=0.270 Test results (EthanolConcentration): Accuracy: 0.28517 ### FaceDetection (scripts/facedetection.sh) Training metrics (last epochs): TRAIN_METRICS epoch=35 train_loss=0.074 vali_loss=1.784 vali_acc=0.636 test_acc=0.636 TRAIN_METRICS epoch=36 train_loss=0.069 vali_loss=1.668 vali_acc=0.637 test_acc=0.637 TRAIN_METRICS epoch=37 train_loss=0.072 vali_loss=2.048 vali_acc=0.638 test_acc=0.638 TRAIN_METRICS epoch=38 train_
AssistantStep 5edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 196,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\n\n\nclass MultiScaleBlock(nn.Module):\n \"\"\"\n Multi-scale depthwise-separable temporal convolution block.\n Uses depthwise convolutions at scales 3,5,7 with configurable dilation,\n pointwise channel mixing, and a feed-forward network with residual connections.\n \"\"\"\n def __init__(self, d_model, dilation=1, dropout=0.25):\n super().__init__()\n self.dw_convs = nn.ModuleList()\n for k in [3, 5, 7]:\n pad = ((k - 1) * dilation) // 2\n self.dw_convs.append(\n nn.Conv1d(d_model, d_model, kernel_size=k, padding=pad,\n dilation=dilation, groups=d_model)\n )\n self.pw_conv = nn.Conv1d(d_model * 3, d_model, kernel_size=1)\n self.norm1 = nn.BatchNorm1d(d_model)\n self.drop1 = nn.Dropout(dropout)\n\n self.ff = nn.Sequential(\n nn.Conv1d(d_model, d_model * 4, kernel_size=1),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Conv1d(d_model * 4, d_model, kernel_size=1),\n nn.Dropout(dropout)\n )\n self.norm2 = nn.BatchNorm1d(d_model)\n\n def forward(self, x):\n # x: [B, D, T]\n residual = x\n branches = [F.gelu(conv(x)) for conv in self.dw_convs]\n out = torch.cat(branches, dim=1) # [B, 3D, T]\n out = self.drop1(self.pw_conv(out)) # [B, D, T]\n x = self.norm1(residual + out)\n\n residual = x\n x = self.norm2(residual + self.ff(x))\n return x\n\n\nclass SpectralGating(nn.Module):\n \"\"\"\n Spectral Channel Gating: computes per-channel importance gates from\n frequency domain amplitude statistics (mean + std). Unlike SE-Net which\n uses temporal mean/var, this leverages FFT amplitude distribution to\n identify channels with informative spectral content.\n \"\"\"\n def __init__(self, d_model, reduction=4):\n super().__init__()\n self.fc = nn.Sequential(\n nn.Linear(d_model * 2, d_model // reduction),\n nn.ReLU(inplace=True),\n nn.Linear(d_model // reduction, d_model),\n nn.Sigmoid()\n )\n\n def forward(self, x):\n # x: [B, D, T]\n x_fft = torch.fft.rfft(x, dim=-1)\n amp = torch.abs(x_fft) # [B, D, F]\n amp_mean = amp.mean(dim=-1) # [B, D]\n amp_std = amp.std(dim=-1) # [B, D]\n stats = torch.cat([amp_mean, amp_std], dim=-1) # [B, 2*D]\n gate = self.fc(stats) # [B, D]\n return x * gate.unsqueeze(-1)\n\n\nclass Model(nn.Module):\n \"\"\"\n Multi-Scale Spectral-Gated Network (MSSGNet) for Time Series Classification.\n\n Novel algorithmic contributions:\n 1. Multi-scale depthwise-separable convolutions with exponentially increasing\n dilation (1,2,4) capture temporal patterns at multiple resolutions.\n 2. Data-adaptive regularization: dropout rate is automatically calibrated based\n on input dimensionality (enc_in). High-dimensional inputs (many channels)\n get stronger regularization to prevent overfitting, while low-dimensional\n inputs get lighter regularization for better learning capacity.\n 3. Progressive temporal downsampling: long sequences are progressively\n compressed between blocks, effectively multiplying the receptive field\n and enabling multi-resolution feature extraction.\n 4. Spectral channel gating uses FFT amplitude statistics (mean+std) to\n provide frequency-aware channel recalibration.\n 5. Triple pooling (mean + max + learned attention) for robust aggregation.\n \"\"\"\n\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.enc_in = configs.enc_in\n self.num_class = configs.num_class\n\n d_model = 128\n n_blocks = 3\n self.d_model = d_model\n\n # Novel: Data-adaptive dropout based on input dimensionality\n # More input channels → higher overfitting risk → stronger regularization\n channel_ratio = min(self.enc_in / 50.0, 1.0)\n dropout = 0.15 + 0.25 * channel_ratio\n dropout = max(0.1, min(0.45, dropout))\n\n # Input projection\n self.input_proj = nn.Sequential(\n nn.Conv1d(self.enc_in, d_model, kernel_size=7, padding=3),\n nn.BatchNorm1d(d_model),\n nn.GELU(),\n nn.Dropout(dropout)\n )\n\n # Multi-scale blocks with increasing dilation\n self.blocks = nn.ModuleList([\n MultiScaleBlock(d_model, dilation=2**i, dropout=dropout)\n for i in range(n_blocks)\n ])\n\n # Spectral channel gating\n self.spectral_gate = SpectralGating(d_model)\n\n # Learned attention pooling\n self.attn_pool = nn.Linear(d_model, 1)\n\n # Classification head\n self.head = nn.Sequential(\n nn.LayerNorm(d_model * 3),\n nn.Linear(d_model * 3, d_model),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_model, self.num_class)\n )\n\n # Progressive downsampling threshold\n self.downsample_threshold = 128\n\n def classification(self, x_enc, x_mark_enc):\n B, T, C = x_enc.shape\n\n # Instance normalization\n means = x_enc.mean(1, keepdim=True).detach()\n x = x_enc - means\n stdev = torch.sqrt(x.var(1, keepdim=True, unbiased=False) + 1e-5).detach()\n x = x / stdev\n\n # [B, C, T] -> [B, D, T]\n x = self.input_proj(x.permute(0, 2, 1))\n\n # Multi-scale temporal processing with progressive downsampling\n for block in self.blocks:\n x = block(x)\n # Downsample long sequences to increase effective receptive field\n if x.shape[-1] > self.downsample_threshold:\n x = F.avg_pool1d(x, kernel_size=2, stride=2)\n\n # Spectral channel gating\n x = self.spectral_gate(x)\n\n # Transpose for pooling: [B, T', D]\n x_t = x.permute(0, 2, 1)\n\n # Triple pooling (global, no mask needed after downsampling)\n mean_pool = x_t.mean(1) # [B, D]\n max_pool = x_t.max(1)[0] # [B, D]\n\n attn_scores = self.attn_pool(x_t).squeeze(-1) # [B, T']\n attn_weights = F.softmax(attn_scores, dim=-1)\n attn_pooled = torch.einsum('bt,btd->bd', attn_weights, x_t) # [B, D]\n\n combined = torch.cat([mean_pool, max_pool, attn_pooled], dim=-1)\n return self.head(combined)\n\n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'classification':\n dec_out = self.classification(x_enc, x_mark_enc)\n return dec_out\n return None\n"
7}Tool ResultStep 5
OK: Replaced lines 1..196 with 183 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–183 | total: 183 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 import math
5
6
7 class MultiScaleBlock(nn.Module):
8 """
9 Multi-scale depthwise-separable temporal convolution block.
10 Uses depthwise convolutions at scales 3,5,7 with configurable dilation,
11 pointwise channel mixing, and a feed-forward network with residual connections.
12 """
13 def __init__(self, d_model, dilation=1, dropout=0.25):
14 super().__init__()
15 self.dw_convs = nn.ModuleList()
16 for k in [3, 5, 7]:
17 pad = ((k - 1) * dilation) // 2
18 self.dw_convs.append(
19 nn.Conv1d(d_model, d_model, kernel_size=k
AssistantStep 6test
Tool ResultStep 6
[Test #3] (0 tests remaining) ## Seed 42 ### EthanolConcentration (scripts/ethanol.sh) Training metrics (last epochs): TRAIN_METRICS epoch=8 train_loss=1.328 vali_loss=1.407 vali_acc=0.281 test_acc=0.281 TRAIN_METRICS epoch=9 train_loss=1.300 vali_loss=1.410 vali_acc=0.266 test_acc=0.266 TRAIN_METRICS epoch=10 train_loss=1.297 vali_loss=1.460 vali_acc=0.270 test_acc=0.270 TRAIN_METRICS epoch=11 train_loss=1.329 vali_loss=1.431 vali_acc=0.259 test_acc=0.259 TRAIN_METRICS epoch=12 train_loss=1.300 vali_loss=1.418 vali_acc=0.266 test_acc=0.266 Test results (EthanolConcentration): Accuracy: 0.28517 ### FaceDetection (scripts/facedetection.sh) Training metrics (last epochs): TRAIN_METRICS epoch=21 train_loss=0.184 vali_loss=1.703 vali_acc=0.552 test_acc=0.552 TRAIN_METRICS epoch=22 train_loss=0.181 vali_loss=1.614 vali_acc=0.556 test_acc=0.556 TRAIN_METRICS epoch=23 train_loss=0.162 vali_loss=1.462 vali_acc=0.571 test_acc=0.571 TRAIN_METRICS epoch=24 train_loss=0.158 vali_loss=1.855 v