Agent Conversation: deepseek-reasoner
Model: deepseek-reasoner
Total steps: 6
Messages: 13
UserStep 0
Task: ts-classification
Time Series Classification: Custom Model Design
Objective
Design and implement a custom deep learning model for multivariate time series classification. Your code goes in the Model class in models/Custom.py. Three reference implementations (DLinear, TimesNet, PatchTST) are provided as read-only.
Evaluation
Trained and evaluated on three UEA datasets:
- EthanolConcentration — spectral data classification
- FaceDetection — MEG brain imaging classification
- Handwriting — accelerometer-based character recognition
Training uses RAdam optimizer, CrossEntropyLoss, patience=10. Metric: accuracy (higher is better).
Time-Series-Library/models/Custom.py [EDITABLE — entire file only]
1: import torch
2: import torch.nn as nn
3: import torch.nn.functional as F
4:
5:
6: class Model(nn.Module):
7: """
8: Custom model for time series classification.
9:
10: Forward signature: forward(x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None)
11: - x_enc: [batch, seq_len, enc_in] — input time series
12: - x_mark_enc: [batch, seq_len] — padding mask (1=valid, 0=padding)
13: - x_dec: not used (None)
14: - x_mark_dec: not used (None)
15:
16: Must return: [batch, num_class] — class logits (before softmax)
17:
18: Note: configs.seq_len, configs.enc_in, and configs.num_class are set
19: dynamically from the dataset at runtime.
20: """
21:
22: def __init__(self, configs):
23: super(Model, self).__init__()
24: self.task_name = configs.task_name
25: self.seq_len = configs.seq_len
26: self.enc_in = configs.enc_in
27: self.num_class = configs.num_class
28: # TODO: Define your model architecture here
29:
30: def classification(self, x_enc, x_mark_enc):
31: """
32: Classification: assign a label to the input time series.
33: Input: x_enc [batch, seq_len, enc_in]
34: x_mark_enc: [batch, seq_len] padding mask
35: Output: [batch, num_class] logits
36: """
37: # TODO: Implement your classification logic
38: batch_size = x_enc.shape[0]
39: return torch.zeros(batch_size, self.num_class).to(x_enc.device)
40:
41: def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
42: if self.task_name == 'classification':
43: dec_out = self.classification(x_enc, x_mark_enc)
44: return dec_out
45: return None
Time-Series-Library/models/DLinear.py [READ-ONLY — do not edit]
1: import torch
2: import torch.nn as nn
3: import torch.nn.functional as F
4: from layers.Autoformer_EncDec import series_decomp
5:
6:
7: class Model(nn.Module):
8: """
9: Paper lAssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 45,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\n\n\nclass MultiScaleConv(nn.Module):\n \"\"\"\n Multi-scale 1D convolutional block with residual connection.\n Uses parallel convs with different kernel sizes, then combines.\n \"\"\"\n def __init__(self, in_channels, out_channels, kernel_sizes=[3, 5, 7], dropout=0.1):\n super(MultiScaleConv, self).__init__()\n self.convs = nn.ModuleList()\n for ks in kernel_sizes:\n padding = (ks // 2)\n self.convs.append(\n nn.Sequential(\n nn.Conv1d(in_channels, out_channels, kernel_size=ks, padding=padding),\n nn.BatchNorm1d(out_channels),\n nn.GELU(),\n nn.Dropout(dropout)\n )\n )\n # If in_channels != out_channels, need projection\n if in_channels != out_channels:\n self.residual_proj = nn.Conv1d(in_channels, out_channels, kernel_size=1)\n else:\n self.residual_proj = None\n self.out_channels = out_channels\n \n def forward(self, x):\n # x shape: [batch, channels, length]\n residual = x\n out = 0\n for conv in self.convs:\n out = out + conv(x)\n out = out / len(self.convs)\n if self.residual_proj is not None:\n residual = self.residual_proj(residual)\n # residual and out have same channels now\n return F.gelu(out + residual)\n\n\nclass TemporalSelfAttention(nn.Module):\n \"\"\"\n Simple temporal self-attention without multi-head to keep params low.\n \"\"\"\n def __init__(self, d_model, dropout=0.1):\n super(TemporalSelfAttention, self).__init__()\n self.d_model = d_model\n self.qkv = nn.Linear(d_model, d_model * 3)\n self.dropout = nn.Dropout(dropout)\n self.scale = 1.0 / math.sqrt(d_model)\n \n def forward(self, x):\n # x shape: [batch, length, d_model]\n B, L, D = x.shape\n qkv = self.qkv(x).reshape(B, L, 3, D).permute(2, 0, 1, 3) # [3, B, L, D]\n q, k, v = qkv[0], qkv[1], qkv[2]\n attn = torch.matmul(q, k.transpose(-2, -1)) * self.scale # [B, L, L]\n attn = F.softmax(attn, dim=-1)\n attn = self.dropout(attn)\n out = torch.matmul(attn, v) # [B, L, D]\n return out + x # residual\n\n\nclass ChannelAttention(nn.Module):\n \"\"\"\n Channel attention across variables (squeeze-and-excitation style).\n \"\"\"\n def __init__(self, num_channels, reduction=4):\n super(ChannelAttention, self).__init__()\n self.avg_pool = nn.AdaptiveAvgPool1d(1)\n self.fc = nn.Sequential(\n nn.Linear(num_channels, num_channels // reduction, bias=False),\n nn.ReLU(inplace=True),\n nn.Linear(num_channels // reduction, num_channels, bias=False),\n nn.Sigmoid()\n )\n \n def forward(self, x):\n # x shape: [batch, channels, length]\n B, C, L = x.shape\n y = self.avg_pool(x).squeeze(-1) # [B, C]\n y = self.fc(y).unsqueeze(-1) # [B, C, 1]\n return x * y.expand_as(x)\n\n\nclass Model(nn.Module):\n \"\"\"\n Custom model for time series classification.\n Multi-scale convolutional encoder + temporal attention + channel attention.\n \"\"\"\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.enc_in = configs.enc_in\n self.num_class = configs.num_class\n \n # Use config parameters if available, else defaults\n d_model = getattr(configs, 'd_model', 128)\n d_ff = getattr(configs, 'd_ff', 256)\n n_heads = getattr(configs, 'n_heads', 4)\n e_layers = getattr(configs, 'e_layers', 2)\n dropout = getattr(configs, 'dropout', 0.1)\n \n # Initial projection to d_model\n self.input_proj = nn.Linear(self.enc_in, d_model)\n \n # Multi-scale convolutional layers\n self.conv_layers = nn.ModuleList()\n for i in range(e_layers):\n self.conv_layers.append(\n MultiScaleConv(d_model, d_model, kernel_sizes=[3, 5, 7], dropout=dropout)\n )\n \n # Temporal self-attention layer\n self.temp_attention = TemporalSelfAttention(d_model, dropout=dropout)\n \n # Channel attention (applied to transposed view)\n self.channel_attention = ChannelAttention(d_model, reduction=4)\n \n # Adaptive pooling\n self.adaptive_pool = nn.AdaptiveAvgPool1d(1)\n \n # Classification head\n self.dropout = nn.Dropout(dropout)\n self.classifier = nn.Sequential(\n nn.Linear(d_model, d_ff),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_ff, self.num_class)\n )\n \n def classification(self, x_enc, x_mark_enc):\n \"\"\"\n Classification: assign a label to the input time series.\n Input: x_enc [batch, seq_len, enc_in]\n x_mark_enc: [batch, seq_len] padding mask\n Output: [batch, num_class] logits\n \"\"\"\n batch_size = x_enc.shape[0]\n \n # Apply padding mask if provided (zero out padded positions)\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [batch, seq_len, 1]\n x_enc = x_enc * mask\n \n # Project input [batch, seq_len, enc_in] -> [batch, seq_len, d_model]\n x = self.input_proj(x_enc)\n \n # Transpose for conv layers: [batch, d_model, seq_len]\n x = x.transpose(1, 2)\n \n # Apply multi-scale convolutional layers\n for conv in self.conv_layers:\n x = conv(x)\n \n # Apply channel attention\n x = self.channel_attention(x)\n \n # Transpose back for temporal attention: [batch, seq_len, d_model]\n x = x.transpose(1, 2)\n \n # Apply temporal self-attention\n x = self.temp_attention(x)\n \n # Apply padding mask again before pooling\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [batch, seq_len, 1]\n x = x * mask\n \n # Adaptive average pooling over time dimension\n # Transpose to [batch, d_model, seq_len] for pooling\n x = x.transpose(1, 2)\n x = self.adaptive_pool(x) # [batch, d_model, 1]\n x = x.squeeze(-1) # [batch, d_model]\n \n # Classification\n x = self.dropout(x)\n logits = self.classifier(x)\n return logits\n\n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'classification':\n dec_out = self.classification(x_enc, x_mark_enc)\n return dec_out\n return None"
7}Tool ResultStep 1
OK: Replaced lines 1..45 with 190 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–190 | total: 190 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 import math
5
6
7 class MultiScaleConv(nn.Module):
8 """
9 Multi-scale 1D convolutional block with residual connection.
10 Uses parallel convs with different kernel sizes, then combines.
11 """
12 def __init__(self, in_channels, out_channels, kernel_sizes=[3, 5, 7], dropout=0.1):
13 super(MultiScaleConv, self).__init__()
14 self.convs = nn.ModuleList()
15 for ks in kernel_sizes:
16 padding = (ks // 2)
17 self.convs.append(
18 nn.Sequential(
19 nn.Conv1d(in_channels, out_channels, kernel_size=ks, padding=pad
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### EthanolConcentration (scripts/ethanol.sh) Training metrics (last epochs): TRAIN_METRICS epoch=9 train_loss=1.453 vali_loss=1.425 vali_acc=0.221 test_acc=0.221 TRAIN_METRICS epoch=10 train_loss=1.428 vali_loss=1.428 vali_acc=0.266 test_acc=0.266 TRAIN_METRICS epoch=11 train_loss=1.428 vali_loss=1.427 vali_acc=0.236 test_acc=0.236 TRAIN_METRICS epoch=12 train_loss=1.402 vali_loss=1.419 vali_acc=0.251 test_acc=0.251 TRAIN_METRICS epoch=13 train_loss=1.382 vali_loss=1.447 vali_acc=0.278 test_acc=0.278 Test results (EthanolConcentration): Accuracy: 0.28897 ### FaceDetection (scripts/facedetection.sh) Training metrics (last epochs): TRAIN_METRICS epoch=10 train_loss=0.424 vali_loss=0.884 vali_acc=0.530 test_acc=0.530 TRAIN_METRICS epoch=11 train_loss=0.349 vali_loss=0.975 vali_acc=0.547 test_acc=0.547 TRAIN_METRICS epoch=12 train_loss=0.268 vali_loss=1.361 vali_acc=0.552 test_acc=0.552 TRAIN_METRICS ep
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 190,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\n\n\nclass DilatedConvBlock(nn.Module):\n \"\"\"\n Dilated convolutional block with multiple parallel dilation rates.\n Each branch: Conv1d with dilation, BatchNorm, GELU, Dropout.\n \"\"\"\n def __init__(self, in_channels, out_channels, dilation_rates=[1, 2, 4, 8], kernel_size=3, dropout=0.1):\n super(DilatedConvBlock, self).__init__()\n self.branches = nn.ModuleList()\n for dilation in dilation_rates:\n padding = (kernel_size - 1) * dilation // 2\n self.branches.append(\n nn.Sequential(\n nn.Conv1d(in_channels, out_channels, kernel_size=kernel_size, \n padding=padding, dilation=dilation),\n nn.BatchNorm1d(out_channels),\n nn.GELU(),\n nn.Dropout(dropout)\n )\n )\n # Residual projection if needed\n if in_channels != out_channels:\n self.residual_proj = nn.Conv1d(in_channels, out_channels, kernel_size=1)\n else:\n self.residual_proj = None\n self.out_channels = out_channels\n \n def forward(self, x):\n residual = x\n out = 0\n for branch in self.branches:\n out = out + branch(x)\n out = out / len(self.branches)\n if self.residual_proj is not None:\n residual = self.residual_proj(residual)\n return F.gelu(out + residual)\n\n\nclass MultiHeadTemporalAttention(nn.Module):\n \"\"\"\n Multi-head temporal self-attention with residual and layer norm.\n \"\"\"\n def __init__(self, d_model, n_heads, dropout=0.1):\n super(MultiHeadTemporalAttention, self).__init__()\n assert d_model % n_heads == 0\n self.d_model = d_model\n self.n_heads = n_heads\n self.head_dim = d_model // n_heads\n self.scale = 1.0 / math.sqrt(self.head_dim)\n \n self.qkv = nn.Linear(d_model, d_model * 3)\n self.proj = nn.Linear(d_model, d_model)\n self.dropout = nn.Dropout(dropout)\n self.norm = nn.LayerNorm(d_model)\n \n def forward(self, x, mask=None):\n # x shape: [batch, length, d_model]\n B, L, D = x.shape\n residual = x\n \n # Compute Q, K, V\n qkv = self.qkv(x).reshape(B, L, 3, self.n_heads, self.head_dim).permute(2, 0, 3, 1, 4)\n q, k, v = qkv[0], qkv[1], qkv[2] # each [B, n_heads, L, head_dim]\n \n # Attention scores\n attn = torch.matmul(q, k.transpose(-2, -1)) * self.scale # [B, n_heads, L, L]\n if mask is not None:\n # mask shape [B, L] -> [B, 1, 1, L] for broadcasting\n mask = mask.unsqueeze(1).unsqueeze(2)\n attn = attn.masked_fill(mask == 0, float('-inf'))\n attn = F.softmax(attn, dim=-1)\n attn = self.dropout(attn)\n \n # Weighted sum\n out = torch.matmul(attn, v) # [B, n_heads, L, head_dim]\n out = out.transpose(1, 2).reshape(B, L, D) # [B, L, D]\n out = self.proj(out)\n out = self.dropout(out)\n out = self.norm(out + residual)\n return out\n\n\nclass CrossChannelAttention(nn.Module):\n \"\"\"\n Attention across channels (variables) using linear projection.\n \"\"\"\n def __init__(self, d_model, reduction=4, dropout=0.1):\n super(CrossChannelAttention, self).__init__()\n self.d_model = d_model\n self.reduction = reduction\n self.avg_pool = nn.AdaptiveAvgPool1d(1)\n \n # Two-layer MLP for channel attention\n self.mlp = nn.Sequential(\n nn.Linear(d_model, d_model // reduction),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_model // reduction, d_model),\n nn.Sigmoid()\n )\n \n def forward(self, x):\n # x shape: [batch, d_model, length]\n B, C, L = x.shape\n # Compute channel-wise statistics\n y = self.avg_pool(x).squeeze(-1) # [B, C]\n y = self.mlp(y).unsqueeze(-1) # [B, C, 1]\n return x * y.expand_as(x)\n\n\nclass Model(nn.Module):\n \"\"\"\n Custom model for time series classification.\n Uses dilated convolutions for multi-scale temporal patterns,\n multi-head temporal attention, and cross-channel attention.\n \"\"\"\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.enc_in = configs.enc_in\n self.num_class = configs.num_class\n \n # Use config parameters\n d_model = configs.d_model\n d_ff = configs.d_ff\n n_heads = configs.n_heads\n e_layers = configs.e_layers\n dropout = configs.dropout\n \n # Input projection\n self.input_proj = nn.Linear(self.enc_in, d_model)\n \n # Dilated convolutional layers\n self.dilated_convs = nn.ModuleList()\n for i in range(e_layers):\n self.dilated_convs.append(\n DilatedConvBlock(d_model, d_model, dilation_rates=[1, 2, 4, 8], \n kernel_size=3, dropout=dropout)\n )\n \n # Temporal multi-head attention\n self.temp_attention = MultiHeadTemporalAttention(d_model, n_heads, dropout)\n \n # Cross-channel attention\n self.channel_attention = CrossChannelAttention(d_model, reduction=4, dropout=dropout)\n \n # Adaptive pooling\n self.adaptive_pool = nn.AdaptiveAvgPool1d(1)\n \n # Classification head\n self.classifier = nn.Sequential(\n nn.Linear(d_model, d_ff),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_ff, self.num_class)\n )\n \n def classification(self, x_enc, x_mark_enc):\n \"\"\"\n Classification: assign a label to the input time series.\n Input: x_enc [batch, seq_len, enc_in]\n x_mark_enc: [batch, seq_len] padding mask\n Output: [batch, num_class] logits\n \"\"\"\n batch_size = x_enc.shape[0]\n \n # Apply padding mask if provided\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [batch, seq_len, 1]\n x_enc = x_enc * mask\n \n # Project input [batch, seq_len, enc_in] -> [batch, seq_len, d_model]\n x = self.input_proj(x_enc)\n \n # Transpose for conv layers: [batch, d_model, seq_len]\n x = x.transpose(1, 2)\n \n # Apply dilated convolutional layers\n for conv in self.dilated_convs:\n x = conv(x)\n \n # Apply cross-channel attention\n x = self.channel_attention(x)\n \n # Transpose back for temporal attention: [batch, seq_len, d_model]\n x = x.transpose(1, 2)\n \n # Apply temporal attention with mask\n attn_mask = x_mark_enc if x_mark_enc is not None else None\n x = self.temp_attention(x, mask=attn_mask)\n \n # Apply padding mask again before pooling\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [batch, seq_len, 1]\n x = x * mask\n \n # Adaptive average pooling over time dimension\n x = x.transpose(1, 2) # [batch, d_model, seq_len]\n x = self.adaptive_pool(x) # [batch, d_model, 1]\n x = x.squeeze(-1) # [batch, d_model]\n \n # Classification\n logits = self.classifier(x)\n return logits\n\n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'classification':\n dec_out = self.classification(x_enc, x_mark_enc)\n return dec_out\n return None"
7}Tool ResultStep 3
OK: Replaced lines 1..190 with 216 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–216 | total: 216 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 import math
5
6
7 class DilatedConvBlock(nn.Module):
8 """
9 Dilated convolutional block with multiple parallel dilation rates.
10 Each branch: Conv1d with dilation, BatchNorm, GELU, Dropout.
11 """
12 def __init__(self, in_channels, out_channels, dilation_rates=[1, 2, 4, 8], kernel_size=3, dropout=0.1):
13 super(DilatedConvBlock, self).__init__()
14 self.branches = nn.ModuleList()
15 for dilation in dilation_rates:
16 padding = (kernel_size - 1) * dilation // 2
17 self.branches.append(
18 nn.Sequential(
19
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### EthanolConcentration (scripts/ethanol.sh) Training metrics (last epochs): TRAIN_METRICS epoch=7 train_loss=1.352 vali_loss=1.387 vali_acc=0.243 test_acc=0.243 TRAIN_METRICS epoch=8 train_loss=1.356 vali_loss=1.393 vali_acc=0.274 test_acc=0.274 TRAIN_METRICS epoch=9 train_loss=1.338 vali_loss=1.401 vali_acc=0.278 test_acc=0.278 TRAIN_METRICS epoch=10 train_loss=1.355 vali_loss=1.411 vali_acc=0.259 test_acc=0.259 TRAIN_METRICS epoch=11 train_loss=1.342 vali_loss=1.405 vali_acc=0.270 test_acc=0.270 Test results (EthanolConcentration): Accuracy: 0.28137 ### FaceDetection (scripts/facedetection.sh) Training metrics (last epochs): TRAIN_METRICS epoch=12 train_loss=0.132 vali_loss=1.652 vali_acc=0.591 test_acc=0.591 TRAIN_METRICS epoch=13 train_loss=0.106 vali_loss=1.463 vali_acc=0.568 test_acc=0.568 TRAIN_METRICS epoch=14 train_loss=0.101 vali_loss=1.898 vali_acc=0.573 test_acc=0.573 TRAIN_METRICS epoch=15 train_los
AssistantStep 5edit
Tool Input
1{
2 "op": "replace",
3 "filename": "Time-Series-Library/models/Custom.py",
4 "start_line": 1,
5 "end_line": 216,
6 "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\nimport torch.fft\n\n\nclass FrequencyAttention(nn.Module):\n \"\"\"\n Frequency-domain attention: FFT -> learnable weighting -> inverse FFT.\n \"\"\"\n def __init__(self, d_model, seq_len, dropout=0.1):\n super(FrequencyAttention, self).__init__()\n self.d_model = d_model\n self.seq_len = seq_len\n # Number of frequency bins (real FFT)\n self.n_freq = seq_len // 2 + 1\n \n # Learnable weight for each frequency bin\n self.freq_weight = nn.Parameter(torch.ones(1, d_model, self.n_freq))\n self.dropout = nn.Dropout(dropout)\n \n def forward(self, x):\n # x shape: [batch, d_model, seq_len]\n B, C, L = x.shape\n # Apply FFT along the time dimension\n x_freq = torch.fft.rfft(x, dim=-1) # [B, C, n_freq] complex\n \n # Separate magnitude and phase\n mag = torch.abs(x_freq)\n phase = torch.angle(x_freq)\n \n # Learnable scaling of magnitude\n weight = self.freq_weight # [1, C, n_freq]\n mag = mag * weight\n \n # Reconstruct complex numbers\n real = mag * torch.cos(phase)\n imag = mag * torch.sin(phase)\n x_freq_weighted = torch.complex(real, imag)\n \n # Inverse FFT\n x_time = torch.fft.irfft(x_freq_weighted, n=L, dim=-1) # [B, C, L]\n \n # Residual connection\n out = x + self.dropout(x_time)\n return out\n\n\nclass DilatedConvBlock(nn.Module):\n \"\"\"\n Dilated convolutional block with multiple parallel dilation rates.\n Each branch: Conv1d with dilation, BatchNorm, GELU, Dropout.\n \"\"\"\n def __init__(self, in_channels, out_channels, dilation_rates=[1, 2, 4, 8], kernel_size=3, dropout=0.1):\n super(DilatedConvBlock, self).__init__()\n self.branches = nn.ModuleList()\n for dilation in dilation_rates:\n padding = (kernel_size - 1) * dilation // 2\n self.branches.append(\n nn.Sequential(\n nn.Conv1d(in_channels, out_channels, kernel_size=kernel_size, \n padding=padding, dilation=dilation),\n nn.BatchNorm1d(out_channels),\n nn.GELU(),\n nn.Dropout(dropout)\n )\n )\n # Residual projection if needed\n if in_channels != out_channels:\n self.residual_proj = nn.Conv1d(in_channels, out_channels, kernel_size=1)\n else:\n self.residual_proj = None\n self.out_channels = out_channels\n \n def forward(self, x):\n residual = x\n out = 0\n for branch in self.branches:\n out = out + branch(x)\n out = out / len(self.branches)\n if self.residual_proj is not None:\n residual = self.residual_proj(residual)\n return F.gelu(out + residual)\n\n\nclass MultiHeadTemporalAttention(nn.Module):\n \"\"\"\n Multi-head temporal self-attention with residual and layer norm.\n \"\"\"\n def __init__(self, d_model, n_heads, dropout=0.1):\n super(MultiHeadTemporalAttention, self).__init__()\n assert d_model % n_heads == 0\n self.d_model = d_model\n self.n_heads = n_heads\n self.head_dim = d_model // n_heads\n self.scale = 1.0 / math.sqrt(self.head_dim)\n \n self.qkv = nn.Linear(d_model, d_model * 3)\n self.proj = nn.Linear(d_model, d_model)\n self.dropout = nn.Dropout(dropout)\n self.norm = nn.LayerNorm(d_model)\n \n def forward(self, x, mask=None):\n # x shape: [batch, length, d_model]\n B, L, D = x.shape\n residual = x\n \n # Compute Q, K, V\n qkv = self.qkv(x).reshape(B, L, 3, self.n_heads, self.head_dim).permute(2, 0, 3, 1, 4)\n q, k, v = qkv[0], qkv[1], qkv[2] # each [B, n_heads, L, head_dim]\n \n # Attention scores\n attn = torch.matmul(q, k.transpose(-2, -1)) * self.scale # [B, n_heads, L, L]\n if mask is not None:\n # mask shape [B, L] -> [B, 1, 1, L] for broadcasting\n mask = mask.unsqueeze(1).unsqueeze(2)\n attn = attn.masked_fill(mask == 0, float('-inf'))\n attn = F.softmax(attn, dim=-1)\n attn = self.dropout(attn)\n \n # Weighted sum\n out = torch.matmul(attn, v) # [B, n_heads, L, head_dim]\n out = out.transpose(1, 2).reshape(B, L, D) # [B, L, D]\n out = self.proj(out)\n out = self.dropout(out)\n out = self.norm(out + residual)\n return out\n\n\nclass CrossChannelAttention(nn.Module):\n \"\"\"\n Attention across channels (variables) using linear projection.\n \"\"\"\n def __init__(self, d_model, reduction=4, dropout=0.1):\n super(CrossChannelAttention, self).__init__()\n self.d_model = d_model\n self.reduction = reduction\n self.avg_pool = nn.AdaptiveAvgPool1d(1)\n \n # Two-layer MLP for channel attention\n self.mlp = nn.Sequential(\n nn.Linear(d_model, d_model // reduction),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_model // reduction, d_model),\n nn.Sigmoid()\n )\n \n def forward(self, x):\n # x shape: [batch, d_model, length]\n B, C, L = x.shape\n # Compute channel-wise statistics\n y = self.avg_pool(x).squeeze(-1) # [B, C]\n y = self.mlp(y).unsqueeze(-1) # [B, C, 1]\n return x * y.expand_as(x)\n\n\nclass Model(nn.Module):\n \"\"\"\n Custom model for time series classification.\n Combines frequency-domain attention, dilated convolutions for multi-scale temporal patterns,\n multi-head temporal attention, and cross-channel attention.\n \"\"\"\n def __init__(self, configs):\n super(Model, self).__init__()\n self.task_name = configs.task_name\n self.seq_len = configs.seq_len\n self.enc_in = configs.enc_in\n self.num_class = configs.num_class\n \n # Use config parameters\n d_model = configs.d_model\n d_ff = configs.d_ff\n n_heads = configs.n_heads\n e_layers = configs.e_layers\n dropout = configs.dropout\n \n # Input projection\n self.input_proj = nn.Linear(self.enc_in, d_model)\n \n # Frequency attention branch\n self.freq_attention = FrequencyAttention(d_model, self.seq_len, dropout)\n \n # Dilated convolutional layers\n self.dilated_convs = nn.ModuleList()\n for i in range(e_layers):\n self.dilated_convs.append(\n DilatedConvBlock(d_model, d_model, dilation_rates=[1, 2, 4, 8], \n kernel_size=3, dropout=dropout)\n )\n \n # Temporal multi-head attention\n self.temp_attention = MultiHeadTemporalAttention(d_model, n_heads, dropout)\n \n # Cross-channel attention\n self.channel_attention = CrossChannelAttention(d_model, reduction=4, dropout=dropout)\n \n # Adaptive pooling\n self.adaptive_pool = nn.AdaptiveAvgPool1d(1)\n \n # Classification head\n self.classifier = nn.Sequential(\n nn.Linear(d_model, d_ff),\n nn.GELU(),\n nn.Dropout(dropout),\n nn.Linear(d_ff, self.num_class)\n )\n \n def classification(self, x_enc, x_mark_enc):\n \"\"\"\n Classification: assign a label to the input time series.\n Input: x_enc [batch, seq_len, enc_in]\n x_mark_enc: [batch, seq_len] padding mask\n Output: [batch, num_class] logits\n \"\"\"\n batch_size = x_enc.shape[0]\n \n # Apply padding mask if provided\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [batch, seq_len, 1]\n x_enc = x_enc * mask\n \n # Project input [batch, seq_len, enc_in] -> [batch, seq_len, d_model]\n x = self.input_proj(x_enc)\n \n # Transpose for conv layers: [batch, d_model, seq_len]\n x = x.transpose(1, 2)\n \n # Apply frequency attention\n x = self.freq_attention(x)\n \n # Apply dilated convolutional layers\n for conv in self.dilated_convs:\n x = conv(x)\n \n # Apply cross-channel attention\n x = self.channel_attention(x)\n \n # Transpose back for temporal attention: [batch, seq_len, d_model]\n x = x.transpose(1, 2)\n \n # Apply temporal attention with mask\n attn_mask = x_mark_enc if x_mark_enc is not None else None\n x = self.temp_attention(x, mask=attn_mask)\n \n # Apply padding mask again before pooling\n if x_mark_enc is not None:\n mask = x_mark_enc.unsqueeze(-1) # [batch, seq_len, 1]\n x = x * mask\n \n # Adaptive average pooling over time dimension\n x = x.transpose(1, 2) # [batch, d_model, seq_len]\n x = self.adaptive_pool(x) # [batch, d_model, 1]\n x = x.squeeze(-1) # [batch, d_model]\n \n # Classification\n logits = self.classifier(x)\n return logits\n\n def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):\n if self.task_name == 'classification':\n dec_out = self.classification(x_enc, x_mark_enc)\n return dec_out\n return None"
7}Tool ResultStep 5
OK: Replaced lines 1..216 with 265 line(s) in Time-Series-Library/models/Custom.py. Editable range: entire file.
[Current file: Time-Series-Library/models/Custom.py | editable: 1–265 | total: 265 lines]
1 import torch
2 import torch.nn as nn
3 import torch.nn.functional as F
4 import math
5 import torch.fft
6
7
8 class FrequencyAttention(nn.Module):
9 """
10 Frequency-domain attention: FFT -> learnable weighting -> inverse FFT.
11 """
12 def __init__(self, d_model, seq_len, dropout=0.1):
13 super(FrequencyAttention, self).__init__()
14 self.d_model = d_model
15 self.seq_len = seq_len
16 # Number of frequency bins (real FFT)
17 self.n_freq = seq_len // 2 + 1
18
19 # Learnable weight for each frequency bin
20 self.freq_weight = nn.Parameter(torch.ones(1, d_model, self.n_freq))
21 self.
AssistantStep 6test
Tool ResultStep 6
[Test #3] (0 tests remaining) ## Seed 42 ### EthanolConcentration (scripts/ethanol.sh) Training metrics (last epochs): TRAIN_METRICS epoch=28 train_loss=1.275 vali_loss=1.459 vali_acc=0.259 test_acc=0.259 TRAIN_METRICS epoch=29 train_loss=1.260 vali_loss=1.493 vali_acc=0.224 test_acc=0.224 TRAIN_METRICS epoch=30 train_loss=1.259 vali_loss=1.538 vali_acc=0.281 test_acc=0.281 TRAIN_METRICS epoch=31 train_loss=1.293 vali_loss=1.486 vali_acc=0.243 test_acc=0.243 TRAIN_METRICS epoch=32 train_loss=1.250 vali_loss=1.479 vali_acc=0.274 test_acc=0.274 Test results (EthanolConcentration): Accuracy: 0.29658 ### FaceDetection (scripts/facedetection.sh) Training metrics (last epochs): TRAIN_METRICS epoch=18 train_loss=0.071 vali_loss=2.369 vali_acc=0.565 test_acc=0.565 TRAIN_METRICS epoch=19 train_loss=0.067 vali_loss=2.298 vali_acc=0.581 test_acc=0.581 TRAIN_METRICS epoch=20 train_loss=0.072 vali_loss=2.513 vali_acc=0.575 test_acc=0.575 TRAIN_METRICS epoch=21 train_loss=0.076 vali_loss=1.620