Agent Conversation: anthropic/claude-opus-4.6

Model: anthropic/claude-opus-4.6
Total steps: 6
Messages: 22
UserStep 0

Task: quant-graph-stock

Graph-Based Multi-Stock Prediction on CSI300

Objective

Design and implement a graph-based stock prediction model that leverages inter-stock relationships through a stock-concept graph. Your code goes in custom_model.py. Three reference implementations (HIST, GATs, LightGBM) are provided as read-only.

Evaluation

Signal quality: IC, ICIR, Rank IC. Portfolio (TopkDropout, top 50, drop 5): Annualized Return, Max Drawdown, Information Ratio. Automatic via qlib's workflow.

Workflow Configuration

workflow_config.yaml lines 14-26 and 32-45 are editable. This covers the model plus dataset adapter/preprocessor configuration. Instruments, date ranges, train/valid/test splits, and evaluation settings are fixed.

qlib/custom_model.py [EDITABLE — lines 58–156 only]

     1: # Custom graph-based stock prediction model for MLS-Bench
     2: #
     3: # EDITABLE section: CustomModel class with fit() and predict() methods.
     4: # FIXED sections: imports and stock-concept graph loading below.
     5: import os
     6: import numpy as np
     7: import pandas as pd
     8: import torch
     9: import torch.nn as nn
    10: import torch.nn.functional as F
    11: from qlib.model.base import Model
    12: from qlib.data.dataset import DatasetH
    13: from qlib.data.dataset.handler import DataHandlerLP
    14: 
    15: DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
    16: 
    17: # =====================================================================
    18: # FIXED: Stock-concept graph data loading utilities
    19: # =====================================================================
    20: # Paths to pre-downloaded graph data
    21: STOCK2CONCEPT_PATH = os.path.expanduser("~/.qlib/qlib_data/qlib_csi300_stock2concept.npy")
    22: STOCK_INDEX_PATH = os.path.expanduser("~/.qlib/qlib_data/qlib_csi300_stock_index.npy")
    23: 
    24: # Load the stock-concept mapping matrix and stock index
    25: # stock2concept_matrix: shape (num_stocks, num_concepts), binary membership
    26: # stock_index_dict: dict mapping instrument name -> integer index
    27: _stock2concept_matrix = np.load(STOCK2CONCEPT_PATH)
    28: _stock_index_dict = np.load(STOCK_INDEX_PATH, allow_pickle=True).item()
    29: 
    30: 
    31: def get_stock_index(instruments, default_index=733):
    32:     """Map instrument names to integer indices for stock2concept lookup.
    33: 
    34:     Args:
    35:         instruments: array-like of instrument name strings
    36:         default_index: fallback index for unknown instruments (733 = padding)
    37: 
    38:     Returns:
    39:         np.ndarray of integer indices
    40:     """
    41:     indices = np.array([_stock_index_dict.get(inst, default_index)
    42:                         for inst in instruments])
    43:     return indices.astype(int)
    44: 
    45: 
    46: def get_concept_matrix(stock_indices):
    47:     """Get the concept membership matrix for gi
AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "qlib/custom_model.py",
4  "start_line": 58,
5  "end_line": 156,
6  "content": "import copy\nimport torch.optim as optim\n\nclass GRUEnc(nn.Module):\n    def __init__(self, d=6, H=64, L=2, dr=0.0):\n        super().__init__()\n        self.rnn = nn.GRU(d, H, L, batch_first=True, dropout=dr)\n        self.ln = nn.LayerNorm(H); self.d = d\n    def forward(self, x):\n        x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)\n        return self.ln(self.rnn(x)[0][:, -1, :])\n\nclass ConceptRankNet(nn.Module):\n    \"\"\"Three-channel decomposition: explicit concepts + implicit clusters + individual.\"\"\"\n    def __init__(self, d=6, H=64, L=2, dr=0.0):\n        super().__init__()\n        self.enc = GRUEnc(d, H, L, dr); self.H = H\n        self.Wq = nn.Linear(H,H); self.Wk = nn.Linear(H,H)\n        self.Wv = nn.Linear(H,H); self.Wi = nn.Linear(H,H)\n        self.fc_e = nn.Linear(H,1); self.fc_i = nn.Linear(H,1)\n        self.fc_r = nn.Linear(H,1); self.drop = nn.Dropout(dr)\n        for m in [self.Wq,self.Wk,self.Wv,self.Wi]: nn.init.xavier_uniform_(m.weight)\n    def forward(self, x, cm):\n        h = self.enc(x)\n        # Explicit: concept prototypes via membership-weighted avg, then cross-attention\n        cn = cm / cm.sum(0, keepdim=True).clamp(min=1)\n        protos = cn.t() @ h\n        Q,K,V = self.Wq(h), self.Wk(protos), self.Wv(protos)\n        att = (Q @ K.t()) / (self.H**0.5)\n        att = att.masked_fill(cm==0, float('-inf'))\n        hc = (cm.sum(1) > 0); aw = torch.zeros_like(att)\n        if hc.any(): aw[hc] = F.softmax(att[hc], dim=1)\n        he = self.drop(aw @ V); hr = h - he\n        # Implicit: stock-stock cosine attention in residual space\n        hi = torch.zeros_like(hr)\n        if h.shape[0] > 1:\n            hp = F.normalize(self.Wi(hr), dim=1); sim = hp @ hp.t()\n            sim.fill_diagonal_(float('-inf'))\n            hi = self.drop(F.softmax(sim * 10, dim=1) @ hr)\n        hind = hr - hi\n        pred = (self.fc_e(he)+self.fc_i(hi)+self.fc_r(hind)).squeeze(-1)\n        orth = (he*hi).mean()**2 + (hi*hind).mean()**2\n        return pred, orth\n\nclass CustomModel(Model):\n    def __init__(self):\n        super().__init__()\n        self.fitted = False; self.net = ConceptRankNet(d=6,H=64,L=2,dr=0.0).to(DEVICE)\n        self.n_ep=50; self.lr=5e-4; self.alpha=0.1; self.beta=0.01\n    @staticmethod\n    def _rl(p, y):\n        \"\"\"Differentiable pairwise rank loss approximating negative rank IC.\"\"\"\n        dp=p.unsqueeze(1)-p.unsqueeze(0); dy=y.unsqueeze(1)-y.unsqueeze(0)\n        return -torch.mean(torch.tanh(dp*5)*torch.sign(dy))\n    def _days(self, df):\n        dc=df.groupby(level=0).size().values; di=np.roll(np.cumsum(dc),1); di[0]=0\n        return di,dc,df[\"feature\"].values,df[\"label\"].values.ravel(),df[\"feature\"].index.get_level_values(\"instrument\")\n    def fit(self, dataset: DatasetH):\n        dft=dataset.prepare(\"train\",col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_L)\n        dfv=dataset.prepare(\"valid\",col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_L)\n        opt=optim.Adam(self.net.parameters(),lr=self.lr,weight_decay=1e-5)\n        best_ic,best_st,wait=-1e9,None,0\n        for ep in range(self.n_ep):\n            self.net.train(); di,dc,ft,lb,ins=self._days(dft)\n            perm=np.random.permutation(len(dc))\n            for idx in perm:\n                si,c=int(di[idx]),int(dc[idx]); sl=slice(si,si+c)\n                f,l=ft[sl],lb[sl]; mk=~(np.isnan(f).any(1)|np.isnan(l))\n                if mk.sum()<5: continue\n                it=ins[sl][mk]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                x=torch.tensor(f[mk],dtype=torch.float32,device=DEVICE)\n                y=torch.tensor(l[mk],dtype=torch.float32,device=DEVICE)\n                pred,orth=self.net(x,cm)\n                loss=F.mse_loss(pred,y)+self.alpha*self._rl(pred,y)+self.beta*orth\n                opt.zero_grad(); loss.backward()\n                torch.nn.utils.clip_grad_norm_(self.net.parameters(),1.0); opt.step()\n            self.net.eval(); ps,ls=[],[]; di,dc,ft,lb,ins=self._days(dfv)\n            with torch.no_grad():\n                for si,c in zip(di,dc):\n                    sl=slice(si,si+c); f,l=ft[sl],lb[sl]; mk=~(np.isnan(f).any(1)|np.isnan(l))\n                    if mk.sum()<2: continue\n                    it=ins[sl][mk]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                    x=torch.tensor(f[mk],dtype=torch.float32,device=DEVICE)\n                    ps.append(self.net(x,cm)[0].cpu().numpy()); ls.append(l[mk])\n            ic=np.corrcoef(np.concatenate(ps),np.concatenate(ls))[0,1]\n            if ic>best_ic: best_ic,best_st,wait=ic,copy.deepcopy(self.net.state_dict()),0\n            else: wait+=1\n            if wait>=10: break\n        if best_st: self.net.load_state_dict(best_st)\n        self.fitted=True\n    def predict(self, dataset: DatasetH, segment=\"test\"):\n        if not self.fitted: raise ValueError(\"Not fitted\")\n        df=dataset.prepare(segment,col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_I)\n        self.net.eval(); preds=[]; dc=df.groupby(level=0).size().values\n        di=np.roll(np.cumsum(dc),1); di[0]=0\n        ft=df[\"feature\"].values; ins=df[\"feature\"].index.get_level_values(\"instrument\")\n        with torch.no_grad():\n            for si,c in zip(di,dc):\n                sl=slice(si,si+c); it=ins[sl]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                preds.append(self.net(torch.tensor(np.nan_to_num(ft[sl],nan=0.0),dtype=torch.float32,device=DEVICE),cm)[0].cpu().numpy())\n        return pd.Series(np.concatenate(preds),index=df[\"feature\"].index,name=\"score\")"
7}
Tool ResultStep 1
OK: Replaced lines 58..156 with 101 line(s) in qlib/custom_model.py. Editable range: 58–158.

[Current file: qlib/custom_model.py | editable: 58–158 | total: 158 lines]
    55|     return _stock2concept_matrix[stock_indices].astype(np.float32)
    56| 
    57| 
    58  import copy
    59  import torch.optim as optim
    60  
    61  class GRUEnc(nn.Module):
    62      def __init__(self, d=6, H=64, L=2, dr=0.0):
    63          super().__init__()
    64          self.rnn = nn.GRU(d, H, L, batch_first=True, dropout=dr)
    65          self.ln = nn.LayerNorm(H); self.d = d
    66      def forward(self, x):
    67          x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)
    68          return self.ln(self.rnn(x)[0][:, -1, :])
    69  
    70  class ConceptRankNet(nn.Module):
    71      """Three-channel decomposition: explicit concepts + implicit clusters + individual."""
    72      def __init__(self, d=6, H=64, L=2, dr=0.0):
    73          super().__init__()
    74          self.enc
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)

### csi300 (scripts/csi300.sh)
Signal analysis (csi300):
SIGNAL_METRIC IC=0.043513
SIGNAL_METRIC ICIR=0.296567
SIGNAL_METRIC Rank_IC=0.059742
SIGNAL_METRIC Rank_ICIR=0.418778
Portfolio analysis (csi300):
PORTFOLIO_METRIC annualized_return=0.044671
PORTFOLIO_METRIC max_drawdown=-0.091236
PORTFOLIO_METRIC information_ratio=0.585033

### csi100 (scripts/csi100.sh)
Signal analysis (csi100):
SIGNAL_METRIC IC=0.043679
SIGNAL_METRIC ICIR=0.222914
SIGNAL_METRIC Rank_IC=0.057033
SIGNAL_METRIC Rank_ICIR=0.300553
Portfolio analysis (csi100):
PORTFOLIO_METRIC annualized_return=-0.013817
PORTFOLIO_METRIC max_drawdown=-0.196462
PORTFOLIO_METRIC information_ratio=-0.228593

[Leaderboard] Results saved: {'ic_csi300': 0.043513, 'icir_csi300': 0.296567, 'rank_ic_csi300': 0.059742, 'rank_icir_csi300': 0.418778, 'annualized_return_csi300': 0.044671, 'max_drawdown_csi300': -0.091236, 'information_ratio_csi300': 0.585033, 'elapsed_csi300
AssistantStep 3edit
Tool Input
1{
2  "op": "replace",
3  "filename": "qlib/custom_model.py",
4  "start_line": 58,
5  "end_line": 158,
6  "content": "import copy\nimport torch.optim as optim\n\nclass GRUEnc(nn.Module):\n    def __init__(self, d=6, H=64, L=2, dr=0.0):\n        super().__init__()\n        self.rnn = nn.GRU(d, H, L, batch_first=True, dropout=dr)\n        self.ln = nn.LayerNorm(H); self.d = d\n    def forward(self, x):\n        x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)\n        return self.ln(self.rnn(x)[0][:, -1, :])\n\nclass ConceptRankNet(nn.Module):\n    \"\"\"Three-channel decomposition: explicit concepts + implicit clusters + individual.\n    Novel: SiLU gating before output heads for smoother gradient flow.\"\"\"\n    def __init__(self, d=6, H=64, L=2, dr=0.0):\n        super().__init__()\n        self.enc = GRUEnc(d, H, L, dr); self.H = H\n        self.Wq = nn.Linear(H,H); self.Wk = nn.Linear(H,H)\n        self.Wv = nn.Linear(H,H); self.Wi = nn.Linear(H,H)\n        self.fc_e = nn.Linear(H,1); self.fc_i = nn.Linear(H,1)\n        self.fc_r = nn.Linear(H,1); self.drop = nn.Dropout(dr)\n        for m in [self.Wq,self.Wk,self.Wv,self.Wi]: nn.init.xavier_uniform_(m.weight)\n    def forward(self, x, cm):\n        h = self.enc(x)\n        # Explicit: concept prototypes via membership-weighted avg, then cross-attention\n        cn = cm / cm.sum(0, keepdim=True).clamp(min=1)\n        protos = cn.t() @ h\n        Q,K,V = self.Wq(h), self.Wk(protos), self.Wv(protos)\n        att = (Q @ K.t()) / (self.H**0.5)\n        att = att.masked_fill(cm==0, float('-inf'))\n        hc = (cm.sum(1) > 0); aw = torch.zeros_like(att)\n        if hc.any(): aw[hc] = F.softmax(att[hc], dim=1)\n        he = self.drop(aw @ V); hr = h - he\n        # Implicit: stock-stock cosine attention in residual space\n        hi = torch.zeros_like(hr)\n        if h.shape[0] > 1:\n            hp = F.normalize(self.Wi(hr), dim=1); sim = hp @ hp.t()\n            sim.fill_diagonal_(float('-inf'))\n            hi = self.drop(F.softmax(sim * 5, dim=1) @ hr)\n        hind = hr - hi\n        pred = (self.fc_e(F.silu(he))+self.fc_i(F.silu(hi))+self.fc_r(F.silu(hind))).squeeze(-1)\n        return pred\n\nclass CustomModel(Model):\n    \"\"\"Concept-Residual Decomposition with Concept-Conditioned Rank Loss (CCRL).\n    Novel: pairwise rank loss weighted by concept co-membership so that\n    stocks sharing more concepts contribute more to the ranking objective.\"\"\"\n    def __init__(self):\n        super().__init__()\n        self.fitted = False; self.net = ConceptRankNet(d=6,H=64,L=2,dr=0.0).to(DEVICE)\n        self.n_ep=50; self.lr=5e-4; self.alpha=0.15\n    @staticmethod\n    def _ccrl(p, y, cm):\n        \"\"\"Concept-Conditioned Rank Loss: pairwise rank weighted by concept co-membership.\"\"\"\n        dp = p.unsqueeze(1)-p.unsqueeze(0); dy = y.unsqueeze(1)-y.unsqueeze(0)\n        pairwise = torch.tanh(dp*3)*torch.sign(dy)\n        w = cm @ cm.t(); w = w / (w.sum(1,keepdim=True)+1e-8)\n        return -(pairwise * w).mean()\n    def _days(self, df):\n        dc=df.groupby(level=0).size().values; di=np.roll(np.cumsum(dc),1); di[0]=0\n        return di,dc,df[\"feature\"].values,df[\"label\"].values.ravel(),df[\"feature\"].index.get_level_values(\"instrument\")\n    def fit(self, dataset: DatasetH):\n        dft=dataset.prepare(\"train\",col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_L)\n        dfv=dataset.prepare(\"valid\",col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_L)\n        opt=optim.Adam(self.net.parameters(),lr=self.lr,weight_decay=1e-5)\n        best_ic,best_st,wait=-1e9,None,0\n        for ep in range(self.n_ep):\n            self.net.train(); di,dc,ft,lb,ins=self._days(dft)\n            perm=np.random.permutation(len(dc))\n            for idx in perm:\n                si,c=int(di[idx]),int(dc[idx]); sl=slice(si,si+c)\n                f,l=ft[sl],lb[sl]; mk=~(np.isnan(f).any(1)|np.isnan(l))\n                if mk.sum()<5: continue\n                it=ins[sl][mk]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                x=torch.tensor(f[mk],dtype=torch.float32,device=DEVICE)\n                y=torch.tensor(l[mk],dtype=torch.float32,device=DEVICE)\n                pred=self.net(x,cm)\n                loss=F.mse_loss(pred,y)+self.alpha*self._ccrl(pred,y,cm)\n                opt.zero_grad(); loss.backward()\n                torch.nn.utils.clip_grad_norm_(self.net.parameters(),1.0); opt.step()\n            self.net.eval(); ps,ls=[],[]; di,dc,ft,lb,ins=self._days(dfv)\n            with torch.no_grad():\n                for si,c in zip(di,dc):\n                    sl=slice(si,si+c); f,l=ft[sl],lb[sl]; mk=~(np.isnan(f).any(1)|np.isnan(l))\n                    if mk.sum()<2: continue\n                    it=ins[sl][mk]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                    x=torch.tensor(f[mk],dtype=torch.float32,device=DEVICE)\n                    ps.append(self.net(x,cm).cpu().numpy()); ls.append(l[mk])\n            ic=np.corrcoef(np.concatenate(ps),np.concatenate(ls))[0,1]\n            if ic>best_ic: best_ic,best_st,wait=ic,copy.deepcopy(self.net.state_dict()),0\n            else: wait+=1\n            if wait>=10: break\n        if best_st: self.net.load_state_dict(best_st)\n        self.fitted=True\n    def predict(self, dataset: DatasetH, segment=\"test\"):\n        if not self.fitted: raise ValueError(\"Not fitted\")\n        df=dataset.prepare(segment,col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_I)\n        self.net.eval(); preds=[]; dc=df.groupby(level=0).size().values\n        di=np.roll(np.cumsum(dc),1); di[0]=0\n        ft=df[\"feature\"].values; ins=df[\"feature\"].index.get_level_values(\"instrument\")\n        with torch.no_grad():\n            for si,c in zip(di,dc):\n                sl=slice(si,si+c); it=ins[sl]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                preds.append(self.net(torch.tensor(np.nan_to_num(ft[sl],nan=0.0),dtype=torch.float32,device=DEVICE),cm).cpu().numpy())\n        return pd.Series(np.concatenate(preds),index=df[\"feature\"].index,name=\"score\")"
7}
Tool ResultStep 3
OK: Replaced lines 58..158 with 106 line(s) in qlib/custom_model.py. Editable range: 58–163.

[Current file: qlib/custom_model.py | editable: 58–163 | total: 163 lines]
    55|     return _stock2concept_matrix[stock_indices].astype(np.float32)
    56| 
    57| 
    58  import copy
    59  import torch.optim as optim
    60  
    61  class GRUEnc(nn.Module):
    62      def __init__(self, d=6, H=64, L=2, dr=0.0):
    63          super().__init__()
    64          self.rnn = nn.GRU(d, H, L, batch_first=True, dropout=dr)
    65          self.ln = nn.LayerNorm(H); self.d = d
    66      def forward(self, x):
    67          x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)
    68          return self.ln(self.rnn(x)[0][:, -1, :])
    69  
    70  class ConceptRankNet(nn.Module):
    71      """Three-channel decomposition: explicit concepts + implicit clusters + individual.
    72      Novel: SiLU gating before output heads for smoother gradient flow."""
    73      def __init__(self, d=6, 
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### csi300 (scripts/csi300.sh)
Signal analysis (csi300):
SIGNAL_METRIC IC=0.045688
SIGNAL_METRIC ICIR=0.319126
SIGNAL_METRIC Rank_IC=0.060312
SIGNAL_METRIC Rank_ICIR=0.421226
Portfolio analysis (csi300):
PORTFOLIO_METRIC annualized_return=0.063644
PORTFOLIO_METRIC max_drawdown=-0.075841
PORTFOLIO_METRIC information_ratio=0.850570

### csi100 (scripts/csi100.sh)
Signal analysis (csi100):
SIGNAL_METRIC IC=0.039548
SIGNAL_METRIC ICIR=0.196944
SIGNAL_METRIC Rank_IC=0.054978
SIGNAL_METRIC Rank_ICIR=0.282020
Portfolio analysis (csi100):
PORTFOLIO_METRIC annualized_return=-0.034753
PORTFOLIO_METRIC max_drawdown=-0.197888
PORTFOLIO_METRIC information_ratio=-0.592766

[Leaderboard] Results saved: {'ic_csi300': 0.045688, 'icir_csi300': 0.319126, 'rank_ic_csi300': 0.060312, 'rank_icir_csi300': 0.421226, 'annualized_return_csi300': 0.063644, 'max_drawdown_csi300': -0.075841, 'information_ratio_csi300': 0.85057, 'elapsed_csi300':
AssistantStep 5edit
Tool Input
1{
2  "op": "replace",
3  "filename": "qlib/custom_model.py",
4  "start_line": 58,
5  "end_line": 163,
6  "content": "import copy\nimport torch.optim as optim\n\nclass GRUEnc(nn.Module):\n    def __init__(self, d=6, H=64, L=2, dr=0.0):\n        super().__init__()\n        self.rnn = nn.GRU(d, H, L, batch_first=True, dropout=dr)\n        self.ln = nn.LayerNorm(H); self.d = d\n    def forward(self, x):\n        x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)\n        return self.ln(self.rnn(x)[0][:, -1, :])\n\nclass ConceptRankNet(nn.Module):\n    \"\"\"Three-channel decomposition: explicit concepts + implicit clusters + individual.\"\"\"\n    def __init__(self, d=6, H=64, L=2, dr=0.0):\n        super().__init__()\n        self.enc = GRUEnc(d, H, L, dr); self.H = H\n        self.Wq = nn.Linear(H,H); self.Wk = nn.Linear(H,H)\n        self.Wv = nn.Linear(H,H); self.Wi = nn.Linear(H,H)\n        self.fc_e = nn.Linear(H,1); self.fc_i = nn.Linear(H,1)\n        self.fc_r = nn.Linear(H,1); self.drop = nn.Dropout(dr)\n        for m in [self.Wq,self.Wk,self.Wv,self.Wi]: nn.init.xavier_uniform_(m.weight)\n    def forward(self, x, cm):\n        h = self.enc(x)\n        # Explicit: concept prototypes via membership-weighted avg, then cross-attention\n        cn = cm / cm.sum(0, keepdim=True).clamp(min=1)\n        protos = cn.t() @ h\n        Q,K,V = self.Wq(h), self.Wk(protos), self.Wv(protos)\n        att = (Q @ K.t()) / (self.H**0.5)\n        att = att.masked_fill(cm==0, float('-inf'))\n        hc = (cm.sum(1) > 0); aw = torch.zeros_like(att)\n        if hc.any(): aw[hc] = F.softmax(att[hc], dim=1)\n        he = self.drop(aw @ V); hr = h - he\n        # Implicit: stock-stock cosine attention in residual space\n        hi = torch.zeros_like(hr)\n        if h.shape[0] > 1:\n            hp = F.normalize(self.Wi(hr), dim=1); sim = hp @ hp.t()\n            sim.fill_diagonal_(float('-inf'))\n            hi = self.drop(F.softmax(sim * 10, dim=1) @ hr)\n        hind = hr - hi\n        pred = (self.fc_e(he)+self.fc_i(hi)+self.fc_r(hind)).squeeze(-1)\n        orth = (he*hi).mean()**2 + (hi*hind).mean()**2\n        return pred, orth\n\nclass CustomModel(Model):\n    \"\"\"Concept-Residual Decomposition with Concept-Modulated Pairwise Rank Loss.\n    Novel: rank loss pairs weighted by concept co-membership, interpolating\n    between global and local ranking for robust generalization.\"\"\"\n    def __init__(self):\n        super().__init__()\n        self.fitted = False; self.net = ConceptRankNet(d=6,H=64,L=2,dr=0.0).to(DEVICE)\n        self.n_ep=50; self.lr=5e-4; self.alpha=0.1; self.beta=0.01\n    @staticmethod\n    def _rl(p, y, cm):\n        \"\"\"Concept-Modulated Pairwise Rank Loss (CMPRL): 70% global + 30% concept.\"\"\"\n        dp=p.unsqueeze(1)-p.unsqueeze(0); dy=y.unsqueeze(1)-y.unsqueeze(0)\n        pw=torch.tanh(dp*5)*torch.sign(dy)\n        wc=cm@cm.t(); wc=wc/(wc.sum(1,keepdim=True)+1e-8)\n        n=p.shape[0]; wu=torch.ones(n,n,device=p.device)/n\n        return -(pw*(0.3*wc+0.7*wu)).mean()\n    def _days(self, df):\n        dc=df.groupby(level=0).size().values; di=np.roll(np.cumsum(dc),1); di[0]=0\n        return di,dc,df[\"feature\"].values,df[\"label\"].values.ravel(),df[\"feature\"].index.get_level_values(\"instrument\")\n    def fit(self, dataset: DatasetH):\n        dft=dataset.prepare(\"train\",col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_L)\n        dfv=dataset.prepare(\"valid\",col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_L)\n        opt=optim.Adam(self.net.parameters(),lr=self.lr,weight_decay=1e-5)\n        best_ic,best_st,wait=-1e9,None,0\n        for ep in range(self.n_ep):\n            self.net.train(); di,dc,ft,lb,ins=self._days(dft)\n            perm=np.random.permutation(len(dc))\n            for idx in perm:\n                si,c=int(di[idx]),int(dc[idx]); sl=slice(si,si+c)\n                f,l=ft[sl],lb[sl]; mk=~(np.isnan(f).any(1)|np.isnan(l))\n                if mk.sum()<5: continue\n                it=ins[sl][mk]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                x=torch.tensor(f[mk],dtype=torch.float32,device=DEVICE)\n                y=torch.tensor(l[mk],dtype=torch.float32,device=DEVICE)\n                pred,orth=self.net(x,cm)\n                loss=F.mse_loss(pred,y)+self.alpha*self._rl(pred,y,cm)+self.beta*orth\n                opt.zero_grad(); loss.backward()\n                torch.nn.utils.clip_grad_norm_(self.net.parameters(),1.0); opt.step()\n            self.net.eval(); ps,ls=[],[]; di,dc,ft,lb,ins=self._days(dfv)\n            with torch.no_grad():\n                for si,c in zip(di,dc):\n                    sl=slice(si,si+c); f,l=ft[sl],lb[sl]; mk=~(np.isnan(f).any(1)|np.isnan(l))\n                    if mk.sum()<2: continue\n                    it=ins[sl][mk]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                    x=torch.tensor(f[mk],dtype=torch.float32,device=DEVICE)\n                    ps.append(self.net(x,cm)[0].cpu().numpy()); ls.append(l[mk])\n            ic=np.corrcoef(np.concatenate(ps),np.concatenate(ls))[0,1]\n            if ic>best_ic: best_ic,best_st,wait=ic,copy.deepcopy(self.net.state_dict()),0\n            else: wait+=1\n            if wait>=10: break\n        if best_st: self.net.load_state_dict(best_st)\n        self.fitted=True\n    def predict(self, dataset: DatasetH, segment=\"test\"):\n        if not self.fitted: raise ValueError(\"Not fitted\")\n        df=dataset.prepare(segment,col_set=[\"feature\",\"label\"],data_key=DataHandlerLP.DK_I)\n        self.net.eval(); preds=[]; dc=df.groupby(level=0).size().values\n        di=np.roll(np.cumsum(dc),1); di[0]=0\n        ft=df[\"feature\"].values; ins=df[\"feature\"].index.get_level_values(\"instrument\")\n        with torch.no_grad():\n            for si,c in zip(di,dc):\n                sl=slice(si,si+c); it=ins[sl]; cm=torch.tensor(get_concept_matrix(get_stock_index(it)),device=DEVICE)\n                preds.append(self.net(torch.tensor(np.nan_to_num(ft[sl],nan=0.0),dtype=torch.float32,device=DEVICE),cm)[0].cpu().numpy())\n        return pd.Series(np.concatenate(preds),index=df[\"feature\"].index,name=\"score\")"
7}
Tool ResultStep 5
OK: Replaced lines 58..163 with 107 line(s) in qlib/custom_model.py. Editable range: 58–164.

[Current file: qlib/custom_model.py | editable: 58–164 | total: 164 lines]
    55|     return _stock2concept_matrix[stock_indices].astype(np.float32)
    56| 
    57| 
    58  import copy
    59  import torch.optim as optim
    60  
    61  class GRUEnc(nn.Module):
    62      def __init__(self, d=6, H=64, L=2, dr=0.0):
    63          super().__init__()
    64          self.rnn = nn.GRU(d, H, L, batch_first=True, dropout=dr)
    65          self.ln = nn.LayerNorm(H); self.d = d
    66      def forward(self, x):
    67          x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)
    68          return self.ln(self.rnn(x)[0][:, -1, :])
    69  
    70  class ConceptRankNet(nn.Module):
    71      """Three-channel decomposition: explicit concepts + implicit clusters + individual."""
    72      def __init__(self, d=6, H=64, L=2, dr=0.0):
    73          super().__init__()
    74          self.enc
AssistantStep 6test
AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "qlib/custom_model.py",
4  "start_line": 58,
5  "end_line": 156,
6  "content": "# Concept-Guided Attention Network (CGAN) with orthogonal decomposition and IC loss\nimport copy\nimport torch.optim as optim\nclass GRUEnc(nn.Module):\n    def __init__(self, d=6, h=64, nl=2, dr=0.0):\n        super().__init__()\n        self.rnn = nn.GRU(d, h, nl, batch_first=True, dropout=dr); self.d = d\n    def forward(self, x):\n        x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)\n        return self.rnn(x)[0][:, -1, :]\nclass CGAN(nn.Module):\n    \"\"\"Concept-Guided Attention Net: concept-biased attention + orthogonal decomposition.\"\"\"\n    def __init__(self, d=6, h=64, nl=2, dr=0.0):\n        super().__init__()\n        self.enc = GRUEnc(d, h, nl, dr); self.h = h\n        self.Wq = nn.Linear(h, h); self.Wk = nn.Linear(h, h); self.Wv = nn.Linear(h, h)\n        self.cb = nn.Parameter(torch.ones(1))\n        self.s2c = nn.Linear(h, h); self.c2s = nn.Linear(h, h)\n        self.fc_ind = nn.Linear(h, h); self.gate = nn.Linear(h * 3, h)\n        self.out = nn.Linear(h, 1); self.ln = nn.LayerNorm(h)\n        for m in [self.Wq, self.Wk, self.Wv, self.s2c, self.c2s, self.fc_ind, self.gate, self.out]:\n            nn.init.xavier_uniform_(m.weight); nn.init.zeros_(m.bias)\n    def forward(self, x, cm):\n        h = self.enc(x)\n        csim = F.normalize(cm, dim=1) @ F.normalize(cm, dim=1).t()\n        attn = F.softmax(self.Wq(h) @ self.Wk(h).t() / self.h**0.5 + self.cb * csim, dim=1)\n        h_a = attn @ self.Wv(h)\n        cw = cm / (cm.sum(0, keepdim=True) + 1e-8)\n        h_c = F.relu(self.c2s(cm @ F.relu(self.s2c(cw.t() @ h))))\n        h_i = F.relu(self.fc_ind(h - h_c))\n        g = torch.sigmoid(self.gate(torch.cat([h_i, h_c, h_a], 1)))\n        return self.out(self.ln(g * h_i + (1 - g) * (h_c + h_a) * 0.5)).squeeze(-1), h_i, h_c\nclass CustomModel(Model):\n    def __init__(self):\n        super().__init__()\n        self.net = CGAN().to(DEVICE); self.fitted = False\n    def _iter(self, df):\n        feat, lab = df[\"feature\"], df[\"label\"].values.ravel()\n        dc = df.groupby(level=0).size().values\n        di = np.roll(np.cumsum(dc), 1); di[0] = 0\n        for i, c in zip(di, dc):\n            s = slice(i, i + c)\n            f = torch.nan_to_num(torch.tensor(feat.values[s], dtype=torch.float32, device=DEVICE))\n            l = torch.tensor(lab[s], dtype=torch.float32, device=DEVICE)\n            inst = feat.index.get_level_values(\"instrument\")[s]\n            cm = torch.tensor(get_concept_matrix(get_stock_index(inst)), device=DEVICE)\n            m = ~torch.isnan(l)\n            if m.sum() < 3: continue\n            yield f, l, cm, m\n    def _ic(self, p, l):\n        pp, ll = p - p.mean(), l - l.mean()\n        return (pp * ll).sum() / (pp.norm() * ll.norm() + 1e-8)\n    def fit(self, dataset):\n        df_tr = dataset.prepare(\"train\", col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_L)\n        df_va = dataset.prepare(\"valid\", col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_L)\n        opt = optim.Adam(self.net.parameters(), lr=5e-4, weight_decay=1e-5)\n        best_ic, best_sd, wait = -1e9, None, 0\n        for ep in range(100):\n            self.net.train()\n            for f, l, cm, m in self._iter(df_tr):\n                p, hi, hc = self.net(f, cm)\n                loss = F.mse_loss(p[m], l[m])\n                loss = loss + 0.1 * (F.normalize(hi[m], dim=1) * F.normalize(hc[m], dim=1)).sum(1).pow(2).mean()\n                if m.sum() > 10: loss = loss - 0.5 * self._ic(p[m], l[m])\n                opt.zero_grad(); loss.backward()\n                nn.utils.clip_grad_norm_(self.net.parameters(), 1.0); opt.step()\n            self.net.eval(); ics = []\n            with torch.no_grad():\n                for f, l, cm, m in self._iter(df_va):\n                    p = self.net(f, cm)[0]\n                    if m.sum() > 10:\n                        v = self._ic(p[m], l[m])\n                        if not torch.isnan(v): ics.append(v.item())\n            vic = np.mean(ics) if ics else -1\n            if vic > best_ic:\n                best_ic, best_sd, wait = vic, copy.deepcopy(self.net.state_dict()), 0\n            else:\n                wait += 1\n                if wait >= 15: break\n        if best_sd: self.net.load_state_dict(best_sd)\n        self.fitted = True\n    def predict(self, dataset, segment=\"test\"):\n        if not self.fitted: raise ValueError(\"Not fitted\")\n        df = dataset.prepare(segment, col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_I)\n        feat = df[\"feature\"]; self.net.eval(); preds = []\n        dc = df.groupby(level=0).size().values\n        di = np.roll(np.cumsum(dc), 1); di[0] = 0\n        with torch.no_grad():\n            for i, c in zip(di, dc):\n                s = slice(i, i + c)\n                f = torch.nan_to_num(torch.tensor(feat.values[s], dtype=torch.float32, device=DEVICE))\n                inst = feat.index.get_level_values(\"instrument\")[s]\n                cm = torch.tensor(get_concept_matrix(get_stock_index(inst)), device=DEVICE)\n                preds.append(self.net(f, cm)[0].cpu().numpy())\n        return pd.Series(np.concatenate(preds), index=feat.index, name=\"score\")\n"
7}
Tool ResultStep 1
OK: Replaced lines 58..156 with 95 line(s) in qlib/custom_model.py. Editable range: 58–152.

[Current file: qlib/custom_model.py | editable: 58–152 | total: 152 lines]
    55|     return _stock2concept_matrix[stock_indices].astype(np.float32)
    56| 
    57| 
    58  # Concept-Guided Attention Network (CGAN) with orthogonal decomposition and IC loss
    59  import copy
    60  import torch.optim as optim
    61  class GRUEnc(nn.Module):
    62      def __init__(self, d=6, h=64, nl=2, dr=0.0):
    63          super().__init__()
    64          self.rnn = nn.GRU(d, h, nl, batch_first=True, dropout=dr); self.d = d
    65      def forward(self, x):
    66          x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)
    67          return self.rnn(x)[0][:, -1, :]
    68  class CGAN(nn.Module):
    69      """Concept-Guided Attention Net: concept-biased attention + orthogonal decomposition."""
    70      def __init__(self, d=6, h=64, nl=2, dr=0.0):
    71          super().__init__()
    72 
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)

### csi300 (scripts/csi300.sh)
Signal analysis (csi300):
SIGNAL_METRIC IC=0.045467
SIGNAL_METRIC ICIR=0.348241
SIGNAL_METRIC Rank_IC=0.057171
SIGNAL_METRIC Rank_ICIR=0.413410
Portfolio analysis (csi300):
PORTFOLIO_METRIC annualized_return=0.037171
PORTFOLIO_METRIC max_drawdown=-0.098496
PORTFOLIO_METRIC information_ratio=0.478468

### csi100 (scripts/csi100.sh)
Signal analysis (csi100):
SIGNAL_METRIC IC=0.037466
SIGNAL_METRIC ICIR=0.210051
SIGNAL_METRIC Rank_IC=0.049303
SIGNAL_METRIC Rank_ICIR=0.259323
Portfolio analysis (csi100):
PORTFOLIO_METRIC annualized_return=0.002680
PORTFOLIO_METRIC max_drawdown=-0.209819
PORTFOLIO_METRIC information_ratio=0.042604

[Leaderboard] Results saved: {'ic_csi300': 0.045467, 'icir_csi300': 0.348241, 'rank_ic_csi300': 0.057171, 'rank_icir_csi300': 0.41341, 'annualized_return_csi300': 0.037171, 'max_drawdown_csi300': -0.098496, 'information_ratio_csi300': 0.478468, 'elapsed_csi300': 
AssistantStep 3edit
Tool Input
1{
2  "op": "replace",
3  "filename": "qlib/custom_model.py",
4  "start_line": 58,
5  "end_line": 152,
6  "content": "# Concept-Residual Decomposition Network with IC-augmented training\nimport copy\nimport torch.optim as optim\nclass CRDNet(nn.Module):\n    \"\"\"Decomposes stock returns into explicit-concept, implicit-peer, and individual components.\n    Novel: iterative residual decomposition + orthogonal regularization + IC loss.\"\"\"\n    def __init__(self, d=6, h=64, nl=2):\n        super().__init__()\n        self.rnn = nn.GRU(d, h, nl, batch_first=True); self.d = d; self.h = h\n        # Explicit concept: stock -> concept -> stock\n        self.fc_s2c = nn.Linear(h, h); self.fc_c2s = nn.Linear(h, h)\n        # Implicit peer: self-attention on residual with separate Q, K, V\n        self.fc_q = nn.Linear(h, h); self.fc_k = nn.Linear(h, h); self.fc_v = nn.Linear(h, h)\n        # Individual component\n        self.fc_ind = nn.Linear(h, h)\n        # Output prediction head\n        self.fc_out = nn.Linear(h, 1)\n        for m in self.modules():\n            if isinstance(m, nn.Linear):\n                nn.init.xavier_uniform_(m.weight); nn.init.zeros_(m.bias)\n    def forward(self, x, cm):\n        x = x.reshape(len(x), self.d, -1).permute(0, 2, 1)\n        h = self.rnn(x)[0][:, -1, :]  # [N, H]\n        # Stage 1: Explicit concept component\n        cw = cm / (cm.sum(0, keepdim=True) + 1e-8)\n        h_es = F.relu(self.fc_c2s(cm @ F.relu(self.fc_s2c(cw.t() @ h))))\n        # Stage 2: Implicit peer component via attention on residual\n        res = h - h_es\n        q, k = self.fc_q(res), self.fc_k(res)\n        attn = F.softmax(q @ k.t() / self.h**0.5, dim=1)\n        h_is = F.relu(self.fc_v(attn @ res))\n        # Stage 3: Individual stock-specific residual\n        h_ind = F.relu(self.fc_ind(h - h_es - h_is))\n        pred = self.fc_out(h_es + h_is + h_ind).squeeze(-1)\n        return pred, h_ind, h_es + h_is\nclass CustomModel(Model):\n    def __init__(self):\n        super().__init__()\n        self.net = CRDNet().to(DEVICE); self.fitted = False\n    def _iter(self, df):\n        feat, lab = df[\"feature\"], df[\"label\"].values.ravel()\n        dc = df.groupby(level=0).size().values\n        di = np.roll(np.cumsum(dc), 1); di[0] = 0\n        for i, c in zip(di, dc):\n            s = slice(i, i + c)\n            f = torch.nan_to_num(torch.tensor(feat.values[s], dtype=torch.float32, device=DEVICE))\n            l = torch.tensor(lab[s], dtype=torch.float32, device=DEVICE)\n            inst = feat.index.get_level_values(\"instrument\")[s]\n            cm = torch.tensor(get_concept_matrix(get_stock_index(inst)), device=DEVICE)\n            m = ~torch.isnan(l)\n            if m.sum() < 5: continue\n            yield f, l, cm, m\n    def _ic(self, p, l):\n        pp, ll = p - p.mean(), l - l.mean()\n        return (pp * ll).sum() / (pp.norm() * ll.norm() + 1e-8)\n    def fit(self, dataset):\n        df_tr = dataset.prepare(\"train\", col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_L)\n        df_va = dataset.prepare(\"valid\", col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_L)\n        opt = optim.Adam(self.net.parameters(), lr=2e-4, weight_decay=1e-5)\n        best_ic, best_sd, wait = -1e9, None, 0\n        for ep in range(200):\n            self.net.train()\n            for f, l, cm, m in self._iter(df_tr):\n                p, hi, hg = self.net(f, cm)\n                loss = F.mse_loss(p[m], l[m])\n                # Orthogonal regularization: decorrelate individual vs shared\n                loss += 0.05 * (F.normalize(hi[m], dim=1) * F.normalize(hg[m], dim=1)).sum(1).pow(2).mean()\n                # IC loss: direct ranking quality optimization\n                if m.sum() > 20: loss -= 0.1 * self._ic(p[m], l[m])\n                opt.zero_grad(); loss.backward()\n                nn.utils.clip_grad_norm_(self.net.parameters(), 1.0); opt.step()\n            self.net.eval(); ics = []\n            with torch.no_grad():\n                for f, l, cm, m in self._iter(df_va):\n                    p = self.net(f, cm)[0]\n                    if m.sum() > 20:\n                        v = self._ic(p[m], l[m])\n                        if not torch.isnan(v): ics.append(v.item())\n            vic = np.mean(ics) if ics else -1\n            if vic > best_ic:\n                best_ic, best_sd, wait = vic, copy.deepcopy(self.net.state_dict()), 0\n            else:\n                wait += 1\n                if wait >= 20: break\n        if best_sd: self.net.load_state_dict(best_sd)\n        self.fitted = True\n    def predict(self, dataset, segment=\"test\"):\n        if not self.fitted: raise ValueError(\"Not fitted\")\n        df = dataset.prepare(segment, col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_I)\n        feat = df[\"feature\"]; self.net.eval(); preds = []\n        dc = df.groupby(level=0).size().values\n        di = np.roll(np.cumsum(dc), 1); di[0] = 0\n        with torch.no_grad():\n            for i, c in zip(di, dc):\n                s = slice(i, i + c)\n                f = torch.nan_to_num(torch.tensor(feat.values[s], dtype=torch.float32, device=DEVICE))\n                inst = feat.index.get_level_values(\"instrument\")[s]\n                cm = torch.tensor(get_concept_matrix(get_stock_index(inst)), device=DEVICE)\n                preds.append(self.net(f, cm)[0].cpu().numpy())\n        return pd.Series(np.concatenate(preds), index=feat.index, name=\"score\")\n"
7}
Tool ResultStep 3
OK: Replaced lines 58..152 with 100 line(s) in qlib/custom_model.py. Editable range: 58–157.

[Current file: qlib/custom_model.py | editable: 58–157 | total: 157 lines]
    55|     return _stock2concept_matrix[stock_indices].astype(np.float32)
    56| 
    57| 
    58  # Concept-Residual Decomposition Network with IC-augmented training
    59  import copy
    60  import torch.optim as optim
    61  class CRDNet(nn.Module):
    62      """Decomposes stock returns into explicit-concept, implicit-peer, and individual components.
    63      Novel: iterative residual decomposition + orthogonal regularization + IC loss."""
    64      def __init__(self, d=6, h=64, nl=2):
    65          super().__init__()
    66          self.rnn = nn.GRU(d, h, nl, batch_first=True); self.d = d; self.h = h
    67          # Explicit concept: stock -> concept -> stock
    68          self.fc_s2c = nn.Linear(h, h); self.fc_c2s = nn.Linear(h, h)
    69          # Implicit peer: self-attention on residual with 
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### csi300 (scripts/csi300.sh)
Signal analysis (csi300):
SIGNAL_METRIC IC=0.031812
SIGNAL_METRIC ICIR=0.205669
SIGNAL_METRIC Rank_IC=0.043688
SIGNAL_METRIC Rank_ICIR=0.266492
Portfolio analysis (csi300):
PORTFOLIO_METRIC annualized_return=0.005508
PORTFOLIO_METRIC max_drawdown=-0.136220
PORTFOLIO_METRIC information_ratio=0.055631

### csi100 (scripts/csi100.sh)
Signal analysis (csi100):
SIGNAL_METRIC IC=0.043438
SIGNAL_METRIC ICIR=0.222726
SIGNAL_METRIC Rank_IC=0.052947
SIGNAL_METRIC Rank_ICIR=0.269370
Portfolio analysis (csi100):
PORTFOLIO_METRIC annualized_return=-0.000013
PORTFOLIO_METRIC max_drawdown=-0.184543
PORTFOLIO_METRIC information_ratio=-0.000206

[Leaderboard] Results saved: {'ic_csi300': 0.031812, 'icir_csi300': 0.205669, 'rank_ic_csi300': 0.043688, 'rank_icir_csi300': 0.266492, 'annualized_return_csi300': 0.005508, 'max_drawdown_csi300': -0.13622, 'information_ratio_csi300': 0.055631, 'elapsed_csi300':