Agent Conversation: gpt-5.4-pro
Model: gpt-5.4-pro
Total steps: 6
Messages: 13
UserStep 0
Task: quant-stock-prediction
Quantitative Stock Prediction on CSI300
Objective
Design and implement a stock prediction model that forecasts next-day returns for CSI300 stocks. Your code goes in custom_model.py. Three reference implementations (LightGBM, LSTM, Transformer) are provided as read-only.
Evaluation
Signal quality: IC, ICIR, Rank IC. Portfolio (TopkDropout, top 50, drop 5): Annualized Return, Max Drawdown, Information Ratio. Evaluation is automatic via qlib's workflow.
Workflow Configuration
workflow_config.yaml lines 13-25 and 31-44 are editable. This is the model plus input-adapter/preprocessor block: you may change the dataset class (e.g., to TSDatasetH) or processors if your model needs a different input view. Instruments, date ranges, train/valid/test splits, and evaluation settings are fixed.
qlib/custom_model.py [EDITABLE — lines 16–103 only]
1: # Custom stock prediction model for MLS-Bench
2: #
3: # EDITABLE section: CustomModel class with fit() and predict() methods.
4: # FIXED sections: imports below.
5: import numpy as np
6: import pandas as pd
7: import torch
8: import torch.nn as nn
9: import torch.nn.functional as F
10: from qlib.model.base import Model
11: from qlib.data.dataset import DatasetH
12: from qlib.data.dataset.handler import DataHandlerLP
13:
14: DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
15:
16: # =====================================================================
17: # EDITABLE: CustomModel — implement your stock prediction model here
18: # =====================================================================
19: class CustomModel(Model):
20: """Custom stock prediction model.
21:
22: You must implement:
23: fit(dataset) — train the model on the training data
24: predict(dataset, segment="test") — return predictions as pd.Series
25:
26: The dataset is a qlib DatasetH with Alpha360 features (360 features per
27: stock per day). The 360 features come from 6 base features
28: (open/close/high/low/volume/vwap ratios) x 60 days of history.
29:
30: For temporal models, features can be reshaped:
31: x.reshape(N, 6, 60).permute(0, 2, 1) -> [N, 60, 6]
32: giving 60 time steps of 6 features each.
33:
34: Segments: "train", "valid", "test".
35:
36: Getting data from the dataset:
37: df_train = dataset.prepare("train", col_set=["feature", "label"],
38: data_key=DataHandlerLP.DK_L)
39: features = df_train["feature"] # DataFrame: (n_samples, 360)
40: labels = df_train["label"] # DataFrame: (n_samples, 1)
41:
42: The label is: Ref($close, -2) / Ref($close, -1) - 1
43: (i.e., the return from T+1 to T+2, predicted at time T)
44:
45: predicAssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "qlib/custom_model.py",
4 "content": "class CustomModel(Model):\n \"\"\"RICE: Recency-weighted IC optimization with contrastive extremes.\n\n Linear scorer over Alpha360 features, but trained for the actual downstream\n objective: cross-sectional ranking. The algorithm combines:\n 1) daily IC maximization,\n 2) tail pairwise ranking between top/bottom label buckets,\n 3) recency weighting for regime adaptation,\n 4) feature normalization across both time-within-stock and date cross-section.\n \"\"\"\n\n def __init__(self):\n super().__init__()\n self.device = torch.device(DEVICE)\n self.epochs, self.batch_days, self.lr = 10, 32, 0.03\n self.tail_k, self.tail_w, self.reg = 12, 0.25, 1e-4\n self.coef_, self.bias_, self.fitted = None, 0.0, False\n\n def _transform(self, feat: pd.DataFrame):\n x = np.nan_to_num(\n feat.values.astype(\"float32\"), nan=0.0, posinf=0.0, neginf=0.0\n )\n x = x.reshape(len(x), 6, 60)\n mu = x.mean(axis=2, keepdims=True)\n sd = x.std(axis=2, keepdims=True) + 1e-5\n x = (0.65 * x + 0.35 * (x - mu) / sd).reshape(len(x), -1)\n dates = feat.index.get_level_values(0).values\n starts = np.r_[0, np.flatnonzero(dates[1:] != dates[:-1]) + 1]\n counts = np.diff(np.r_[starts, len(dates)])\n groups = list(zip(starts.tolist(), counts.tolist()))\n for st, ct in groups:\n sl = slice(st, st + ct)\n xm = x[sl].mean(axis=0, keepdims=True)\n xs = x[sl].std(axis=0, keepdims=True) + 1e-4\n x[sl] = (x[sl] - xm) / xs\n return x, feat.index, groups\n\n def _day_loss(self, score, label):\n s = score - score.mean()\n y = label - label.mean()\n corr = (s * y).mean() / (\n torch.sqrt((s.square()).mean() + 1e-6)\n * torch.sqrt((y.square()).mean() + 1e-6)\n )\n sz = s / torch.sqrt((s.square()).mean() + 1e-6)\n yz = y / torch.sqrt((y.square()).mean() + 1e-6)\n point = (F.smooth_l1_loss(sz, yz, reduction=\"none\") * (1.0 + y.abs())).mean()\n if len(label) >= 6:\n k = max(3, min(self.tail_k, len(label) // 8))\n idx = torch.argsort(label)\n gap = score[idx[-k:]].unsqueeze(1) - score[idx[:k]].unsqueeze(0)\n rank = F.softplus(-gap).mean()\n else:\n rank = 0.0 * point\n return -corr + 0.2 * point + self.tail_w * rank\n\n def _eval_ic(self, x, y, groups, coef, bias):\n pred = x @ coef + bias\n vals = []\n for st, ct in groups:\n p = pred[st : st + ct]\n t = y[st : st + ct]\n p = p - p.mean()\n t = t - t.mean()\n vals.append((p * t).mean() / (np.sqrt((p * p).mean() * (t * t).mean()) + 1e-8))\n return float(np.mean(vals))\n\n def fit(self, dataset: DatasetH):\n np.random.seed(0)\n torch.manual_seed(0)\n df_train, df_valid = dataset.prepare(\n [\"train\", \"valid\"], col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_L\n )\n x_tr, _, g_tr = self._transform(df_train[\"feature\"])\n y_tr = df_train[\"label\"].values.reshape(-1).astype(\"float32\")\n x_va, _, g_va = self._transform(df_valid[\"feature\"])\n y_va = df_valid[\"label\"].values.reshape(-1).astype(\"float32\")\n\n w0 = (x_tr * y_tr[:, None]).mean(axis=0)\n w = nn.Parameter(torch.tensor(w0, dtype=torch.float32, device=self.device))\n b = nn.Parameter(torch.zeros(1, dtype=torch.float32, device=self.device))\n opt = torch.optim.Adam([w, b], lr=self.lr)\n day_w = np.exp(np.linspace(-1.0, 0.0, len(g_tr))).astype(\"float32\")\n best_ic, best = -1e9, None\n patience = 0\n\n for _ in range(self.epochs):\n order = np.random.permutation(len(g_tr))\n for i in range(0, len(order), self.batch_days):\n opt.zero_grad()\n loss = torch.zeros((), device=self.device)\n norm = torch.zeros((), device=self.device)\n for d in order[i : i + self.batch_days]:\n st, ct = g_tr[d]\n xb = torch.tensor(x_tr[st : st + ct], device=self.device)\n yb = torch.tensor(y_tr[st : st + ct], device=self.device)\n loss = loss + day_w[d] * self._day_loss(xb @ w + b, yb)\n norm = norm + day_w[d]\n (loss / norm + self.reg * w.square().mean()).backward()\n opt.step()\n coef = w.detach().cpu().numpy()\n vic = self._eval_ic(x_va, y_va, g_va, coef, float(b.item()))\n if vic > best_ic:\n best_ic, best, patience = vic, (coef.copy(), float(b.item())), 0\n else:\n patience += 1\n if patience >= 3:\n break\n\n self.coef_, self.bias_ = best\n self.fitted = True\n\n def predict(self, dataset: DatasetH, segment=\"test\"):\n if not self.fitted:\n raise ValueError(\"Model is not fitted yet!\")\n feat = dataset.prepare(segment, col_set=\"feature\", data_key=DataHandlerLP.DK_I)\n x, index, groups = self._transform(feat)\n pred = x @ self.coef_ + self.bias_\n for st, ct in groups:\n sl = slice(st, st + ct)\n p = pred[sl]\n pred[sl] = (p - p.mean()) / (p.std() + 1e-6)\n return pd.Series(pred, index=index, name=\"score\")",
5 "after_line": 18,
6 "start_line": 19,
7 "end_line": 103
8}Tool ResultStep 1
OK: Replaced lines 19..103 with 123 line(s) in qlib/custom_model.py. Editable range: 16–141.
[Current file: qlib/custom_model.py | editable: 16–141 | total: 141 lines]
13|
14| DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
15|
16 # =====================================================================
17 # EDITABLE: CustomModel — implement your stock prediction model here
18 # =====================================================================
19 class CustomModel(Model):
20 """RICE: Recency-weighted IC optimization with contrastive extremes.
21
22 Linear scorer over Alpha360 features, but trained for the actual downstream
23 objective: cross-sectional ranking. The algorithm combines:
24 1) daily IC maximization,
25 2) tail pairwise ranking between top/bottom label buckets,
26 3) recency weighting for regime adaptation,
27 4) feature normalization across both time-within-st
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)
### csi300 (scripts/csi300.sh)
Signal analysis (csi300):
SIGNAL_METRIC IC=0.006807
SIGNAL_METRIC ICIR=0.034698
SIGNAL_METRIC Rank_IC=0.025524
SIGNAL_METRIC Rank_ICIR=0.131924
Portfolio analysis (csi300):
PORTFOLIO_METRIC annualized_return=-0.055174
PORTFOLIO_METRIC max_drawdown=-0.356952
PORTFOLIO_METRIC information_ratio=-0.491327
### csi100 (scripts/csi100.sh)
Signal analysis (csi100):
SIGNAL_METRIC IC=0.005939
SIGNAL_METRIC ICIR=0.025505
SIGNAL_METRIC Rank_IC=0.017858
SIGNAL_METRIC Rank_ICIR=0.079583
Portfolio analysis (csi100):
PORTFOLIO_METRIC annualized_return=-0.095656
PORTFOLIO_METRIC max_drawdown=-0.340357
PORTFOLIO_METRIC information_ratio=-1.411316
[Leaderboard] Results saved: {'ic_csi300': 0.006807, 'icir_csi300': 0.034698, 'rank_ic_csi300': 0.025524, 'rank_icir_csi300': 0.131924, 'annualized_return_csi300': -0.055174, 'max_drawdown_csi300': -0.356952, 'information_ratio_csi300': -0.491327, 'elapsed_cs
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "qlib/workflow_config.yaml",
4 "content": " infer_processors: []",
5 "after_line": 30,
6 "start_line": 31,
7 "end_line": 38
8}Tool ResultStep 3
OK: Replaced lines 31..38 with 1 line(s) in qlib/workflow_config.yaml. Editable range: 13–25, 31–37.
[Current file: qlib/workflow_config.yaml | editable: 13–25 | total: 75 lines]
10| - "." # So custom_model.py is importable via module_path
11|
12| task:
13 model:
14 class: CustomModel
15 module_path: custom_model
16 kwargs: {}
17
18 dataset:
19 class: DatasetH
20 module_path: qlib.data.dataset
21 kwargs:
22 handler:
23 class: Alpha360
24 module_path: qlib.contrib.data.handler
25 kwargs:
26| start_time: "2008-01-01"
27| end_time: "2020-08-01"
28| fit_start_time: "2008-01-01"AssistantStep 4edit
Tool Input
1{
2 "op": "replace",
3 "filename": "qlib/custom_model.py",
4 "content": "class CustomModel(Model):\n \"\"\"DARTS-GBM: Dual-Objective Adaptive Rank-Temporal Smoothing.\n\n Novel contribution:\n - pointwise regression tree for return magnitude\n - per-date LambdaRank tree on discretized cross-sectional relevance\n - recency-decayed sample weighting for regime adaptation\n - validation-selected blending of the two objectives\n - causal per-instrument EMA smoothing to reduce turnover noise\n \"\"\"\n\n def __init__(self):\n super().__init__()\n self.num_boost_round = 500\n self.early_stopping_rounds = 50\n self.rank_weight = 0.5\n self.ema_alpha = 1.0\n base = {\n \"colsample_bytree\": 0.8879,\n \"learning_rate\": 0.0421,\n \"subsample\": 0.8789,\n \"lambda_l1\": 205.6999,\n \"lambda_l2\": 580.9768,\n \"max_depth\": 8,\n \"num_leaves\": 210,\n \"num_threads\": 20,\n \"verbosity\": -1,\n \"force_col_wise\": True,\n }\n self.reg_params = {**base, \"objective\": \"mse\", \"metric\": \"l2\"}\n self.rank_params = {\n **base,\n \"objective\": \"lambdarank\",\n \"metric\": \"ndcg\",\n \"eval_at\": [10, 20, 50],\n \"label_gain\": list(range(16)),\n }\n self.reg_model = None\n self.rank_model = None\n\n def _groups(self, index):\n dates = index.get_level_values(0).values\n starts = np.r_[0, np.flatnonzero(dates[1:] != dates[:-1]) + 1]\n counts = np.diff(np.r_[starts, len(index)])\n return list(zip(starts.tolist(), counts.tolist()))\n\n def _prepare(self, df):\n feat = df[\"feature\"]\n label = df[\"label\"]\n y = label.values.reshape(-1).astype(\"float32\")\n mask = np.isfinite(y)\n x = feat.values[mask].astype(\"float32\")\n y = y[mask]\n index = feat.index[mask]\n return x, y, index, self._groups(index)\n\n def _date_weights(self, groups):\n day_w = np.exp(np.linspace(np.log(0.35), 0.0, len(groups))).astype(\"float32\")\n return np.repeat(day_w, [ct for _, ct in groups])\n\n def _rank_labels(self, y, groups):\n rel = np.zeros(len(y), dtype=np.int32)\n bins = np.array([0.08, 0.2, 0.35, 0.5, 0.65, 0.8, 0.92])\n for st, ct in groups:\n order = np.argsort(y[st : st + ct], kind=\"mergesort\")\n q = np.empty(ct, dtype=\"float32\")\n q[order] = (np.arange(ct, dtype=\"float32\") + 0.5) / ct\n rel[st : st + ct] = np.digitize(q, bins)\n return rel\n\n def _cs_zscore(self, pred, groups):\n out = pred.astype(\"float32\", copy=True)\n for st, ct in groups:\n p = out[st : st + ct]\n out[st : st + ct] = (p - p.mean()) / (p.std() + 1e-6)\n return out\n\n def _metric(self, pred, y, groups):\n ic_l, ric_l, spread_l = [], [], []\n for st, ct in groups:\n p = pred[st : st + ct]\n t = y[st : st + ct]\n p0 = p - p.mean()\n t0 = t - t.mean()\n ic_l.append((p0 * t0).mean() / (np.sqrt((p0 * p0).mean() * (t0 * t0).mean()) + 1e-8))\n rp = np.argsort(np.argsort(p, kind=\"mergesort\"), kind=\"mergesort\").astype(\"float32\")\n rt = np.argsort(np.argsort(t, kind=\"mergesort\"), kind=\"mergesort\").astype(\"float32\")\n rp -= rp.mean()\n rt -= rt.mean()\n ric_l.append((rp * rt).mean() / (np.sqrt((rp * rp).mean() * (rt * rt).mean()) + 1e-8))\n k = max(5, min(50, ct // 6))\n order = np.argsort(p)\n spread_l.append(float(t[order[-k:]].mean() - t[order[:k]].mean()))\n return float(np.mean(ic_l) + np.mean(ric_l) + 0.05 * np.mean(spread_l))\n\n def _ema(self, index, pred, alpha):\n if alpha >= 0.999:\n return pred\n s = pd.Series(pred, index=index, dtype=\"float32\")\n s = s.groupby(level=1, sort=False, group_keys=False).apply(\n lambda z: z.sort_index(level=0).ewm(alpha=alpha, adjust=False).mean()\n )\n return s.reindex(index).values.astype(\"float32\")\n\n def fit(self, dataset: DatasetH):\n import lightgbm as lgb\n\n df_train, df_valid = dataset.prepare(\n [\"train\", \"valid\"], col_set=[\"feature\", \"label\"], data_key=DataHandlerLP.DK_L\n )\n x_tr, y_tr, _, g_tr = self._prepare(df_train)\n x_va, y_va, idx_va, g_va = self._prepare(df_valid)\n w_tr = self._date_weights(g_tr)\n w_va = self._date_weights(g_va)\n\n dtr_reg = lgb.Dataset(x_tr, label=y_tr, weight=w_tr, free_raw_data=False)\n dva_reg = lgb.Dataset(x_va, label=y_va, weight=w_va, reference=dtr_reg, free_raw_data=False)\n self.reg_model = lgb.train(\n self.reg_params,\n dtr_reg,\n num_boost_round=self.num_boost_round,\n valid_sets=[dva_reg],\n callbacks=[lgb.early_stopping(self.early_stopping_rounds, verbose=False)],\n )\n\n yr_tr = self._rank_labels(y_tr, g_tr)\n yr_va = self._rank_labels(y_va, g_va)\n dtr_rank = lgb.Dataset(\n x_tr, label=yr_tr, group=[ct for _, ct in g_tr], weight=w_tr, free_raw_data=False\n )\n dva_rank = lgb.Dataset(\n x_va,\n label=yr_va,\n group=[ct for _, ct in g_va],\n weight=w_va,\n reference=dtr_rank,\n free_raw_data=False,\n )\n self.rank_model = lgb.train(\n self.rank_params,\n dtr_rank,\n num_boost_round=self.num_boost_round,\n valid_sets=[dva_rank],\n callbacks=[lgb.early_stopping(self.early_stopping_rounds, verbose=False)],\n )\n\n p_reg = self._cs_zscore(\n self.reg_model.predict(x_va, num_iteration=self.reg_model.best_iteration), g_va\n )\n p_rank = self._cs_zscore(\n self.rank_model.predict(x_va, num_iteration=self.rank_model.best_iteration), g_va\n )\n best_score = -1e18\n for rw in np.linspace(0.0, 1.0, 11):\n base = self._cs_zscore((1.0 - rw) * p_reg + rw * p_rank, g_va)\n for alpha in [1.0, 0.9, 0.8, 0.7]:\n pred = self._cs_zscore(self._ema(idx_va, base, alpha), g_va)\n score = self._metric(pred, y_va, g_va)\n if score > best_score:\n best_score = score\n self.rank_weight = float(rw)\n self.ema_alpha = float(alpha)\n\n def predict(self, dataset: DatasetH, segment=\"test\"):\n if self.reg_model is None or self.rank_model is None:\n raise ValueError(\"model is not fitted yet!\")\n feat = dataset.prepare(segment, col_set=\"feature\", data_key=DataHandlerLP.DK_I)\n x = feat.values.astype(\"float32\")\n index = feat.index\n groups = self._groups(index)\n p_reg = self._cs_zscore(\n self.reg_model.predict(x, num_iteration=self.reg_model.best_iteration), groups\n )\n p_rank = self._cs_zscore(\n self.rank_model.predict(x, num_iteration=self.rank_model.best_iteration), groups\n )\n pred = self._cs_zscore((1.0 - self.rank_weight) * p_reg + self.rank_weight * p_rank, groups)\n pred = self._cs_zscore(self._ema(index, pred, self.ema_alpha), groups)\n return pd.Series(pred, index=index, name=\"score\")",
5 "after_line": 18,
6 "start_line": 19,
7 "end_line": 141
8}Tool ResultStep 4
OK: Replaced lines 19..141 with 179 line(s) in qlib/custom_model.py. Editable range: 16–197.
[Current file: qlib/custom_model.py | editable: 16–197 | total: 197 lines]
13|
14| DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
15|
16 # =====================================================================
17 # EDITABLE: CustomModel — implement your stock prediction model here
18 # =====================================================================
19 class CustomModel(Model):
20 """DARTS-GBM: Dual-Objective Adaptive Rank-Temporal Smoothing.
21
22 Novel contribution:
23 - pointwise regression tree for return magnitude
24 - per-date LambdaRank tree on discretized cross-sectional relevance
25 - recency-decayed sample weighting for regime adaptation
26 - validation-selected blending of the two objectives
27 - causal per-instrument EMA smoothing to reduce turnover noise
28
AssistantStep 5test
Tool ResultStep 5
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### csi300 (scripts/csi300.sh)
Signal analysis (csi300):
SIGNAL_METRIC IC=0.040056
SIGNAL_METRIC ICIR=0.310593
SIGNAL_METRIC Rank_IC=0.050734
SIGNAL_METRIC Rank_ICIR=0.410955
Portfolio analysis (csi300):
PORTFOLIO_METRIC annualized_return=0.032381
PORTFOLIO_METRIC max_drawdown=-0.098395
PORTFOLIO_METRIC information_ratio=0.463793
### csi100 (scripts/csi100.sh)
Signal analysis (csi100):
SIGNAL_METRIC IC=0.033114
SIGNAL_METRIC ICIR=0.195395
SIGNAL_METRIC Rank_IC=0.042880
SIGNAL_METRIC Rank_ICIR=0.255892
Portfolio analysis (csi100):
PORTFOLIO_METRIC annualized_return=-0.028188
PORTFOLIO_METRIC max_drawdown=-0.180497
PORTFOLIO_METRIC information_ratio=-0.518470
[Leaderboard] Results saved: {'ic_csi300': 0.040056, 'icir_csi300': 0.310593, 'rank_ic_csi300': 0.050734, 'rank_icir_csi300': 0.410955, 'annualized_return_csi300': 0.032381, 'max_drawdown_csi300': -0.098395, 'information_ratio_csi300': 0.463793, 'elapsed_csi300'
AssistantStep 6submit
Tool Input
1{
2 "n": 2
3}Tool ResultStep 6
[submit] Submitting result from test #2 as final. ### csi300 (scripts/csi300.sh) Signal analysis (csi300): SIGNAL_METRIC IC=0.040056 SIGNAL_METRIC ICIR=0.310593 SIGNAL_METRIC Rank_IC=0.050734 SIGNAL_METRIC Rank_ICIR=0.410955 Portfolio analysis (csi300): PORTFOLIO_METRIC annualized_return=0.032381 PORTFOLIO_METRIC max_drawdown=-0.098395 PORTFOLIO_METRIC information_ratio=0.463793 ### csi100 (scripts/csi100.sh) Signal analysis (csi100): SIGNAL_METRIC IC=0.033114 SIGNAL_METRIC ICIR=0.195395 SIGNAL_METRIC Rank_IC=0.042880 SIGNAL_METRIC Rank_ICIR=0.255892 Portfolio analysis (csi100): PORTFOLIO_METRIC annualized_return=-0.028188 PORTFOLIO_METRIC max_drawdown=-0.180497 PORTFOLIO_METRIC information_ratio=-0.518470 ### csi300_recent (scripts/csi300_recent.sh) Signal analysis (csi300_recent): SIGNAL_METRIC IC=0.022980 SIGNAL_METRIC ICIR=0.161000 SIGNAL_METRIC Rank_IC=0.040731 SIGNAL_METRIC Rank_ICIR=0.290944 Portfolio analysis (csi300_recent): PORTFOLIO_METRIC annualized_return=0.017025 PORTF