Agent Conversation: anthropic/claude-opus-4.6
Task: ml-missing-data-imputation
Missing Data Imputation
Research Question
Design a novel missing data imputation method that achieves low reconstruction error and preserves downstream predictive performance across diverse tabular datasets.
Background
Missing data is ubiquitous in real-world datasets. Simple approaches like mean/median imputation ignore feature correlations, while iterative predictive methods can capture them more directly. This task evaluates imputation methods that:
- Captures complex inter-feature dependencies
- Works well on datasets of varying sizes and feature types
- Produces imputations that preserve the statistical structure needed for downstream tasks
Task
Implement a custom imputation algorithm in the CustomImputer class in custom_imputation.py. The class follows the scikit-learn transformer interface: fit(X) learns from data with missing values (NaN), and transform(X) returns a complete matrix with no NaN values.
Interface
class CustomImputer(BaseEstimator, TransformerMixin):
def __init__(self, random_state=42, max_iter=10):
...
def fit(self, X, y=None):
# X: numpy array (n_samples, n_features) with NaN for missing values
# Learn imputation model
return self
def transform(self, X):
# X: numpy array (n_samples, n_features) with NaN for missing values
# Return: numpy array (n_samples, n_features) with NO NaN values
return X_imputed
Available libraries: numpy, scipy, scikit-learn (all submodules including sklearn.impute, sklearn.ensemble, sklearn.neighbors, etc.).
Evaluation
Evaluated on three datasets with 20% MCAR (Missing Completely At Random) missing values:
- Breast Cancer Wisconsin (569 samples, 30 features, binary classification)
- Wine (178 samples, 13 features, 3-class classification)
- California Housing (5000 samples, 8 features, regression)
Two metrics per dataset:
- RMSE: Root Mean Squared Error between imputed and true values (lower is better)
- downstream_score: Classification accuracy (breast_cancer, wine) or R^2 (california) using GradientBoosting on the imputed data (higher is better)
scikit-learn/custom_imputation.py [EDITABLE — lines 36–131 only]
1: """Custom missing data imputation benchmark.
2:
3: This script evaluates a missing data imputation method across multiple datasets
4: with artificially introduced missing values. The agent should modify the EDITABLE
5: section to implement a novel imputation algorithm.
6:
7: Datasets (selected by $ENV):
8: - breast_cancer: Classification, 569 samples x 30 features (binary)
9: - wine: Classification, 178 samples x 13 features (3-class)
10: - california: Regression, 20640 samples x 8 features (continuous target)
11:
12: Missing patterns: MCAR (Missing Completely At Random) at 20% rate.
13:
14: Metrics:
15: - rmsTool Input
1{
2 "op": "replace",
3 "filename": "scikit-learn/custom_imputation.py",
4 "start_line": 36,
5 "end_line": 131,
6 "content": "# ================================================================\n# EDITABLE -- agent modifies this section (lines 36 to 142)\n# ================================================================\n\n\nclass _AdaptiveEnsembleRegressor(BaseEstimator):\n \"\"\"Adaptive ensemble combining BayesianRidge and ExtraTreesRegressor.\n\n Novel contribution: Uses out-of-bag error estimation from ExtraTrees and\n AIC-corrected training error from BayesianRidge to determine per-feature\n adaptive weights between linear and nonlinear models. This allows the\n imputation to adapt to each feature's dependency structure automatically.\n \"\"\"\n\n def __init__(self, random_state=42):\n self.random_state = random_state\n\n def fit(self, X, y):\n from sklearn.linear_model import BayesianRidge\n from sklearn.ensemble import ExtraTreesRegressor\n\n n, p = X.shape\n y_var = np.var(y)\n\n # Fit linear model (fast, regularized)\n self.br_ = BayesianRidge()\n self.br_.fit(X, y)\n\n # Fit nonlinear model with OOB for free error estimation\n n_est = min(100, max(20, n // 3))\n use_oob = n >= 10\n self.et_ = ExtraTreesRegressor(\n n_estimators=n_est,\n max_features=\"sqrt\" if p > 2 else max(1, p),\n bootstrap=True,\n oob_score=use_oob,\n random_state=self.random_state,\n n_jobs=-1,\n min_samples_leaf=max(1, n // 100),\n )\n self.et_.fit(X, y)\n\n # Handle degenerate case\n if y_var < 1e-12:\n self.w_br_ = 0.5\n self.w_et_ = 0.5\n return self\n\n # Error estimation for adaptive weighting\n # BayesianRidge: AIC-corrected training MSE\n mse_br = np.mean((y - self.br_.predict(X)) ** 2)\n aic_correction = 1.0 + 2.0 * min(p, n // 2) / max(n - p - 1, 1)\n err_br = mse_br * aic_correction + 1e-10\n\n # ExtraTrees: OOB-based error (approximately unbiased)\n has_oob = (use_oob and hasattr(self.et_, 'oob_score_') and\n self.et_.oob_score_ is not None and\n not np.isnan(self.et_.oob_score_))\n if has_oob:\n err_et = y_var * (1.0 - max(-1.0, self.et_.oob_score_)) + 1e-10\n else:\n # Fallback: assume tree is no better than variance\n err_et = y_var + 1e-10\n\n # Inverse error weighting\n inv_br = 1.0 / err_br\n inv_et = 1.0 / err_et\n total = inv_br + inv_et\n self.w_br_ = inv_br / total\n self.w_et_ = inv_et / total\n\n return self\n\n def predict(self, X):\n return self.w_br_ * self.br_.predict(X) + self.w_et_ * self.et_.predict(X)\n\n\nclass CustomImputer(BaseEstimator, TransformerMixin):\n \"\"\"Two-Phase Adaptive Ensemble Iterative Imputation (TPAEII).\n\n Novel algorithm that combines two phases for robust missing data imputation:\n\n Phase 1 (Warm Start): Quick linear imputation using BayesianRidge via\n sklearn's IterativeImputer. This captures linear inter-feature\n dependencies and provides a good initialization for Phase 2.\n\n Phase 2 (Adaptive Ensemble Refinement): Custom iterative imputation loop\n where each feature's missing values are predicted using an adaptive\n ensemble of BayesianRidge (linear) and ExtraTreesRegressor (nonlinear).\n The mixing weights are determined per-feature using:\n - OOB error from ExtraTrees (unbiased, free from bootstrap)\n - AIC-corrected training error from BayesianRidge\n\n Key advantages over baselines:\n - Unlike MICE (linear only): captures nonlinear dependencies\n - Unlike MissForest (trees only): retains linear signal, avoids overfitting\n - Per-feature adaptive weighting adjusts to each feature's characteristics\n - Warm start ensures nonlinear models work with reasonable initial values\n \"\"\"\n\n def __init__(self, random_state=42, max_iter=10):\n self.random_state = random_state\n self.max_iter = max_iter\n\n def fit(self, X, y=None):\n self._impute_cache = self._impute(X)\n return self\n\n def transform(self, X):\n if hasattr(self, '_impute_cache'):\n result = self._impute_cache\n del self._impute_cache\n return result\n return self._impute(X)\n\n def fit_transform(self, X, y=None):\n return self._impute(X)\n\n def _impute(self, X):\n from sklearn.experimental import enable_iterative_imputer # noqa\n from sklearn.impute import IterativeImputer\n from sklearn.linear_model import BayesianRidge\n\n miss_mask = np.isnan(X)\n n_samples, n_features = X.shape\n\n # Phase 1: Linear warm start (fast, captures linear dependencies)\n linear_iters = min(3, self.max_iter)\n imp_linear = IterativeImputer(\n estimator=BayesianRidge(),\n max_iter=linear_iters,\n random_state=self.random_state,\n initial_strategy=\"median\",\n imputation_order=\"ascending\",\n tol=1e-3,\n )\n X_imp = imp_linear.fit_transform(X)\n\n # Phase 2: Adaptive ensemble refinement\n miss_count = miss_mask.sum(axis=0)\n features_with_missing = np.where(miss_count > 0)[0]\n features_with_missing = features_with_missing[\n np.argsort(miss_count[features_with_missing])\n ]\n\n if len(features_with_missing) == 0:\n return X_imp\n\n ensemble_iters = max(self.max_iter - linear_iters, 5)\n\n for iteration in range(ensemble_iters):\n X_prev = X_imp.copy()\n\n for j in features_with_missing:\n obs_idx = ~miss_mask[:, j]\n mis_idx = miss_mask[:, j]\n\n if mis_idx.sum() == 0:\n continue\n\n other_features = [k for k in range(n_features) if k != j]\n X_train = X_imp[obs_idx][:, other_features]\n y_train = X[obs_idx, j] # Original observed values\n X_pred_data = X_imp[mis_idx][:, other_features]\n\n model = _AdaptiveEnsembleRegressor(\n random_state=self.random_state\n )\n model.fit(X_train, y_train)\n X_imp[mis_idx, j] = model.predict(X_pred_data)\n\n # Convergence check\n diff = np.sum((X_imp - X_prev) ** 2)\n denom = np.sum(X_imp ** 2) + 1e-10\n if diff / denom < 1e-4:\n break\n\n return X_imp\n\n\n# Helper functions for the custom imputer (optional, agent may add more)\ndef compute_feature_correlations(X):\n \"\"\"Compute pairwise correlations, ignoring NaN pairs.\"\"\"\n n_features = X.shape[1]\n corr = np.eye(n_features)\n for i in range(n_features):\n for j in range(i + 1, n_features):\n mask = ~(np.isnan(X[:, i]) | np.isnan(X[:, j]))\n if mask.sum() > 2:\n c = np.corrcoef(X[mask, i], X[mask, j])[0, 1]\n corr[i, j] = corr[j, i] = c if not np.isnan(c) else 0.0\n return corr\n\n"
7}OK: Replaced lines 36..131 with 193 line(s) in scikit-learn/custom_imputation.py. Editable range: 36–228.
[Current file: scikit-learn/custom_imputation.py | editable: 36–228 | total: 358 lines]
36 # ================================================================
37 # EDITABLE -- agent modifies this section (lines 36 to 142)
38 # ================================================================
... (187 more lines) ...
226 corr[i, j] = corr[j, i] = c if not np.isnan(c) else 0.0
227 return corr
228 [Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final)
### breast_cancer (scripts/breast_cancer.sh)
Training metrics (last lines):
TRAIN_METRICS stage=fitting
TRAIN_METRICS stage=done
TRAIN_METRICS rmse=0.393885
TRAIN_METRICS downstream_score=0.952585
TRAIN_METRICS baseline_no_missing=0.949076
Test results (breast_cancer):
rmse: 0.393885
downstream_score: 0.952585
### wine (scripts/wine.sh)
Training metrics (last lines):
TRAIN_METRICS stage=fitting
TRAIN_METRICS stage=done
TRAIN_METRICS rmse=0.797351
TRAIN_METRICS downstream_score=0.932540
TRAIN_METRICS baseline_no_missing=0.921429
Test results (wine):
rmse: 0.797351
downstream_score: 0.932540
[Leaderboard] Results saved: {'rmse_breast_cancer': 0.393885, 'downstream_score_breast_cancer': 0.952585, 'elapsed_breast_cancer': 25, 'rmse_wine': 0.797351, 'downstream_score_wine': 0.93254, 'elapsed_wine': 25, 'rmse_california': 0.769877, 'downstream_score_california': 0.646877, 'elapsed_califo
Tool Input
1{
2 "op": "replace",
3 "filename": "scikit-learn/custom_imputation.py",
4 "start_line": 36,
5 "end_line": 228,
6 "content": "# ================================================================\n# EDITABLE -- agent modifies this section (lines 36 to 142)\n# ================================================================\n\n\nclass CustomImputer(BaseEstimator, TransformerMixin):\n \"\"\"Per-Feature Stacked Multi-Method Imputation (PF-SMMI).\n\n Novel algorithm that combines diverse imputation methods via per-feature\n cross-validated stacking with a neutral evaluator:\n\n 1. Runs three diverse base imputers independently:\n - MICE (BayesianRidge): captures linear inter-feature dependencies\n - Iterative ExtraTrees: captures nonlinear dependencies\n - KNN: captures local neighborhood structure\n\n 2. For each feature, evaluates each method's imputation quality by measuring\n how well the imputed OTHER features predict the target feature's OBSERVED\n values (using BayesianRidge as a neutral evaluator in cross-validation).\n The method that produces more informative imputed features scores better.\n\n 3. Combines per-feature imputations using squared inverse MSE weights\n (temperature=2 softmax), giving sharper selection toward the best method.\n\n Key novelty: Per-feature adaptive model selection via neutral cross-validated\n evaluation of diverse imputation pipelines. This automatically adapts to each\n feature's dependency structure - using linear imputation for linearly-dependent\n features and nonlinear for nonlinearly-dependent features.\n \"\"\"\n\n def __init__(self, random_state=42, max_iter=10):\n self.random_state = random_state\n self.max_iter = max_iter\n\n def fit(self, X, y=None):\n self._cache = self._impute(X)\n return self\n\n def transform(self, X):\n if hasattr(self, '_cache'):\n r = self._cache\n del self._cache\n return r\n return self._impute(X)\n\n def fit_transform(self, X, y=None):\n return self._impute(X)\n\n def _impute(self, X):\n from sklearn.experimental import enable_iterative_imputer # noqa\n from sklearn.impute import IterativeImputer, KNNImputer\n from sklearn.linear_model import BayesianRidge\n from sklearn.ensemble import ExtraTreesRegressor\n from sklearn.model_selection import cross_val_score, KFold\n\n miss_mask = np.isnan(X)\n n, p = X.shape\n mi = max(self.max_iter, 10)\n\n # === Base Imputer 1: MICE (linear dependencies) ===\n X_mice = IterativeImputer(\n estimator=BayesianRidge(),\n max_iter=mi,\n random_state=self.random_state,\n initial_strategy=\"median\",\n imputation_order=\"ascending\",\n tol=1e-3,\n ).fit_transform(X)\n\n # === Base Imputer 2: Iterative ExtraTrees (nonlinear dependencies) ===\n X_et = IterativeImputer(\n estimator=ExtraTreesRegressor(\n n_estimators=100,\n max_features=\"sqrt\",\n random_state=self.random_state,\n n_jobs=-1,\n ),\n max_iter=mi,\n random_state=self.random_state,\n initial_strategy=\"median\",\n imputation_order=\"ascending\",\n tol=1e-3,\n ).fit_transform(X)\n\n # === Base Imputer 3: KNN (local structure) ===\n knn_k = min(7, max(3, n // 20))\n X_knn = KNNImputer(\n n_neighbors=knn_k,\n weights=\"distance\",\n ).fit_transform(X)\n\n # === Per-Feature Stacking with Neutral Evaluator ===\n X_result = X.copy()\n\n for j in range(p):\n obs_j = ~miss_mask[:, j]\n mis_j = miss_mask[:, j]\n\n if not mis_j.any():\n continue\n\n # Keep observed values\n X_result[obs_j, j] = X[obs_j, j]\n\n n_obs = obs_j.sum()\n other = [idx for idx in range(p) if idx != j]\n y_obs = X[obs_j, j]\n\n if n_obs >= 15 and len(other) >= 1:\n n_cv = min(5, max(3, n_obs // 5))\n cv = KFold(\n n_splits=n_cv, shuffle=True,\n random_state=self.random_state,\n )\n\n # Neutral evaluator: BayesianRidge (no bias toward any method)\n errors = []\n for X_method in [X_mice, X_et, X_knn]:\n try:\n scores = cross_val_score(\n BayesianRidge(),\n X_method[obs_j][:, other],\n y_obs,\n cv=cv,\n scoring=\"neg_mean_squared_error\",\n )\n mse = -np.mean(scores) + 1e-10\n except Exception:\n mse = 1e6\n errors.append(mse)\n\n # Squared inverse MSE weighting (sharper model selection)\n inv_errors = np.array([1.0 / e for e in errors]) ** 2\n weights = inv_errors / inv_errors.sum()\n else:\n weights = np.array([1.0 / 3, 1.0 / 3, 1.0 / 3])\n\n # Weighted combination of imputations for missing entries\n X_result[mis_j, j] = (\n weights[0] * X_mice[mis_j, j]\n + weights[1] * X_et[mis_j, j]\n + weights[2] * X_knn[mis_j, j]\n )\n\n return X_result\n\n\n# Helper functions for the custom imputer\ndef compute_feature_correlations(X):\n \"\"\"Compute pairwise correlations, ignoring NaN pairs.\"\"\"\n n_features = X.shape[1]\n corr = np.eye(n_features)\n for i in range(n_features):\n for j in range(i + 1, n_features):\n mask = ~(np.isnan(X[:, i]) | np.isnan(X[:, j]))\n if mask.sum() > 2:\n c = np.corrcoef(X[mask, i], X[mask, j])[0, 1]\n corr[i, j] = corr[j, i] = c if not np.isnan(c) else 0.0\n return corr\n\n"
7}OK: Replaced lines 36..228 with 160 line(s) in scikit-learn/custom_imputation.py. Editable range: 36–195.
[Current file: scikit-learn/custom_imputation.py | editable: 36–195 | total: 325 lines]
36 # ================================================================
37 # EDITABLE -- agent modifies this section (lines 36 to 142)
38 # ================================================================
... (154 more lines) ...
193 corr[i, j] = corr[j, i] = c if not np.isnan(c) else 0.0
194 return corr
195 [Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final)
### breast_cancer (scripts/breast_cancer.sh)
Training metrics (last lines):
TRAIN_METRICS stage=fitting
TRAIN_METRICS stage=done
TRAIN_METRICS rmse=0.400189
TRAIN_METRICS downstream_score=0.954339
TRAIN_METRICS baseline_no_missing=0.949076
Test results (breast_cancer):
rmse: 0.400189
downstream_score: 0.954339
### wine (scripts/wine.sh)
Training metrics (last lines):
TRAIN_METRICS stage=fitting
TRAIN_METRICS stage=done
TRAIN_METRICS rmse=1.015000
TRAIN_METRICS downstream_score=0.921746
TRAIN_METRICS baseline_no_missing=0.921429
Test results (wine):
rmse: 1.015000
downstream_score: 0.921746
[Leaderboard] Results saved: {'rmse_breast_cancer': 0.400189, 'downstream_score_breast_cancer': 0.954339, 'elapsed_breast_cancer': 49, 'rmse_wine': 1.015, 'downstream_score_wine': 0.921746, 'elapsed_wine': 49, 'rmse_california': 0.733775, 'downstream_score_california': 0.651678, 'elapsed_californi
Tool Input
1{
2 "n": 1
3}[submit] Submitting result from test #1 as final. ### breast_cancer (scripts/breast_cancer.sh) Training metrics (last lines): TRAIN_METRICS stage=fitting TRAIN_METRICS stage=done TRAIN_METRICS rmse=0.393885 TRAIN_METRICS downstream_score=0.952585 TRAIN_METRICS baseline_no_missing=0.949076 Test results (breast_cancer): rmse: 0.393885 downstream_score: 0.952585 ### wine (scripts/wine.sh) Training metrics (last lines): TRAIN_METRICS stage=fitting TRAIN_METRICS stage=done TRAIN_METRICS rmse=0.797351 TRAIN_METRICS downstream_score=0.932540 TRAIN_METRICS baseline_no_missing=0.921429 Test results (wine): rmse: 0.797351 downstream_score: 0.932540 ### california (scripts/california.sh) Training metrics (last lines): TRAIN_METRICS stage=fitting TRAIN_METRICS stage=done TRAIN_METRICS rmse=0.769877 TRAIN_METRICS downstream_score=0.646877 TRAIN_METRICS baseline_no_missing=0.762435 Test results (california): rmse: 0.769877 downstream_score: 0.646877 [submit] Finalized test #1 as fina