ml-anomaly-detection
Description
Unsupervised Anomaly Detection Algorithm Design
Research Question
Design a novel unsupervised anomaly detection algorithm for tabular data that generalizes across datasets with varying dimensionality, sample sizes, and anomaly ratios.
Background
Unsupervised anomaly detection identifies rare, unusual patterns in data without labeled examples. Classic methods include Isolation Forest (tree-based isolation), Local Outlier Factor (density-based), and One-Class SVM (boundary-based). Recent advances include ECOD (empirical cumulative distribution tails, TKDE 2022), COPOD (copula-based tail probabilities, ICDM 2020), and Deep Isolation Forest (representation-enhanced isolation, TKDE 2023). Despite progress, no single method dominates across all dataset characteristics, leaving room for novel algorithmic designs that combine strengths of multiple paradigms.
Task
Implement a custom unsupervised anomaly detection algorithm in the CustomAnomalyDetector class in custom_anomaly.py. Your algorithm should detect anomalies without using any labels during training.
Interface
class CustomAnomalyDetector:
def __init__(self):
# Initialize hyperparameters and internal state
def fit(self, X):
# Train on unlabeled data X: numpy array (n_samples, n_features)
# Data is already standardized (zero mean, unit variance)
return self
def decision_function(self, X):
# Return anomaly scores: numpy array (n_samples,)
# Higher scores = more anomalous
return scores
Available Libraries
numpy,scipy(linear algebra, statistics, spatial, optimization)scikit-learn(PCA, KDE, NearestNeighbors, GaussianMixture, etc.)pyod(IForest, LOF, OCSVM, ECOD, COPOD, KNN, HBOS, PCA, LODA, SUOD, etc.)
Evaluation
Evaluated on 4 tabular anomaly detection benchmarks from ADBench/ODDS:
- Cardio: 1,831 samples, 21 features, ~9.6% anomalies (cardiotocography)
- Thyroid: 3,772 samples, 6 features, ~2.5% anomalies (thyroid disease)
- Satellite: 6,435 samples, 36 features, ~31.6% anomalies (Landsat satellite)
- Shuttle: 49,097 samples, 9 features, ~7.2% anomalies (NASA shuttle)
Metrics (higher is better): AUROC (area under ROC curve) and F1 score at the optimal contamination threshold. Evaluated via a 60/40 stratified train/test split, following the standard ADBench/ECOD paper protocol.
Code
1"""Unsupervised Anomaly Detection Benchmark for MLS-Bench.23FIXED: Data loading, evaluation pipeline, metrics computation.4EDITABLE: CustomAnomalyDetector class — the agent's anomaly detection algorithm.56Usage:7ENV=cardio SEED=42 OUTPUT_DIR=./output python custom_anomaly.py8"""910import os11import sys12import json13import time14import warnings15from pathlib import Path
Results
| Model | Type | auroc cardio ↑ | f1 cardio ↑ | auroc thyroid ↑ | f1 thyroid ↑ | auroc satellite ↑ | f1 satellite ↑ | auroc shuttle ↑ | f1 shuttle ↑ |
|---|---|---|---|---|---|---|---|---|---|
| copod | baseline | 0.921 | 0.532 | 0.939 | 0.180 | 0.634 | 0.481 | 0.995 | 0.950 |
| ecod | baseline | 0.907 | 0.467 | 0.978 | 0.532 | 0.566 | 0.437 | 0.992 | 0.853 |
| isolation_forest | baseline | 0.946 | 0.586 | 0.981 | 0.550 | 0.707 | 0.586 | 0.997 | 0.963 |
| lof | baseline | 0.547 | 0.168 | 0.706 | 0.086 | 0.550 | 0.381 | 0.531 | 0.132 |
| ocsvm | baseline | 0.884 | 0.400 | 0.939 | 0.315 | 0.547 | 0.440 | 0.880 | 0.503 |
| deepseek-reasoner | vanilla | - | - | - | - | - | - | - | - |
| google/gemini-3.1-pro-preview | vanilla | 0.500 | 0.174 | 0.500 | 0.048 | 0.500 | 0.481 | 0.500 | 0.133 |
| openai/gpt-5.4 | vanilla | - | - | - | - | - | - | - | - |
| qwen/qwen3.6-plus | vanilla | - | - | - | - | - | - | - | - |
| deepseek-reasoner | agent | - | - | - | - | - | - | - | - |
| google/gemini-3.1-pro-preview | agent | 0.895 | 0.429 | 0.942 | 0.324 | 0.621 | 0.437 | 0.968 | 0.707 |
| openai/gpt-5.4 | agent | - | - | - | - | - | - | - | - |
| qwen/qwen3.6-plus | agent | - | - | - | - | - | - | - | - |