Agent Conversation: qwen/qwen3.6-plus

Model: qwen/qwen3.6-plus
Total steps: 20
Messages: 42
UserStep 0

Task: ml-symbolic-regression

Symbolic Regression: GP Search Strategy

Objective

Design and implement a better genetic programming search strategy for symbolic regression. Your code goes in custom_sr.py. Three reference implementations (Standard GP, Parsimony GP, Lexicase GP) demonstrate different approaches.

Background

Symbolic regression discovers mathematical expressions that fit data. Genetic programming evolves a population of expression trees through selection, crossover, and mutation. Key challenges include balancing exploration vs exploitation, controlling expression complexity (bloat), and escaping local optima. Different approaches address these through fitness shaping, novel selection mechanisms, or improved genetic operators.

Evaluation

Tested on three standard symbolic regression benchmarks: Nguyen-7 (univariate transcendental), Nguyen-10 (bivariate trigonometric), Koza-3 (univariate polynomial). Metric: R² on held-out test set (higher is better).

gplearn/custom_sr.py [EDITABLE — lines 228–306 only]

     1: #!/usr/bin/env python3
     2: """Symbolic Regression via Genetic Programming.
     3: 
     4: A self-contained GP framework for symbolic regression benchmarks.
     5: The editable section contains the search strategy: fitness function,
     6: selection, crossover, mutation, and per-generation evolution logic.
     7: """
     8: 
     9: import argparse
    10: import math
    11: import random
    12: import sys
    13: import os
    14: import numpy as np
    15: 
    16: 
    17: # ============================================================
    18: # Operator Definitions (FIXED)
    19: # ============================================================
    20: 
    21: def protected_div(a, b):
    22:     """Protected division: returns 1.0 when divisor is near zero."""
    23:     return np.where(np.abs(b) > 1e-10, a / b, 1.0)
    24: 
    25: 
    26: def protected_log(a):
    27:     """Protected log: returns 0.0 for non-positive inputs."""
    28:     return np.where(np.abs(a) > 1e-10, np.log(np.abs(a)), 0.0)
    29: 
    30: 
    31: def protected_exp(a):
    32:     """Protected exp: clips input to prevent overflow."""
    33:     return np.exp(np.clip(a, -10, 10))
    34: 
    35: 
    36: OPERATORS = {
    37:     'add': (np.add, 2),
    38:     'sub': (np.subtract, 2),
    39:     'mul': (np.multiply, 2),
    40:     'div': (protected_div, 2),
    41:     'sin': (np.sin, 1),
    42:     'cos': (np.cos, 1),
    43:     'log': (protected_log, 1),
    44:     'exp': (protected_exp, 1),
    45: }
    46: 
    47: OPERATOR_NAMES = list(OPERATORS.keys())
    48: 
    49: 
    50: # ============================================================
    51: # Tree Representation (FIXED)
    52: # ============================================================
    53: 
    54: class Node:
    55:     """A node in the GP expression tree."""
    56:     __slots__ = ('value', 'children')
    57: 
    58:   
AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "LLaMA-Factory/gplearn/custom_sr.py",
4  "start_line": 228,
5  "end_line": 306,
6  "content": "def fitness_function(tree, X, y):\n    \"\"\"Evaluate fitness of a candidate program. Lower is better.\"\"\"\n    y_pred = safe_evaluate(tree, X)\n    return float(np.mean((y - y_pred) ** 2))\n\n\ndef _hash_fitness(fitness, num_bins=15):\n    \"\"\"Hash a tree's behavioral profile into a niche ID.\n    \n    Uses log-scale MSE quantization to group similar-fit individuals\n    into niches for fitness sharing.\n    \"\"\"\n    if fitness <= 0:\n        return num_bins\n    log_fit = np.log1p(fitness)\n    return min(int(np.log(1 + 1e-6 + fitness) * num_bins / np.log(1 + 1e6 + max(fitness, 1e-6)) * num_bins), num_bins)\n\n\ndef _compute_niche_fitness(fitnesses, population, X):\n    \"\"\"Compute shared fitness using behavioral niche hashing.\n    \n    Fitness is shared among individuals in the same niche, penalizing\n    overcrowding and encouraging exploration of underrepresented regions.\n    \"\"\"\n    n = len(population)\n    shared = np.zeros(n)\n    \n    # Hash each tree into a niche based on its prediction profile\n    niches = {}\n    # Use binned absolute errors per case as behavioral signature\n    for i, tree in enumerate(population):\n        y_pred = safe_evaluate(tree, X)\n        errors = np.abs(y_pred - X.dot(np.zeros(X.shape[1])) if False else y_pred)\n        # Behavioral hash: quantized error sum\n        error_sum = float(np.sum(np.abs(errors)))\n        niche_id = _hash_fitness(fitnesses[i], num_bins=20)\n        if niche_id not in niches:\n            niches[niche_id] = []\n        niches[niche_id].append(i)\n    \n    # Apply fitness sharing within niches\n    for niche_id, indices in niches.items():\n        niche_size = len(indices)\n        sharing_factor = niche_size ** 0.7  # sublinear penalty\n        for idx in indices:\n            shared[idx] = fitnesses[idx] * sharing_factor\n    \n    return shared.tolist()\n\n\ndef selection(population, fitnesses, n_select, niche_fitnesses=None, \n              per_case_errors=None):\n    \"\"\"Hybrid selection combining fitness sharing with epsilon-lexicase.\n    \n    Uses two-stage selection:\n    1. Fitness sharing niche preservation (50% of selections)\n    2. Adaptive epsilon-lexicase on per-case errors (50% of selections)\n    \"\"\"\n    selected = []\n    pop_size = len(population)\n    if pop_size == 0:\n        return selected\n    \n    n_share = n_select // 2\n    n_lex = n_select - n_share\n    \n    # --- Stage 1: Niche-preserved tournament selection ---\n    used_fitnesses = niche_fitnesses if niche_fitnesses is not None else fitnesses\n    for _ in range(n_share):\n        tourn_size = min(5, pop_size)\n        candidates = random.sample(range(pop_size), tourn_size)\n        best = min(candidates, key=lambda i: used_fitnesses[i])\n        selected.append(population[best].copy())\n    \n    # --- Stage 2: Adaptive epsilon-lexicase selection ---\n    if per_case_errors is not None and n_lex > 0:\n        n_cases = per_case_errors.shape[1]\n        for _ in range(n_lex):\n            candidates = list(range(pop_size))\n            cases = list(range(n_cases))\n            random.shuffle(cases)\n            \n            # Adaptive epsilon: tighter for easy cases, looser for hard cases\n            for case in cases:\n                if len(candidates) <= 1:\n                    break\n                case_errors = per_case_errors[candidates, case]\n                median_err = float(np.median(case_errors))\n                # Case-specific epsilon based on difficulty\n                easy_threshold = float(np.percentile(case_errors, 25))\n                hard_threshold = float(np.percentile(case_errors, 75))\n                epsilon = easy_threshold + 0.5 * (hard_threshold - easy_threshold)\n                epsilon = max(epsilon, 1e-10)\n                \n                candidates = [c for c, e in zip(candidates, case_errors) \n                              if e <= median_err + epsilon]\n                if len(candidates) == 0:\n                    candidates = list(range(pop_size))\n            \n            winner = random.choice(candidates) if candidates else candidates[0]\n            selected.append(population[winner].copy())\n    \n    return selected\n\n\ndef _subtree_semantic_hash(subtree, hash_seed=42):\n    \"\"\"Compute structural hash of a subtree for fast similarity check.\"\"\"\n    if subtree.is_terminal:\n        return hash((subtree.value, hash_seed))\n    child_hashes = tuple(_subtree_semantic_hash(c, hash_seed) for c in subtree.children)\n    return hash((subtree.value, child_hashes, hash_seed))\n\n\ndef crossover(parent1, parent2, n_features, max_depth=17):\n    \"\"\"Semantic-aware subtree crossover with similarity avoidance.\n    \n    Avoids swapping semantically identical subtrees (wasteful crossover)\n    and prefers swaps between moderately different subtrees.\n    \"\"\"\n    offspring = parent1.copy()\n    donor = parent2.copy()\n    \n    off_size = offspring.size()\n    don_size = donor.size()\n    if off_size <= 1 or don_size <= 1:\n        return offspring\n    \n    # Try multiple crossover points to find a semantically different swap\n    max_attempts = 5\n    for attempt in range(max_attempts):\n        off_point = random.randint(1, off_size - 1)\n        don_point = random.randint(0, don_size - 1)\n        \n        # Get subtrees at chosen points\n        donor_nodes = donor.get_all_nodes()\n        donor_sub = donor_nodes[don_point][0]\n        off_nodes = offspring.get_all_nodes()\n        off_node, parent_ref, child_idx = off_nodes[off_point]\n        \n        # Quick structural hash check - skip if subtrees are identical\n        off_hash = _subtree_semantic_hash(off_node, hash_seed=off_point % 1000)\n        don_hash = _subtree_semantic_hash(donor_sub, hash_seed=don_point % 1000)\n        if off_hash == don_hash:\n            # Try structurally different subtrees if possible\n            if attempt < max_attempts - 1:\n                continue\n        \n        # Perform the swap\n        donor_sub_copy = donor_sub.copy()\n        if parent_ref is not None:\n            parent_ref.children[child_idx] = donor_sub_copy\n        else:\n            offspring = donor_sub_copy\n        \n        # Depth check\n        if offspring.depth() > max_depth:\n            if attempt < max_attempts - 1:\n                offspring = parent1.copy()\n                continue\n            return parent1.copy()\n        \n        return offspring\n    \n    return parent1.copy()\n\n\ndef mutation(parent, n_features, max_depth=17):\n    \"\"\"Subtree mutation with adaptive depth based on parent depth.\"\"\"\n    offspring = parent.copy()\n    tree_size = offspring.size()\n    if tree_size <= 1:\n        return generate_tree('grow', 3, n_features)\n    \n    # Adaptive mutation depth: deeper parents get smaller random subtrees\n    parent_depth = offspring.depth()\n    mut_depth = max(2, min(4, 6 - parent_depth // 3))\n    \n    mut_point = random.randint(1, tree_size - 1)\n    new_subtree = generate_tree('grow', mut_depth, n_features)\n    \n    nodes = offspring.get_all_nodes()\n    node, par, child_idx = nodes[mut_point]\n    if par is not None:\n        par.children[child_idx] = new_subtree\n    else:\n        offspring = new_subtree\n    \n    if offspring.depth() > max_depth:\n        return parent.copy()\n    \n    return offspring\n\n\ndef evolve_one_generation(population, fitnesses, X_train, y_train,\n                          n_features, pop_size,\n                          crossover_rate=0.9, mutation_rate=0.05,\n                          max_depth=17):\n    \"\"\"Generation with hybrid fitness-sharing + adaptive lexicase selection\n    and semantic-aware genetic operators.\n    \n    Features:\n    1. Adaptive parsimony pressure for bloat control\n    2. Fitness sharing niche preservation\n    3. Hybrid selection (fitness sharing + lexicase)\n    4. Semantic-aware crossover\n    5. Adaptive diversity-based mutation boost\n    \"\"\"\n    new_population = []\n    \n    # --- Adaptive parsimony coefficient ---\n    lengths = np.array([tree.size() for tree in population], dtype=float)\n    raw_fit = np.array(fitnesses, dtype=float)\n    len_var = float(np.var(lengths))\n    if len_var > 1e-15:\n        parsimony_coeff = float(np.cov(lengths, raw_fit)[1, 0]) / len_var\n        parsimony_coeff = max(parsimony_coeff, 0.0)\n        parsimony_coeff = min(parsimony_coeff, 0.001)\n    else:\n        parsimony_coeff = 0.0\n    \n    # --- Compute niche fitnesses ---\n    niche_fitnesses = _compute_niche_fitness(fitnesses, population, X_train)\n    \n    # --- Penalized fitnesses for parsimony ---\n    penalized = [f + parsimony_coeff * l for f, l in zip(niche_fitnesses, lengths)]\n    \n    # --- Pre-compute per-case errors for lexicase ---\n    per_case_errors = np.empty((len(population), y_train.size))\n    for i, tree in enumerate(population):\n        y_pred = safe_evaluate(tree, X_train)\n        per_case_errors[i] = np.abs(y_train - y_pred)\n    \n    # --- Elitism: preserve best by raw fitness ---\n    elite_idx = int(np.argmin(fitnesses))\n    new_population.append(population[elite_idx].copy())\n    \n    # --- Adaptive mutation rate based on fitness diversity ---\n    if len(fitnesses) > 1:\n        fit_std = float(np.std(fitnesses))\n        fit_mean = float(np.mean(fitnesses)) + 1e-15\n        cv = fit_std / fit_mean  # coefficient of variation\n        # Low diversity -> increase mutation\n        if cv < 0.1:\n            effective_mut = min(mutation_rate * 3, 0.3)\n        elif cv < 0.3:\n            effective_mut = mutation_rate * 1.5\n        else:\n            effective_mut = mutation_rate\n    else:\n        effective_mut = mutation_rate\n    \n    # --- Generate offspring ---\n    while len(new_population) < pop_size:\n        r = random.random()\n        if r < crossover_rate:\n            parents = selection(population, fitnesses, 2, \n                              niche_fitnesses=penalized,\n                              per_case_errors=per_case_errors)\n            child = crossover(parents[0], parents[1], n_features, max_depth)\n        elif r < crossover_rate + effective_mut:\n            parents = selection(population, fitnesses, 1,\n                              niche_fitnesses=pe\n... (truncated)"
7}
Tool ResultStep 1
ERROR: Package 'LLaMA-Factory' is not in allowed packages
AssistantStep 2edit
Tool Input
1{
2  "content": "def fitness_function(tree, X, y):\n    \"\"\"Evaluate fitness of a candidate program. Lower is better.\"\"\"\n    y_pred = safe_evaluate(tree, X)\n    return float(np.mean((y - y_pred) ** 2))\n\n\ndef _compute_niche_fitness(fitnesses, population, X):\n    \"\"\"Compute shared fitness using behavioral niche hashing.\n    \n    Fitness is shared among individuals in the same niche, penalizing\n    overcrowding and encouraging exploration of underrepresented regions.\n    \"\"\"\n    n = len(population)\n    shared = np.zeros(n)\n    \n    # Hash each tree into a niche based on fitness quantiles\n    sorted_idx = np.argsort(fitnesses)\n    n_bins = min(20, n)\n    bin_edges = np.linspace(0, n - 1, n_bins + 1, dtype=int)\n    \n    niche_membership = np.zeros(n, dtype=int)\n    for b in range(n_bins):\n        start, end = bin_edges[b], bin_edges[b + 1]\n        for i in sorted_idx[start:end]:\n            niche_membership[i] = b\n    \n    # Compute niche sizes and apply sharing\n    for niche_id in range(n_bins):\n        indices = np.where(niche_membership == niche_id)[0]\n        niche_size = len(indices)\n        # Sublinear sharing penalty - moderate overcrowding penalty\n        sharing_factor = niche_size ** 0.5\n        for idx in indices:\n            shared[idx] = fitnesses[idx] * sharing_factor\n    \n    return shared.tolist()\n\n\ndef selection(population, fitnesses, n_select, niche_fitnesses=None, \n              per_case_errors=None):\n    \"\"\"Hybrid selection combining fitness sharing with epsilon-lexicase.\n    \n    Uses two-stage selection:\n    1. Fitness sharing niche preservation (50% of selections)\n    2. Adaptive epsilon-lexicase on per-case errors (50% of selections)\n    \"\"\"\n    selected = []\n    pop_size = len(population)\n    if pop_size == 0:\n        return selected\n    \n    n_share = n_select // 2\n    n_lex = n_select - n_share\n    \n    # --- Stage 1: Niche-preserved tournament selection ---\n    used_fitnesses = niche_fitnesses if niche_fitnesses is not None else fitnesses\n    for _ in range(n_share):\n        tourn_size = min(5, pop_size)\n        candidates = random.sample(range(pop_size), tourn_size)\n        best = min(candidates, key=lambda i: used_fitnesses[i])\n        selected.append(population[best].copy())\n    \n    # --- Stage 2: Adaptive epsilon-lexicase selection ---\n    if per_case_errors is not None and n_lex > 0:\n        n_cases = per_case_errors.shape[1]\n        for _ in range(n_lex):\n            candidates = list(range(pop_size))\n            cases = list(range(n_cases))\n            random.shuffle(cases)\n            \n            for case in cases:\n                if len(candidates) <= 1:\n                    break\n                case_errors = per_case_errors[candidates, case]\n                median_err = float(np.median(case_errors))\n                # Adaptive epsilon based on error spread\n                easy_threshold = float(np.percentile(case_errors, 25))\n                hard_threshold = float(np.percentile(case_errors, 75))\n                epsilon = easy_threshold + 0.5 * (hard_threshold - easy_threshold)\n                epsilon = max(epsilon, 1e-10)\n                \n                candidates = [c for c, e in zip(candidates, case_errors) \n                              if e <= median_err + epsilon]\n                if len(candidates) == 0:\n                    candidates = list(range(pop_size))\n            \n            winner = random.choice(candidates) if candidates else 0\n            selected.append(population[winner].copy())\n    \n    return selected\n\n\ndef _semantic_hash(subtree):\n    \"\"\"Compute structural hash of a subtree for fast similarity check.\"\"\"\n    if subtree.is_terminal:\n        return hash((0, subtree.value))\n    child_hashes = tuple(_semantic_hash(c) for c in subtree.children)\n    return hash((1, subtree.value, child_hashes))\n\n\ndef crossover(parent1, parent2, n_features, max_depth=17):\n    \"\"\"Semantic-aware subtree crossover with similarity avoidance.\n    \n    Avoids swapping structurally identical subtrees and tries multiple\n    crossover points to find semantically different swaps.\n    \"\"\"\n    offspring = parent1.copy()\n    donor = parent2.copy()\n    \n    off_size = offspring.size()\n    don_size = donor.size()\n    if off_size <= 1 or don_size <= 1:\n        return offspring\n    \n    max_attempts = 5\n    for attempt in range(max_attempts):\n        off_point = random.randint(1, off_size - 1)\n        don_point = random.randint(0, don_size - 1)\n        \n        donor_nodes = donor.get_all_nodes()\n        donor_sub = donor_nodes[don_point][0]\n        off_nodes = offspring.get_all_nodes()\n        off_node, parent_ref, child_idx = off_nodes[off_point]\n        \n        # Structural hash check\n        off_hash = _semantic_hash(off_node)\n        don_hash = _semantic_hash(donor_sub)\n        if off_hash == don_hash:\n            if attempt < max_attempts - 1:\n                continue\n        \n        # Perform the swap\n        donor_sub_copy = donor_sub.copy()\n        if parent_ref is not None:\n            parent_ref.children[child_idx] = donor_sub_copy\n        else:\n            offspring = donor_sub_copy\n        \n        # Depth check\n        if offspring.depth() > max_depth:\n            if attempt < max_attempts - 1:\n                offspring = parent1.copy()\n                continue\n            return parent1.copy()\n        \n        return offspring\n    \n    return parent1.copy()\n\n\ndef mutation(parent, n_features, max_depth=17):\n    \"\"\"Subtree mutation with adaptive depth based on parent depth.\"\"\"\n    offspring = parent.copy()\n    tree_size = offspring.size()\n    if tree_size <= 1:\n        return generate_tree('grow', 3, n_features)\n    \n    # Adaptive mutation depth\n    parent_depth = offspring.depth()\n    mut_depth = max(2, min(4, 6 - parent_depth // 3))\n    \n    mut_point = random.randint(1, tree_size - 1)\n    new_subtree = generate_tree('grow', mut_depth, n_features)\n    \n    nodes = offspring.get_all_nodes()\n    node, par, child_idx = nodes[mut_point]\n    if par is not None:\n        par.children[child_idx] = new_subtree\n    else:\n        offspring = new_subtree\n    \n    if offspring.depth() > max_depth:\n        return parent.copy()\n    \n    return offspring\n\n\ndef evolve_one_generation(population, fitnesses, X_train, y_train,\n                          n_features, pop_size,\n                          crossover_rate=0.9, mutation_rate=0.05,\n                          max_depth=17):\n    \"\"\"Generation with hybrid fitness-sharing + adaptive lexicase selection\n    and semantic-aware genetic operators.\n    \n    Features:\n    1. Adaptive parsimony pressure for bloat control\n    2. Fitness sharing niche preservation\n    3. Hybrid selection (fitness sharing + lexicase)\n    4. Semantic-aware crossover\n    5. Adaptive diversity-based mutation boost\n    \"\"\"\n    new_population = []\n    \n    # --- Adaptive parsimony coefficient ---\n    lengths = np.array([tree.size() for tree in population], dtype=float)\n    raw_fit = np.array(fitnesses, dtype=float)\n    len_var = float(np.var(lengths))\n    if len_var > 1e-15:\n        parsimony_coeff = float(np.cov(lengths, raw_fit)[1, 0]) / len_var\n        parsimony_coeff = max(parsimony_coeff, 0.0)\n        parsimony_coeff = min(parsimony_coeff, 0.001)\n    else:\n        parsimony_coeff = 0.0\n    \n    # --- Compute niche fitnesses ---\n    niche_fitnesses = _compute_niche_fitness(fitnesses, population, X_train)\n    \n    # --- Penalized fitnesses for parsimony ---\n    penalized = [f + parsimony_coeff * l for f, l in zip(niche_fitnesses, lengths)]\n    \n    # --- Pre-compute per-case errors for lexicase ---\n    errors = np.empty((len(population), y_train.size))\n    for i, tree in enumerate(population):\n        y_pred = safe_evaluate(tree, X_train)\n        errors[i] = np.abs(y_train - y_pred)\n    \n    # --- Elitism: preserve best by raw fitness ---\n    elite_idx = int(np.argmin(fitnesses))\n    new_population.append(population[elite_idx].copy())\n    \n    # --- Adaptive mutation rate based on fitness diversity ---\n    if len(fitnesses) > 1:\n        fit_std = float(np.std(fitnesses))\n        fit_mean = float(np.mean(fitnesses)) + 1e-15\n        cv = fit_std / fit_mean\n        if cv < 0.1:\n            effective_mut = min(mutation_rate * 3, 0.3)\n        elif cv < 0.3:\n            effective_mut = mutation_rate * 1.5\n        else:\n            effective_mut = mutation_rate\n    else:\n        effective_mut = mutation_rate\n    \n    # --- Generate offspring ---\n    while len(new_population) < pop_size:\n        r = random.random()\n        if r < crossover_rate:\n            parents = selection(population, fitnesses, 2, \n                              niche_fitnesses=penalized,\n                              per_case_errors=errors)\n            child = crossover(parents[0], parents[1], n_features, max_depth)\n        elif r < crossover_rate + effective_mut:\n            parents = selection(population, fitnesses, 1,\n                              niche_fitnesses=penalized,\n                              per_case_errors=errors)\n            child = mutation(parents[0], n_features, max_depth)\n        else:\n            parents = selection(population, fitnesses, 1,\n                              niche_fitnesses=penalized,\n                              per_case_errors=errors)\n            child = parents[0]\n        new_population.append(child)\n    \n    return new_population[:pop_size]",
3  "end_line": 306,
4  "filename": "custom_sr.py",
5  "start_line": 228
6}
Tool ResultStep 2
ERROR: Invalid arguments for 'edit': WorkspaceTools.edit() missing 1 required positional argument: 'op'
AssistantStep 3edit
Tool Input
1{
2  "filename": "gplearn/custom_sr.py",
3  "op": "replace",
4  "content": "def fitness_function(tree, X, y):\n    \"\"\"Evaluate fitness of a candidate program. Lower is better.\"\"\"\n    y_pred = safe_evaluate(tree, X)\n    return float(np.mean((y - y_pred) ** 2))\n\n\ndef _compute_niche_fitness(fitnesses, population, X):\n    \"\"\"Compute shared fitness using behavioral niche hashing.\n    \n    Fitness is shared among individuals in the same niche, penalizing\n    overcrowding and encouraging exploration of underrepresented regions.\n    \"\"\"\n    n = len(population)\n    shared = np.zeros(n)\n    \n    # Hash each tree into a niche based on fitness quantiles\n    sorted_idx = np.argsort(fitnesses)\n    n_bins = min(20, n)\n    bin_edges = np.linspace(0, n - 1, n_bins + 1, dtype=int)\n    \n    niche_membership = np.zeros(n, dtype=int)\n    for b in range(n_bins):\n        start, end = bin_edges[b], bin_edges[b + 1]\n        for i in sorted_idx[start:end]:\n            niche_membership[i] = b\n    \n    # Compute niche sizes and apply sharing\n    for niche_id in range(n_bins):\n        indices = np.where(niche_membership == niche_id)[0]\n        niche_size = len(indices)\n        # Sublinear sharing penalty - moderate overcrowding penalty\n        sharing_factor = niche_size ** 0.5\n        for idx in indices:\n            shared[idx] = fitnesses[idx] * sharing_factor\n    \n    return shared.tolist()\n\n\ndef selection(population, fitnesses, n_select, niche_fitnesses=None, \n              per_case_errors=None):\n    \"\"\"Hybrid selection combining fitness sharing with epsilon-lexicase.\n    \n    Uses two-stage selection:\n    1. Fitness sharing niche preservation (50% of selections)\n    2. Adaptive epsilon-lexicase on per-case errors (50% of selections)\n    \"\"\"\n    selected = []\n    pop_size = len(population)\n    if pop_size == 0:\n        return selected\n    \n    n_share = n_select // 2\n    n_lex = n_select - n_share\n    \n    # --- Stage 1: Niche-preserved tournament selection ---\n    used_fitnesses = niche_fitnesses if niche_fitnesses is not None else fitnesses\n    for _ in range(n_share):\n        tourn_size = min(5, pop_size)\n        candidates = random.sample(range(pop_size), tourn_size)\n        best = min(candidates, key=lambda i: used_fitnesses[i])\n        selected.append(population[best].copy())\n    \n    # --- Stage 2: Adaptive epsilon-lexicase selection ---\n    if per_case_errors is not None and n_lex > 0:\n        n_cases = per_case_errors.shape[1]\n        for _ in range(n_lex):\n            candidates = list(range(pop_size))\n            cases = list(range(n_cases))\n            random.shuffle(cases)\n            \n            for case in cases:\n                if len(candidates) <= 1:\n                    break\n                case_errors = per_case_errors[candidates, case]\n                median_err = float(np.median(case_errors))\n                # Adaptive epsilon based on error spread\n                easy_threshold = float(np.percentile(case_errors, 25))\n                hard_threshold = float(np.percentile(case_errors, 75))\n                epsilon = easy_threshold + 0.5 * (hard_threshold - easy_threshold)\n                epsilon = max(epsilon, 1e-10)\n                \n                candidates = [c for c, e in zip(candidates, case_errors) \n                              if e <= median_err + epsilon]\n                if len(candidates) == 0:\n                    candidates = list(range(pop_size))\n            \n            winner = random.choice(candidates) if candidates else 0\n            selected.append(population[winger].copy())\n    \n    return selected\n\n\ndef _semantic_hash(subtree):\n    \"\"\"Compute structural hash of a subtree for fast similarity check.\"\"\"\n    if subtree.is_terminal:\n        return hash((0, subtree.value))\n    child_hashes = tuple(_semantic_hash(c) for c in subtree.children)\n    return hash((1, subtree.value, child_hashes))\n\n\ndef crossover(parent1, parent2, n_features, max_depth=17):\n    \"\"\"Semantic-aware subtree crossover with similarity avoidance.\n    \n    Avoids swapping structurally identical subtrees and tries multiple\n    crossover points to find semantically different swaps.\n    \"\"\"\n    offspring = parent1.copy()\n    donor = parent2.copy()\n    \n    off_size = offspring.size()\n    don_size = donor.size()\n    if off_size <= 1 or don_size <= 1:\n        return offspring\n    \n    max_attempts = 5\n    for attempt in range(max_attempts):\n        off_point = random.randint(1, off_size - 1)\n        don_point = random.randint(0, don_size - 1)\n        \n        donor_nodes = donor.get_all_nodes()\n        donor_sub = donor_nodes[don_point][0]\n        off_nodes = offspring.get_all_nodes()\n        off_node, parent_ref, child_idx = off_nodes[off_point]\n        \n        # Structural hash check\n        off_hash = _semantic_hash(off_node)\n        don_hash = _semantic_hash(donor_sub)\n        if off_hash == don_hash:\n            if attempt < max_attempts - 1:\n                continue\n        \n        # Perform the swap\n        donor_sub_copy = donor_sub.copy()\n        if parent_ref is not None:\n            parent_ref.children[child_idx] = donor_sub_copy\n        else:\n            offspring = donor_sub_copy\n        \n        # Depth check\n        if offspring.depth() > max_depth:\n            if attempt < max_attempts - 1:\n                offspring = parent1.copy()\n                continue\n            return parent1.copy()\n        \n        return offspring\n    \n    return parent1.copy()\n\n\ndef mutation(parent, n_features, max_depth=17):\n    \"\"\"Subtree mutation with adaptive depth based on parent depth.\"\"\"\n    offspring = parent.copy()\n    tree_size = offspring.size()\n    if tree_size <= 1:\n        return generate_tree('grow', 3, n_features)\n    \n    # Adaptive mutation depth\n    parent_depth = offspring.depth()\n    mut_depth = max(2, min(4, 6 - parent_depth // 3))\n    \n    mut_point = random.randint(1, tree_size - 1)\n    new_subtree = generate_tree('grow', mut_depth, n_features)\n    \n    nodes = offspring.get_all_nodes()\n    node, par, child_idx = nodes[mut_point]\n    if par is not None:\n        par.children[child_idx] = new_subtree\n    else:\n        offspring = new_subtree\n    \n    if offspring.depth() > max_depth:\n        return parent.copy()\n    \n    return offspring\n\n\ndef evolve_one_generation(population, fitnesses, X_train, y_train,\n                          n_features, pop_size,\n                          crossover_rate=0.9, mutation_rate=0.05,\n                          max_depth=17):\n    \"\"\"Generation with hybrid fitness-sharing + adaptive lexicase selection\n    and semantic-aware genetic operators.\n    \n    Features:\n    1. Adaptive parsimony pressure for bloat control\n    2. Fitness sharing niche preservation\n    3. Hybrid selection (fitness sharing + lexicase)\n    4. Semantic-aware crossover\n    5. Adaptive diversity-based mutation boost\n    95% crossover, variable mutation, rest is copy.\n    \"\"\"\n    new_population = []\n    \n    # --- Adaptive parsimony coefficient ---\n    lengths = np.array([tree.size() for tree in population], dtype=float)\n    raw_fit = np.array(fitnesses, dtype=float)\n    len_var = float(np.var(lengths))\n    if len_var > 1e-15:\n        parsimony_coeff = float(np.cov(lengths, raw_fit)[1, 0]) / len_var\n        parsimony_coeff = max(parsimony_coeff, 0.0)\n        parsimony_coeff = min(parsimony_coeff, 0.001)\n    else:\n        parsimony_coeff = 0.0\n    \n    # --- Compute niche fitnesses ---\n    niche_fitnesses = _compute_niche_fitness(fitnesses, population, X_train)\n    \n    # --- Penalized fitnesses for parsimony ---\n    penalized = [f + parsimony_coeff * l for f, l in zip(niche_fitnesses, lengths)]\n    \n    # --- Pre-compute per-case errors for lexicase ---\n    errors = np.empty((len(population), y_train.size))\n    for i, tree in enumerate(population):\n        y_pred = safe_evaluate(tree, X_train)\n        errors[i] = np.abs(y_train - y_pred)\n    \n    # --- Elitism: preserve best by raw fitness ---\n    elite_idx = int(np.argmin(fitnesses))\n    new_population.append(population[elite_idx].copy())\n    \n    # --- Adaptive mutation rate based on fitness diversity ---\n    if len(fitnesses) > 1:\n        fit_std = float(np.std(fitnesses))\n        fit_mean = float(np.mean(fitnesses)) + 1e-15\n        cv = fit_std / fit_mean\n        if cv < 0.1:\n            effective_mut = min(mutation_rate * 3, 0.3)\n        elif cv < 0.3:\n            effective_mut = mutation_rate * 1.5\n        else:\n            effective_mut = mutation_rate\n    else:\n        effective_mut = mutation_rate\n    \n    # --- Generate offspring ---\n    while len(new_population) < pop_size:\n        r = random.random()\n        if r < crossover_rate:\n            parents = selection(population, fitnesses, 2, \n                              niche_fitnesses=penalized,\n                              per_case_errors=errors)\n            child = crossover(parents[0], parents[1], n_features, max_depth)\n        elif r < crossover_rate + effective_mut:\n            parents = selection(population, fitnesses, 1,\n                              niche_fitnesses=penalized,\n                              per_case_errors=errors)\n            child = mutation(parents[0], n_features, max_depth)\n        else:\n            parents = selection(population, fitnesses, 1,\n                              niche_fitnesses=penalized,\n                              per_case_errors=errors)\n            child = parents[0]\n        new_population.append(child)\n    \n    return new_population[:pop_size]",
5  "start_line": 228,
6  "end_line": 306
7}
Tool ResultStep 3
OK: Replaced lines 228..306 with 256 line(s) in gplearn/custom_sr.py. Editable range: 228–483.

[Current file: gplearn/custom_sr.py | editable: 228–483 | total: 560 lines]
   228  def fitness_function(tree, X, y):
   229      """Evaluate fitness of a candidate program. Lower is better."""
   230      y_pred = safe_evaluate(tree, X)
       ... (250 more lines) ...
   481          new_population.append(child)
   482      
   483      return new_population[:pop_size]
AssistantStep 4edit
Tool Input
1{
2  "filename": "gplearn/custom_sr.py",
3  "op": "replace",
4  "content": "            winner = random.choice(candidates) if candidates else 0\n            selected.append(population[winner].copy())",
5  "start_line": 287,
6  "end_line": 288
7}
Tool ResultStep 4
OK: Replaced lines 287..288 with 2 line(s) in gplearn/custom_sr.py. Editable range: 228–483.

[Current file: gplearn/custom_sr.py | editable: 228–483 | total: 560 lines]
   228  def fitness_function(tree, X, y):
   229      """Evaluate fitness of a candidate program. Lower is better."""
   230      y_pred = safe_evaluate(tree, X)
       ... (250 more lines) ...
   481          new_population.append(child)
   482      
   483      return new_population[:pop_size]
AssistantStep 5edit
Tool Input
1{
2  "filename": "gplearn/custom_sr.py",
3  "op": "replace",
4  "content": "\"\"\"\n        1. Adaptive parsimony pressure for bloat control\n        2. Fitness sharing niche preservation\n        3. Hybrid selection (fitness sharing + lexicase)\n        4. Semantic-aware crossover\n        5. Adaptive diversity-based mutation boost\n        6. Random exploration injection\n        \"\"\"",
5  "start_line": 372,
6  "end_line": 372
7}
Tool ResultStep 5
OK: Replaced lines 372..372 with 8 line(s) in gplearn/custom_sr.py. Editable range: 228–490.

[Current file: gplearn/custom_sr.py | editable: 228–490 | total: 567 lines]
   228  def fitness_function(tree, X, y):
   229      """Evaluate fitness of a candidate program. Lower is better."""
   230      y_pred = safe_evaluate(tree, X)
       ... (257 more lines) ...
   488          new_population.append(child)
   489      
   490      return new_population[:pop_size]
AssistantStep 6edit
Tool Input
1{
2  "filename": "gplearn/custom_sr.py",
3  "op": "replace",
4  "content": "\"\"\"\n        1. Adaptive parsimony pressure for bloat control\n        2. Fitness sharing niche preservation\n        3. Hybrid selection (fitness sharing + lexicase)\n        4. Semantic-aware crossover\n        5. Adaptive diversity-based mutation boost\n        6. Random exploration injection\n    \"\"\"",
5  "start_line": 372,
6  "end_line": 372
7}
Tool ResultStep 6
OK: Replaced lines 372..372 with 8 line(s) in gplearn/custom_sr.py. Editable range: 228–497.

[Current file: gplearn/custom_sr.py | editable: 228–497 | total: 574 lines]
   228  def fitness_function(tree, X, y):
   229      """Evaluate fitness of a candidate program. Lower is better."""
   230      y_pred = safe_evaluate(tree, X)
       ... (264 more lines) ...
   495          new_population.append(child)
   496      
   497      return new_population[:pop_size]
AssistantStep 7test
Tool ResultStep 7
[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final)


## Seed 42

### nguyen7 (scripts/nguyen7.sh)
[COMMAND FAILED exit=1]
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-symbolic-regression/qwen_qwen3.6-plus_20260414_110948/gplearn/custom_sr.py", line 287
    winner = random.choice(candidates) if candidates else 0
IndentationError: unexpected indent
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-symbolic-regression/qwen_qwen3.6-plus_20260414_110948/.local_scripts/scripts/nguyen7.sh` failed. (See above for error)


### nguyen10 (scripts/nguyen10.sh)
[COMMAND FAILED exit=1]
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-symbolic-regression/qwen_qwen3.6-plus_20260414_110948/gplearn/custom_sr.py", line 287
    winner = random.choice(candidates) if candidates else 0
IndentationError: unexpected indent
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanly
AssistantStep 8edit
Tool Input
1{
2  "content": "            for case in cases:\n                if len(candidates) <= 1:\n                    break\n                case_errors = per_case_errors[candidates, case]\n                median_err = float(np.median(case_errors))\n                # Adaptive epsilon based on error spread\n                easy_threshold = float(np.percentile(case_errors, 25))\n                hard_threshold = float(np.percentile(case_errors, 75))\n                epsilon = easy_threshold + 0.5 * (hard_threshold - easy_threshold)\n                epsilon = max(epsilon, 1e-10)\n                \n                candidates = [c for c, e in zip(candidates, case_errors) \n                              if e <= median_err + epsilon]\n                if len(candidates) == 0:\n                    candidates = list(range(pop_size))\n            \n            winner = random.choice(candidates) if candidates else random.randint(0, pop_size - 1)\n            selected.append(population[winner].copy())",
3  "end_line": 288,
4  "filename": "gplearn/custom_sr.py",
5  "start_line": 270,
6  "op": "replace"
7}
Tool ResultStep 8
OK: Replaced lines 270..288 with 18 line(s) in gplearn/custom_sr.py. Editable range: 228–496.

[Current file: gplearn/custom_sr.py | editable: 228–496 | total: 573 lines]
   228  def fitness_function(tree, X, y):
   229      """Evaluate fitness of a candidate program. Lower is better."""
   230      y_pred = safe_evaluate(tree, X)
       ... (263 more lines) ...
   494          new_population.append(child)
   495      
   496      return new_population[:pop_size]
AssistantStep 9edit
Tool Input
1{
2  "filename": "gplearn/custom_sr.py",
3  "op": "replace",
4  "content": "        9. Adaptive parsimony pressure for bloat control\n        2. Fitness sharing niche preservation\n        3. Hybrid selection (fitness sharing + lexicase)\n        4. Semantic-aware crossover\n        5. Adaptive diversity-based mutation boost\n    \"\"\"\n    new_population = []\n    \n    # --- Adaptive parsimony coefficient ---\n    lengths = np.array([tree.size() for tree in population], dtype=float)\n    raw_fit = np.array(fitnesses, dtype=float)\n    len_var = float(np.var(lengths))\n    if len_var > 1e-15:\n        parsimony_coeff = float(np.cov(lengths, raw_fit)[1, 0]) / len_var\n        parsimony_coeff = max(parsimony_coeff, 0.0)\n        parsimony_coeff = min(parsimony_coeff, 0.001)\n    else:\n        parsimony_coeff = 0.0\n    \n    # --- Compute niche fitnesses ---\n    niche_fitnesses = _compute_niche_fitness(fitnesses, population, X_train)\n    \n    # --- Penalized fitnesses for parsimony ---\n    penalized = [f + parsimony_coeff * l for f, l in zip(niche_fitnesses, lengths)]\n    \n    # --- Pre-compute per-case errors for lexicase ---\n    pre_case_errors = np.empty((len(population), y_train.size))\n    for i, tree in enumerate(population):\n        y_pred = safe_evaluate(tree, X_train)\n        pre_case_errors[i] = np.abs(y_train - y_pred)\n    \n    # --- Elitism: preserve best by raw fitness ---\n    elite_idx = int(np.argmin(fitnesses))\n    new_population.append(population[elite_idx].copy())\n    \n    # --- Adaptive mutation rate based on fitness diversity ---\n    if len(fitnesses) > 1:\n        fit_std = float(np.std(fitnesses))\n        fit_mean = float(np.mean(fitnesses)) + 1e-15\n        cv = fit_std / fit_mean\n        if cv < 0.1:\n            effective_mut = min(mutation_rate * 3, 0.3)\n        elif cv < 0.3:\n            effective_mut = mutation_rate * 1.5\n        else:\n            effective_mut = mutation_rate\n    else:\n        effective_mut = mutation_rate\n    \n    # --- Generate offspring ---\n    while len(new_population) < pop_size:\n        r = random.random()\n        if r < crossover_rate:\n            parents = selection(population, fitnesses, 2, \n                              niche_fitnesses=penalized,\n                              per_case_errors=pre_case_errors)\n            child = crossover(parents[0], parents[1], n_features, max_depth)\n        elif r < crossover_rate + effective_mut:\n            parents = selection(population, fitnesses, 1,\n                              niche_fitnesses=penalized,\n                              per_case_errors=pre_case_errors)\n            child = mutation(parents[0], n_features, max_depth)\n        else:\n            parents = selection(population, fitnesses, 1,\n                              niche_fitnesses=penalized,\n                              per_case_errors=pre_case_errors)\n            child = parents[0]\n        new_population.append(child)\n    \n    return new_population[:pop_size]",
5  "end_line": 496,
6  "start_line": 358
7}
Tool ResultStep 9
OK: Replaced lines 358..496 with 70 line(s) in gplearn/custom_sr.py. Editable range: 228–427.

[Current file: gplearn/custom_sr.py | editable: 228–427 | total: 504 lines]
   228  def fitness_function(tree, X, y):
   229      """Evaluate fitness of a candidate program. Lower is better."""
   230      y_pred = safe_evaluate(tree, X)
       ... (194 more lines) ...
   425          new_population.append(child)
   426      
   427      return new_population[:pop_size]