Agent Conversation: google/gemini-3.1-pro-preview
Task: ml-symbolic-regression
Symbolic Regression: GP Search Strategy
Objective
Design and implement a better genetic programming search strategy for symbolic regression. Your code goes in custom_sr.py. Three reference implementations (Standard GP, Parsimony GP, Lexicase GP) demonstrate different approaches.
Background
Symbolic regression discovers mathematical expressions that fit data. Genetic programming evolves a population of expression trees through selection, crossover, and mutation. Key challenges include balancing exploration vs exploitation, controlling expression complexity (bloat), and escaping local optima. Different approaches address these through fitness shaping, novel selection mechanisms, or improved genetic operators.
Evaluation
Tested on three standard symbolic regression benchmarks: Nguyen-7 (univariate transcendental), Nguyen-10 (bivariate trigonometric), Koza-3 (univariate polynomial). Metric: R² on held-out test set (higher is better).
gplearn/custom_sr.py [EDITABLE — lines 228–306 only]
1: #!/usr/bin/env python3
2: """Symbolic Regression via Genetic Programming.
3:
4: A self-contained GP framework for symbolic regression benchmarks.
5: The editable section contains the search strategy: fitness function,
6: selection, crossover, mutation, and per-generation evolution logic.
7: """
8:
9: import argparse
10: import math
11: import random
12: import sys
13: import os
14: import numpy as np
15:
16:
17: # ============================================================
18: # Operator Definitions (FIXED)
19: # ============================================================
20:
21: def protected_div(a, b):
22: """Protected division: returns 1.0 when divisor is near zero."""
23: return np.where(np.abs(b) > 1e-10, a / b, 1.0)
24:
25:
26: def protected_log(a):
27: """Protected log: returns 0.0 for non-positive inputs."""
28: return np.where(np.abs(a) > 1e-10, np.log(np.abs(a)), 0.0)
29:
30:
31: def protected_exp(a):
32: """Protected exp: clips input to prevent overflow."""
33: return np.exp(np.clip(a, -10, 10))
34:
35:
36: OPERATORS = {
37: 'add': (np.add, 2),
38: 'sub': (np.subtract, 2),
39: 'mul': (np.multiply, 2),
40: 'div': (protected_div, 2),
41: 'sin': (np.sin, 1),
42: 'cos': (np.cos, 1),
43: 'log': (protected_log, 1),
44: 'exp': (protected_exp, 1),
45: }
46:
47: OPERATOR_NAMES = list(OPERATORS.keys())
48:
49:
50: # ============================================================
51: # Tree Representation (FIXED)
52: # ============================================================
53:
54: class Node:
55: """A node in the GP expression tree."""
56: __slots__ = ('value', 'children')
57:
58: Tool Input
1{
2 "op": "replace",
3 "filename": "gplearn/custom_sr.py",
4 "content": "def fitness_function(tree, X, y):\n \"\"\"Evaluate fitness of a candidate program. Lower is better.\"\"\"\n y_pred = safe_evaluate(tree, X)\n return float(np.mean((y - y_pred) ** 2))\n\n\ndef selection(population, fitnesses, n_select, tournament_size=7):\n \"\"\"Tournament selection on penalized fitnesses.\"\"\"\n selected = []\n pop_size = len(population)\n for _ in range(n_select):\n candidates = random.sample(range(pop_size), min(tournament_size, pop_size))\n best = min(candidates, key=lambda i: fitnesses[i])\n selected.append(population[best].copy())\n return selected\n\n\ndef crossover(parent1, parent2, n_features, max_depth=17):\n \"\"\"Standard subtree crossover.\"\"\"\n offspring = parent1.copy()\n donor = parent2.copy()\n off_size = offspring.size()\n don_size = donor.size()\n if off_size <= 1 or don_size <= 1:\n return offspring\n\n off_point = random.randint(1, off_size - 1)\n don_point = random.randint(0, don_size - 1)\n\n donor_nodes = donor.get_all_nodes()\n donor_subtree = donor_nodes[don_point][0].copy()\n\n off_nodes = offspring.get_all_nodes()\n node, parent, child_idx = off_nodes[off_point]\n if parent is not None:\n parent.children[child_idx] = donor_subtree\n else:\n offspring = donor_subtree\n\n if offspring.depth() > max_depth:\n return parent1.copy()\n return offspring\n\n\ndef mutate_subtree(parent, n_features, max_depth):\n offspring = parent.copy()\n tree_size = offspring.size()\n if tree_size <= 1:\n return generate_tree('grow', 3, n_features)\n mut_point = random.randint(1, tree_size - 1)\n new_subtree = generate_tree('grow', 3, n_features)\n nodes = offspring.get_all_nodes()\n node, par, child_idx = nodes[mut_point]\n if par is not None:\n par.children[child_idx] = new_subtree\n else:\n offspring = new_subtree\n if offspring.depth() > max_depth:\n return parent.copy()\n return offspring\n\n\ndef mutate_point(parent, n_features, max_depth):\n offspring = parent.copy()\n nodes = offspring.get_all_nodes()\n for node, _, _ in nodes:\n if random.random() < 0.1:\n if node.is_terminal:\n if random.random() < 0.5:\n node.value = f'x{random.randint(0, n_features - 1)}'\n else:\n node.value = str(round(random.uniform(-5.0, 5.0), 2))\n else:\n arity = len(node.children)\n valid_ops = [op for op, n_args in OPERATORS.items() if n_args == arity]\n if valid_ops:\n node.value = random.choice(valid_ops)\n return offspring\n\n\ndef mutate_hoist(parent, n_features, max_depth):\n offspring = parent.copy()\n nodes = offspring.get_all_nodes()\n if len(nodes) <= 1:\n return offspring\n mut_point = random.randint(1, len(nodes) - 1)\n node, par, child_idx = nodes[mut_point]\n \n sub_nodes = node.get_all_nodes()\n if len(sub_nodes) <= 1:\n return offspring\n sub_mut_point = random.randint(1, len(sub_nodes) - 1)\n new_subtree = sub_nodes[sub_mut_point][0].copy()\n \n if par is not None:\n par.children[child_idx] = new_subtree\n else:\n offspring = new_subtree\n return offspring\n\n\ndef mutate_collapse(parent, n_features, max_depth):\n offspring = parent.copy()\n nodes = offspring.get_all_nodes()\n if len(nodes) <= 1:\n return offspring\n mut_point = random.randint(1, len(nodes) - 1)\n node, par, child_idx = nodes[mut_point]\n \n if par is not None:\n par.children[child_idx] = random_terminal(n_features)\n else:\n offspring = random_terminal(n_features)\n return offspring\n\n\ndef mutate_constants(parent, n_features, max_depth):\n offspring = parent.copy()\n nodes = offspring.get_all_nodes()\n for node, _, _ in nodes:\n if node.is_terminal and isinstance(node.value, str) and not node.value.startswith('x'):\n try:\n val = float(node.value)\n val += random.gauss(0, 0.5)\n node.value = str(round(val, 4))\n except ValueError:\n pass\n return offspring\n\n\ndef mutation(parent, n_features, max_depth=17):\n \"\"\"Randomly apply one of several mutation strategies.\"\"\"\n r = random.random()\n if r < 0.3:\n return mutate_subtree(parent, n_features, max_depth)\n elif r < 0.5:\n return mutate_point(parent, n_features, max_depth)\n elif r < 0.7:\n return mutate_hoist(parent, n_features, max_depth)\n elif r < 0.8:\n return mutate_collapse(parent, n_features, max_depth)\n else:\n return mutate_constants(parent, n_features, max_depth)\n\n\ndef evolve_one_generation(population, fitnesses, X_train, y_train,\n n_features, pop_size,\n crossover_rate=0.9, mutation_rate=0.05,\n max_depth=17):\n \"\"\"Evolution step with adaptive parsimony, multiple mutators, and random immigrants.\"\"\"\n new_population = []\n \n lengths = np.array([tree.size() for tree in population], dtype=float)\n raw_fit = np.array(fitnesses, dtype=float)\n len_var = float(np.var(lengths))\n if len_var > 1e-15:\n parsimony_coeff = float(np.cov(lengths, raw_fit)[1, 0]) / len_var\n parsimony_coeff = max(parsimony_coeff, 0.0)\n parsimony_coeff = min(parsimony_coeff, 0.005)\n else:\n parsimony_coeff = 0.0\n\n penalized = [f + parsimony_coeff * l for f, l in zip(fitnesses, lengths)]\n\n # Elitism\n sorted_indices = np.argsort(fitnesses)\n elite_count = max(1, pop_size // 100)\n for i in range(elite_count):\n new_population.append(population[sorted_indices[i]].copy())\n \n # Immigrants\n n_random = max(1, pop_size // 50)\n for _ in range(n_random):\n depth = random.randint(2, max_depth - 2) if max_depth > 4 else 3\n new_population.append(generate_tree('grow', depth, n_features))\n \n crossover_p = 0.6\n mutation_p = 0.4\n \n while len(new_population) < pop_size:\n r = random.random()\n if r < crossover_p:\n parents = selection(population, penalized, 2)\n child = crossover(parents[0], parents[1], n_features, max_depth)\n elif r < crossover_p + mutation_p:\n parents = selection(population, penalized, 1)\n child = mutation(parents[0], n_features, max_depth)\n else:\n parents = selection(population, penalized, 1)\n child = parents[0]\n new_population.append(child)\n\n return new_population[:pop_size]",
5 "start_line": 228,
6 "end_line": 306
7}OK: Replaced lines 228..306 with 193 line(s) in gplearn/custom_sr.py. Editable range: 228–420.
[Current file: gplearn/custom_sr.py | editable: 228–420 | total: 497 lines]
228 def fitness_function(tree, X, y):
229 """Evaluate fitness of a candidate program. Lower is better."""
230 y_pred = safe_evaluate(tree, X)
... (187 more lines) ...
418 new_population.append(child)
419
420 return new_population[:pop_size][Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final) ## Seed 42 ### nguyen7 (scripts/nguyen7.sh) Training metrics (last generations): TRAIN_METRICS generation=45 best_fitness=0.001051 avg_fitness=2841.410662 best_size=21 train_r2=0.998555 TRAIN_METRICS generation=46 best_fitness=0.001051 avg_fitness=4491.783897 best_size=21 train_r2=0.998555 TRAIN_METRICS generation=47 best_fitness=0.001051 avg_fitness=18.078825 best_size=21 train_r2=0.998555 TRAIN_METRICS generation=48 best_fitness=0.001051 avg_fitness=94.755806 best_size=21 train_r2=0.998555 TRAIN_METRICS generation=49 best_fitness=0.001051 avg_fitness=184298.511555 best_size=21 train_r2=0.998555 Test results (nguyen7): TEST_METRICS r2=0.997338 rmse=0.059497 train_r2=0.998555 size=21 expression="(x0 sub (sin((cos(exp(-2.6215)) mul x0)) mul ((x0 div (sin(sin(3.75)) sub (log(4.4) div x0))) div x0)))" R² = 0.997338 RMSE = 0.059497 Expression: (x0 sub (sin((cos(exp(-2.6215)) mul x0)) m
Tool Input
1{
2 "content": "def fitness_function(tree, X, y):\n \"\"\"Evaluate fitness of a candidate program. Lower is better.\"\"\"\n y_pred = safe_evaluate(tree, X)\n return float(np.mean((y - y_pred) ** 2))\n\ndef selection(population, fitnesses, n_select, tournament_size=7):\n \"\"\"Tournament selection on penalized fitnesses.\"\"\"\n selected = []\n pop_size = len(population)\n for _ in range(n_select):\n candidates = random.sample(range(pop_size), min(tournament_size, pop_size))\n best = min(candidates, key=lambda i: fitnesses[i])\n selected.append(population[best].copy())\n return selected\n\ndef crossover(parent1, parent2, n_features, max_depth=17):\n \"\"\"Standard subtree crossover.\"\"\"\n offspring = parent1.copy()\n donor = parent2.copy()\n off_size = offspring.size()\n don_size = donor.size()\n if off_size <= 1 or don_size <= 1:\n return offspring\n\n off_point = random.randint(1, off_size - 1)\n don_point = random.randint(0, don_size - 1)\n\n donor_nodes = donor.get_all_nodes()\n donor_subtree = donor_nodes[don_point][0].copy()\n\n off_nodes = offspring.get_all_nodes()\n node, parent, child_idx = off_nodes[off_point]\n if parent is not None:\n parent.children[child_idx] = donor_subtree\n else:\n offspring = donor_subtree\n\n if offspring.depth() > max_depth:\n return parent1.copy()\n return offspring\n\ndef mutate_subtree(parent, n_features, max_depth):\n offspring = parent.copy()\n tree_size = offspring.size()\n if tree_size <= 1:\n return generate_tree('grow', 3, n_features)\n\n mut_point = random.randint(1, tree_size - 1)\n new_subtree = generate_tree('grow', 3, n_features)\n\n nodes = offspring.get_all_nodes()\n node, par, child_idx = nodes[mut_point]\n if par is not None:\n par.children[child_idx] = new_subtree\n else:\n offspring = new_subtree\n\n if offspring.depth() > max_depth:\n return parent.copy()\n return offspring\n\ndef mutate_point(parent, n_features, max_depth):\n offspring = parent.copy()\n nodes = offspring.get_all_nodes()\n for node, _, _ in nodes:\n if random.random() < 0.1:\n if node.is_terminal:\n if random.random() < 0.5:\n node.value = f'x{random.randint(0, n_features - 1)}'\n else:\n if random.random() < 0.2:\n node.value = str(float(random.randint(-5, 5)))\n else:\n node.value = str(round(random.uniform(-5.0, 5.0), 2))\n else:\n arity = len(node.children)\n valid_ops = [op for op, n_args in OPERATORS.items() if n_args == arity]\n if valid_ops:\n node.value = random.choice(valid_ops)\n return offspring\n\ndef mutate_hoist(parent, n_features, max_depth):\n offspring = parent.copy()\n nodes = offspring.get_all_nodes()\n if len(nodes) <= 1:\n return offspring\n mut_point = random.randint(1, len(nodes) - 1)\n node, par, child_idx = nodes[mut_point]\n \n sub_nodes = node.get_all_nodes()\n if len(sub_nodes) <= 1:\n return offspring\n sub_mut_point = random.randint(1, len(sub_nodes) - 1)\n new_subtree = sub_nodes[sub_mut_point][0].copy()\n \n if par is not None:\n par.children[child_idx] = new_subtree\n else:\n offspring = new_subtree\n return offspring\n\ndef mutation(parent, n_features, max_depth=17):\n r = random.random()\n if r < 0.4:\n return mutate_subtree(parent, n_features, max_depth)\n elif r < 0.8:\n return mutate_point(parent, n_features, max_depth)\n else:\n return mutate_hoist(parent, n_features, max_depth)\n\ndef evolve_one_generation(population, fitnesses, X_train, y_train,\n n_features, pop_size,\n crossover_rate=0.9, mutation_rate=0.05,\n max_depth=17):\n new_population = []\n\n lengths = np.array([tree.size() for tree in population], dtype=float)\n raw_fit = np.array(fitnesses, dtype=float)\n len_var = float(np.var(lengths))\n if len_var > 1e-15:\n parsimony_coeff = float(np.cov(lengths, raw_fit)[1, 0]) / len_var\n parsimony_coeff = max(parsimony_coeff, 0.0)\n parsimony_coeff = min(parsimony_coeff, 0.001)\n else:\n parsimony_coeff = 0.0\n\n penalized = [f + parsimony_coeff * l for f, l in zip(fitnesses, lengths)]\n\n # Elitism\n sorted_indices = np.argsort(fitnesses)\n elite_count = 3\n for i in range(elite_count):\n new_population.append(population[sorted_indices[i]].copy())\n\n # Random Immigrants\n new_population.append(generate_tree('grow', max(2, max_depth - 1), n_features))\n new_population.append(generate_tree('full', max(2, max_depth - 1), n_features))\n\n crossover_p = 0.80\n mutation_p = 0.15\n\n while len(new_population) < pop_size:\n r = random.random()\n if r < crossover_p:\n parents = selection(population, penalized, 2)\n child = crossover(parents[0], parents[1], n_features, max_depth)\n elif r < crossover_p + mutation_p:\n parents = selection(population, penalized, 1)\n child = mutation(parents[0], n_features, max_depth)\n else:\n parents = selection(population, penalized, 1)\n child = parents[0]\n new_population.append(child)\n\n return new_population[:pop_size]",
3 "end_line": 420,
4 "op": "replace",
5 "filename": "gplearn/custom_sr.py",
6 "start_line": 228
7}OK: Replaced lines 228..420 with 155 line(s) in gplearn/custom_sr.py. Editable range: 228–382.
[Current file: gplearn/custom_sr.py | editable: 228–382 | total: 459 lines]
228 def fitness_function(tree, X, y):
229 """Evaluate fitness of a candidate program. Lower is better."""
230 y_pred = safe_evaluate(tree, X)
... (149 more lines) ...
380 new_population.append(child)
381
382 return new_population[:pop_size][Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final) ### nguyen7 (scripts/nguyen7.sh) Training metrics (last generations): TRAIN_METRICS generation=45 best_fitness=0.000214 avg_fitness=125922.461933 best_size=16 train_r2=0.999705 TRAIN_METRICS generation=46 best_fitness=0.000214 avg_fitness=2723.619864 best_size=16 train_r2=0.999705 TRAIN_METRICS generation=47 best_fitness=0.000214 avg_fitness=65.147968 best_size=16 train_r2=0.999705 TRAIN_METRICS generation=48 best_fitness=0.000214 avg_fitness=8.469239 best_size=16 train_r2=0.999705 TRAIN_METRICS generation=49 best_fitness=0.000214 avg_fitness=4794.705884 best_size=16 train_r2=0.999705 Test results (nguyen7): TEST_METRICS r2=0.999392 rmse=0.028443 train_r2=0.999705 size=16 expression="(x0 sub (-1.61 mul (exp(-1.9) mul (x0 add (sin((x0 sub 0.2)) mul sin(x0))))))" R² = 0.999392 RMSE = 0.028443 Expression: (x0 sub (-1.61 mul (exp(-1.9) mul (x0 add (sin((x0 sub 0.2)) mul sin(x0)))))) ###
Tool Input
1{
2 "start_line": 228,
3 "op": "replace",
4 "content": "def fitness_function(tree, X, y):\n \"\"\"Evaluate fitness of a candidate program. Lower is better.\"\"\"\n y_pred = safe_evaluate(tree, X)\n return float(np.mean((y - y_pred) ** 2))\n\ndef selection(population, penalized_scores, n_select, tournament_size=7):\n \"\"\"Tournament selection on penalized scores.\"\"\"\n selected = []\n pop_size = len(population)\n for _ in range(n_select):\n candidates = random.sample(range(pop_size), min(tournament_size, pop_size))\n best = min(candidates, key=lambda i: penalized_scores[i])\n selected.append(population[best].copy())\n return selected\n\ndef crossover(parent1, parent2, n_features, max_depth=17):\n \"\"\"Standard subtree crossover.\"\"\"\n offspring = parent1.copy()\n donor = parent2.copy()\n off_size = offspring.size()\n don_size = donor.size()\n if off_size <= 1 or don_size <= 1:\n return offspring\n\n off_point = random.randint(1, off_size - 1)\n don_point = random.randint(0, don_size - 1)\n\n donor_nodes = donor.get_all_nodes()\n donor_subtree = donor_nodes[don_point][0].copy()\n\n off_nodes = offspring.get_all_nodes()\n node, parent, child_idx = off_nodes[off_point]\n if parent is not None:\n parent.children[child_idx] = donor_subtree\n else:\n offspring = donor_subtree\n\n if offspring.depth() > max_depth:\n return parent1.copy()\n return offspring\n\ndef mutation(parent, n_features, max_depth=17):\n \"\"\"Standard subtree mutation.\"\"\"\n offspring = parent.copy()\n tree_size = offspring.size()\n if tree_size <= 1:\n return generate_tree('grow', 3, n_features)\n\n mut_point = random.randint(1, tree_size - 1)\n new_subtree = generate_tree('grow', 3, n_features)\n\n nodes = offspring.get_all_nodes()\n node, par, child_idx = nodes[mut_point]\n if par is not None:\n par.children[child_idx] = new_subtree\n else:\n offspring = new_subtree\n\n if offspring.depth() > max_depth:\n return parent.copy()\n return offspring\n\ndef evolve_one_generation(population, fitnesses, X_train, y_train,\n n_features, pop_size,\n crossover_rate=0.9, mutation_rate=0.05,\n max_depth=17):\n \"\"\"Rank-based Parsimony Pressure GP generation.\n \n Replaces scale-variant MSE covariance parsimony with an elegant \n Rank-based penalty. This resolves issues with massive MSE outliers\n destroying the covariance metric, naturally balancing performance\n improvements against size increases.\n \"\"\"\n new_population = []\n \n ranks = np.empty(pop_size, dtype=float)\n sorted_idx = np.argsort(fitnesses)\n for rank, idx in enumerate(sorted_idx):\n ranks[idx] = float(rank)\n \n lengths = np.array([tree.size() for tree in population], dtype=float)\n \n # 1.0 means an excess node requires beating 1 extra individual\n # in the population to be considered of equal value.\n penalty_coef = 1.0\n penalized_scores = ranks + penalty_coef * lengths\n\n # Elitism uses raw fitness to preserve the absolute best fit\n elite_idx = int(np.argmin(fitnesses))\n new_population.append(population[elite_idx].copy())\n\n while len(new_population) < pop_size:\n r = random.random()\n if r < crossover_rate:\n parents = selection(population, penalized_scores, 2)\n child = crossover(parents[0], parents[1], n_features, max_depth)\n elif r < crossover_rate + mutation_rate:\n parents = selection(population, penalized_scores, 1)\n child = mutation(parents[0], n_features, max_depth)\n else:\n parents = selection(population, penalized_scores, 1)\n child = parents[0]\n new_population.append(child)\n\n return new_population[:pop_size]",
5 "end_line": 382,
6 "filename": "gplearn/custom_sr.py"
7}OK: Replaced lines 228..382 with 105 line(s) in gplearn/custom_sr.py. Editable range: 228–332.
[Current file: gplearn/custom_sr.py | editable: 228–332 | total: 409 lines]
228 def fitness_function(tree, X, y):
229 """Evaluate fitness of a candidate program. Lower is better."""
230 y_pred = safe_evaluate(tree, X)
... (99 more lines) ...
330 new_population.append(child)
331
332 return new_population[:pop_size][Test #3] (0 tests remaining; call submit(n=N) to choose which test result to submit as final) [NOTE] This was your last test. You MUST now call submit(n=X) to choose which test result to submit as your final answer. ## Seed 42 ### nguyen7 (scripts/nguyen7.sh) Training metrics (last generations): TRAIN_METRICS generation=45 best_fitness=0.004400 avg_fitness=681332.172087 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=46 best_fitness=0.004400 avg_fitness=1.082981 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=47 best_fitness=0.004400 avg_fitness=3.944345 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=48 best_fitness=0.004400 avg_fitness=348.396990 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=49 best_fitness=0.004400 avg_fitness=1.122496 best_size=6 train_r2=0.993954 Test results (nguyen7): TEST_METRICS r2=0.977326 rmse=0.173633 train_r2=0.993954 size=6 expression="(x0 sub log(cos(sin(x0))))" R² = 0.977326 RMSE = 0.173633 Expression: (x0 sub
Tool Input
1{
2 "content": "def fitness_function(tree, X, y):\n \"\"\"Evaluate fitness of a candidate program. Lower is better.\"\"\"\n y_pred = safe_evaluate(tree, X)\n return float(np.mean((y - y_pred) ** 2))\n\ndef selection(population, penalized_scores, n_select, tournament_size=7):\n \"\"\"Tournament selection on penalized scores.\"\"\"\n selected = []\n pop_size = len(population)\n for _ in range(n_select):\n candidates = random.sample(range(pop_size), min(tournament_size, pop_size))\n best = min(candidates, key=lambda i: penalized_scores[i])\n selected.append(population[best].copy())\n return selected\n\ndef crossover(parent1, parent2, n_features, max_depth=17):\n \"\"\"Standard subtree crossover.\"\"\"\n offspring = parent1.copy()\n donor = parent2.copy()\n off_size = offspring.size()\n don_size = donor.size()\n if off_size <= 1 or don_size <= 1:\n return offspring\n\n off_point = random.randint(1, off_size - 1)\n don_point = random.randint(0, don_size - 1)\n\n donor_nodes = donor.get_all_nodes()\n donor_subtree = donor_nodes[don_point][0].copy()\n\n off_nodes = offspring.get_all_nodes()\n node, parent, child_idx = off_nodes[off_point]\n if parent is not None:\n parent.children[child_idx] = donor_subtree\n else:\n offspring = donor_subtree\n\n if offspring.depth() > max_depth:\n return parent1.copy()\n return offspring\n\ndef mutation(parent, n_features, max_depth=17):\n \"\"\"Standard subtree mutation.\"\"\"\n offspring = parent.copy()\n tree_size = offspring.size()\n if tree_size <= 1:\n return generate_tree('grow', 3, n_features)\n\n mut_point = random.randint(1, tree_size - 1)\n new_subtree = generate_tree('grow', 3, n_features)\n\n nodes = offspring.get_all_nodes()\n node, par, child_idx = nodes[mut_point]\n if par is not None:\n par.children[child_idx] = new_subtree\n else:\n offspring = new_subtree\n\n if offspring.depth() > max_depth:\n return parent.copy()\n return offspring\n\ndef evolve_one_generation(population, fitnesses, X_train, y_train,\n n_features, pop_size,\n crossover_rate=0.9, mutation_rate=0.05,\n max_depth=17):\n \"\"\"Rank-based Parsimony Pressure GP generation.\n \n Replaces scale-variant MSE covariance parsimony with an elegant \n Rank-based penalty. This resolves issues with massive MSE outliers\n destroying the covariance metric, naturally balancing performance\n improvements against size increases.\n \"\"\"\n new_population = []\n \n ranks = np.empty(pop_size, dtype=float)\n sorted_idx = np.argsort(fitnesses)\n for rank, idx in enumerate(sorted_idx):\n ranks[idx] = float(rank)\n \n lengths = np.array([tree.size() for tree in population], dtype=float)\n \n # 0.5 means a 2-node increase requires beating 1 extra individual\n # in the population's fitness rank to be considered equal.\n penalty_coef = 0.5\n penalized_scores = ranks + penalty_coef * lengths\n\n # Elitism uses raw fitness to preserve the absolute best fit\n elite_count = 2\n for i in range(min(elite_count, pop_size)):\n new_population.append(population[sorted_idx[i]].copy())\n\n while len(new_population) < pop_size:\n r = random.random()\n if r < crossover_rate:\n parents = selection(population, penalized_scores, 2)\n child = crossover(parents[0], parents[1], n_features, max_depth)\n elif r < crossover_rate + mutation_rate:\n parents = selection(population, penalized_scores, 1)\n child = mutation(parents[0], n_features, max_depth)\n else:\n parents = selection(population, penalized_scores, 1)\n child = parents[0]\n new_population.append(child)\n\n return new_population[:pop_size]",
3 "filename": "gplearn/custom_sr.py",
4 "start_line": 228,
5 "op": "replace",
6 "end_line": 332
7}OK: Replaced lines 228..332 with 106 line(s) in gplearn/custom_sr.py. Editable range: 228–333.
[Current file: gplearn/custom_sr.py | editable: 228–333 | total: 410 lines]
228 def fitness_function(tree, X, y):
229 """Evaluate fitness of a candidate program. Lower is better."""
230 y_pred = safe_evaluate(tree, X)
... (100 more lines) ...
331 new_population.append(child)
332
333 return new_population[:pop_size]ERROR: Test budget exhausted (3/3). You MUST call submit(n=N) to choose which test result to submit as final.
Tool Input
1{
2 "n": 3
3}[submit] Submitting result from test #3 as final. ## Seed 42 ### nguyen7 (scripts/nguyen7.sh) Training metrics (last generations): TRAIN_METRICS generation=45 best_fitness=0.004400 avg_fitness=681332.172087 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=46 best_fitness=0.004400 avg_fitness=1.082981 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=47 best_fitness=0.004400 avg_fitness=3.944345 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=48 best_fitness=0.004400 avg_fitness=348.396990 best_size=6 train_r2=0.993954 TRAIN_METRICS generation=49 best_fitness=0.004400 avg_fitness=1.122496 best_size=6 train_r2=0.993954 Test results (nguyen7): TEST_METRICS r2=0.977326 rmse=0.173633 train_r2=0.993954 size=6 expression="(x0 sub log(cos(sin(x0))))" R² = 0.977326 RMSE = 0.173633 Expression: (x0 sub log(cos(sin(x0)))) ### nguyen10 (scripts/nguyen10.sh) Training metrics (last generations): TRAIN_METRICS generation=45 best_fitness=0.000000 avg_fitness=17.754313 best