Agent Conversation: anthropic/claude-opus-4.6
Task: optimization-pac-bayes-bound
Task: PAC-Bayes Generalization Bound Optimization
Research Question
Design a tighter PAC-Bayes generalization bound by optimizing the bound formulation, prior/posterior parameterization, and KL divergence estimation for stochastic neural networks.
Background
PAC-Bayes theory provides non-vacuous generalization bounds for stochastic classifiers. Given a prior distribution P over hypotheses (chosen before seeing data) and a posterior Q (learned from data), PAC-Bayes bounds certify that with high probability (1-delta), the true risk of a stochastic classifier sampled from Q is bounded.
The key components of a PAC-Bayes bound are:
- Empirical risk: estimated loss of the stochastic predictor on training data
- KL divergence: KL(Q||P) measuring complexity of the posterior relative to the prior
- Bound formula: how these terms combine to yield the final certificate
Standard bounds include:
- McAllester/Maurer: risk + sqrt(KL_term / (2n)) -- simple but loose
- Catoni/Lambda: risk/(1-lam/2) + KL_term/(nlam(1-lam/2)) -- tighter with tuned lambda
- Quadratic: (sqrt(risk + KL_term) + sqrt(KL_term))^2 -- better at low risk
The bound can be further tightened through:
- Optimizing the bound functional form (beyond classical inequalities)
- Better training objectives that minimize the bound directly
- Improved risk certificate evaluation (e.g., PAC-Bayes-kl inversion)
- Data-dependent prior construction
- Tighter KL estimation or alternative divergence measures
What to Implement
Implement the BoundOptimizer class in custom_pac_bayes.py. You must implement:
compute_bound(empirical_risk, kl, n, delta): The PAC-Bayes bound formulatrain_step(model, data, target, device, n_bound, delta): Training objectivecompute_risk_certificate(model, bound_loader, device, delta, mc_samples): Final certificate evaluation
Interface
model(x, sample=True/False): stochastic forward pass (sample=True) or posterior mean (sample=False)get_total_kl(model): sum of KL divergence across all probabilistic layersinv_kl(q, c): binary KL inversion -- find p such that KL(Ber(q)||Ber(p)) = ccompute_01_risk(model, loader, device, mc_samples): MC estimate of 0-1 risk- Available losses: F.nll_loss, F.cross_entropy on log_softmax outputs
Evaluation
The bound optimizer is tested on three settings:
- MNIST-FCN: 4-layer fully connected network (784-600-600-600-10) on MNIST
- MNIST-CNN: 4-layer CNN (2 conv + 2 fc) on MNIST
- FashionMNIST-CNN: Same CNN architecture on FashionMNIST
Primary metric: risk_certificate (0-1 loss PAC-Bayes bound) -- lower is better (tighter bound).
Training uses data-dependent priors: 50% of training data trains a deterministic prior, 50% evaluates the bound.
Hints
- The PAC-Bayes-kl bound (Langford-Seeger) is often tighter than closed-form alternatives for certificate evaluation
- The training objective need not be the sa
Tool Input
1{
2 "op": "replace",
3 "filename": "PBB/custom_pac_bayes.py",
4 "start_line": 460,
5 "end_line": 604,
6 "content": "class BoundOptimizer:\n \"\"\"Adaptive Catoni PAC-Bayes bound with analytically optimal lambda\n and multi-sample gradient estimation.\n\n Novel contribution: At each training step, we compute the closed-form\n optimal lambda that minimizes the Catoni bound for the current empirical\n risk and KL divergence:\n\n B(lambda) = R/(1-lambda/2) + C/(n*lambda*(1-lambda/2))\n\n The unique minimizer satisfies: 2Rn*t^2 + 2C*t - C = 0 (t = lambda/2)\n giving lambda* = (-C + sqrt(C^2 + 2RnC)) / (Rn).\n\n This avoids the instability of learned-lambda approaches (which require\n a separate optimizer and careful tuning) while ensuring the tightest\n Catoni bound is always used as the training objective.\n\n Additionally, we use multi-sample forward passes (2 MC samples) during\n training for variance-reduced gradient estimation of the expected loss\n under the posterior distribution.\n\n The certificate uses PAC-Bayes-kl inversion (Langford-Seeger), which is\n the tightest known bound for given empirical risk and KL divergence.\n \"\"\"\n\n def __init__(self, learning_rate=0.001, momentum=0.95, prior_sigma=0.1,\n pmin=1e-5):\n self.learning_rate = learning_rate\n self.momentum = momentum\n self.prior_sigma = prior_sigma\n self.pmin = pmin\n self.n_mc_train = 2 # MC samples per training step\n\n def _optimal_lambda(self, R, C, n):\n \"\"\"Closed-form optimal lambda for the Catoni bound.\n\n Minimizes B(lam) = R/(1-lam/2) + C/(n*lam*(1-lam/2))\n by solving the first-order condition: 2Rn*t^2 + 2C*t - C = 0\n where t = lam/2.\n \"\"\"\n R = max(float(R), 1e-10)\n C = max(float(C), 1e-10)\n Rn = R * n\n disc = C * C + 2.0 * Rn * C\n t_star = (-C + math.sqrt(disc)) / (2.0 * Rn)\n lam_star = 2.0 * t_star\n return max(0.01, min(1.99, lam_star))\n\n def compute_bound(self, empirical_risk, kl, n, delta):\n \"\"\"Catoni bound with analytically optimal lambda.\n\n Computes the tightest Catoni-style PAC-Bayes bound by finding\n the optimal lambda in closed form for the current risk and KL.\n \"\"\"\n D = math.log(2.0 * math.sqrt(n) / delta)\n\n # Extract scalar values for lambda computation (no gradient needed)\n R_val = empirical_risk.item() if isinstance(\n empirical_risk, torch.Tensor) else float(empirical_risk)\n kl_val = kl.item() if isinstance(kl, torch.Tensor) else float(kl)\n C_val = kl_val + D\n\n lam = self._optimal_lambda(R_val, C_val, n)\n\n # Compute bound with gradient flow through empirical_risk and kl\n C = kl + D\n fac = 1.0 / (1.0 - lam / 2.0)\n bound = empirical_risk * fac + C * fac / (n * lam)\n return bound\n\n def train_step(self, model, data, target, device, n_bound, delta):\n \"\"\"Multi-sample training with adaptive Catoni bound.\n\n Averages NLL over multiple MC forward passes for reduced gradient\n variance in the stochastic posterior optimization, while using the\n analytically optimal lambda for the tightest Catoni bound.\n \"\"\"\n total_nll = 0.0\n for _ in range(self.n_mc_train):\n output = model(data, sample=True)\n log_probs = F.log_softmax(output, dim=1)\n log_probs = torch.clamp(log_probs, min=math.log(self.pmin))\n total_nll = total_nll + F.nll_loss(log_probs, target)\n\n nll = total_nll / self.n_mc_train\n kl = get_total_kl(model) # Analytic KL, same for all MC samples\n return self.compute_bound(nll, kl, n_bound, delta)\n\n def compute_risk_certificate(self, model, bound_loader, device,\n delta=0.025, mc_samples=1000):\n \"\"\"PAC-Bayes-kl certificate (Langford-Seeger, tightest known).\n\n Uses binary KL inversion for the tightest possible certificate:\n find p s.t. kl(Ber(emp_risk) || Ber(p)) = (KL + D) / n\n \"\"\"\n model.eval()\n n_bound = len(bound_loader.dataset)\n\n # 1. Empirical 0-1 risk via MC majority voting\n emp_risk_01 = compute_01_risk(model, bound_loader, device,\n mc_samples=mc_samples)\n\n # 2. KL divergence (analytic, exact)\n with torch.no_grad():\n dummy_data = next(iter(bound_loader))[0][:1].to(device)\n model(dummy_data, sample=True)\n kl = get_total_kl(model).item()\n\n # 3. PAC-Bayes-kl inversion for tightest certificate\n D = math.log(2.0 * math.sqrt(n_bound) / delta)\n c = (kl + D) / n_bound\n risk_cert_01 = inv_kl(emp_risk_01, c)\n\n # 4. NLL-based metrics for diagnostics\n total_nll = 0.0\n total_samples = 0\n with torch.no_grad():\n for data, target in bound_loader:\n data, target = data.to(device), target.to(device)\n output = model(data, sample=True)\n log_probs = F.log_softmax(output, dim=1)\n log_probs = torch.clamp(log_probs, min=math.log(self.pmin))\n nll = F.nll_loss(log_probs, target, reduction=\"sum\")\n total_nll += nll.item()\n total_samples += target.size(0)\n emp_nll = total_nll / total_samples\n\n emp_nll_t = torch.tensor(emp_nll)\n kl_t = torch.tensor(kl)\n ce_bound = self.compute_bound(emp_nll_t, kl_t, n_bound, delta).item()\n\n metrics = {\n \"empirical_01_risk\": emp_risk_01,\n \"empirical_nll\": emp_nll,\n \"kl_divergence\": kl,\n \"ce_bound\": ce_bound,\n }\n\n return risk_cert_01, metrics"
7}OK: Replaced lines 460..604 with 139 line(s) in PBB/custom_pac_bayes.py. Editable range: 460–598. [Current file: PBB/custom_pac_bayes.py | editable: 460–598 | total: 788 lines] 457| # ================================================================ 458| 459| 460 class BoundOptimizer: 461 """Adaptive Catoni PAC-Bayes bound with analytically optimal lambda 462 and multi-sample gradient estimation. 463 464 Novel contribution: At each training step, we compute the closed-form 465 optimal lambda that minimizes the Catoni bound for the current empirical 466 risk and KL divergence: 467 468 B(lambda) = R/(1-lambda/2) + C/(n*lambda*(1-lambda/2)) 469 470 The unique minimizer satisfies: 2Rn*t^2 + 2C*t - C = 0 (t = lambda/2) 471 giving lambda* = (-C + sqrt(C^2 + 2RnC)) / (Rn). 472 473 This avoids the instability of learned-lambda approaches (which require 474 a separate optimizer
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ### mnist-fcn (scripts/mnist_fcn.sh) [output file not found: /srv/home/bohanlyu/MLS-Bench/logs/optimization-pac-bayes-bound/anthropic_claude-opus-4.6_20260405_234106/20260405_234941/group_1/mnist-fcn_s42.out] ### mnist-cnn (scripts/mnist_cnn.sh) [output file not found: /srv/home/bohanlyu/MLS-Bench/logs/optimization-pac-bayes-bound/anthropic_claude-opus-4.6_20260405_234106/20260405_234941/group_1/mnist-cnn_s42.out] ### fmnist-cnn (scripts/fmnist_cnn.sh) [output file not found: /srv/home/bohanlyu/MLS-Bench/logs/optimization-pac-bayes-bound/anthropic_claude-opus-4.6_20260405_234106/20260405_234941/group_1/fmnist-cnn_s42.out]
Tool Input
1{
2 "op": "replace",
3 "filename": "PBB/custom_pac_bayes.py",
4 "start_line": 460,
5 "end_line": 598,
6 "content": "class BoundOptimizer:\n \"\"\"Adaptive Catoni PAC-Bayes bound with analytically optimal lambda\n and multi-sample gradient estimation.\n\n Novel contribution: closed-form optimal lambda for the Catoni bound\n at each training step, combined with multi-sample MC forward passes\n for variance-reduced gradient estimation. The Catoni bound:\n\n B(lam) = R/(1-lam/2) + C/(n*lam*(1-lam/2))\n\n is minimized at lam* = (-C + sqrt(C^2 + 2RnC)) / (Rn).\n Certificate uses PAC-Bayes-kl inversion (tightest known).\n \"\"\"\n\n def __init__(self, learning_rate=0.001, momentum=0.95, prior_sigma=0.1,\n pmin=1e-5):\n self.learning_rate = learning_rate\n self.momentum = momentum\n self.prior_sigma = prior_sigma\n self.pmin = pmin\n self.n_mc_train = 2\n\n def _optimal_lambda(self, R, C, n):\n \"\"\"Closed-form optimal lambda for Catoni bound.\"\"\"\n R = max(float(R), 1e-10)\n C = max(float(C), 1e-10)\n Rn = R * n\n disc = C * C + 2.0 * Rn * C\n t_star = (-C + math.sqrt(disc)) / (2.0 * Rn)\n lam_star = 2.0 * t_star\n return max(0.01, min(1.99, lam_star))\n\n def compute_bound(self, empirical_risk, kl, n, delta):\n \"\"\"Catoni bound with analytically optimal lambda.\"\"\"\n D = math.log(2.0 * math.sqrt(n) / delta)\n if isinstance(empirical_risk, torch.Tensor):\n R_val = empirical_risk.item()\n else:\n R_val = float(empirical_risk)\n if isinstance(kl, torch.Tensor):\n kl_val = kl.item()\n else:\n kl_val = float(kl)\n\n C_val = kl_val + D\n lam = self._optimal_lambda(R_val, C_val, n)\n\n C_tensor = kl + D\n fac = 1.0 / (1.0 - lam / 2.0)\n bound = empirical_risk * fac + C_tensor * fac / (n * lam)\n return bound\n\n def train_step(self, model, data, target, device, n_bound, delta):\n \"\"\"Multi-sample training with adaptive Catoni bound.\"\"\"\n total_nll = 0.0\n for _ in range(self.n_mc_train):\n output = model(data, sample=True)\n log_probs = F.log_softmax(output, dim=1)\n log_probs = torch.clamp(log_probs, min=math.log(self.pmin))\n total_nll = total_nll + F.nll_loss(log_probs, target)\n\n nll = total_nll / self.n_mc_train\n kl = get_total_kl(model)\n bound = self.compute_bound(nll, kl, n_bound, delta)\n return bound\n\n def compute_risk_certificate(self, model, bound_loader, device,\n delta=0.025, mc_samples=1000):\n \"\"\"PAC-Bayes-kl certificate (Langford-Seeger).\"\"\"\n model.eval()\n n_bound = len(bound_loader.dataset)\n\n # 1. Empirical 0-1 risk via MC majority voting\n emp_risk_01 = compute_01_risk(model, bound_loader, device,\n mc_samples=mc_samples)\n\n # 2. NLL-based empirical risk\n total_nll = 0.0\n total_samples = 0\n kl_total = None\n with torch.no_grad():\n for data, target in bound_loader:\n data, target = data.to(device), target.to(device)\n output = model(data, sample=True)\n log_probs = F.log_softmax(output, dim=1)\n log_probs = torch.clamp(log_probs, min=math.log(self.pmin))\n nll = F.nll_loss(log_probs, target, reduction=\"sum\")\n total_nll += nll.item()\n total_samples += target.size(0)\n if kl_total is None:\n kl_total = get_total_kl(model)\n\n emp_nll = total_nll / total_samples\n\n # 3. Get KL from forward pass\n with torch.no_grad():\n dummy_data = next(iter(bound_loader))[0][:1].to(device)\n model(dummy_data, sample=True)\n kl = get_total_kl(model).item()\n\n # 4. PAC-Bayes-kl inversion for 0-1 loss certificate\n c = (kl + math.log(2.0 * math.sqrt(n_bound) / delta)) / n_bound\n risk_cert_01 = inv_kl(emp_risk_01, c)\n\n # 5. Compute CE bound for metrics\n emp_nll_t = torch.tensor(emp_nll)\n kl_t = torch.tensor(kl)\n ce_bound = self.compute_bound(emp_nll_t, kl_t, n_bound, delta).item()\n\n metrics = {\n \"empirical_01_risk\": emp_risk_01,\n \"empirical_nll\": emp_nll,\n \"kl_divergence\": kl,\n \"ce_bound\": ce_bound,\n }\n\n return risk_cert_01, metrics"
7}OK: Replaced lines 460..598 with 117 line(s) in PBB/custom_pac_bayes.py. Editable range: 460–576. [Current file: PBB/custom_pac_bayes.py | editable: 460–576 | total: 766 lines] 457| # ================================================================ 458| 459| 460 class BoundOptimizer: 461 """Adaptive Catoni PAC-Bayes bound with analytically optimal lambda 462 and multi-sample gradient estimation. 463 464 Novel contribution: closed-form optimal lambda for the Catoni bound 465 at each training step, combined with multi-sample MC forward passes 466 for variance-reduced gradient estimation. The Catoni bound: 467 468 B(lam) = R/(1-lam/2) + C/(n*lam*(1-lam/2)) 469 470 is minimized at lam* = (-C + sqrt(C^2 + 2RnC)) / (Rn). 471 Certificate uses PAC-Bayes-kl inversion (tightest known). 472 """ 473 474 def __init__(self, learning_rate=0.001, momentum=0.95, prior_sigma=0.1, 47
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### mnist-fcn (scripts/mnist_fcn.sh) Training progress (last 5 steps): TRAIN_METRICS prior_epoch=10 loss=0.010002 accuracy=0.9971 TRAIN_METRICS posterior_epoch=5 train_obj=0.162034 kl=84.41 TRAIN_METRICS posterior_epoch=10 train_obj=0.160994 kl=105.15 TRAIN_METRICS posterior_epoch=15 train_obj=0.156592 kl=109.80 TRAIN_METRICS posterior_epoch=20 train_obj=0.160625 kl=113.57 Results (mnist-fcn): risk_certificate: 0.039442 test_error: 0.021900 kl_divergence: 113.570000 ce_bound: 0.161298 empirical_01_risk: 0.023133 ### mnist-cnn (scripts/mnist_cnn.sh) Training progress (last 5 steps): TRAIN_METRICS prior_epoch=10 loss=0.004920 accuracy=0.9988 TRAIN_METRICS posterior_epoch=5 train_obj=0.070160 kl=29.82 TRAIN_METRICS posterior_epoch=10 train_obj=0.069118 kl=35.52 TRAIN_METRICS posterior_epoch=15 train_obj=0.067587 kl=37.68 TRAIN_METRICS posterior_epoch=20 train_obj=0.067688 kl=38.20 Results (mnist-cnn): risk_
Tool Input
1{
2 "op": "replace",
3 "filename": "PBB/custom_pac_bayes.py",
4 "start_line": 460,
5 "end_line": 576,
6 "content": "class BoundOptimizer:\n \"\"\"Adaptive Catoni PAC-Bayes bound with analytically optimal lambda,\n multi-sample gradient estimation, and KL-aware training amplification.\n\n Novel contributions:\n 1. Closed-form optimal lambda for the Catoni bound at each step:\n B(lam) = R/(1-lam/2) + C/(n*lam*(1-lam/2))\n minimized at lam* = (-C + sqrt(C^2 + 2RnC)) / (Rn).\n\n 2. KL amplification during training: the NLL surrogate overestimates\n the 0-1 loss, so the standard Catoni trade-off under-penalizes KL\n relative to what the PAC-Bayes-kl certificate needs. We apply a\n mild KL boost (1.2x) during training to shift the posterior toward\n lower KL, which directly tightens the certificate via inv_kl.\n\n 3. Multi-sample (3) MC forward passes for variance-reduced gradients.\n\n 4. PAC-Bayes-kl inversion (Langford-Seeger) for the tightest certificate.\n \"\"\"\n\n def __init__(self, learning_rate=0.001, momentum=0.95, prior_sigma=0.1,\n pmin=1e-5):\n self.learning_rate = learning_rate\n self.momentum = momentum\n self.prior_sigma = prior_sigma\n self.pmin = pmin\n self.n_mc_train = 3 # MC samples per training step\n self.kl_boost = 1.2 # KL amplification for training\n\n def _optimal_lambda(self, R, C, n):\n \"\"\"Closed-form optimal lambda for Catoni bound.\"\"\"\n R = max(float(R), 1e-10)\n C = max(float(C), 1e-10)\n Rn = R * n\n disc = C * C + 2.0 * Rn * C\n t_star = (-C + math.sqrt(disc)) / (2.0 * Rn)\n lam_star = 2.0 * t_star\n return max(0.01, min(1.99, lam_star))\n\n def compute_bound(self, empirical_risk, kl, n, delta):\n \"\"\"Catoni bound with analytically optimal lambda.\"\"\"\n D = math.log(2.0 * math.sqrt(n) / delta)\n if isinstance(empirical_risk, torch.Tensor):\n R_val = empirical_risk.item()\n else:\n R_val = float(empirical_risk)\n if isinstance(kl, torch.Tensor):\n kl_val = kl.item()\n else:\n kl_val = float(kl)\n\n C_val = kl_val + D\n lam = self._optimal_lambda(R_val, C_val, n)\n\n C_tensor = kl + D\n fac = 1.0 / (1.0 - lam / 2.0)\n bound = empirical_risk * fac + C_tensor * fac / (n * lam)\n return bound\n\n def train_step(self, model, data, target, device, n_bound, delta):\n \"\"\"Multi-sample training with adaptive Catoni bound and KL boost.\"\"\"\n total_nll = 0.0\n for _ in range(self.n_mc_train):\n output = model(data, sample=True)\n log_probs = F.log_softmax(output, dim=1)\n log_probs = torch.clamp(log_probs, min=math.log(self.pmin))\n total_nll = total_nll + F.nll_loss(log_probs, target)\n\n nll = total_nll / self.n_mc_train\n kl = get_total_kl(model)\n # Boost KL during training to encourage tighter posteriors\n # This compensates for NLL overestimating 0-1 loss, which makes\n # the standard Catoni trade-off under-penalize KL\n kl_train = kl * self.kl_boost\n bound = self.compute_bound(nll, kl_train, n_bound, delta)\n return bound\n\n def compute_risk_certificate(self, model, bound_loader, device,\n delta=0.025, mc_samples=1000):\n \"\"\"PAC-Bayes-kl certificate (Langford-Seeger).\"\"\"\n model.eval()\n n_bound = len(bound_loader.dataset)\n\n # 1. Empirical 0-1 risk via MC majority voting\n emp_risk_01 = compute_01_risk(model, bound_loader, device,\n mc_samples=mc_samples)\n\n # 2. NLL-based empirical risk\n total_nll = 0.0\n total_samples = 0\n kl_total = None\n with torch.no_grad():\n for data, target in bound_loader:\n data, target = data.to(device), target.to(device)\n output = model(data, sample=True)\n log_probs = F.log_softmax(output, dim=1)\n log_probs = torch.clamp(log_probs, min=math.log(self.pmin))\n nll = F.nll_loss(log_probs, target, reduction=\"sum\")\n total_nll += nll.item()\n total_samples += target.size(0)\n if kl_total is None:\n kl_total = get_total_kl(model)\n\n emp_nll = total_nll / total_samples\n\n # 3. Get KL from forward pass (exact analytic KL)\n with torch.no_grad():\n dummy_data = next(iter(bound_loader))[0][:1].to(device)\n model(dummy_data, sample=True)\n kl = get_total_kl(model).item()\n\n # 4. PAC-Bayes-kl inversion for 0-1 loss certificate\n c = (kl + math.log(2.0 * math.sqrt(n_bound) / delta)) / n_bound\n risk_cert_01 = inv_kl(emp_risk_01, c)\n\n # 5. Compute CE bound for metrics (using actual KL, not boosted)\n emp_nll_t = torch.tensor(emp_nll)\n kl_t = torch.tensor(kl)\n ce_bound = self.compute_bound(emp_nll_t, kl_t, n_bound, delta).item()\n\n metrics = {\n \"empirical_01_risk\": emp_risk_01,\n \"empirical_nll\": emp_nll,\n \"kl_divergence\": kl,\n \"ce_bound\": ce_bound,\n }\n\n return risk_cert_01, metrics"
7}OK: Replaced lines 460..576 with 128 line(s) in PBB/custom_pac_bayes.py. Editable range: 460–587. [Current file: PBB/custom_pac_bayes.py | editable: 460–587 | total: 777 lines] 457| # ================================================================ 458| 459| 460 class BoundOptimizer: 461 """Adaptive Catoni PAC-Bayes bound with analytically optimal lambda, 462 multi-sample gradient estimation, and KL-aware training amplification. 463 464 Novel contributions: 465 1. Closed-form optimal lambda for the Catoni bound at each step: 466 B(lam) = R/(1-lam/2) + C/(n*lam*(1-lam/2)) 467 minimized at lam* = (-C + sqrt(C^2 + 2RnC)) / (Rn). 468 469 2. KL amplification during training: the NLL surrogate overestimates 470 the 0-1 loss, so the standard Catoni trade-off under-penalizes KL 471 relative to what the PAC-Bayes-kl certificate needs. We apply a 472 mild KL boost (1.2x) durin
[Test #3] (0 tests remaining) ### mnist-fcn (scripts/mnist_fcn.sh) Training progress (last 5 steps): TRAIN_METRICS prior_epoch=10 loss=0.010002 accuracy=0.9971 TRAIN_METRICS posterior_epoch=5 train_obj=0.163569 kl=64.13 TRAIN_METRICS posterior_epoch=10 train_obj=0.160103 kl=74.61 TRAIN_METRICS posterior_epoch=15 train_obj=0.163201 kl=73.73 TRAIN_METRICS posterior_epoch=20 train_obj=0.161281 kl=76.23 Results (mnist-fcn): risk_certificate: 0.036202 test_error: 0.021900 kl_divergence: 76.230000 ce_bound: 0.158567 empirical_01_risk: 0.023000 ### mnist-cnn (scripts/mnist_cnn.sh) Training progress (last 5 steps): TRAIN_METRICS prior_epoch=10 loss=0.004800 accuracy=0.9989 TRAIN_METRICS posterior_epoch=5 train_obj=0.069410 kl=24.66 TRAIN_METRICS posterior_epoch=10 train_obj=0.070019 kl=28.75 TRAIN_METRICS posterior_epoch=15 train_obj=0.069149 kl=31.13 TRAIN_METRICS posterior_epoch=20 train_obj=0.068295 kl=28.77 Results (mnist-cnn): risk_certificate: 0.017048 test_error: 0.00980