ai4bio-antibody-binding-scoring
Description
Task: Antibody Binding Affinity Scoring
Research Question
Design a novel scoring function that predicts antibody-antigen binding affinity from pretrained protein language model features. The goal is to develop a scoring head that effectively leverages ESM-2 representations to produce scores that correlate with experimental binding measurements across diverse antigen targets.
Background
Antibody engineering requires computational methods to predict how sequence mutations affect binding affinity to a target antigen. A common approach uses pretrained protein language models (PLMs) like ESM-2 to compute pseudo-log-likelihood (PLL) scores as a proxy for binding fitness. However, PLL is a generic sequence fitness metric and does not directly model the antibody-antigen interaction.
Key challenges:
- Mutation sensitivity: The scoring function must detect the effect of single or few amino acid changes in the antibody heavy/light chains on binding.
- Generalization across antigens: The function should work across structurally different antigen targets (influenza, SARS-related, HER2).
- Limited supervision: Experimental binding data is available for training, but datasets are relatively small (hundreds to thousands of variants per antigen).
Existing approaches include:
- Mean PLL: Average token log-probability of the full complex under a masked LM — simple zero-shot baseline.
- Masked marginal scoring: Compute log-probability ratios at mutated positions relative to wild-type — captures mutation-specific effects.
- Supervised MLP heads: Learn a mapping from PLM embedding differences (mutant vs wild-type) to binding affinity.
What to Implement
Implement the ScoringHead class in custom_abscore.py. You must implement:
__init__(self, embed_dim): Initialize your scoring model.forward(self, heavy_feats, light_feats, antigen_feats, wt_heavy_feats, wt_light_feats) -> Tensor [B]: Return one binding affinity score per variant.compute_loss(self, pred_scores, target_scores) -> Tensor: Training loss (can be no-op for zero-shot).is_zero_shotproperty: ReturnTrueto skip training,Falsefor supervised training.
Input Features (ESMFeatures dataclass)
Each *_feats argument is an ESMFeatures object with:
@dataclass
class ESMFeatures:
embeddings: Tensor # [B, 1280] mean-pooled ESM-2 hidden states
token_logprobs: List[float] # [B] average per-token log-probability
sequences: List[str] # [B] original amino acid sequences
The ESM-2 model (facebook/esm2_t33_650M_UR50D, 650M params) is loaded in the FIXED section with both the MLM head (for log-probs) and hidden state output (for embeddings). It is frozen — only the ScoringHead parameters are trainable.
Evaluation
The model is tested on 3 antigen targets from AbBiBench:
| Dataset | Antigen | Description |
|---|---|---|
| influenza | 3gbn_h1 | Influenza hemagglutinin H1 |
| sars | 4fqi_h1 | Influenza hemagglutinin H1 (4fqi complex) |
| her2 | 4d5_her2 | HER2 (breast cancer target) |
Metric: Spearman rank correlation between predicted scores and experimental binding affinity measurements. Higher is better.
Split protocol (position-disjoint): The datasets are deep mutational scans (DMS) where many rows share the same mutated positions. A naive random split leaks positional context across train/val/test and inflates supervised Spearman to 0.9+ even for trivial heads — this does not reflect real generalization. Instead, we use an 80/10/10 split by mutated-position signature on the heavy chain: every mutation at a given position lands entirely in one split, so positions appearing in the test set are never observed during training. The wildtype row is always in train. Under this protocol, supervised heads drop to the range reported by AbBiBench for zero-shot/few-shot methods (~0.2–0.6 on influenza/sars, lower on her2); substantial gains require models that actually generalize across positions rather than memorizing per-position effects.
Editable Region
Lines 150-227 of custom_abscore.py (between EDITABLE SECTION START and EDITABLE SECTION END). The region must contain a ScoringHead class with the specified interface.
Code
1"""2Antibody Binding Affinity Scoring — Self-contained template.3Predicts antibody-antigen binding affinity from pretrained protein4language model embeddings. Evaluates via Spearman correlation against5experimental binding measurements on AbBiBench datasets.67Structure:8Lines 1-115: FIXED — Imports, data loading, ESM-2 embedding extraction9Lines 116-172: EDITABLE — ScoringHead class (starter: linear projection)10Lines 173+: FIXED — Training loop, evaluation, CLI11"""12import os13import sys14import json15import argparse
Results
| Model | Type | spearman influenza ↑ | mse influenza ↓ | spearman sars ↑ | mse sars ↓ | spearman her2 ↑ | mse her2 ↓ |
|---|---|---|---|---|---|---|---|
| delta_mlp | baseline | 0.942 | 0.213 | 0.934 | 0.045 | 0.462 | 0.387 |
| esm2_pll | baseline | 0.930 | 0.174 | 0.859 | 0.144 | 0.480 | 0.688 |
| masked_marginal | baseline | 0.938 | 0.202 | 0.940 | 0.031 | 0.466 | 0.333 |