ai4bio-antibody-binding-scoring

AI for BiologyAbBiBenchrigorous codebase

Description

Task: Antibody Binding Affinity Scoring

Research Question

Design a novel scoring function that predicts antibody-antigen binding affinity from pretrained protein language model features. The goal is to develop a scoring head that effectively leverages ESM-2 representations to produce scores that correlate with experimental binding measurements across diverse antigen targets.

Background

Antibody engineering requires computational methods to predict how sequence mutations affect binding affinity to a target antigen. A common approach uses pretrained protein language models (PLMs) like ESM-2 to compute pseudo-log-likelihood (PLL) scores as a proxy for binding fitness. However, PLL is a generic sequence fitness metric and does not directly model the antibody-antigen interaction.

Key challenges:

Mutation sensitivity: The scoring function must detect the effect of single or few amino acid changes in the antibody heavy/light chains on binding.
Generalization across antigens: The function should work across structurally different antigen targets (influenza, SARS-related, HER2).
Limited supervision: Experimental binding data is available for training, but datasets are relatively small (hundreds to thousands of variants per antigen).

Existing approaches include:

Mean PLL: Average token log-probability of the full complex under a masked LM — simple zero-shot baseline.
Masked marginal scoring: Compute log-probability ratios at mutated positions relative to wild-type — captures mutation-specific effects.
Supervised MLP heads: Learn a mapping from PLM embedding differences (mutant vs wild-type) to binding affinity.

What to Implement

Implement the ScoringHead class in custom_abscore.py. You must implement:

__init__(self, embed_dim): Initialize your scoring model.
forward(self, heavy_feats, light_feats, antigen_feats, wt_heavy_feats, wt_light_feats) -> Tensor [B]: Return one binding affinity score per variant.
compute_loss(self, pred_scores, target_scores) -> Tensor: Training loss (can be no-op for zero-shot).
is_zero_shot property: Return True to skip training, False for supervised training.

Input Features (ESMFeatures dataclass)

Each *_feats argument is an ESMFeatures object with:

@dataclass
class ESMFeatures:
    embeddings: Tensor      # [B, 1280] mean-pooled ESM-2 hidden states
    token_logprobs: List[float]  # [B] average per-token log-probability
    sequences: List[str]    # [B] original amino acid sequences

The ESM-2 model (facebook/esm2_t33_650M_UR50D, 650M params) is loaded in the FIXED section with both the MLM head (for log-probs) and hidden state output (for embeddings). It is frozen — only the ScoringHead parameters are trainable.

Evaluation

The model is tested on 3 antigen targets from AbBiBench:

Dataset	Antigen	Description
influenza	3gbn_h1	Influenza hemagglutinin H1
sars	4fqi_h1	Influenza hemagglutinin H1 (4fqi complex)
her2	4d5_her2	HER2 (breast cancer target)

Metric: Spearman rank correlation between predicted scores and experimental binding affinity measurements. Higher is better.

Split protocol (position-disjoint): The datasets are deep mutational scans (DMS) where many rows share the same mutated positions. A naive random split leaks positional context across train/val/test and inflates supervised Spearman to 0.9+ even for trivial heads — this does not reflect real generalization. Instead, we use an 80/10/10 split by mutated-position signature on the heavy chain: every mutation at a given position lands entirely in one split, so positions appearing in the test set are never observed during training. The wildtype row is always in train. Under this protocol, supervised heads drop to the range reported by AbBiBench for zero-shot/few-shot methods (~0.2–0.6 on influenza/sars, lower on her2); substantial gains require models that actually generalize across positions rather than memorizing per-position effects.

Editable Region

Lines 150-227 of custom_abscore.py (between EDITABLE SECTION START and EDITABLE SECTION END). The region must contain a ScoringHead class with the specified interface.

Code

custom_abscore.py

EditableRead-only

1"""
2Antibody Binding Affinity Scoring — Self-contained template.
3Predicts antibody-antigen binding affinity from pretrained protein
4language model embeddings. Evaluates via Spearman correlation against
5experimental binding measurements on AbBiBench datasets.
6
7Structure:
8  Lines 1-115:   FIXED — Imports, data loading, ESM-2 embedding extraction
9  Lines 116-172: EDITABLE — ScoringHead class (starter: linear projection)
10  Lines 173+:    FIXED — Training loop, evaluation, CLI
11"""
12import os
13import sys
14import json
15import argparse

Results

Model	Type	spearman influenza ↑	mse influenza ↓	spearman sars ↑	mse sars ↓	spearman her2 ↑	mse her2 ↓
delta_mlp	baseline	0.942	0.213	0.934	0.045	0.462	0.387
esm2_pll	baseline	0.930	0.174	0.859	0.144	0.480	0.688
masked_marginal	baseline	0.938	0.202	0.940	0.031	0.466	0.333