Poster
Protein Discovery with Discrete Walk-Jump Sampling
Nathan Frey · Dan Berenberg · Karina Zadorozhny · Joseph Kleinhenz · Julien Lafrance-Vanasse · Isidro Hotzel · Yan Wu · Stephen Ra · Richard Bonneau · Kyunghyun Cho · Andreas Loukas · Vladimir Gligorijevic · Saeed Saremi
Halle B
[
Abstract
]
Oral
presentation:
Oral 1A
Tue 7 May 1 a.m. PDT — 1:45 a.m. PDT
[
OpenReview]
Tue 7 May 1:45 a.m. PDT
— 3:45 a.m. PDT
Tue 7 May 1 a.m. PDT — 1:45 a.m. PDT
Abstract:
We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our $\textit{Discrete Walk-Jump Sampling}$ formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the $\textit{distributional conformity score}$ to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100\% of generated samples are successfully expressed and purified and 70\% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.
Chat is not available.