Probabilistic Experimental Design for Petascale DNA Synthesis
I introduce experimental design methods to efficiently manufacture samples from generative models of biomolecules in the real world. The algorithms merge statistical methods for approximate sampling with randomness from the physical world. I also develop tools to rigorously evaluate the quality of manufactured samples, including nonparametric Bayesian two-sample tests with strong theoretical guarantees and scalable algorithms. I demonstrate synthesizing ~10^17 samples from a generative model of human antibodies, at a sample quality comparable to state-of-the-art protein language models, and a cost of ~$10^3. The library yields hundreds of therapeutic candidates against "undruggable" tumor antigens. Using previous methods, a library of the same size and quality would cost roughly ~$10^15.