I am a research scientist and cofounder at Black Forest
Labs. I obtained my PhD and Master from the University of Waterloo, and a Bachelor of
Science from the Technical University of Munich.
My main research interest is generative modeling, with a focus on diffusion models.
We present a novel distillation method called Latent Adversarial Diffusion Distillation (LADD) and
distill Stable Diffusin 3 down to one to four inference steps.
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi,
Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion
English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach
ICML, 2024   (Best Paper Award) arXiv
We present Stable Diffusion 3 an improved version of Stable Diffusion using a novel multimodal
Diffusion Transformer architecture.
We present Stable Video Diffusion,a latent video diffusion model for high-resolution,
state-of-the-art text-to-video and image-to-video generation.
We demonstrate the necessity of a well-curated pretraining dataset for generating high-quality
videos and present a systematic curation process to train a strong base model, including captioning
and filtering strategies.
We present SDXL an improved version of Stable Diffusion using several conditioning tricks and
multi-resolution training. We also release a refiner model to even further improve visual fidelity.
We show that proximal maps can serve as a natural family of quantizers that is both easy to design
and analyze, and we propose ProxConnect as a generalization of the widely-used BinaryConnect.
I give a comprehensive and self-contained introduction to continuous Normalizing Flows, and show
that their training can be accelerated using tolerance schedulers.