Image-to-Image Translation with Conditional Adversarial Networks [PDF]
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping.
Efros, Alexei A.+3 more
core +2 more sources
High-Resolution Image Synthesis with Latent Diffusion Models [PDF]
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism
Robin Rombach+4 more
semanticscholar +1 more source
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [PDF]
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from
Junnan Li+3 more
semanticscholar +1 more source
Hierarchical Text-Conditional Image Generation with CLIP Latents [PDF]
Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image ...
A. Ramesh+4 more
semanticscholar +1 more source
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [PDF]
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength ...
Chitwan Saharia+13 more
semanticscholar +1 more source
Adding Conditional Control to Text-to-Image Diffusion Models [PDF]
We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers ...
Lvmin Zhang, Anyi Rao, Maneesh Agrawala
semanticscholar +1 more source
Analyzing and Improving the Image Quality of StyleGAN [PDF]
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training
Tero Karras+5 more
semanticscholar +1 more source
TIAToolbox as an end-to-end library for advanced tissue image analytics
Pocock, Graham et al. present TIAToolbox, a Python toolbox for computational pathology. The extendable library can be used for data loading, pre-processing, model inference, post-processing, and visualization.
Johnathan Pocock+13 more
doaj +1 more source
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [PDF]
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt.
Nataniel Ruiz+5 more
semanticscholar +1 more source
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network [PDF]
Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at ...
C. Ledig+8 more
semanticscholar +1 more source