Results 1 to 10 of about 10,352,216 (337)
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [PDF]
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from
Junnan Li+3 more
semanticscholar +1 more source
Adding Conditional Control to Text-to-Image Diffusion Models [PDF]
We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers ...
Lvmin Zhang, Anyi Rao, Maneesh Agrawala
semanticscholar +1 more source
High-Resolution Image Synthesis with Latent Diffusion Models [PDF]
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism
Robin Rombach+4 more
semanticscholar +1 more source
Hierarchical Text-Conditional Image Generation with CLIP Latents [PDF]
Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image ...
A. Ramesh+4 more
semanticscholar +1 more source
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [PDF]
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength ...
Chitwan Saharia+13 more
semanticscholar +1 more source
TIAToolbox as an end-to-end library for advanced tissue image analytics
Pocock, Graham et al. present TIAToolbox, a Python toolbox for computational pathology. The extendable library can be used for data loading, pre-processing, model inference, post-processing, and visualization.
Johnathan Pocock+13 more
doaj +1 more source
Analyzing and Improving the Image Quality of StyleGAN [PDF]
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training
Tero Karras+5 more
semanticscholar +1 more source
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [PDF]
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt.
Nataniel Ruiz+5 more
semanticscholar +1 more source
Foldover Features for Dynamic Object Behaviour Description in Microscopic Videos
A behavior description helps analyze tiny objects, similar objects, objects with weak visual information, and objects with similar visual information. It plays a fundamental role in the identification and classification of dynamic objects in microscopic ...
Xialin Li+11 more
doaj +1 more source
The main obstacle to image augmentation with Generative Adversarial Networks (GANs) is the need for a large amount of training data, but this is difficult for small datasets like Environmental Microorganisms (EMs). EM image analysis plays a vital role in
Hao Xu+9 more
doaj +1 more source