Results 1 to 10 of about 10,352,216 (337)

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [PDF]

open access: yesInternational Conference on Machine Learning, 2023
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from
Junnan Li   +3 more
semanticscholar   +1 more source

Adding Conditional Control to Text-to-Image Diffusion Models [PDF]

open access: yesIEEE International Conference on Computer Vision, 2023
We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers ...
Lvmin Zhang, Anyi Rao, Maneesh Agrawala
semanticscholar   +1 more source

High-Resolution Image Synthesis with Latent Diffusion Models [PDF]

open access: yesComputer Vision and Pattern Recognition, 2021
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism
Robin Rombach   +4 more
semanticscholar   +1 more source

Hierarchical Text-Conditional Image Generation with CLIP Latents [PDF]

open access: yesarXiv.org, 2022
Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image ...
A. Ramesh   +4 more
semanticscholar   +1 more source

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [PDF]

open access: yesNeural Information Processing Systems, 2022
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength ...
Chitwan Saharia   +13 more
semanticscholar   +1 more source

TIAToolbox as an end-to-end library for advanced tissue image analytics

open access: yesCommunications Medicine, 2022
Pocock, Graham et al. present TIAToolbox, a Python toolbox for computational pathology. The extendable library can be used for data loading, pre-processing, model inference, post-processing, and visualization.
Johnathan Pocock   +13 more
doaj   +1 more source

Analyzing and Improving the Image Quality of StyleGAN [PDF]

open access: yesComputer Vision and Pattern Recognition, 2019
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training
Tero Karras   +5 more
semanticscholar   +1 more source

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt.
Nataniel Ruiz   +5 more
semanticscholar   +1 more source

Foldover Features for Dynamic Object Behaviour Description in Microscopic Videos

open access: yesIEEE Access, 2020
A behavior description helps analyze tiny objects, similar objects, objects with weak visual information, and objects with similar visual information. It plays a fundamental role in the identification and classification of dynamic objects in microscopic ...
Xialin Li   +11 more
doaj   +1 more source

An Enhanced Framework of Generative Adversarial Networks (EF-GANs) for Environmental Microorganism Image Augmentation With Limited Rotation-Invariant Training Data

open access: yesIEEE Access, 2020
The main obstacle to image augmentation with Generative Adversarial Networks (GANs) is the need for a large amount of training data, but this is difficult for small datasets like Environmental Microorganisms (EMs). EM image analysis plays a vital role in
Hao Xu   +9 more
doaj   +1 more source

Home - About - Disclaimer - Privacy