BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing [PDF]
Subject-driven text-to-image generation models create novel renditions of an input subject based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties preserving the subject fidelity.
Dongxu Li, Junnan Li, Steven C. H. Hoi
semanticscholar +1 more source
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding [PDF]
Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multi-modal conversations.
Peng Jin +4 more
semanticscholar +1 more source
A Survey of Orthogonal Moments for Image Representation: Theory, Implementation, and Evaluation [PDF]
Image representation is an important topic in computer vision and pattern recognition. It plays a fundamental role in a range of applications toward understanding visual contents.
Shuren Qi +4 more
semanticscholar +1 more source
Learning Continuous Image Representation with Local Implicit Image Function [PDF]
How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images.
Yinbo Chen, Sifei Liu, Xiaolong Wang
semanticscholar +1 more source
A Principled Design of Image Representation: Towards Forensic Tasks [PDF]
Image forensics is a rising topic as the trustworthy multimedia content is critical for modern society. Like other vision-related applications, forensic analysis relies heavily on the proper image representation.
Shuren Qi +4 more
semanticscholar +1 more source
Dual Space Latent Representation Learning for Image Representation
Semi-supervised non-negative matrix factorization (NMF) has achieved successful results due to the significant ability of image recognition by a small quantity of labeled information.
Yulei Huang +3 more
doaj +1 more source
Exploring Simple Siamese Representation Learning [PDF]
Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing ...
Xinlei Chen, Kaiming He
semanticscholar +1 more source
Documentary narrative for a new understanding of a stigmatized public space
Following on from an audiovisual project carried out over fifteen years in cities in the South of France, the tourist sites are now being filmed as part of a new documentary series, in an attempt to better understand daily life in these easily ...
Natacha Cyrulnik
doaj +1 more source
DSGEM: Dual scene graph enhancement module‐based visual question answering
Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention‐based VQA models mine the implicit relationships between objects according to the feature similarity, which neglects the explicit ...
Boyue Wang +5 more
doaj +1 more source
Underwater image enhancement via a channel‐wise transmission estimation network
Underwater image enhancement for image processing and underwater robotic vision have recently attracted much academic attention. However, in most existing methods, underwater image enhancement is completed with a simple assumption: the attenuation ...
Qiang Wang, Bo Fu, Huijie Fan
doaj +1 more source

