Results 11 to 20 of about 1,634,168 (383)
GLaMM: Pixel Grounding Large Multimodal Model [PDF]
Large Multimodal Models (LMMs) extend Large Lan-guage Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses.
H. Rasheed +9 more
semanticscholar +1 more source
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization [PDF]
Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable diffusion models provide a potential solution to the challenging ...
Tao Yang +3 more
semanticscholar +1 more source
PixelLM: Pixel Reasoning with Large Multimodal Model [PDF]
While large multimodal models (LMMs) have achieved remarkable progress, generating pixel-level masks for image reasoning tasks involving multiple open-world targets remains a challenge. To bridge this gap, we introduce PixelLM, an effective and efficient
Zhongwei Ren +6 more
semanticscholar +1 more source
Generalized Decoding for Pixel, Image, and Language [PDF]
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decoder takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text ...
Xueyan Zou +13 more
semanticscholar +1 more source
Pixel Difference Networks for Efficient Edge Detection [PDF]
Recently, deep Convolutional Neural Networks (CNNs) can achieve human-level performance in edge detection with the rich and abstract edge representation capacities.
Z. Su +7 more
semanticscholar +1 more source
Exploring Cross-Image Pixel Contrast for Semantic Segmentation [PDF]
Current semantic segmentation methods focus only on mining "local" context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteria (
Wenguan Wang +5 more
semanticscholar +1 more source
Osprey: Pixel Understanding with Visual Instruction Tuning [PDF]
Multimodal large language models (MLLMs) have recently achieved impressive general-purpose vision-language capabilities through visual instruction tuning.
Yuqian Yuan +7 more
semanticscholar +1 more source
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [PDF]
Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled
Wenzhe Shi +7 more
semanticscholar +1 more source
One Pixel Attack for Fooling Deep Neural Networks [PDF]
Recent research has revealed that the output of deep neural networks (DNNs) can be easily altered by adding relatively small perturbations to the input vector.
Jiawei Su +2 more
semanticscholar +1 more source
PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization [PDF]
We introduce Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object.
Shunsuke Saito +5 more
semanticscholar +1 more source

