Results 11 to 20 of about 1,634,168 (383)

GLaMM: Pixel Grounding Large Multimodal Model [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
Large Multimodal Models (LMMs) extend Large Lan-guage Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses.
H. Rasheed   +9 more
semanticscholar   +1 more source

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization [PDF]

open access: yesEuropean Conference on Computer Vision, 2023
Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable diffusion models provide a potential solution to the challenging ...
Tao Yang   +3 more
semanticscholar   +1 more source

PixelLM: Pixel Reasoning with Large Multimodal Model [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
While large multimodal models (LMMs) have achieved remarkable progress, generating pixel-level masks for image reasoning tasks involving multiple open-world targets remains a challenge. To bridge this gap, we introduce PixelLM, an effective and efficient
Zhongwei Ren   +6 more
semanticscholar   +1 more source

Generalized Decoding for Pixel, Image, and Language [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decoder takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text ...
Xueyan Zou   +13 more
semanticscholar   +1 more source

Pixel Difference Networks for Efficient Edge Detection [PDF]

open access: yesIEEE International Conference on Computer Vision, 2021
Recently, deep Convolutional Neural Networks (CNNs) can achieve human-level performance in edge detection with the rich and abstract edge representation capacities.
Z. Su   +7 more
semanticscholar   +1 more source

Exploring Cross-Image Pixel Contrast for Semantic Segmentation [PDF]

open access: yesIEEE International Conference on Computer Vision, 2021
Current semantic segmentation methods focus only on mining "local" context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteria (
Wenguan Wang   +5 more
semanticscholar   +1 more source

Osprey: Pixel Understanding with Visual Instruction Tuning [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
Multimodal large language models (MLLMs) have recently achieved impressive general-purpose vision-language capabilities through visual instruction tuning.
Yuqian Yuan   +7 more
semanticscholar   +1 more source

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [PDF]

open access: yesComputer Vision and Pattern Recognition, 2016
Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled
Wenzhe Shi   +7 more
semanticscholar   +1 more source

One Pixel Attack for Fooling Deep Neural Networks [PDF]

open access: yesIEEE Transactions on Evolutionary Computation, 2017
Recent research has revealed that the output of deep neural networks (DNNs) can be easily altered by adding relatively small perturbations to the input vector.
Jiawei Su   +2 more
semanticscholar   +1 more source

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization [PDF]

open access: yesIEEE International Conference on Computer Vision, 2019
We introduce Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object.
Shunsuke Saito   +5 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy