Results 1 to 10 of about 403,741 (54)
GLaMM: Pixel Grounding Large Multimodal Model [PDF]
Large Multimodal Models (LMMs) extend Large Lan-guage Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses.
H. Rasheed +9 more
semanticscholar +1 more source
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization [PDF]
Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable diffusion models provide a potential solution to the challenging ...
Tao Yang +3 more
semanticscholar +1 more source
PixelLM: Pixel Reasoning with Large Multimodal Model [PDF]
While large multimodal models (LMMs) have achieved remarkable progress, generating pixel-level masks for image reasoning tasks involving multiple open-world targets remains a challenge. To bridge this gap, we introduce PixelLM, an effective and efficient
Zhongwei Ren +6 more
semanticscholar +1 more source
Osprey: Pixel Understanding with Visual Instruction Tuning [PDF]
Multimodal large language models (MLLMs) have recently achieved impressive general-purpose vision-language capabilities through visual instruction tuning.
Yuqian Yuan +7 more
semanticscholar +1 more source
Generalized Decoding for Pixel, Image, and Language [PDF]
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decoder takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text ...
Xueyan Zou +13 more
semanticscholar +1 more source
Selección de Píxel Semilla mediante Wavelets para Crecimiento por Regiones Difuso
El análisis de masas y tumores en mamografía es un problema difícil porque los signos del cáncer pueden ser mínimos o estar superpuestos en el tejido. Las técnicas de procesamiento de imágenes pueden mejorar el diagnóstico reduciendo los costos.
Damian Valdés Santiago +2 more
doaj +1 more source
Exploring Cross-Image Pixel Contrast for Semantic Segmentation [PDF]
Current semantic segmentation methods focus only on mining "local" context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteria (
Wenguan Wang +5 more
semanticscholar +1 more source
Pixel Difference Networks for Efficient Edge Detection [PDF]
Recently, deep Convolutional Neural Networks (CNNs) can achieve human-level performance in edge detection with the rich and abstract edge representation capacities.
Z. Su +7 more
semanticscholar +1 more source
SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic Segmentation [PDF]
Domain adaptive semantic segmentation attempts to make satisfactory dense predictions on an unlabeled target domain by utilizing the supervised model trained on a labeled source domain. One popular solution is self-training, which retrains the model with
Binhui Xie +5 more
semanticscholar +1 more source
Pixel-Grounded Prototypical Part Networks [PDF]
Prototypical part neural networks (ProtoPartNNs), namely ProtoPNet and its derivatives, are an intrinsically interpretable approach to machine learning.
Zachariah Carmichael +5 more
semanticscholar +1 more source

