Results 11 to 20 of about 1,226,149 (401)

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability
Jiarui Xu   +5 more
semanticscholar   +1 more source

Side Adapter Network for Open-Vocabulary Semantic Segmentation [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network
Mengde Xu   +4 more
semanticscholar   +1 more source

OpenMask3D: Open-Vocabulary 3D Instance Segmentation [PDF]

open access: yesNeural Information Processing Systems, 2023
We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets.
Ayca Takmaz   +5 more
semanticscholar   +1 more source

Scaling Open-Vocabulary Object Detection [PDF]

open access: yesNeural Information Processing Systems, 2023
Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data.
Matthias Minderer   +2 more
semanticscholar   +1 more source

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Open-vocabulary semantic segmentation aims to segment an image into semantic regions according to text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and then leverage pre-
Feng Liang   +8 more
semanticscholar   +1 more source

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space [PDF]

open access: yesConference on Empirical Methods in Natural Language Processing, 2022
Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process,
Mor Geva   +3 more
semanticscholar   +1 more source

Towards Open Vocabulary Learning: A Survey [PDF]

open access: yesIEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Jianzong Wu   +11 more
semanticscholar   +1 more source

Simple Open-Vocabulary Object Detection with Vision Transformers [PDF]

open access: yesarXiv.org, 2022
Combining simple architectures with large-scale pre-training has led to massive improvements in image classification. For object detection, pre-training and scaling approaches are less well established, especially in the long-tailed and open-vocabulary ...
Matthias Minderer   +13 more
semanticscholar   +1 more source

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models [PDF]

open access: yesConference on Empirical Methods in Natural Language Processing, 2023
Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This \textit{vocabulary bottleneck} limits
Davis Liang   +7 more
semanticscholar   +1 more source

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich ...
Runyu Ding   +5 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy