Results 21 to 30 of about 1,234,147 (346)

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale [PDF]

open access: yesInternational Conference on Machine Learning, 2023
This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model. Our key insight is -- learning diffusion models for marginal, conditional, and joint distributions can be ...
Fan Bao   +9 more
semanticscholar   +1 more source

Multi-Modal Self-Supervised Learning for Recommendation [PDF]

open access: yesThe Web Conference, 2023
The online emergence of multi-modal sharing platforms (e.g., TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (e.g., visual, textual and acoustic) into the latent user representations.
Wei Wei   +3 more
semanticscholar   +1 more source

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning [PDF]

open access: yesNeural Information Processing Systems, 2022
We present modality gap, an intriguing geometric phenomenon of the representation space of multi-modal models. Specifically, we show that different data modalities (e.g.
Weixin Liang   +4 more
semanticscholar   +1 more source

Cross-modal Memory Networks for Radiology Report Generation [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2022
Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments.
Zhihong Chen   +3 more
semanticscholar   +1 more source

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers [PDF]

open access: yesIEEE transactions on intelligent transportation systems (Print), 2022
Scene understanding based on image segmentation is a crucial component of autonomous vehicles. Pixel-wise semantic segmentation of RGB images can be advanced by exploiting complementary features from the supplementary modality ( ${X}$ -modality). However,
Huayao Liu   +4 more
semanticscholar   +1 more source

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object classification, segmentation and detection is often laborious owing to the irregular structure of point clouds.
Mohamed Afham   +5 more
semanticscholar   +1 more source

Delivering Arbitrary-Modal Semantic Segmentation [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
Multimodal fusion can make semantic segmentation more robust. However, fusing an arbitrary number of modalities remains underexplored. To delve into this problem, we create the Deliver arbitrary-modal segmentation benchmark, covering Depth, LiDAR ...
Jiaming Zhang   +8 more
semanticscholar   +1 more source

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be ...
Zhiqiu Lin   +4 more
semanticscholar   +1 more source

Collaborative Diffusion for Multi-Modal Face Generation and Editing [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users'
Ziqi Huang   +3 more
semanticscholar   +1 more source

Multi-Modal Learning with Missing Modality via Shared-Specific Feature Modelling [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only during evaluation or train separate ...
Hu Wang   +5 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy