Results 21 to 30 of about 2,484,152 (338)

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers [PDF]

open access: yesIEEE transactions on intelligent transportation systems (Print), 2022
Scene understanding based on image segmentation is a crucial component of autonomous vehicles. Pixel-wise semantic segmentation of RGB images can be advanced by exploiting complementary features from the supplementary modality ( ${X}$ -modality). However,
Huayao Liu   +4 more
semanticscholar   +1 more source

Cross-modal Memory Networks for Radiology Report Generation [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2022
Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments.
Zhihong Chen   +3 more
semanticscholar   +1 more source

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object classification, segmentation and detection is often laborious owing to the irregular structure of point clouds.
Mohamed Afham   +5 more
semanticscholar   +1 more source

Delivering Arbitrary-Modal Semantic Segmentation [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
Multimodal fusion can make semantic segmentation more robust. However, fusing an arbitrary number of modalities remains underexplored. To delve into this problem, we create the Deliver arbitrary-modal segmentation benchmark, covering Depth, LiDAR ...
Jiaming Zhang   +8 more
semanticscholar   +1 more source

Multi-Modal Self-Supervised Learning for Recommendation [PDF]

open access: yesThe Web Conference, 2023
The online emergence of multi-modal sharing platforms (e.g., TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (e.g., visual, textual and acoustic) into the latent user representations.
Wei Wei   +3 more
semanticscholar   +1 more source

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video [PDF]

open access: yesInternational Conference on Machine Learning, 2023
Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration ...
Haiyang Xu   +14 more
semanticscholar   +1 more source

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale [PDF]

open access: yesInternational Conference on Machine Learning, 2023
This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model. Our key insight is -- learning diffusion models for marginal, conditional, and joint distributions can be ...
Fan Bao   +9 more
semanticscholar   +1 more source

Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration [PDF]

open access: yesarXiv.org, 2023
Although instruction-tuned large language models (LLMs) have exhibited remarkable capabilities across various NLP tasks, their effectiveness on other data modalities beyond text has not been fully studied.
Chenyang Lyu   +7 more
semanticscholar   +1 more source

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark [PDF]

open access: yesNeural Information Processing Systems, 2023
Large language models have become a potential pathway toward achieving artificial general intelligence. Recent works on multi-modal large language models have demonstrated their effectiveness in handling visual modalities.
Zhen-fei Yin   +11 more
semanticscholar   +1 more source

Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [PDF]

open access: yesComputer Vision and Pattern Recognition, 2021
How should representations from complementary sensors be integrated for autonomous driving? Geometry-based sensor fusion has shown great promise for perception tasks such as object detection and motion forecasting.
Aditya Prakash   +2 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy