Multimodal Transformer for Unaligned Multimodal Language Sequences [PDF]
Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment ...
Yao-Hung Hubert Tsai +5 more
semanticscholar +5 more sources
PaLM-E: An Embodied Multimodal Language Model [PDF]
Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding.
Danny Driess +21 more
openalex +3 more sources
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering [PDF]
When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT).
Pan Lu +8 more
semanticscholar +1 more source
MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI [PDF]
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.
Xiang Yue +21 more
semanticscholar +1 more source
Acknowledgment to Reviewers of Multimodal Technologies and Interaction in 2021
Rigorous peer-reviews are the basis of high-quality academic publishing [...]
Multimodal Technologies and Interaction Editorial Office
doaj +1 more source
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [PDF]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect
Chaoyou Fu +12 more
semanticscholar +1 more source
Acknowledgment to Reviewers of Multimodal Technologies and Interaction in 2020
Peer review is the driving force of journal development, and reviewers are gatekeepers who ensure that Multimodal Technologies and Interaction maintains its standards for the high quality of its published papers [...]
Multimodal Technologies and Interaction Editorial Office
doaj +1 more source
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity [PDF]
This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application ...
Yejin Bang +12 more
semanticscholar +1 more source
Multimode Mamyshev Oscillator [PDF]
Spatiotemporal mode-locking (STML) is demonstrated in a Mamyshev Oscillator. We observe a variety of STML states with different degrees of spatiotemporal coupling. The design allows some control over the multimode output beam profile.
Henry Haig +6 more
openaire +3 more sources
nuScenes: A Multimodal Dataset for Autonomous Driving [PDF]
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the ...
Holger Caesar +9 more
semanticscholar +1 more source

