Results 301 to 310 of about 1,224,850 (345)
Some of the next articles are maybe not open access.
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
European Conference on Computer Vision, 2023In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.
Lin Chen+7 more
semanticscholar +1 more source
mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Computer Vision and Pattern Recognition, 2023Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily fo-cus on enhancing multi-modal capabilities.
Qinghao Ye+9 more
semanticscholar +1 more source
Modal Persistence and Modal Travel [PDF]
AbstractWe argue that there is an interesting modal analogue of temporal persistence, namely modal persistence, and an interesting modal analogue of time travel, namely modal travel. We explicate each of these notions and then argue that there are plausible conditions under which some ordinary objects modally persist. We go on to consider whether it is
Michael Duncan, Kristie Miller
openaire +1 more source
PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
Computer Vision and Pattern Recognition, 2021Camera and LiDAR are two complementary sensors for 3D object detection in the autonomous driving context. Camera provides rich texture and color cues while LiDAR specializes in relative distance sensing.
Chunwei Wang+3 more
semanticscholar +1 more source
1991
In part I. of this essay we attempt to articulate Husserl’s phenomenological descriptions for the genesis of the primitive logical connectives, negation and disjunction. In part II. we describe possible worlds models for the use of disjunction and negation in epistemic contexts and contexts relating to the analysis of meanings. Finally, in part III, we
Jaakko Hintikka, Charles W. Harvey
openaire +2 more sources
In part I. of this essay we attempt to articulate Husserl’s phenomenological descriptions for the genesis of the primitive logical connectives, negation and disjunction. In part II. we describe possible worlds models for the use of disjunction and negation in epistemic contexts and contexts relating to the analysis of meanings. Finally, in part III, we
Jaakko Hintikka, Charles W. Harvey
openaire +2 more sources
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
arXiv.orgIn the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding.
Chaoyou Fu+19 more
semanticscholar +1 more source
Cross-modal Ambiguity Learning for Multimodal Fake News Detection
The Web Conference, 2022Cross-modal learning is essential to enable accurate fake news detection due to the fast-growing multimodal contents in online social communities. A fundamental challenge of multimodal fake news detection lies in the inherent ambiguity across different ...
Yixuan Chen+6 more
semanticscholar +1 more source
Chameleon: Mixed-Modal Early-Fusion Foundation Models
arXiv.orgWe present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence.
Chameleon Team, Jacob Kahn
semanticscholar +1 more source
Multi-modal Transformer for Video Retrieval
European Conference on Computer Vision, 2020The task of retrieving video content relevant to natural language queries plays a critical role in effectively handling internet-scale datasets. Most of the existing methods for this caption-to-video retrieval problem do not fully exploit cross-modal ...
Valentin Gabeur+3 more
semanticscholar +1 more source