Visual representation - Open Access .click

Results 11 to 20 of about 4,996,743 (329)

Explainable Malware Detection System Using Transformers-Based Transfer Learning and Multi-Model Visual Representation. [PDF]

Sensors (Basel), 2022
Android has become the leading mobile ecosystem because of its accessibility and adaptability. It has also become the primary target of widespread malicious apps. This situation needs the immediate implementation of an effective malware detection system.
Ullah F +5 more
europepmc +2 more sources

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale [PDF]

Computer Vision and Pattern Recognition, 2022
We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features conditioned on
Yuxin Fang +8 more
semanticscholar +1 more source

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding [PDF]

Computer Vision and Pattern Recognition, 2023
Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multi-modal conversations.
Peng Jin +4 more
semanticscholar +1 more source

R3M: A Universal Visual Representation for Robot Manipulation [PDF]

Conference on Robot Learning, 2022
We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of downstream robotic manipulation tasks.
Suraj Nair +4 more
semanticscholar +1 more source

Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning [PDF]

Neural Information Processing Systems, 2022
Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis ...
Fuying Wang +4 more
semanticscholar +1 more source

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [PDF]

Computer Vision and Pattern Recognition, 2021
While accurate lip synchronization has been achieved for arbitrary-subject audio-driven talking face generation, the problem of how to efficiently drive the head pose remains.
Hang Zhou +5 more
semanticscholar +1 more source

Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning [PDF]

Computer Vision and Pattern Recognition, 2023
Self-supervised learning (SSL) has made remarkable progress in visual representation learning. Some studies combine SSL with knowledge distillation (SSL-KD) to boost the representation learning performance of small models.
Kaiyou Song, Jin Xie, Shanyi Zhang, Zimeng Luo +3 more
semanticscholar +1 more source

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering [PDF]

Neural Information Processing Systems, 2022
This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in
Yuanze Lin +5 more
semanticscholar +1 more source

Offline Visual Representation Learning for Embodied Navigation [PDF]

arXiv.org, 2022
How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g ...
Karmesh Yadav +7 more
semanticscholar +1 more source

Momentum Contrast for Unsupervised Visual Representation Learning [PDF]

Computer Vision and Pattern Recognition, 2019
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large
Kaiming He +4 more
semanticscholar +1 more source

humans
visual perception
16. peace & justice

neuroscience
5. gender equality
computer vision

4. education