Explainable Malware Detection System Using Transformers-Based Transfer Learning and Multi-Model Visual Representation. [PDF]
Android has become the leading mobile ecosystem because of its accessibility and adaptability. It has also become the primary target of widespread malicious apps. This situation needs the immediate implementation of an effective malware detection system.
Ullah F +5 more
europepmc +2 more sources
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale [PDF]
We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features conditioned on
Yuxin Fang +8 more
semanticscholar +1 more source
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding [PDF]
Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multi-modal conversations.
Peng Jin +4 more
semanticscholar +1 more source
R3M: A Universal Visual Representation for Robot Manipulation [PDF]
We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of downstream robotic manipulation tasks.
Suraj Nair +4 more
semanticscholar +1 more source
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning [PDF]
Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis ...
Fuying Wang +4 more
semanticscholar +1 more source
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [PDF]
While accurate lip synchronization has been achieved for arbitrary-subject audio-driven talking face generation, the problem of how to efficiently drive the head pose remains.
Hang Zhou +5 more
semanticscholar +1 more source
Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning [PDF]
Self-supervised learning (SSL) has made remarkable progress in visual representation learning. Some studies combine SSL with knowledge distillation (SSL-KD) to boost the representation learning performance of small models.
Kaiyou Song +3 more
semanticscholar +1 more source
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering [PDF]
This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in
Yuanze Lin +5 more
semanticscholar +1 more source
Offline Visual Representation Learning for Embodied Navigation [PDF]
How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g ...
Karmesh Yadav +7 more
semanticscholar +1 more source
Momentum Contrast for Unsupervised Visual Representation Learning [PDF]
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large
Kaiming He +4 more
semanticscholar +1 more source

