Vision transformer - Open Access .click

Results 31 to 40 of about 170,334 (258)

Proceedings of the AAAI Conference on Artificial Intelligence, 2022
Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy. What remains largely unexplored is their robustness evaluation and attribution.
Paul, Sayak, Chen, Pin-Yu
openaire +2 more sources

The Multiscale Surface Vision Transformer

ArXiv, 2023
Accepted for publication at MIDL 2024, 17 pages, 6 ...
Dahan, Simon +3 more
openaire +3 more sources

Privacy-Preserving Semantic Segmentation Using Vision Transformer

Journal of Imaging, 2022
In this paper, we propose a privacy-preserving semantic segmentation method that uses encrypted images and models with the vision transformer (ViT), called the segmentation transformer (SETR).
Hitoshi Kiya +3 more
doaj +1 more source

Recurrent Attentional Networks for Saliency Detection

, 2016
Convolutional-deconvolution networks can be adopted to perform end-to-end saliency detection. But, they do not work well with objects of multiple scales.
Kuen, Jason, Wang, Gang, Wang, Zhenhua
core +1 more source

Long Short-Term Memory Spatial Transformer Network

, 2019
Spatial transformer network has been used in a layered form in conjunction with a convolutional network to enable the model to transform data spatially.
Chen, Tianyue, Feng, Shiyang, Sun, Hao
core +1 more source

Sign language recognition with transformer networks [PDF]

, 2020
Sign languages are complex languages. Research into them is ongoing, supported by large video corpora of which only small parts are annotated. Sign language recognition can be used to speed up the annotation process of these corpora, in order to aid ...
Dambre, Joni +2 more
core

Peripheral Vision Transformer

, 2022
Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions.
Min, Juhong, Zhao, Yucheng, Luo, Chong, Cho, Minsu +3 more
openaire +2 more sources

V-LTCS: Backbone exploration for Multimodal Misogynous Meme detection

Natural Language Processing Journal
Memes have become a fundamental part of online communication and humour, reflecting and shaping the culture of today’s digital age. The amplified Meme culture is inadvertently endorsing and propagating casual Misogyny. This study proposes V-LTCS (Vision-
Sneha Chinivar +3 more
doaj +1 more source

GaitTriViT and GaitVViT: Transformer-based methods emphasizing spatial or temporal aspects in gait recognition [PDF]

PeerJ Computer Science
In image recognition tasks, subjects with long distances and low resolution remain a challenge, whereas gait recognition, identifying subjects by walking patterns, is considered one of the most promising biometric technologies due to its stability and ...
Hongyun Sheng
doaj +2 more sources

Modeling Image Virality with Pairwise Spatial Transformer Networks

, 2017
The study of virality and information diffusion online is a topic gaining traction rapidly in the computational social sciences. Computer vision and social network analysis research have also focused on understanding the impact of content and information
Agarwal, Sumeet, Dubey, Abhimanyu
core +1 more source

deep learning
transformer
image classification

artificial intelligence
computer vision
convolutional neural networks

swin transformer
vision transformers
transformers