Results 11 to 20 of about 96,008 (322)
Transformers for Vision: A Survey on Innovative Methods for Computer Vision
Transformers have emerged as a groundbreaking architecture in the field of computer vision, offering a compelling alternative to traditional convolutional neural networks (CNNs) by enabling the modeling of long-range dependencies and global context ...
Balamurugan Palanisamy +7 more
doaj +3 more sources
A Survey on Vision Transformer [PDF]
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to apply transformer to computer vision tasks.
Kai Han 0002 +12 more
openaire +2 more sources
Person Re-Identification is an essential task in computer vision, particularly in surveillance applications. The aim is to identify a person based on an input image from surveillance photographs in various scenarios.
Muhammad Tahir, Saeed Anwar
doaj +1 more source
Multiscale Vision Transformers [PDF]
Technical ...
Haoqi Fan 0001 +6 more
openaire +2 more sources
code: https://github.com/OpenNLPLab/Vicinity-Vision ...
Weixuan Sun +9 more
openaire +3 more sources
Multi-Manifold Attention for Vision Transformers
Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through highly descriptive
Dimitrios Konstantinidis +3 more
doaj +1 more source
Prior works have proposed several strategies to reduce the computational cost of self-attention mechanism. Many of these works consider decomposing the self-attention procedure into regional and local feature extraction procedures that each incurs a much smaller computational complexity.
Ting Yao +5 more
openaire +3 more sources
Transformers in Remote Sensing: A Survey
Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade. Recently, transformer-based architectures, originally introduced in natural language processing, have pervaded ...
Abdulaziz Amer Aleissaee +6 more
doaj +1 more source
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing improved image recognition performance with various computational costs. Here, the trained ViT model, termed super vision
Mingbao Lin +6 more
openaire +2 more sources
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of-the-art results on many computer vision benchmarks. Scale is a primary ingredient in attaining excellent results, therefore, understanding a model's scaling properties is a key to designing future generations effectively.
Xiaohua Zhai +3 more
openaire +2 more sources

