Results 11 to 20 of about 157,275 (228)
Multiscale Vision Transformers [PDF]
Technical ...
Haoqi Fan 0001 +6 more
openaire +2 more sources
code: https://github.com/OpenNLPLab/Vicinity-Vision ...
Weixuan Sun +9 more
openaire +3 more sources
Transformers in Vision: A Survey [PDF]
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks, e.g.,
Salman H. Khan 0001 +5 more
openaire +2 more sources
Prior works have proposed several strategies to reduce the computational cost of self-attention mechanism. Many of these works consider decomposing the self-attention procedure into regional and local feature extraction procedures that each incurs a much smaller computational complexity.
Ting Yao +5 more
openaire +3 more sources
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing improved image recognition performance with various computational costs. Here, the trained ViT model, termed super vision
Mingbao Lin +6 more
openaire +2 more sources
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of-the-art results on many computer vision benchmarks. Scale is a primary ingredient in attaining excellent results, therefore, understanding a model's scaling properties is a key to designing future generations effectively.
Xiaohua Zhai +3 more
openaire +2 more sources
Building Extraction With Vision Transformer [PDF]
Submitted to ...
Libo Wang +3 more
openaire +2 more sources
Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions.
Juhong Min +3 more
openaire +3 more sources
Reversible Vision Transformers
We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures with efficient memory usage.
Karttikeya Mangalam +6 more
openaire +2 more sources
Vision Transformer with Progressive Sampling [PDF]
Accepted to ICCV ...
Xiaoyu Yue +6 more
openaire +3 more sources

