Results 11 to 20 of about 172,675 (276)
A Survey on Vision Transformer [PDF]
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to apply transformer to computer vision tasks.
Kai Han +12 more
openaire +2 more sources
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades.
Oumaima Moutik +6 more
doaj +1 more source
Transformer-based ripeness segmentation for tomatoes
With the recent development of computer vision technology, various computer vision techniques have been applied to agriculture. Recently, the Transformer network has been introduced to image recognition, which allows a different approach to extracting ...
Risa Shinoda +3 more
doaj +1 more source
Reversible Vision Transformers
We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures with efficient memory usage.
Mangalam, Karttikeya +6 more
openaire +2 more sources
Building Extraction With Vision Transformer [PDF]
Submitted to ...
Libo Wang +3 more
openaire +2 more sources
Transformer architectures for computer vision: A comprehensive review and future research directions [PDF]
Long-range dependencies and contextual relationships in videos were captured by using Convolutional Neural Networks (CNNs) in past. Recently the use of Transformers is started for capturing the long-range dependencies and contextual relationships in ...
Ugile Tukaram, Uke Nilesh
doaj +1 more source
Vision Transformer with Progressive Sampling [PDF]
Accepted to ICCV ...
Yue, X +6 more
openaire +3 more sources
Semi-supervised Vision Transformers
We study the training of Vision Transformers for semi-supervised image classification. Transformers have recently demonstrated impressive performance on a multitude of supervised learning tasks. Surprisingly, we show Vision Transformers perform significantly worse than Convolutional Neural Networks when only a small set of labeled data is available ...
Zejia Weng +4 more
openaire +2 more sources
Measuring miniature eye movements by means of a SQUID magnetometer [PDF]
A new technique to measure small eye movements is reported. The precise recording of human eye movements is necessary for research on visual fatigue induced by visual display units.1 So far all methods used have disadvantages: especially those which are ...
Breukink, E.W. +4 more
core +4 more sources
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of-the-art results on many computer vision benchmarks. Scale is a primary ingredient in attaining excellent results, therefore, understanding a model's scaling properties is a key to designing future generations effectively.
Zhai, Xiaohua +3 more
openaire +2 more sources

