Results 11 to 20 of about 947,558 (364)
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [PDF]
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such ...
Ze Liu +7 more
semanticscholar +1 more source
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [PDF]
Although convolutional neural networks (CNNs) have achieved great success in computer vision, this work investigates a simpler, convolution-free backbone network use-fid for many dense prediction tasks.
Wenhai Wang +8 more
semanticscholar +1 more source
SwinIR: Image Restoration Using Swin Transformer [PDF]
Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images).
Jingyun Liang +5 more
semanticscholar +1 more source
Restormer: Efficient Transformer for High-Resolution Image Restoration [PDF]
Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.
Syed Waqas Zamir +5 more
semanticscholar +1 more source
Masked-attention Mask Transformer for Universal Image Segmentation [PDF]
Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing spe-cialized architectures for ...
Bowen Cheng +4 more
semanticscholar +1 more source
ViViT: A Video Vision Transformer [PDF]
We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification. Our model extracts spatiotemporal tokens from the input video, which are then encoded by a series of transformer ...
Anurag Arnab +5 more
semanticscholar +1 more source
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting [PDF]
Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture ...
Haoyi Zhou +6 more
semanticscholar +1 more source
Swin Transformer V2: Scaling Up Capacity and Resolution [PDF]
We present techniques for scaling Swin Transformer [35] up to 3 billion parameters and making it capable of training with images of up to 1,536x1,536 resolution.
Ze Liu +11 more
semanticscholar +1 more source
PVT v2: Improved baselines with Pyramid Vision Transformer [PDF]
Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an ...
Wenhai Wang +8 more
semanticscholar +1 more source
Conformer: Convolution-augmented Transformer for Speech Recognition [PDF]
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).
Anmol Gulati +10 more
semanticscholar +1 more source

