Results 51 to 60 of about 15,271 (229)
BMPCQA: Bioinspired Metaverse Point Cloud Quality Assessment Based on Large Multimodal Models
This study presents a bioinspired metaverse point cloud quality assessment metric, which simulates the human visual evaluation process to perform the point cloud quality assessment task. It first extracts rendering projection video features, normal image features, and point cloud patch features, which are then fed into a large multimodal model to ...
Huiyu Duan +7 more
wiley +1 more source
Easy and efficient acquisition of high-resolution remote sensing images is of importance in geographic information systems. Previously, deep neural networks composed of convolutional layers have achieved impressive progress in super-resolution ...
Jingzhi Tu +3 more
doaj +1 more source
Semantic-Aware Local-Global Vision Transformer
Vision Transformers have achieved remarkable progresses, among which Swin Transformer has demonstrated the tremendous potential of Transformer for vision tasks.
Chen, Fanglin +4 more
core
Swin-FER: Swin Transformer for Facial Expression Recognition
The ability of transformers to capture global context information is highly beneficial for recognizing subtle differences in facial expressions. However, compared to convolutional neural networks, transformers require the computation of dependencies between each element and all other elements, leading to high computational complexity. Additionally, the
Mei Bie +4 more
openaire +2 more sources
Source Microphone Identification Using Swin Transformer
Microphone identification is a crucial challenge in the field of digital audio forensics. The ability to accurately identify the type of microphone used to record a piece of audio can provide important information for forensic analysis and crime investigations.
Mustafa Qamhan +2 more
openaire +2 more sources
Variational Autoencoder+Deep Deterministic Policy Gradient addresses low‐light failures of infrared depth sensing for indoor robot navigation. Stage 1 pretrains an attention‐enhanced Variational Autoencoder (Convolutional Block Attention Module+Feature Pyramid Network) to map dark depth frames to a well‐lit reconstruction, yielding a 128‐D latent code ...
Uiseok Lee +7 more
wiley +1 more source
Multi-Focus Microscopy Image Fusion Based on Swin Transformer Architecture
In this study, we introduce the U-Swin fusion model, an effective and efficient transformer-based architecture designed for the fusion of multi-focus microscope images.
Han Hank Xia +4 more
doaj +1 more source
Pattern Attention Transformer with Doughnut Kernel
We present in this paper a new architecture, the Pattern Attention Transformer (PAT), that is composed of the new doughnut kernel. Compared with tokens in the NLP field, Transformer in computer vision has the problem of handling the high resolution of ...
Sheng, WenYuan
core
KDLM: Lightweight Brain Tumor Segmentation via Knowledge Distillation
A lightweight student network is designed, which is based on multiscale and multilevel feature fusion and combined with the residual channel attention mechanism to achieve efficient feature extraction and fusion with very few parameters. A dual‐teacher collaborative knowledge distillation framework is proposed.
Baotian Li +4 more
wiley +1 more source
HEAL-SWIN: A Vision Transformer On The Sphere
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to ...
Carlsson, Oscar +6 more
core

