Results 121 to 130 of about 14,679 (231)

Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition

open access: yes
Swin-Transformer has demonstrated remarkable success in computer vision by leveraging its hierarchical feature representation based on Transformer. In speech signals, emotional information is distributed across different scales of speech features, e.\,g.,
Lian, Hailun   +6 more
core  

SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection

open access: yes, 2023
Among the current mainstream change detection networks, transformer is deficient in the ability to capture accurate low-level details, while convolutional neural network (CNN) is wanting in the capacity to understand global information and establish ...
Liu, Jia   +3 more
core  

LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News

open access: yes, 2023
Large language and vision models have transformed how social movements scholars identify protest and extract key protest attributes from multi-modal data such as texts, images, and videos.
Zhang, Yongjun
core  

Efficient Wheat Disease Identification Using Hybrid Swin-SHARP Vision Model

open access: yesIEEE Access
Accurate identification of wheat diseases is an essential component for increasing crop yields and guaranteeing global food security. However, subjective opinions, errors, and laborious procedures frequently limit traditional approaches, which are based ...
Waqar Khalid   +3 more
doaj   +1 more source

YotoR-You Only Transform One Representation

open access: yes
This paper introduces YotoR (You Only Transform One Representation), a novel deep learning model for object detection that combines Swin Transformers and YoloR architectures.
Loncomilla, Patricio   +2 more
core  

Sounds like gambling : detection of gambling venue visitation from sounds in gamblers’ environments using a transformer [PDF]

open access: yes
Objective digital measurement of gamblers visiting gambling venues is conducted using cashless cards and facial recognition systems, but these methods are confined within a single gambling venue.
304901/profile-ja.html   +12 more
core  

Swin Transformer Fusion Network for Image Quality Assessment

open access: yesIEEE Access
This paper presents an efficient deep-learning model named Swin Transformer fusion network (STFN) for full-reference image quality assessment (FR-IQA). The STFN model uses the first and second stages of the Swin Transformer for feature extraction.
Hyeongmyeon Kim, Changhoon Yim
doaj   +1 more source

DCHT: Deep Complex Hybrid Transformer for Speech Enhancement

open access: yes, 2023
Most of the current deep learning-based approaches for speech enhancement only operate in the spectrogram or waveform domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be ...
Li, Jialu   +3 more
core  

PPLA-Transformer: An Efficient Transformer for Defect Detection with Linear Attention Based on Pyramid Pooling

open access: yesSensors
Defect detection is crucial for quality control in industrial products. The defects in industrial products are typically subtle, leading to reduced accuracy in detection.
Xiaona Song   +4 more
doaj   +1 more source

Railway Signal Relay Voiceprint Fault Diagnosis Method Based on Swin-Transformer and Fusion of Gaussian-Laplacian Pyramid

open access: yesMathematics
Fault diagnosis of railway signal relays is crucial for the operational safety and efficiency of railway systems. With the continuous advancement of deep learning techniques in various applications, voiceprint-based fault diagnosis has emerged as a ...
Yi Liu   +4 more
doaj   +1 more source

Home - About - Disclaimer - Privacy