Results 131 to 140 of about 18,617 (276)
Large language and vision models have transformed how social movements scholars identify protest and extract key protest attributes from multi-modal data such as texts, images, and videos.
Zhang, Yongjun
core
Swin Transformer-Based Dynamic Semantic Communication for Multi-User with Different Computing Capacity [PDF]
Loc X. Nguyen +6 more
openalex +1 more source
The Swin‐Transformer is a variant of the Vision Transformer, which constructs a hierarchical Transformer that computes representations with shifted windows and window multi‐head self‐attention.
Yixuan Xu +3 more
doaj +1 more source
Transformers meet CNNs for insights into breast mass classification from histopathological images
IntroductionBreast cancer remains one of the leading causes of cancer-related deaths among women worldwide, highlighting the critical need for accurate histopathological diagnosis and reliable decision-support systems to improve diagnostic sensitivity ...
Vatsala Anand, Ajay Khajuria
doaj +1 more source
YotoR-You Only Transform One Representation
This paper introduces YotoR (You Only Transform One Representation), a novel deep learning model for object detection that combines Swin Transformers and YoloR architectures.
Loncomilla, Patricio +2 more
core
Among the current mainstream change detection networks, transformer is deficient in the ability to capture accurate low-level details, while convolutional neural network (CNN) is wanting in the capacity to understand global information and establish ...
Liu, Jia +3 more
core
Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the ...
Rong-Xing Ding +4 more
doaj +1 more source
Efficient Wheat Disease Identification Using Hybrid Swin-SHARP Vision Model
Accurate identification of wheat diseases is an essential component for increasing crop yields and guaranteeing global food security. However, subjective opinions, errors, and laborious procedures frequently limit traditional approaches, which are based ...
Waqar Khalid +3 more
doaj +1 more source
B-Cos Aligned Transformers Learn Human-Interpretable Features
Vision Transformers (ViTs) and Swin Transformers (Swin) are currently state-of-the-art in computational pathology. However, domain experts are still reluctant to use these models due to their lack of interpretability.
Boxberg, Melanie +9 more
core

