Shape retrieval by using multi-scale angle-based representation and dynamic label propagation
To improve the robustness and discrimination power of the triangle-area representation, a novel shape matching method based on multi-scale angle representation is proposed in this study.
Yanxia Yu +6 more
doaj +1 more source
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [PDF]
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from
Junnan Li +3 more
semanticscholar +1 more source
Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints [PDF]
In the last years, the computer vision research community has studied on how to model temporal dynamics in videos to employ 3D human action recognition.
C. Caetano, F. Brémond, W. R. Schwartz
semanticscholar +1 more source
Learning the representation of instrument images in laparoscopy videos
Automatic recognition of instruments in laparoscopy videos poses many challenges that need to be addressed, like identifying multiple instruments appearing in various representations and in different lighting conditions, which in turn may be occluded by ...
Sabrina Kletz +2 more
doaj +1 more source
Parallax‐based second‐order mixed attention for stereo image super‐resolution
Stereo image pairs can effectively enhance the performance of super‐resolution (SR) since both intra‐view and cross‐view information can be used. However, exploiting cross‐view information accurately is extremely challenging.
Chenyang Duan, Nanfeng Xiao
doaj +1 more source
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis [PDF]
Generative modeling and representation learning are two key tasks in computer vision. However, these models are typically trained independently, which ignores the potential for each task to help the other, and leads to training and model maintenance ...
Tianhong Li +5 more
semanticscholar +1 more source
Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images
Image forensic datasets need to accommodate a complex diversity of systematic noise and intrinsic image artefacts to prevent any overfitting of learning methods to a small set of camera types or manipulation techniques.
Adam Novozámský +2 more
doaj +1 more source
Deep High-Resolution Representation Learning for Visual Recognition [PDF]
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Jingdong Wang +11 more
semanticscholar +1 more source
The advancements in image super-resolution technology have led to its widespread use in remote sensing applications. However, there is currently a lack of a general solution for the reconstruction of satellite images at arbitrary resolutions.
Tai An +3 more
doaj +1 more source
Application research on improved CGAN in image raindrop removal
Rainy weather can greatly reduce the image quality and hinder the subsequent processing of the image. In order to achieve raindrop removal on rainy images, the single image raindrop removal method based on conditional generative adversarial networks ...
Min Zhu +4 more
doaj +1 more source

