Cross-modality gap - Open Access .click

Results 11 to 20 of about 36,711 (232)

Multi‐granularity re‐ranking for visible‐infrared person re‐identification

CAAI Transactions on Intelligence Technology, 2023
Visible‐infrared person re‐identification (VI‐ReID) is a supplementary task of single‐modality re‐identification, which makes up for the defect of conventional re‐identification under insufficient illumination. It is more challenging than single‐modality
Yadi Wang +3 more
doaj +1 more source

Bridging Modality Gap for Visual Grounding with Effecitve Cross-Modal Distillation

, 2023
Visual grounding aims to align visual information of specific regions of images with corresponding natural language expressions. Current visual grounding methods leverage pre-trained visual and language backbones independently to obtain visual features and linguistic features.
Wang, Jiaxi +5 more
openaire +2 more sources

Cross-Modal Retrieval via Similarity-Preserving Learning and Semantic Average Embedding

IEEE Access, 2020
Cross-modal retrieval takes one modality data as the query to search related data from different modalities (e.g. images vs. texts). As the heterogeneous gap exists between different media data, mainstream methods focus on reducing modality gap using ...
Tao Zhi, Yingchun Fan, Hong Han
doaj +1 more source

Survey of Cross-Modal Person Re-Identification from a Mathematical Perspective

Mathematics, 2023
Person re-identification (Re-ID) aims to retrieve a particular pedestrian’s identification from a surveillance system consisting of non-overlapping cameras.
Minghui Liu, Yafei Zhang, Huafeng Li
doaj +1 more source

Cross-Modality Person Re-Identification via Local Paired Graph Attention Network

Sensors, 2023
Cross-modality person re-identification (ReID) aims at searching a pedestrian image of RGB modality from infrared (IR) pedestrian images and vice versa.
Jianglin Zhou +4 more
doaj +1 more source

Cross-modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Subspace Learning

Proceedings of the 29th ACM International Conference on Multimedia, 2021
Computational food analysis (CFA) naturally requires multi-modal evidence of a particular food, e.g., images, recipe text, etc. A key to making CFA possible is multi-modal shared representation learning, which aims to create a joint representation of the multiple views (text and image) of the data. In this work we propose a method for food domain cross-
Ricardo Guerrero, Hai X. Pham, Vladimir Pavlovic +2 more
openaire +3 more sources

Unsupervised Domain Adaptation Network With Category-Centric Prototype Aligner for Biomedical Image Segmentation

IEEE Access, 2021
With the widespread success of deep learning in biomedical image segmentation, domain shift becomes a critical and challenging problem, as the gap between two domains can severely affect model performance when deployed to unseen data with heterogeneous ...
Ping Gong +4 more
doaj +1 more source

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model [PDF]

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Nowadays, driven by the increasing concern on diet and health, food computing has attracted enormous attention from both industry and research community. One of the most popular research topics in this domain is Food Retrieval, due to its profound influence on health-oriented applications.
Fu, Han, Wu, Rui, Liu, Chenghao, Sun, Jianling +3 more
openaire +2 more sources

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

Connection Science, 2023
Food-oriented cross-modal retrieval aims to retrieve relevant recipes given food images or vice versa. The modality semantic gap between recipes and food images (text and image modalities) is the main challenge.
Wenyu Zhao +4 more
doaj +1 more source

Multi‐level cross‐modality learning framework for text‐based person re‐identification

Electronics Letters, 2023
The target of text‐based person re‐identification (Re‐ID) is to retrieve the corresponding image of a person through the given text information. However, due to the homogeneous variety and modality heterogeneity, it is challenging to simultaneously learn
Tinghui Wu, Shuhe Zhang, Dihu Chen, Haifeng Hu +3 more
doaj +1 more source

fos: computer and information sciences
computer vision and pattern recognition cs.cv
computer science - computation and language

machine learning cs.lg
computation and language cs.cl
computer science - machine learning

cross-modal retrieval
information retrieval cs.ir
computer science - information retrieval