Multi‐granularity re‐ranking for visible‐infrared person re‐identification
Visible‐infrared person re‐identification (VI‐ReID) is a supplementary task of single‐modality re‐identification, which makes up for the defect of conventional re‐identification under insufficient illumination. It is more challenging than single‐modality
Yadi Wang +3 more
doaj +1 more source
Bridging Modality Gap for Visual Grounding with Effecitve Cross-Modal Distillation
Visual grounding aims to align visual information of specific regions of images with corresponding natural language expressions. Current visual grounding methods leverage pre-trained visual and language backbones independently to obtain visual features and linguistic features.
Wang, Jiaxi +5 more
openaire +2 more sources
Cross-Modal Retrieval via Similarity-Preserving Learning and Semantic Average Embedding
Cross-modal retrieval takes one modality data as the query to search related data from different modalities (e.g. images vs. texts). As the heterogeneous gap exists between different media data, mainstream methods focus on reducing modality gap using ...
Tao Zhi, Yingchun Fan, Hong Han
doaj +1 more source
Survey of Cross-Modal Person Re-Identification from a Mathematical Perspective
Person re-identification (Re-ID) aims to retrieve a particular pedestrian’s identification from a surveillance system consisting of non-overlapping cameras.
Minghui Liu, Yafei Zhang, Huafeng Li
doaj +1 more source
Cross-Modality Person Re-Identification via Local Paired Graph Attention Network
Cross-modality person re-identification (ReID) aims at searching a pedestrian image of RGB modality from infrared (IR) pedestrian images and vice versa.
Jianglin Zhou +4 more
doaj +1 more source
Cross-modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Subspace Learning
Computational food analysis (CFA) naturally requires multi-modal evidence of a particular food, e.g., images, recipe text, etc. A key to making CFA possible is multi-modal shared representation learning, which aims to create a joint representation of the multiple views (text and image) of the data. In this work we propose a method for food domain cross-
Ricardo Guerrero +2 more
openaire +3 more sources
With the widespread success of deep learning in biomedical image segmentation, domain shift becomes a critical and challenging problem, as the gap between two domains can severely affect model performance when deployed to unseen data with heterogeneous ...
Ping Gong +4 more
doaj +1 more source
MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model [PDF]
Nowadays, driven by the increasing concern on diet and health, food computing has attracted enormous attention from both industry and research community. One of the most popular research topics in this domain is Food Retrieval, due to its profound influence on health-oriented applications.
Fu, Han +3 more
openaire +2 more sources
Exploring latent weight factors and global information for food-oriented cross-modal retrieval
Food-oriented cross-modal retrieval aims to retrieve relevant recipes given food images or vice versa. The modality semantic gap between recipes and food images (text and image modalities) is the main challenge.
Wenyu Zhao +4 more
doaj +1 more source
Multi‐level cross‐modality learning framework for text‐based person re‐identification
The target of text‐based person re‐identification (Re‐ID) is to retrieve the corresponding image of a person through the given text information. However, due to the homogeneous variety and modality heterogeneity, it is challenging to simultaneously learn
Tinghui Wu +3 more
doaj +1 more source

