Results 241 to 250 of about 153,605 (299)
Some of the next articles are maybe not open access.
M3R: Masked Token Mixup and Cross-Modal Reconstruction for Zero-Shot Learning
ACM Multimedia, 2023In the zero-shot learning (ZSL), learned representation spaces are often biased toward seen classes, thus limiting the ability to predict previously unseen classes. In this paper, we propose Masked token Mixup and cross-Modal Reconstruction for zero-shot
Peng Zhao, Qiangchang Wang, Yilong Yin
semanticscholar +1 more source
Image Tagging via Cross-Modal Semantic Mapping
Proceedings of the 23rd ACM international conference on Multimedia, 2015Images without annotations are ubiquitous on the Internet, and recommending tags for them has become a challenging open task in image understanding. A common bottleneck of related work is the semantic gap between the image and text representations.
Zhi-Hong Deng, Hongliang Yu, Yunlun Yang
openaire +1 more source
Deep Semantic Mapping for Cross-Modal Retrieval
2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), 2015Cross-Modal mapping plays an essential role in multimedia information retrieval systems. However, most of existing work paid much attention on learning mapping functions but neglected the exploration of high-level semantic representation of modalities.
Cheng Wang +2 more
openaire +1 more source
Audio/visual mapping with cross-modal hidden Markov models
IEEE Transactions on Multimedia, 2005The audio/visual mapping problem of speech-driven facial animation has intrigued researchers for years. Recent research efforts have demonstrated that hidden Markov model (HMM) techniques, which have been applied successfully to the problem of speech recognition, could achieve a similar level of success in audio/visual mapping problems. A number of HMM-
FU S. +4 more
openaire +2 more sources
Persistent Stereo Visual Localization on Cross-Modal Invariant Map
IEEE Transactions on Intelligent Transportation Systems, 2020Autonomous mobile vehicles are expected to perform persistent and accurate localization with low-cost equipment. To achieve this goal, we propose a stereo camera based visual localization method using a modified laser map, which takes the advantage of both the low cost of camera, and high geometric precision of laser data to achieve long-term ...
Xiaqing Ding +6 more
openaire +2 more sources
DISPARITY MAP ESTIMATION FROM CROSS-MODAL STEREO
2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2018Mono-modal stereo matching problem has been studied for decades. The introduction of cross-modal stereo systems in industrial scene increases the interest in cross-modal stereo matching. The existing algorithms mostly consider mono-modal setting so they do not translate well in cross-modal setting.
Thapanapong Rukkanchanunt +3 more
openaire +1 more source
IEEE Transactions on Medical Imaging, 2021
Cell or nucleus detection is a fundamental task in microscopy image analysis and has recently achieved state-of-the-art performance by using deep neural networks. However, training supervised deep models such as convolutional neural networks (CNNs) usually requires sufficient annotated image data, which is prohibitively expensive or unavailable in some
Fuyong Xing +3 more
openaire +2 more sources
Cell or nucleus detection is a fundamental task in microscopy image analysis and has recently achieved state-of-the-art performance by using deep neural networks. However, training supervised deep models such as convolutional neural networks (CNNs) usually requires sufficient annotated image data, which is prohibitively expensive or unavailable in some
Fuyong Xing +3 more
openaire +2 more sources
Coupled dictionary learning and feature mapping for cross-modal retrieval
2015 IEEE International Conference on Multimedia and Expo (ICME), 2015In this paper, we investigate the problem of modeling images and associated text for cross-modal retrieval tasks such as text-to-image search and image-to-text search. To make the data from image and text modalities comparable, previous cross-modal retrieval methods directly learn two projection matrices to map the raw features of the two modalities ...
Xing Xu +3 more
openaire +1 more source
Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning
ACM MultimediaEmotion alignment between music and palettes is crucial for effective multimedia content, yet misalignment creates confusion that weakens the intended message.
Jiayun Hu +4 more
semanticscholar +1 more source
Cross-Modal Dual Learning for Sentence-to-Video Generation
ACM Multimedia, 2019Automatic content generation has become an attractive while challenging topic in the past decade. Generating videos from sentences particularly poses great challenges to the multimedia community due to its multi-modal characteristics in essence, e.g ...
Yue Liu +3 more
semanticscholar +1 more source

