AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models [PDF]
Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information.
Yuan Tseng +18 more
semanticscholar +1 more source
MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning [PDF]
Placement is an essential task in modern chip design, aiming at placing millions of circuit modules on a 2D chip canvas. Unlike the human-centric solution, which requires months of intense effort by hardware engineers to produce a layout to minimize ...
Yao Lai, Yao Mu, Ping Luo
semanticscholar +1 more source
Causal Reasoning Meets Visual Representation Learning: A Prospective Study [PDF]
Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing.
Liu Y, Wei Y, Yan H, Li G, Lin L.
europepmc +3 more sources
Reading-Strategy Inspired Visual Representation Learning for Text-to-Video Retrieval [PDF]
This paper aims for the task of text-to-video retrieval, where given a query in the form of a natural-language sentence, it is asked to retrieve videos which are semantically relevant to the given query, from a great number of unlabeled videos.
Jianfeng Dong +6 more
semanticscholar +1 more source
Deep High-Resolution Representation Learning for Visual Recognition [PDF]
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Jingdong Wang +11 more
semanticscholar +1 more source
Exploring Simple Siamese Representation Learning [PDF]
Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing ...
Xinlei Chen, Kaiming He
semanticscholar +1 more source
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning [PDF]
Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks, while the self-attention computation in Transformer scales quadratically w.r.t. the input patch number.
Ting Yao +4 more
semanticscholar +1 more source
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning [PDF]
Contrastive learning methods for unsupervised visual representation learning have reached remarkable levels of transfer performance. We argue that the power of contrastive learning has yet to be fully unleashed, as current methods are trained only on ...
Zhenda Xie +5 more
semanticscholar +1 more source
Self-Supervised Visual Representation Learning with Semantic Grouping [PDF]
In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely ...
Xin Wen +4 more
semanticscholar +1 more source
Collaborative Unsupervised Visual Representation Learning from Decentralized Data [PDF]
Unsupervised representation learning has achieved outstanding performances using centralized data available on the Internet. However, the increasing awareness of privacy protection limits sharing of decentralized unlabeled image data that grows ...
Weiming Zhuang +4 more
semanticscholar +1 more source

