InfoMSD: an information-maximization self-distillation framework for parameter-efficient fine-tuning on artwork images [PDF]
In recent years, despite the remarkable performance of large-scale vision language models across various visual classification tasks, their substantial parameter counts and high fine-tuning costs have hindered deployment in resource-constrained cultural ...
Feng Guan +3 more
doaj +2 more sources
Toward Generalized Multistage Clustering: Multiview Self-Distillation
Existing multi-stage clustering methods independently learn the salient features from multiple views and then perform the clustering task. Particularly, multi-view clustering (MVC) has attracted a lot of attention in multi-view or multi-modal scenarios.
Jiatai Wang, Zhiwei Xu, Xin Wang, Tao Li
openaire +4 more sources
SkillFactory: Self-Distillation For Learning Cognitive Behaviors [PDF]
Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base language model exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them.
Zayne Sprague +5 more
openalex +3 more sources
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training [PDF]
Vision-Language Models (VLMs) trained with contrastive loss have achieved significant advancements in various vision and language tasks. However, the global nature of the contrastive loss makes VLMs focus predominantly on foreground objects, neglecting ...
Sanghwan Kim +4 more
openalex +2 more sources
Leave No One Behind: Online Self-Supervised Self-Distillation for Sequential Recommendation [PDF]
Sequential recommendation methods play a pivotal role in modern recommendation systems. A key challenge lies in accurately modeling user preferences in the face of data sparsity. To tackle this challenge, recent methods leverage contrastive learning (CL)
Shaowei Wei +7 more
semanticscholar +3 more sources
Self-Distillation Improves DNA Sequence Inference [PDF]
Self-supervised pretraining (SSP) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSP approaches in genomics focus on masked language modeling of individual sequences ...
Tong Yu +4 more
openalex +3 more sources
Self-Distillation for Unsupervised 3D Domain Adaptation [PDF]
Adriano Cardace +4 more
openalex +4 more sources
A Teacher-Free Graph Knowledge Distillation Framework With Dual Self-Distillation
Recent years have witnessed great success in handling graph-related tasks with Graph Neural Networks (GNNs). Despite their great academic success, Multi-Layer Perceptrons (MLPs) remain the primary workhorse for practical industrial applications.
Lirong Wu +4 more
semanticscholar +3 more sources
Person re-identification based on multi-branch visual transformer and self-distillation. [PDF]
Chen W, Yin K, Wu Y, Hu Y.
europepmc +3 more sources
Understanding the Gains from Repeated Self-Distillation
Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model. Despite using the same architecture and the same training data, self-distillation has been empirically observed to ...
Divyansh Pareek, S. S. Du, Sewoong Oh
semanticscholar +3 more sources

