Self-distillation - Open Access .click

Results 321 to 330 of about 4,930,132 (374)

Some of the next articles are maybe not open access.

Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

Annual Meeting of the Association for Computational Linguistics
The surge in Large Language Models (LLMs) has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities.
Zhaorui Yang +6 more
semanticscholar +1 more source

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

Annual Meeting of the Association for Computational Linguistics
The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges.
Dayou Du +6 more
semanticscholar +1 more source

Self-Distillation for Gaussian Process Models [PDF]

, 2023
We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher ...
Borup, Kenneth, Andersen, Lars Nørvang
openaire +1 more source

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Computer Vision and Pattern Recognition
Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific concept in novel contexts, i.e., "identity-preserving ...
Shengqu Cai +5 more
semanticscholar +1 more source

Multivariate Correlation Self-Distillation Transformer for Time Series Forecasting With Incomplete Data

IEEE Transactions on Industrial Informatics
Multivariate time series forecasting estimates future development by capturing variable relationships and constructing temporal regular, which is widely used in many scenarios, including industrial production, economic development, and disease prediction.
Xiang Li +6 more
semanticscholar +1 more source

Slot Attention with Re-Initialization and Self-Distillation

ACM Multimedia
Unlike popular solutions based on dense feature maps, Object-Centric Learning (OCL) represents visual scenes as sub-symbolic object-level feature vectors, termed slots, which are highly versatile for tasks involving visual modalities.
Rongzhen Zhao, Yi Zhao, Juho Kannala, J. Pajarinen +3 more
semanticscholar +1 more source

Towards One-step Causal Video Generation via Adversarial Self-Distillation

arXiv.org
Recent hybrid video generation models combine autoregressive temporal dynamics with diffusion-based spatial denoising, but their sequential, iterative nature leads to error accumulation and long inference times.
Yongqi Yang +7 more
semanticscholar +1 more source

Learning Critically: Selective Self-Distillation in Federated Learning on Non-IID Data

IEEE Transactions on Big Data
Federated learning (FL) enables multiple clients to collaboratively train a global model while keeping local data decentralized. Data heterogeneity (non-IID) across clients has imposed significant challenges to FL, which makes local models re-optimize ...
Yuting He +5 more
semanticscholar +1 more source

Masked Self-Distillation Domain Adaptation for Hyperspectral Image Classification

IEEE Transactions on Geoscience and Remote Sensing
Deep learning-based unsupervised domain adaptation (UDA) has shown potential in cross-scene hyperspectral image (HSI) classification. However, existing methods often experience reduced feature discriminability during domain alignment due to the ...
Zhuoqun Fang +4 more
semanticscholar +1 more source

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

International Conference on Learning Representations
Autoregressive (AR) Large Language Models (LLMs) have demonstrated significant success across numerous tasks. However, the AR modeling paradigm presents certain limitations; for instance, contemporary autoregressive LLMs are trained to generate one token
Justin Deschenaux, Caglar Gulcehre
semanticscholar +1 more source

machine learning
distillation
chemistry

fos: computer and information sciences
pattern recognition psychology
computer vision and pattern recognition cs.cv

engineering
computer vision
linguistics