Results 321 to 330 of about 4,930,132 (374)
Some of the next articles are maybe not open access.
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
Annual Meeting of the Association for Computational LinguisticsThe surge in Large Language Models (LLMs) has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities.
Zhaorui Yang +6 more
semanticscholar +1 more source
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
Annual Meeting of the Association for Computational LinguisticsThe upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges.
Dayou Du +6 more
semanticscholar +1 more source
Self-Distillation for Gaussian Process Models [PDF]
We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher ...
Borup, Kenneth, Andersen, Lars Nørvang
openaire +1 more source
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Computer Vision and Pattern RecognitionText-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific concept in novel contexts, i.e., "identity-preserving ...
Shengqu Cai +5 more
semanticscholar +1 more source
IEEE Transactions on Industrial Informatics
Multivariate time series forecasting estimates future development by capturing variable relationships and constructing temporal regular, which is widely used in many scenarios, including industrial production, economic development, and disease prediction.
Xiang Li +6 more
semanticscholar +1 more source
Multivariate time series forecasting estimates future development by capturing variable relationships and constructing temporal regular, which is widely used in many scenarios, including industrial production, economic development, and disease prediction.
Xiang Li +6 more
semanticscholar +1 more source
Slot Attention with Re-Initialization and Self-Distillation
ACM MultimediaUnlike popular solutions based on dense feature maps, Object-Centric Learning (OCL) represents visual scenes as sub-symbolic object-level feature vectors, termed slots, which are highly versatile for tasks involving visual modalities.
Rongzhen Zhao +3 more
semanticscholar +1 more source
Towards One-step Causal Video Generation via Adversarial Self-Distillation
arXiv.orgRecent hybrid video generation models combine autoregressive temporal dynamics with diffusion-based spatial denoising, but their sequential, iterative nature leads to error accumulation and long inference times.
Yongqi Yang +7 more
semanticscholar +1 more source
Learning Critically: Selective Self-Distillation in Federated Learning on Non-IID Data
IEEE Transactions on Big DataFederated learning (FL) enables multiple clients to collaboratively train a global model while keeping local data decentralized. Data heterogeneity (non-IID) across clients has imposed significant challenges to FL, which makes local models re-optimize ...
Yuting He +5 more
semanticscholar +1 more source
Masked Self-Distillation Domain Adaptation for Hyperspectral Image Classification
IEEE Transactions on Geoscience and Remote SensingDeep learning-based unsupervised domain adaptation (UDA) has shown potential in cross-scene hyperspectral image (HSI) classification. However, existing methods often experience reduced feature discriminability during domain alignment due to the ...
Zhuoqun Fang +4 more
semanticscholar +1 more source
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
International Conference on Learning RepresentationsAutoregressive (AR) Large Language Models (LLMs) have demonstrated significant success across numerous tasks. However, the AR modeling paradigm presents certain limitations; for instance, contemporary autoregressive LLMs are trained to generate one token
Justin Deschenaux, Caglar Gulcehre
semanticscholar +1 more source

