Results 1 to 10 of about 651,619 (74)

Decomposition Methods for Large Scale LP Decoding [PDF]

open access: green, 2013
Siddharth Barman   +3 more
openalex   +1 more source

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding [PDF]

open access: yesarXiv
To mitigate the high inference latency stemming from autoregressive decoding in Large Language Models (LLMs), Speculative Decoding has emerged as a novel decoding paradigm for LLM inference. In each decoding step, this method first drafts several future tokens efficiently and then verifies them in parallel.
arxiv  

Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation [PDF]

open access: yesarXiv
Speculative decoding stands as a pivotal technique to expedite inference in autoregressive (large) language models. This method employs a smaller draft model to speculate a block of tokens, which the target model then evaluates for acceptance. Despite a wealth of studies aimed at increasing the efficiency of speculative decoding, the influence of ...
arxiv  

Two Methods for Decreasing the Computational Complexity of the MIMO ML Decoder

open access: green, 2004
Takayuki Fukatani   +2 more
openalex   +2 more sources

Home - About - Disclaimer - Privacy