Results 181 to 190 of about 263,258 (207)
Some of the next articles are maybe not open access.
IEEE International Symposium on Workload Characterization, 2016
Heterogeneous systems are ubiquitous in the field of High- Performance Computing (HPC). Graphics processing units (GPUs) are widely used as accelerators for their enormous computing potential and energy efficiency; furthermore, on-die integration of GPUs
Victor Garcia +5 more
semanticscholar +1 more source
Heterogeneous systems are ubiquitous in the field of High- Performance Computing (HPC). Graphics processing units (GPUs) are widely used as accelerators for their enormous computing potential and energy efficiency; furthermore, on-die integration of GPUs
Victor Garcia +5 more
semanticscholar +1 more source
Cache-efficient implementation and batching of tridiagonalization on manycore CPUs
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019We herein propose an efficient implementation of tridiagonalization (TRD) for small matrices on manycore CPUs. Tridiagonalization is a matrix decomposition that is used as a preprocessor for eigenvalue computations. Further, TRD for such small matrices appears even in the HPC environment as a subproblem of large computations.To utilize the large cache ...
Shuhei Kudo, Toshiyuki Imamura
openaire +1 more source
A CMOS RISC CPU with on-chip parallel cache
Proceedings of IEEE International Solid-State Circuits Conference - ISSCC '94, 2002This CMOS CPU in a 0.55 /spl mu/m, 3-metal process integrates over 1.2 M transistors on a single chip. All circuitry on-chip operates at 140 MHz under typical conditions. All off-chip interfaces are cycled at the same frequency (with the exception of system bus interface, which is cycled at 120 MHz). Chip parameters are given. >
E. Rashid +27 more
openaire +1 more source
Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache
12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06), 2006In embedded systems, handling time-critical real-time tasks is a challenge. The software may not only multi-task to improve response time, but also support events and interrupts, forcing the system to balance multiple priorities. Further, pre-emptive task switching hampers efficient interrupt processing, leading to instruction cache misses.
Ken W. Batcher, Robert A. Walker 0001
openaire +1 more source
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
International Conference on Machine LearningWith the widespread deployment of long-context large language models (LLMs), there has been a growing demand for efficient support of high-throughput inference.
Hanshi Sun +8 more
semanticscholar +1 more source
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Conference on Machine Learning and SystemsOnline LLM inference powers many exciting applications such as intelligent chatbots and autonomous agents. Modern LLM inference engines widely rely on request batching to improve inference throughput, aiming to make it cost-efficient when running on ...
Xu Jiang +4 more
semanticscholar +1 more source
Scaling your Hybrid CPU-GPU DBMS to Multiple GPUs
Proceedings of the VLDB EndowmentGPU-accelerated databases have been gaining popularity in recent years due to their massive parallelism and high memory bandwidth. The limited GPU memory capacity, however, is still a major bottleneck for GPU databases.
B. Yogatama, Weiwei Gong, Xiangyao Yu
semanticscholar +1 more source
KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation
Annual Meeting of the Association for Computational LinguisticsInference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value (KV) cache is used to store intermediate activations, which significantly lowers the computational overhead for token ...
Chaoyi Jiang +3 more
semanticscholar +1 more source
Enzian: an open, general, CPU/FPGA platform for systems software research
International Conference on Architectural Support for Programming Languages and Operating Systems, 2022David A. Cock +12 more
semanticscholar +1 more source
On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures
ISC Workshops, 2017Tom Deakin +2 more
semanticscholar +1 more source

