Cpu cache - Open Access .click

Results 181 to 190 of about 263,258 (207)

Some of the next articles are maybe not open access.

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications

IEEE International Symposium on Workload Characterization, 2016
Heterogeneous systems are ubiquitous in the field of High- Performance Computing (HPC). Graphics processing units (GPUs) are widely used as accelerators for their enormous computing potential and energy efficiency; furthermore, on-die integration of GPUs
Victor Garcia +5 more
semanticscholar +1 more source

Cache-efficient implementation and batching of tridiagonalization on manycore CPUs

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019
We herein propose an efficient implementation of tridiagonalization (TRD) for small matrices on manycore CPUs. Tridiagonalization is a matrix decomposition that is used as a preprocessor for eigenvalue computations. Further, TRD for such small matrices appears even in the HPC environment as a subproblem of large computations.To utilize the large cache ...
Shuhei Kudo, Toshiyuki Imamura
openaire +1 more source

A CMOS RISC CPU with on-chip parallel cache

Proceedings of IEEE International Solid-State Circuits Conference - ISSCC '94, 2002
This CMOS CPU in a 0.55 /spl mu/m, 3-metal process integrates over 1.2 M transistors on a single chip. All circuitry on-chip operates at 140 MHz under typical conditions. All off-chip interfaces are cycled at the same frequency (with the exception of system bus interface, which is cycled at 120 MHz). Chip parameters are given. >
E. Rashid +27 more
openaire +1 more source

Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache

12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06), 2006
In embedded systems, handling time-critical real-time tasks is a challenge. The software may not only multi-task to improve response time, but also support events and interrupts, forcing the system to balance multiple priorities. Further, pre-emptive task switching hampers efficient interrupt processing, leading to instruction cache misses.
Ken W. Batcher, Robert A. Walker 0001
openaire +1 more source

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

International Conference on Machine Learning
With the widespread deployment of long-context large language models (LLMs), there has been a growing demand for efficient support of high-throughput inference.
Hanshi Sun +8 more
semanticscholar +1 more source

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Conference on Machine Learning and Systems
Online LLM inference powers many exciting applications such as intelligent chatbots and autonomous agents. Modern LLM inference engines widely rely on request batching to improve inference throughput, aiming to make it cost-efficient when running on ...
Xu Jiang +4 more
semanticscholar +1 more source

Scaling your Hybrid CPU-GPU DBMS to Multiple GPUs

Proceedings of the VLDB Endowment
GPU-accelerated databases have been gaining popularity in recent years due to their massive parallelism and high memory bandwidth. The limited GPU memory capacity, however, is still a major bottleneck for GPU databases.
B. Yogatama, Weiwei Gong, Xiangyao Yu
semanticscholar +1 more source

KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation

Annual Meeting of the Association for Computational Linguistics
Inference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value (KV) cache is used to store intermediate activations, which significantly lowers the computational overhead for token ...
Chaoyi Jiang +3 more
semanticscholar +1 more source

Enzian: an open, general, CPU/FPGA platform for systems software research

International Conference on Architectural Support for Programming Languages and Operating Systems, 2022
David A. Cock +12 more
semanticscholar +1 more source

On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures

ISC Workshops, 2017
Tom Deakin, W. Gaudin, Simon McIntosh-Smith +2 more
semanticscholar +1 more source

computer science
engineering
fos: computer and information sciences

cache
performance cs.pf
gpu