Results 171 to 180 of about 263,258 (207)
Some of the next articles are maybe not open access.

Extending a CPU Cache for Efficient IPv6 Lookup

2018 IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS), 2018
Increasing throughput requirements for Internet routers and growing routing table sizes have emphasized the need for fast and scalable packet forwarding systems. This paper presents a hardware cache-based IPv6 lookup system. Our goal is to study how much performance can be achieved with a lookup system that is implemented by modifying a processor cache.
Benjamin Wolff   +3 more
openaire   +2 more sources

ShowTime: Amplifying Arbitrary CPU Timing Side Channels

ACM Asia Conference on Computer and Communications Security, 2023
Microarchitectural attacks typically rely on precise timing sources to uncover short-lived secret-dependent activity in the processor. In response, many browsers and even CPU vendors restrict access to fine-grained timers.
Antoon Purnal   +3 more
semanticscholar   +1 more source

μManycore: A Cloud-Native CPU for Tail at Scale

International Symposium on Computer Architecture, 2023
Microservices are emerging as a popular cloud-computing paradigm. Microservice environments execute typically-short service requests that interact with one another via remote procedure calls (often across machines), and are subject to stringent tail ...
Jovan Stojkovic   +3 more
semanticscholar   +1 more source

Cache in Hand: Expander-Driven CXL Prefetcher for Next Generation CXL-SSD

USENIX Workshop on Hot Topics in Storage and File Systems, 2023
Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs ...
Miryeong Kwon   +2 more
semanticscholar   +1 more source

Fetch Me If You Can: Evaluating CPU Cache Prefetching and Its Reliability on High Latency Memory

International Workshop on Data Management on New Hardware
Memory can be located close to a CPU, at remote sockets, or on devices connected via interconnects such as CXL or NVLink. A larger distance between memory and a core accessing the memory usually results in higher access latency.
Fabian Mahling, Marcel Weisgut, T. Rabl
semanticscholar   +1 more source

How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching

International Workshop on Data Management on New Hardware
Software-based prefetching is a powerful method for tolerating access penalties that are encountered by data processing systems: memory latency. Although the idea appears straightforward---simply informing the CPU about upcoming data accesses---the ...
Roland Kühn, J. Mühlig, Jens Teubner
semanticscholar   +1 more source

Logic synthesis and verification of the CPU and caches of a mainframe system

Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC, 2002
This paper describes the large scale application of logic synthesis and formal verification using the BONSAI system to the design of the CPU and caches of a high-end mainframe system. The key feature of this application is the methodology that integrates a set of logic synthesis and formal verification techniques to build an effective logic-design ...
Huy Nam Nguyen   +4 more
openaire   +1 more source

An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018
Emerging high-performance processor architectures show two key trends: longer vector units and deeper memory hierarchies. It is not always possible to exploit both vectorization and locality. Prior optimization techniques have focused on either vectorization for data parallelism or cache reuse for low latency ignoring the interference between the two ...
Du Shen, Milind Chabbi, Xu Liu 0001
openaire   +1 more source

Iterative cache simulation of embedded CPUs with trace stripping

Proceedings of the seventh international workshop on Hardware/software codesign - CODES '99, 1999
Trace-driven cache simulation is a time-consuming yet valuable procedure for evaluating the performance of embedded memory systems. In this paper we present a novel technique, called iterative cache simulation, to produce a variety of performance metrics for several different cache configurations. Compared with previous work in this field, our approach
Zhao Wu, Wayne H. Wolf
openaire   +1 more source

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference

Design Automation Conference
The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase in computation.
Shuzhang Zhong   +5 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy