Hardware architecture cs.ar - Open Access .click

Results 1 to 10 of about 687 (31)

Accelerating Time Series Analysis via Processing using Non-Volatile Memories [PDF]

, 2022
Time Series Analysis (TSA) is a critical workload for consumer-facing devices. Accelerating TSA is vital for many domains as it enables the extraction of valuable information and predict future events.
Fernandez, Ivan +8 more
core +2 more sources

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

, 2021
Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning
Fernandez, Ivan +7 more
core +1 more source

End-to-end QoS for the open source safety-relevant RISC-V SELENE platform [PDF]

, 2022
This paper presents the end-to-end QoS approach to provide performance guarantees followed in the SELENEplatform, a high-performance RISC-V based heterogeneous SoC for safety-related real-time systems.
Abella Ferrer, Jaume +10 more
core +1 more source

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

, 2011
Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface.
Anderson +28 more
core +1 more source

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

, 2022
Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources.
Boroumand, Amirali +4 more
core

An efficient use of virtualization in grid/cloud environments [PDF]

, 2011
Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational resources. Grid enables access to the resources but it does not guarantee any quality of service.
Choudhury, Arindam +4 more
core +1 more source

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

, 2023
Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets.
Brocard, Sylvan +7 more
core

Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication

, 2023
From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (
Andri, Renzo +4 more
core

An evaluation of a microprocessor with two independent hardware execution threads coupled through a shared cache

, 2023
We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one.
Desai, Madhav P.
core

SafeTI Traffic Injector Enhancement for Effective Interference Testing in Critical Real-Time Systems

, 2023
Safety-critical domains, such as automotive, space, and robotics, are adopting increasingly powerful multicores with abundant hardware shared resources for higher performance and efficiency.
Abella, Jaume +3 more
core