Results 11 to 20 of about 126,905 (80)

Accelerating Time Series Analysis via Processing using Non-Volatile Memories [PDF]

open access: yes, 2022
Time Series Analysis (TSA) is a critical workload for consumer-facing devices. Accelerating TSA is vital for many domains as it enables the extraction of valuable information and predict future events.
Fernandez, Ivan   +8 more
core   +2 more sources

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

open access: yes, 2021
Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning
Fernandez, Ivan   +7 more
core   +1 more source

End-to-end QoS for the open source safety-relevant RISC-V SELENE platform [PDF]

open access: yes, 2022
This paper presents the end-to-end QoS approach to provide performance guarantees followed in the SELENEplatform, a high-performance RISC-V based heterogeneous SoC for safety-related real-time systems.
Abella Ferrer, Jaume   +10 more
core   +1 more source

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications [PDF]

open access: yesInternational Symposium on High-Performance Computer Architecture, 2022
Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization.
Marcelo Orenes-Vera   +3 more
semanticscholar   +1 more source

RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation

open access: yesInternational Symposium on Computer Architecture, 2023
Robotic systems, such as autonomous unmanned aerial vehicles (UAVs) and self-driving cars, have been widely deployed in many scenarios and have the potential to revolutionize the future generation of computing.
Dima Nikiforov   +5 more
semanticscholar   +1 more source

V10: Hardware-Assisted NPU Multi-tenancy for Improved Resource Utilization and Fairness

open access: yesInternational Symposium on Computer Architecture, 2023
Modern cloud platforms have deployed neural processing units (NPUs) like Google Cloud TPUs to accelerate online machine learning (ML) inference services.
Yu Xue, Yiqi Liu, Lifeng Nai, Jian Huang
semanticscholar   +1 more source

Scaling Qubit Readout with Hardware Efficient Machine Learning Architectures [PDF]

open access: yesInternational Symposium on Computer Architecture, 2022
Reading a qubit is a fundamental operation in quantum computing. It translates quantum information into classical information enabling subsequent classification to assign the qubit states '0' or '1'.
Satvik Maurya   +4 more
semanticscholar   +1 more source

FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction

open access: yesInternational Symposium on Computer Architecture, 2023
Transformer model is becoming prevalent in various AI applications with its outstanding performance. However, the high cost of computation and memory footprint make its inference inefficient. We discover that among the three main computation modules in a
Yubin Qin   +8 more
semanticscholar   +1 more source

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

open access: yes, 2011
Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface.
Anderson   +28 more
core   +1 more source

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

open access: yes, 2022
Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources.
Boroumand, Amirali   +4 more
core  

Home - About - Disclaimer - Privacy