Results 11 to 20 of about 126,905 (80)
Accelerating Time Series Analysis via Processing using Non-Volatile Memories [PDF]
Time Series Analysis (TSA) is a critical workload for consumer-facing devices. Accelerating TSA is vital for many domains as it enables the extraction of valuable information and predict future events.
Fernandez, Ivan +8 more
core +2 more sources
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning
Fernandez, Ivan +7 more
core +1 more source
End-to-end QoS for the open source safety-relevant RISC-V SELENE platform [PDF]
This paper presents the end-to-end QoS approach to provide performance guarantees followed in the SELENEplatform, a high-performance RISC-V based heterogeneous SoC for safety-related real-time systems.
Abella Ferrer, Jaume +10 more
core +1 more source
Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications [PDF]
Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization.
Marcelo Orenes-Vera +3 more
semanticscholar +1 more source
Robotic systems, such as autonomous unmanned aerial vehicles (UAVs) and self-driving cars, have been widely deployed in many scenarios and have the potential to revolutionize the future generation of computing.
Dima Nikiforov +5 more
semanticscholar +1 more source
V10: Hardware-Assisted NPU Multi-tenancy for Improved Resource Utilization and Fairness
Modern cloud platforms have deployed neural processing units (NPUs) like Google Cloud TPUs to accelerate online machine learning (ML) inference services.
Yu Xue, Yiqi Liu, Lifeng Nai, Jian Huang
semanticscholar +1 more source
Scaling Qubit Readout with Hardware Efficient Machine Learning Architectures [PDF]
Reading a qubit is a fundamental operation in quantum computing. It translates quantum information into classical information enabling subsequent classification to assign the qubit states '0' or '1'.
Satvik Maurya +4 more
semanticscholar +1 more source
FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction
Transformer model is becoming prevalent in various AI applications with its outstanding performance. However, the high cost of computation and memory footprint make its inference inefficient. We discover that among the three main computation modules in a
Yubin Qin +8 more
semanticscholar +1 more source
Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface.
Anderson +28 more
core +1 more source
Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud
Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources.
Boroumand, Amirali +4 more
core

