Results 121 to 130 of about 589,245 (147)
Some of the next articles are maybe not open access.
IEEE International Solid-State Circuits Conference, 2021
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm2) in AI hardware accelerators across cloud and edge platforms.
A. Agrawal +43 more
semanticscholar +1 more source
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm2) in AI hardware accelerators across cloud and edge platforms.
A. Agrawal +43 more
semanticscholar +1 more source
MemPol: Policing Core Memory Bandwidth from Outside of the Cores
IEEE Real Time Technology and Applications Symposium, 2023In today’s multiprocessor systems-on-a-chip (MP- SoC), the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worstcase execution ...
Alexander Zuepke +4 more
semanticscholar +1 more source
Instruction Profiling Based Predictive Throttling for Power and Performance
IEEE transactions on computers, 2023Technology scaling has long been the driving force for reducing power consumption in microprocessor design. As scaling has reached its limits, new techniques are being adopted to address the power problem.
A. Owahid, E. John
semanticscholar +1 more source
MTFT: Multi-Tenant Fair Throttling
International Conference on Big Data and Smart Computing, 2023Distributing resources fairly to tenants is important in cloud data center. However, traditional Fair I/O schedulers are not suitable for container environments. It is because of the unfairness and performance degradations.
Il-Bing Song, Sang-Won Lee
semanticscholar +1 more source
IEEE Journal of Solid-State Circuits, 2022
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions ...
Sae Kyu Lee +43 more
semanticscholar +1 more source
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions ...
Sae Kyu Lee +43 more
semanticscholar +1 more source
Design Automation Conference, 2023
Embedded systems are increasingly adopting heterogeneous templates integrating hardware accelerators and application-specific processors, which poses novel challenges.
Gianluca Brilli +6 more
semanticscholar +1 more source
Embedded systems are increasingly adopting heterogeneous templates integrating hardware accelerators and application-specific processors, which poses novel challenges.
Gianluca Brilli +6 more
semanticscholar +1 more source
Glint: Decentralized Federated Graph Learning with Traffic Throttling and Flow Scheduling
International Workshop on Quality of Service, 2021Federated learning has been proposed as a promising distributed machine learning paradigm with strong privacy protection on training data. Existing work mainly focuses on training convolutional neural network (CNN) models good at learning on image/voice ...
Tao Liu, Pengjie Li, Yu-Lei Gu
semanticscholar +1 more source
LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management
International Symposium on High-Performance Computer Architecture, 2021Modern Cloud Service Providers (CSP) heavily co-schedule tasks with different priorities on the same computing node to increase server utilization. To ensure the performance of high priority jobs, CSPs usually employ Quality-of-Service (QoS) mechanisms ...
Ying Zhang +9 more
semanticscholar +1 more source
Addressing Thermal Throttling in HBM
2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD)High Bandwidth Memory (HBM) is utilized in HPC and AI/ML systems, as it provides a higher data rate and also a high memory capacity by stacking DRAM dies. In the quest for a higher HBM capacity, as the number of stacked DRAM dies is increased, the number
Gaurav Kothari, Kanad Ghose
semanticscholar +1 more source
LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling
International Conference on Parallel ProcessingLarge Language Models (LLMs) have achieved unprecedented success across various applications, but their substantial memory requirements pose significant challenges to current memory system designs, especially during inference. Our work targets last-level
Zhongchun Zhou, Chengtao Lai, Wei Zhang
semanticscholar +1 more source

