Results 121 to 130 of about 589,245 (147)
Some of the next articles are maybe not open access.

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

IEEE International Solid-State Circuits Conference, 2021
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm2) in AI hardware accelerators across cloud and edge platforms.
A. Agrawal   +43 more
semanticscholar   +1 more source

MemPol: Policing Core Memory Bandwidth from Outside of the Cores

IEEE Real Time Technology and Applications Symposium, 2023
In today’s multiprocessor systems-on-a-chip (MP- SoC), the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worstcase execution ...
Alexander Zuepke   +4 more
semanticscholar   +1 more source

Instruction Profiling Based Predictive Throttling for Power and Performance

IEEE transactions on computers, 2023
Technology scaling has long been the driving force for reducing power consumption in microprocessor design. As scaling has reached its limits, new techniques are being adopted to address the power problem.
A. Owahid, E. John
semanticscholar   +1 more source

MTFT: Multi-Tenant Fair Throttling

International Conference on Big Data and Smart Computing, 2023
Distributing resources fairly to tenants is important in cloud data center. However, traditional Fair I/O schedulers are not suitable for container environments. It is because of the unfairness and performance degradations.
Il-Bing Song, Sang-Won Lee
semanticscholar   +1 more source

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

IEEE Journal of Solid-State Circuits, 2022
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions ...
Sae Kyu Lee   +43 more
semanticscholar   +1 more source

Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-based Heterogeneous SoCs

Design Automation Conference, 2023
Embedded systems are increasingly adopting heterogeneous templates integrating hardware accelerators and application-specific processors, which poses novel challenges.
Gianluca Brilli   +6 more
semanticscholar   +1 more source

Glint: Decentralized Federated Graph Learning with Traffic Throttling and Flow Scheduling

International Workshop on Quality of Service, 2021
Federated learning has been proposed as a promising distributed machine learning paradigm with strong privacy protection on training data. Existing work mainly focuses on training convolutional neural network (CNN) models good at learning on image/voice ...
Tao Liu, Pengjie Li, Yu-Lei Gu
semanticscholar   +1 more source

LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management

International Symposium on High-Performance Computer Architecture, 2021
Modern Cloud Service Providers (CSP) heavily co-schedule tasks with different priorities on the same computing node to increase server utilization. To ensure the performance of high priority jobs, CSPs usually employ Quality-of-Service (QoS) mechanisms ...
Ying Zhang   +9 more
semanticscholar   +1 more source

Addressing Thermal Throttling in HBM

2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
High Bandwidth Memory (HBM) is utilized in HPC and AI/ML systems, as it provides a higher data rate and also a high memory capacity by stacking DRAM dies. In the quest for a higher HBM capacity, as the number of stacked DRAM dies is increased, the number
Gaurav Kothari, Kanad Ghose
semanticscholar   +1 more source

LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling

International Conference on Parallel Processing
Large Language Models (LLMs) have achieved unprecedented success across various applications, but their substantial memory requirements pose significant challenges to current memory system designs, especially during inference. Our work targets last-level
Zhongchun Zhou, Chengtao Lai, Wei Zhang
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy