Hardware Acceleration of Neural Graphics [PDF]
Rendering and inverse rendering techniques have recently attained powerful new capabilities and building blocks in the form of neural representations (NR), with derived rendering techniques quickly becoming indispensable tools next to classic computer ...
Muhammad Husnain Mubarik +3 more
semanticscholar +1 more source
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings [PDF]
In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes)
N. Jouppi +13 more
semanticscholar +1 more source
Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design [PDF]
Novel view synthesis is an essential functionality for enabling immersive experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for which Neural Radiance Field (NeRF) has emerged as the state-of-the-art (SOTA) technique.
Y. Fu +6 more
semanticscholar +1 more source
Deep learning (DL) models such as convolutional neural networks (ConvNets) are being deployed to solve various computer vision and natural language processing tasks at the edge.
Hadjer Benmeziane +3 more
semanticscholar +1 more source
OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization [PDF]
Transformer-based large language models (LLMs) have achieved great success with the growing model size. LLMs' size grows by 240× every two years, which outpaces the hardware progress and makes model inference increasingly costly.
Cong Guo +8 more
semanticscholar +1 more source
ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification
With the rapid growth of classification scale in deep learning systems, the final classification layer becomes extreme classification with a memory footprint exceeding the main memory capacity of the CPU or GPU.
Siqi Li +7 more
semanticscholar +1 more source
ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks [PDF]
‘‘Extreme edge”1 devices, such as smart sensors, are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in ...
Zachary Susskind +11 more
semanticscholar +1 more source
Software-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum Processors [PDF]
Computational chemistry is the leading application to demonstrate the advantage of quantum computing in the near term. However, large-scale simulation of chemical systems on quantum computers is currently hindered due to a mismatch between the ...
Gushu Li, Yunong Shi, Ali Javadi-Abhari
semanticscholar +1 more source
HeapCheck: Low-cost Hardware Support for Memory Safety
Programs written in C/C++ are vulnerable to memory-safety errors like buffer-overflows and use-after-free. While several mechanisms to detect such errors have been previously proposed, they suffer from a variety of drawbacks, including poor performance ...
Gururaj Saileshwar +4 more
semanticscholar +1 more source
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers [PDF]
While vision transformers (ViTs) have continuously achieved new milestones in the field of computer vision, their sophisticated network architectures with high computation and memory costs have impeded their deployment on resource-limited edge devices ...
Peiyan Dong +10 more
semanticscholar +1 more source

