Results 21 to 30 of about 197 (65)
Benchmarking GPUs on SVBRDF Extractor Model
With the maturity of deep learning, its use is emerging in every field. Also, as different types of GPUs are becoming more available in the markets, it creates a difficult decision for users. How can users select GPUs to achieve optimal performance for a
Kandel, Narayan, Lambert, Melanie
core
Performance Modeling and Prediction for Dense Linear Algebra
This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their performance ...
Peise, Elmar
core +1 more source
In-Situ Techniques on GPU-Accelerated Data-Intensive Applications
The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges
Bellentani, Laura +7 more
core +1 more source
Polynomial-time Solver of Tridiagonal QUBO and QUDO problems with Tensor Networks
We present an algorithm for solving tridiagonal Quadratic Unconstrained Binary Optimization (QUBO) problems and Quadratic Unconstrained Discrete Optimization (QUDO) problems with one-neighbor interactions using the quantum-inspired technology of tensor ...
Ali, Alejandro Mata +3 more
core
A Test for FLOPs as a Discriminant for Linear Algebra Algorithms
Linear algebra expressions, which play a central role in countless scientific computations, are often computed via a sequence of calls to existing libraries of building blocks (such as those provided by BLAS and LAPACK). A sequence identifies a computing
Bientinesi, Paolo, Sankaran, Aravind
core +1 more source
Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM
AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality ...
Maleki, Saeed
core
Updates on the Low-Level Abstraction of Memory Access
Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data ...
Gruber, Bernhard Manfred
core
A note on integrating products of linear forms over the unit simplex
Integrating a product of linear forms over the unit simplex can be done in polynomial time if the number of variables n is fixed (V. Baldoni et al., 2011).
Casale, G
core
Modeling and Design of the Communication Sensing and Control Coupled Closed-Loop Industrial System
With the advent of 5G era, factories are transitioning towards wireless networks to break free from the limitations of wired networks. In 5G-enabled factories, unmanned automatic devices such as automated guided vehicles and robotic arms complete ...
Feng, Zhiyong +4 more
core
Predictability of just in time compilation
The productivity of embedded software development is limited by the high fragmentation of hardware platforms. To alleviate this problem, virtualization has become an important tool in computer science; and virtual machines are used in a number of ...
Bouakaz, Adnan
core

