Performance cs.pf - Open Access .click

Results 21 to 30 of about 197 (65)

Benchmarking GPUs on SVBRDF Extractor Model

, 2023
With the maturity of deep learning, its use is emerging in every field. Also, as different types of GPUs are becoming more available in the markets, it creates a difficult decision for users. How can users select GPUs to achieve optimal performance for a
Kandel, Narayan, Lambert, Melanie
core

Performance Modeling and Prediction for Dense Linear Algebra

, 2017
This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their performance ...
Peise, Elmar
core +1 more source

In-Situ Techniques on GPU-Accelerated Data-Intensive Applications

, 2023
The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges
Bellentani, Laura +7 more
core +1 more source

Polynomial-time Solver of Tridiagonal QUBO and QUDO problems with Tensor Networks

, 2023
We present an algorithm for solving tridiagonal Quadratic Unconstrained Binary Optimization (QUBO) problems and Quadratic Unconstrained Discrete Optimization (QUDO) problems with one-neighbor interactions using the quantum-inspired technology of tensor ...
Ali, Alejandro Mata +3 more
core

A Test for FLOPs as a Discriminant for Linear Algebra Algorithms

, 2022
Linear algebra expressions, which play a central role in countless scientific computations, are often computed via a sequence of calls to existing libraries of building blocks (such as those provided by BLAS and LAPACK). A sequence identifies a computing
Bientinesi, Paolo, Sankaran, Aravind
core +1 more source

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

, 2023
AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality ...
Maleki, Saeed
core

Updates on the Low-Level Abstraction of Memory Access

, 2023
Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data ...
Gruber, Bernhard Manfred
core

A note on integrating products of linear forms over the unit simplex

, 2018
Integrating a product of linear forms over the unit simplex can be done in polynomial time if the number of variables n is fixed (V. Baldoni et al., 2011).
Casale, G
core

Modeling and Design of the Communication Sensing and Control Coupled Closed-Loop Industrial System

, 2023
With the advent of 5G era, factories are transitioning towards wireless networks to break free from the limitations of wired networks. In 5G-enabled factories, unmanned automatic devices such as automated guided vehicles and robotic arms complete ...
Feng, Zhiyong +4 more
core

Predictability of just in time compilation

, 2010
The productivity of embedded software development is limited by the high fragmentation of hardware platforms. To alleviate this problem, virtualization has become an important tool in computer science; and virtual machines are used in a number of ...
Bouakaz, Adnan
core