Results 31 to 40 of about 53,837 (168)
Fast matrix multiplication via compiler‐only layered data reorganization and intrinsic lowering
Abstract The resurgence of machine learning has increased the demand for high‐performance basic linear algebra subroutines (BLAS), which have long depended on libraries to achieve peak performance on commodity hardware. High‐performance BLAS implementations rely on a layered approach that consists of tiling and packing layers—for data (re)organization ...
Braedy Kuzma +6 more
wiley +1 more source
Applying AVX512 vectorization to improve the performance of a random number generator
The generation of uniformly distributed random numbers is necessary for computer simulation by Monte Carlo methods and molecular dynamics. Generators of pseudo-random numbers (GPRS) are used to generate random numbers.
M. S. Guskova +2 more
doaj +1 more source
Спосіб ентропійного кодування відео на базі розширеного набору інструкцій SIMD AVX-512
The purpose of this work is to reduce the time of entropy coding of video using the capabilities of processors with an extended instruction set of the AVX-512 type due to parallelization and the use of additional SIMD instructions compared to AVX2 and ...
Русанова, О.В. +1 more
core +1 more source
Статтю присвячено програмній bitsliced-імплементації шифру «Калина» з використанням векторних інструкцій SSE, AVX, AVX-512 для х86-64 процесорів. Проаналізовано переваги і недоліки різних підходів до ефективної та захищеної програмної реалізації блокових
Yаroslav Sovyn, Volodymyr Khoma
doaj +1 more source
The article is devoted to the vectorization of calculations for Intel Xeon Phi Knights Landing (KNL) processor. Small-dimensional matrices are considered as objects for optimization. These operations are wide common in calculation codes in various scopes
Leonid A. Benderskiy +2 more
doaj +1 more source
Vectorization of CMSSW offline software [PDF]
The CMS experiment has been utilizing vectorization, or SIMD, in parts of its data processing applications for over a decade. On x86 platforms the vectorization level is still SSE3. In the past attempts to use wider vector instruction sets such as AVX or
Gartung Patrick
doaj +1 more source
Experiments on Speeding Up the Recursive Fast Fourier Transform by using AVX-512 SIMD instructions
The Fast Fourier Transform is probably one of the most studied algorithms of all time. New techniques regarding hardware and software are often applied and tested on it, but the interest in FFT is still large because of its applications - signal and ...
Giacomo Sansone, Marco Cococcioni
core +1 more source
Optimization of the N-body Simulation on Intel’s Architectures Based on AVX-512 Instruction Set [PDF]
The N-body simulations have become a powerful tool to test the gravitational interaction among particles, ranging from a few bodies to complete galaxies.
Chichizola, Franco +7 more
core +1 more source
Remote AVX Overhead: Detection and Mitigation
Due to power constraints, recent Intel CPUs reduce their frequency when executing AVX2 and AVX-512 instructions. Often, this frequency reduction affects other applications as well, which reduces overall performance and prevents contemporary operating ...
Gottschlag, Mathias
core +1 more source
Converting Binary Floating‐Point Numbers to Shortest Decimal Strings: An Experimental Review
ABSTRACT Background When sharing or logging numerical data, we must convert binary floating‐point numbers into their decimal string representations. For example, the number π might become 3.1415927. Engineers have perfected many algorithms for producing such accurate, short strings.
Jaël Champagne Gareau, Daniel Lemire
wiley +1 more source

