Results 11 to 20 of about 53,837 (168)
Transcoding Unicode Characters with AVX-512 Instructions
AbstractIntel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our
Daniel Lemire, Robert Clausecker
wiley +5 more sources
Scalability analysis of AVX-512 extensions [PDF]
Energy efficiency below a specific thermal design power (TDP) has become the main design goal for microprocessors across all market segments. Optimizing the usage of the available transistors within the TDP is a pending topic. Parallelism is the basic foundation for achieving the exascale level.
Juan M. Cebrian +2 more
core +4 more sources
String searching with mismatches using AVX2 and AVX-512 instructions
zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Tamanna Chhabra +2 more
openaire +4 more sources
VECTORIZATION OF SMALL-SIZED SPECIAL-TYPE MATRICES MULTIPLICATION USING INSTRUCTIONS AVX-512
Modern software packages for supercomputer calculations require a large amount of computing resources. At the same time there are new hardware architectures that open up new opportunities for program code optimizing.
Leonid A. Benderskiy +2 more
doaj +3 more sources
Gem5-AVX: Extension of the Gem5 Simulator to Support AVX Instruction Sets
Recent commodity x86 CPUs still dominate the majority of supercomputers and most of them implement vector architectures to support single instruction multiple data (SIMD).
Seungmin Lee +3 more
doaj +2 more sources
Acceleration of Particle Swarm Optimization with AVX Instructions [PDF]
Parallel implementations of algorithms are usually compared with single-core CPU performance. The advantage of multicore vector processors decreases the performance gap between GPU and CPU computation, as shown in many recent pieces of research. With the
Jakub Safarik, Vaclav Snasel
doaj +2 more sources
Implementation of a vectorized Quicksort using AVX-512 intrinsics [PDF]
Jahrzehntelang wurden Verbesserungen der Rechengeschwindigkeit erreicht, indem die Taktfrequenz der CPU erhöht wurde. Im Laufe der letzten Jahre wurde dieser Mechanismus durch physikalische Einflüsse gebremst. Daher müssen moderne Single-Thread-Anwendungen stärker CPU-Funktionen ausnutzen, um von den Fortschritten neuer Prozessorgenerationen zu ...
Thiemicke, Frank +2 more
openaire +2 more sources
SeqMatcher: efficient genome sequence matching with AVX-512 extensions [PDF]
Abstract The recent emergence of long-read sequencing technologies has enabled substantial improvements in accuracy and reduced computational costs. Nonetheless, pairwise sequence alignment remains a time-consuming step in common bioinformatics pipelines, becoming a bottleneck in de novo whole-genome assembly.
Elena Espinosa +3 more
openaire +4 more sources
Vectorized Falcon-Sign Implementations using SSE2, AVX2, AVX-512F, NEON, and RVV
Falcon, a NTRU-based digital signature algorithm, has been selected by NIST as one of the post-quantum cryptography (PQC) standards. Compared to verification, the signature generation of Falcon is relatively slow. One of the core operations in signature
Jipeng Zhang, Jiaheng Zhang
doaj +2 more sources
Efficient Parallel Implementations of PIPO Block Cipher on CPU and GPU
Data encryption is essential for securely managing clients’ data in servers in data-centric ICT environment. Clients must encrypt the data before transmitting it to severs or other clients. Encrypting a large volumne of data requires a lot of time.
Hojin Choi, Seog Chung Seo
doaj +1 more source

