Results 141 to 150 of about 597 (176)
AVX2-optimized Kvazaar HEVC intra encoder
This paper presents efficient SIMD optimizations for the open-source Kvazaar HEVC intra encoder. The C implementation of Kvazaar is accelerated by Intel AVX2 instructions whose effect on Kvazaar ultrafast preset is profiled. According to our profiling results, C functions of SATD, DCT, quantization, and intra prediction account for over 60% of the ...
Ari Lemmetti +2 more
exaly +4 more sources
Tailored AVX2 Transform Kernels for Versatile Video Coding
Peer ...
Joose Sainio +2 more
exaly +4 more sources
Some of the next articles are maybe not open access.
Related searches:
Related searches:
Optimizing Dilithium Implementation with AVX2/-512
Transactions on Embedded Computing SystemsDilithium is a signature scheme that is currently being standardized to the Module-Lattice-Based Digital Signature Standard by NIST. It is believed to be secure even against attacks from large-scale quantum computers based on lattice problems. The implementation efficiency is important for promoting the migration of current cryptography algorithms to ...
Runqing Xu, Debiao He, Min Luo
exaly +2 more sources
Accelerating stereo vision algorithm using SSE3, AVX2, and CUDA
Stereo vision features a widespread usage such as robotics, unmanned cars, aerial surveys, and many real-time applications. Also, it needs computational expensive calculations because of stereo matching. In real time applications, the execution time of stereo vision depth detection algorithm is very important.
M. Kokhazadeh +3 more
openaire +2 more sources
Fair Scheduling for AVX2 and AVX-512 Workloads.
CPU schedulers such as the Linux Completely Fair Scheduler try to allocate equal shares of the CPU performance to tasks of equal priority by allocating equal CPU time as a technique to improve quality of service for individual tasks. Recently, CPUs have, however, become power-limited to the point where different subsets of the instruction set allow for
Gottschlag, Mathias +3 more
openaire +2 more sources
Research on Accelerating the Performance of SpMV Based on AVX2 Instruction Set
Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence, 2020SpMV (Sparse Matrix-Vector Multiplication) has been widely used in various computing fields. Of course, the requirements for its performance also increase with the increase in the amount of data. Under the CPU processor, we can bring SpMV computing high-performance speed increase through multi-threaded programming. Besides, for CPU processors with SIMD
Haodong Bian, Jianqiang Huang
exaly +2 more sources
High performance implementation of 2-D convolution using AVX2
2017 19th International Symposium on Computer Architecture and Digital Systems (CADS), 2017Convolution is the most important and fundamental concept in multimedia processing. The 2-D convolution is used for different filtering operations such as sharpening, smoothing, and edge detection. It performs many mathematical operations on all image pixels. Therefore, it is almost a compute-intensive kernel.
Hossein Amiri, Asadollah Shahbahrami
exaly +2 more sources
AVX2 Programming – Extended Instructions
In this chapter, you learn how to use some of the instruction set extensions that were introduced in Chapter 8. The first section contains a couple of source code examples that exemplify use of the scalar and packed fused-multiply-add (FMA) instructions. The second section covers instructions that involve the general-purpose registers.
Daniel Kusswurm
openaire +2 more sources
On Improving the Speedup of Slice and Tile Level Parallelism in HEVC Using AVX2
HEVC has emerged as the new video coding standard promising improved compression ratios (for the same quality) by up to 50% compared to H.264/AVC. To achieve this performance HEVC requires increased computational overhead compared to its predecessor. For this reason parallelism is used, usually at a coarse grained level, e.g., per slice or tile.
Dimitris Skoumpourdis +5 more
openaire +2 more sources
Fast Implementation of Simeck Family Block Ciphers Using AVX2
2018 International Conference on Platform Technology and Service (PlatCon), 2018In CHES 2015, the Simeck light-weight family block cipher was proposed, which has similar architecture to SIMON and SPECK. Previous works on implementation of Simeck family lightweight block cipher are focused on the embedded device environment. In this paper, we proposed the fast implementation methods of Simeck family lightweight block ciphers by ...
Taehwan Park, Hwajeong Seo, Howon Kim
exaly +2 more sources

