Results 91 to 100 of about 46,626 (281)
Exploitation of potential parallelism is obviously a major source of code optimization. This chapter therefore focusses on DSP-specific techniques, which aim at parallelization of generated vertical machine code. In the first part, we consider the area of memory address generation.
openaire +1 more source
Modern AI systems can now synthesize coherent multimedia experiences, generating video and audio directly from text prompts. These unified frameworks represent a rapid shift toward controllable and synchronized content creation. From early neural architectures to transformer and diffusion paradigms, this paper contextualizes the ongoing evolution of ...
Charles Ding, Rohan Bhowmik
wiley +1 more source
Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can
Joseph Arnold Xavier +7 more
doaj +1 more source
Implementing OpenMP 4.0 for the NVIDIA PTX architecture in GCC compiler
The paper describes the approach used in implementing OpenMP offloading to NVIDIA accelerators in GCC. Offloading refers to a new capability in OpenMP 4.0 specification update that allows the programmer to specify regions of code that should be executed ...
A. V. Monakov, V. A. Ivanishin
doaj +1 more source
Instruction-level parallelism (ILP) is a set of processor and compiler design techniques that speed up program execution via the parallel execution of individual RISC-style operations, such as memory loads and stores, integer additions, and floating-point multiplications.
openaire +1 more source
Converting Binary Floating‐Point Numbers to Shortest Decimal Strings: An Experimental Review
ABSTRACT Background When sharing or logging numerical data, we must convert binary floating‐point numbers into their decimal string representations. For example, the number π might become 3.1415927. Engineers have perfected many algorithms for producing such accurate, short strings.
Jaël Champagne Gareau, Daniel Lemire
wiley +1 more source
Memory and Parallelism Analysis Using a Platform-Independent Approach
Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task.
Awan, Ahsan Javed +5 more
core +1 more source
Freedom of Scientific Inquiry and Democracy. A Systems‐Theoretical Approach
ABSTRACT The article examines the relationship between democracy and one of its inherent features: freedom of scientific inquiry—a multi‐layered concept closely intertwined with the broader notion of academic freedom—both of which are increasingly under threat worldwide. The paper advocates for the use of Luhmann's theoretical framework to analyse this
Krešimir Žažar, Steffen Roth
wiley +1 more source
Exploiting superword level parallelism with multimedia instruction sets [PDF]
S. Larsen, Saman Amarasinghe
openalex +3 more sources
Performance Debugging and Tuning using an Instruction-Set Simulator [PDF]
Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems.
Magnusson, Peter S., Montelius, Johan
core +3 more sources

