Results 41 to 50 of about 15,151 (210)
Analysis and development tools for efficient programs on parallel architectures
The article proposes methods for supporting development of efficient programs for modern parallel architectures, including hybrid systems. Specialized profiling methods designed for programmers tasked with parallelizing existing code are proposed.
Alexander Monakov +3 more
doaj +1 more source
As embedded devices start supporting heterogeneous processing cores (Central Processing Unit [CPU]–Graphical Processing Unit [GPU] based cores), performance aware task allocation becomes a major issue. Use of Open Computing Language (OpenCL) applications
Rakesh Kumar, Bibhas Ghoshal
doaj +1 more source
CLBlast: A Tuned OpenCL BLAS Library
This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices.
Nugteren, Cedric
core +1 more source
Enabling GPU Support for the COMPSs-Mobile Framework [PDF]
Using the GPUs embedded in mobile devices allows for increasing the performance of the applications running on them while reducing the energy consumption of their execution.
Badia Sala, Rosa Maria +2 more
core +1 more source
OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices [PDF]
Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in terms of ...
Alberola-López, Carlos +6 more
core +2 more sources
GPU-Accelerated PSO for High-Performance American Option Valuation
Using artificial intelligence tools to evaluate financial derivatives has become increasingly popular. PSO (particle swarm optimization) is one such tool. We present a comprehensive study of PSO for pricing American options on GPUs using OpenCL.
Leon Xing Li, Ren-Raw Chen
doaj +1 more source
ABSTRACT Task‐based programming interfaces introduce a paradigm in which computations are decomposed into fine‐grained units of work known as “tasks”. StarPU is a runtime system originally developed to support task‐based parallelism on on‐premise heterogeneous architectures by abstracting low‐level hardware details and efficiently managing resource ...
Vanderlei Munhoz +5 more
wiley +1 more source
Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster
The mean shift image segmentation algorithm is very computation-intensive. To address the need to deal with a large number of remote sensing (RS) image segmentations in real-world applications, this study has investigated the parallelization of the mean ...
Fang Huang +6 more
doaj +1 more source
OpenCL realization of some many-body potentials [PDF]
Modeling of carbon nanostructures by means of classical molecular dynamics requires a lot of computations. One of the ways to improve the performance of basic algorithms is to transform them for running on SIMD-type computing systems such as systems with
A. S. Minkin +2 more
doaj +1 more source
We report on our implementation of LatticeQCD applications using OpenCL. We focus on the general concept and on distributing different parts on hybrid systems, consisting of both CPUs (Central Processing Units) and GPUs (Graphic Processing Units).
Philipsen, Owe +4 more
openaire +3 more sources

