Results 81 to 90 of about 3,115,284 (271)
A Comparison of some recent Task-based Parallel Programming Models [PDF]
The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cilk++, Intel TBB and the task concept in OpenMP version 3.0.
Brorsson, Mats+2 more
core +1 more source
CPU has insufficient resources to satisfy the efficient computation of the convolution neural network (CNN), especially for embedded applications. Therefore, heterogeneous computing platforms are widely used to accelerate CNN tasks, such as GPU, FPGA ...
Li Luo+9 more
doaj +1 more source
Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "$\parallel$" (parallel) and "$;$" (serial), are insufficient in expressing "partial dependencies" or "partial ...
Dinh, David+2 more
core +1 more source
Factoring out ordered sections to expose thread-level parallelism [PDF]
With the rise of multi-core processors, researchers are taking a new look at extending the applicability auto-parallelization techniques. In this paper, we identify a dependence pattern on which autoparallelization currently fails.
De Bosschere, Koen+2 more
core +1 more source
Learning weakly supervised multimodal phoneme embeddings
Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e.
Chaabouni, Rahma+3 more
core +2 more sources
Scalable deep text comprehension for Cancer surveillance on high-performance computing
Background Deep Learning (DL) has advanced the state-of-the-art capabilities in bioinformatics applications which has resulted in trends of increasingly sophisticated and computationally demanding models trained by larger and larger data sets.
John X. Qiu+8 more
doaj +1 more source
Asynchronous Processing for Latent Fingerprint Identification on Heterogeneous CPU-GPU Systems
Latent fingerprint identification is one of the most essential identification procedures in criminal investigations. Addressing this task is challenging as (i) it requires analyzing massive databases in reasonable periods and (ii) it is commonly solved ...
Andres J. Sanchez-Fernandez+6 more
doaj +1 more source
Distributed Memory Implementation of Bron-Kerbosch Algorithm
This paper proposes a parallel implementation of the Bron-Kerbosch algorithm, which finds all maximal cliques in large, complex graphs using CPU and thread-level parallelism and distributed memory with multiple cores. With the growing size and complexity
Tejas Ravindra Rote+4 more
doaj +1 more source
Towards an Adaptive Skeleton Framework for Performance Portability [PDF]
The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable.
Maier, Patrick+2 more
core
transactional tasks parallelism in software transactions
Many programming languages, such as Clojure, Scala, and Haskell, support different concurrency models. In practice these models are often combined, however the semantics of the combinations are not always well-defined. In this paper, we study the combination of futures and Software Transactional Memory.
Swalens, Janwillem+2 more
openaire +6 more sources