Results 41 to 50 of about 3,115,284 (271)
In this work, we present the design and implementation of an ultra-low latency Deep Reinforcement Learning (DRL) FPGA based accelerator for addressing hard real-time Mixed Integer Programming problems.
Gerasimos Gerogiannis+5 more
doaj +1 more source
MPI+X: task-based parallelization and dynamic load balance of finite element assembly [PDF]
The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm.
Artigues, Antoni+6 more
core +4 more sources
Basic principles of odontopreparation for non-removable orthopedic structures (literature review)
Introduction. The actual problem of dentists-orthopedists at the present time is a violation of the odontopreparation protocol and, as a result, poor-quality fixation of various non-removable orthopedic structures.
D. I. Dmitriev+2 more
doaj +1 more source
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the ...
A Arnold+21 more
core +1 more source
Performance monitoring and analysis of task-based OpenMP. [PDF]
OpenMP, a typical shared memory programming paradigm, has been extensively applied in high performance computing community due to the popularity of multicore architectures in recent years.
Yi Ding, Kai Hu, Kai Wu, Zhenlong Zhao
doaj +1 more source
A Fast Causal Profiler for Task Parallel Programs
This paper proposes TASKPROF, a profiler that identifies parallelism bottlenecks in task parallel programs. It leverages the structure of a task parallel execution to perform fine-grained attribution of work to various parts of the program.
Nagarakatte, Santosh, Yoga, Adarsh
core +1 more source
How Many Cooks Spoil the Soup? [PDF]
In this work, we study the following basic question: "How much parallelism does a distributed task permit?" Our definition of parallelism (or symmetry) here is not in terms of speed, but in terms of identical roles that processes have at the same time in
D Alistarh+19 more
core +2 more sources
An asynchronous and task-based implementation of peridynamics utilizing HPX—the C++ standard library for parallelism and concurrency [PDF]
On modern supercomputers, asynchronous many task systems are emerging to address the new architecture of computational nodes. Through this shift of increasing cores per node, a new programming model with focus on handling of the fine-grain parallelism ...
Patrick Diehl+4 more
semanticscholar +1 more source
GPU Acceleration of Melody Accurate Matching in Query-by-Humming
With the increasing scale of the melody database, the query-by-humming system faces the trade-offs between response speed and retrieval accuracy. Melody accurate matching is the key factor to restrict the response speed.
Limin Xiao+4 more
doaj +1 more source
Reservation-Based Federated Scheduling for Parallel Real-Time Tasks
This paper considers the scheduling of parallel real-time tasks with arbitrary-deadlines. Each job of a parallel task is described as a directed acyclic graph (DAG).
Agrawal, Kunal+4 more
core +1 more source