Results 41 to 50 of about 172,822 (307)
Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences [PDF]
Parallelizing Gated Recurrent Unit (GRU) networks is a challenging task, as the training procedure of GRU is inherently sequential. Prior efforts to parallelize GRU have largely focused on conventional parallelization strategies such as data-parallel and model-parallel training algorithms.
arxiv
Effective Parallelism for Equation and Jacobian Evaluation in Power Flow Calculation [PDF]
This letter investigates parallelism approaches for equation and Jacobian evaluations in large-scale power flow calculation. Two levels of parallelism are proposed and analyzed: inter-model parallelism, which evaluates models in parallel, and intra-model parallelism, which evaluates calculations within each model in parallel.
arxiv +1 more source
Parallel algorithms for series parallel graphs [PDF]
In this paper, a parallel algorithm is given that, given a graph G=(V, E), decides whether G is a series parallel graph, and if so, builds a decomposition tree for G of series and parallel composition rules. The algorithm uses O(log¦E¦log*¦E¦) time and O(¦E¦) operations on an EREW PRAM, andO(log¦E¦) time and O(¦E¦) operations on a CRCW PRAM (note that ...
Babette de Fluiter, Hans L. Bodlaender
openaire +5 more sources
Tesseract: Parallelize the Tensor Parallelism Efficiently
Together with the improvements in state-of-the-art accuracies of various tasks, deep learning models are getting significantly larger. However, it is extremely difficult to implement these large models because limited GPU memory makes it impossible to fit large models into a single GPU or even a GPU server. Besides, it is highly necessary to reduce the
Wang, Boxiang+3 more
openaire +2 more sources
This work analyzes Roman Jakobson's phonological survey, which was inspired by poetic texts, verse structure and direct observation of the work of certain poets.
Tanja Rusimović, Dragan Miladinović
doaj +1 more source
Approaches for Stereo Matching [PDF]
This review focuses on the last decade's development of the computational stereopsis for recovering three-dimensional information. The main components of the stereo analysis are exposed: image acquisition and camera modeling, feature selection, feature ...
Takouhi Ozanian
doaj +1 more source
Parallel Implementation of a Two-level Algebraic ILU(k)-based Domain Decomposition Preconditioner
We discuss the parallel implementation of a two-level algebraic ILU(k)-based domain decomposition preconditioner using the PETSc library. We present strategies to improve performance and minimize communication among processes during setup and application
Italo Cristiano L. Nievinski+5 more
doaj +1 more source
HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines
Modern server hardware is increasingly heterogeneous as hardware accelerators, such as GPUs, are used together with multicore CPUs to meet the computational demands of modern data analytics workloads.Unfortunately, query parallelization techniques used ...
Periklis Chrysogelos+3 more
semanticscholar +1 more source
Automatic Time‐Resolved Cardiovascular Segmentation of 4D Flow MRI Using Deep Learning
Background Segmenting the whole heart over the cardiac cycle in 4D flow MRI is a challenging and time‐consuming process, as there is considerable motion and limited contrast between blood and tissue. Purpose To develop and evaluate a deep learning‐based segmentation method to automatically segment the cardiac chambers and great thoracic vessels from 4D
Mariana Bustamante+4 more
wiley +1 more source
High Speed and Less Area Efficient Montgomery Modular Multiplication for VLSI Applications [PDF]
A high speed and area efficient Montgomery modular multiplication algorithm is implemented. In the proposed multiplier high speed is achieved by using the carry save adder (CSA) to reduce the carry propagation at the addition operation stage due to this ...
Sreenivasa Murthy K.E.+2 more
doaj +1 more source