Results 81 to 90 of about 178,091 (333)

Model Parallelism With Subnetwork Data Parallelism

open access: yes
Distributed pre-training of large models at scale often imposes heavy memory demands on individual nodes and incurs significant intra-node communication costs. We propose a novel alternative approach that reduces the memory requirements by training small, structured subnetworks of the model on separate workers.
Singh, Vaibhav   +3 more
openaire   +2 more sources

Energy‐Efficient Knapsack Optimization Using Probabilistic Memristor Crossbars

open access: yesAdvanced Intelligent Systems, EarlyView.
The knapsack problem, a nondeterministic polynomial‐time (NP)‐hard combinatorial optimization problem, is solved energy‐efficiently. This work presents an algorithm‐hardware co‐design and implementation for practical (non‐ideal) NP‐hard problems with destabilizing self‐feedback (non‐zero diagonal) and non‐binary Hamiltonian representations under analog
Jinzhan Li, Suhas Kumar, Su‐in Yi
wiley   +1 more source

A Pipeline-Based ODE Solving Framework

open access: yesIEEE Access
The traditional parallel solving methods of ordinary differential equations (ODE) are mainly classified into task-parallelism, data-parallelism, and instruction-level parallelism.
Ruixia Cao, Shangjun Hou, Lin Ma
doaj   +1 more source

Study on Distributed Training Optimization Based on Hybrid Parallel [PDF]

open access: yesJisuanji kexue
Large-scale neural network training is a hot topic in the field of deep learning,and distributed training stands out as one of the most effective methods for training large neural networks across multiple nodes.Distributed training typically involves ...
XU Jinlong, LI Pengfei, LI Jianan, CHEN Biaoyuan, GAO Wei, HAN Lin
doaj   +1 more source

Configurable Kernel Map Implementation in Memristor Crossbar for Convolution Neural Network

open access: yesAdvanced Intelligent Systems, EarlyView.
A configurable kernel map implementation using a memristor crossbar array is presented. The crossbar array area can be configured based on the number of read cycles per inference, which directly affects the inference speed. The algorithm underlying this scheme is described, and convolutional neural network operations are experimentally validated using ...
Gyeonghae Kim   +3 more
wiley   +1 more source

Bioinspired Fully On‐Chip Learning Implemented on Memristive Neural Networks

open access: yesAdvanced Intelligent Systems, EarlyView.
This work proposes a memristive neural network based on van der Waals ferroelectric memristors and contrastive Hebbian learning, enabling fully on‐chip learning. The system achieves over 98% accuracy in pattern recognition with low power consumption (0.321 nJ/image) and high robustness, paving the way for efficient, bioinspired neuromorphic computing ...
Zhixing Wen   +9 more
wiley   +1 more source

goSLP: Globally Optimized Superword Level Parallelism Framework

open access: yes, 2018
Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit superword level parallelism (SLP), a type of fine-grained parallelism.
Amarasinghe, Saman, Mendis, Charith
core   +1 more source

Data Parallel Skeletons in Java

open access: yesProcedia Computer Science, 2012
AbstractIn the past years, multi-core processors and clusters of multi-core processors have emerged to be promising approaches to meet the growing demand for computing performance. They deliver scalable performance, certainly at the costs of tedious and complex parallel programming.
Steffen Ernsting, Herbert Kuchen
openaire   +2 more sources

Selective Update for Hardware‐Friendly On‐Chip Training in Distributed Analog In‐Memory Computing Systems

open access: yesAdvanced Intelligent Systems, EarlyView.
This study proposes a hardware‐efficient training methodology for crossbar arrays mapped with convolutional kernels in distributed computing systems. The approach is robust to variations in analog devices, minimizes disturbances during parallel writing operations, reduces stress on hardware, and accelerates the training process.
Jaehyeon Kang   +3 more
wiley   +1 more source

Improving Long‐Term Glucose Prediction Accuracy with Uncertainty‐Estimated ProbSparse‐Transformer

open access: yesAdvanced Intelligent Systems, EarlyView.
Wearable devices collect blood glucose and other physiological data, which serve as inputs to the prediction model. After data embedding, a structure utilizing ProbSparse self‐attention and a one‐step generative head within a Transformer‐based model is introduced, which is concurrently designed for deployment on edge devices, enabling real‐time ...
Wei Huang   +5 more
wiley   +1 more source

Home - About - Disclaimer - Privacy