Results 1 to 10 of about 172,822 (307)
To Parallelize or Not to Parallelize, Speed Up Issue [PDF]
Running parallel applications requires special and expensive processing resources to obtain the required results within a reasonable time. Before parallelizing serial applications, some analysis is recommended to be carried out to decide whether it will benefit from parallelization or not.
arxiv +4 more sources
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning [PDF]
Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, and ...
Tri Dao
semanticscholar +1 more source
The Parallelism Tradeoff: Limitations of Log-Precision Transformers [PDF]
Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens (and ...
William Merrill, Ashish Sabharwal
semanticscholar +1 more source
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism [PDF]
Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models.
Xupeng Miao+6 more
semanticscholar +1 more source
Merak: An Efficient Distributed DNN Training Framework With Automated 3D Parallelism for Giant Foundation Models [PDF]
Foundation models are in the process of becoming the dominant deep learning technology. Pretraining a foundation model is always time-consuming due to the large scale of both the model parameter and training dataset.
Zhiquan Lai+7 more
semanticscholar +1 more source
Sequence Parallelism: Long Sequence Training from System Perspective [PDF]
Transformer achieves promising results on various tasks. However, self-attention suffers from quadratic memory requirements with respect to the sequence length. Existing work focuses on reducing time and space complexity from an algorithm perspective. In
Shenggui Li+3 more
semanticscholar +1 more source
On interval number in cycle convexity [PDF]
Recently, Araujo et al. [Manuscript in preparation, 2017] introduced the notion of Cycle Convexity of graphs. In their seminal work, they studied the graph convexity parameter called hull number for this new graph convexity they proposed, and they ...
Julio Araujo+3 more
doaj +1 more source
Climate change, especially weather extremes like extreme cold or extreme hot, is a major challenge for global livestock. One of the animal breeding goals for sustainable livestock production should be to breed animals with excellent climate adaptability.
Nai-Yi Xu+5 more
doaj +1 more source
Parallel Locality and Parallelization Quality [PDF]
This paper presents a new distributed computation model adapted to manycore processors. In this model, the run is spread on the available cores by fork machine instructions produced by the compiler , for example at function calls and loops iterations. This approach is to be opposed to the actual model of computation based on cache and predictor.
Goossens, Bernard+3 more
openaire +5 more sources
Lexical and Structural Cues to Discourse Processing in First and Second Language
Discourse connectives are lexical items like “but” and “so” that are well-known to influence the online processing of the discourse relations they convey.
Ludivine Crible+2 more
doaj +1 more source