Results 271 to 280 of about 79,266 (321)
Some of the next articles are maybe not open access.
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, 2010
Whereas the fastest supercomputer of 1998 could compute 1.34 trillion double precision floating point operations per second (TFLOPS) [7], today's consumer-level (sub-$500) graphics cards such as the NVidia GeForce GTX 480 can compute 1.35 TFLOPS (single precision) [8].
Nate Anderson +2 more
openaire +1 more source
Whereas the fastest supercomputer of 1998 could compute 1.34 trillion double precision floating point operations per second (TFLOPS) [7], today's consumer-level (sub-$500) graphics cards such as the NVidia GeForce GTX 480 can compute 1.35 TFLOPS (single precision) [8].
Nate Anderson +2 more
openaire +1 more source
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2014
Parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU kernel in CUDA programs, still contains both se-quential code and parallel loops.
Yi Yang, Huiyang Zhou
openaire +1 more source
Parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU kernel in CUDA programs, still contains both se-quential code and parallel loops.
Yi Yang, Huiyang Zhou
openaire +1 more source
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, 2011
During six months of intensive nVidia CUDA C programming many bugs were created. We pass on the software engineering lessons learnt, particularly those relevant to parallel general-purpose computation on graphics hardware GPGPU.
openaire +1 more source
During six months of intensive nVidia CUDA C programming many bugs were created. We pass on the software engineering lessons learnt, particularly those relevant to parallel general-purpose computation on graphics hardware GPGPU.
openaire +1 more source
CUDA-MAFFT: Accelerating MAFFT on CUDA-enabled graphics hardware
2013 IEEE International Conference on Bioinformatics and Biomedicine, 2013Multiple sequence alignment (MSA) constitutes an extremely powerful tool for many biological applications including phylogenetic tree estimation, secondary structure prediction, and critical residue identification. However, aligning large biological sequences with popular tools such as MAFFT requires long runtimes on sequential architectures.
Xiangyuan Zhu, Kenli Li
openaire +1 more source
CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications
2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 2019GPUs are powerful, massively parallel processors, which require a vast amount of thread parallelism to keep their thousands of execution units busy, and to tolerate latency when accessing its high-throughput memory system. Understanding the behavior of massively threaded GPU programs can be difficult, even though recent GPUs provide an abundance of ...
Lorenz Braun, Holger Froning
openaire +1 more source
2015
?????????????????? ???????????? ?????????????? ???????????????????? ?????????????????? ??????????????????????????????? ?????? ??????????????????-?????????????????? ?????????????????? CUDA. ???????????????? ???????????????????? ???????? ???????????? ?????????????????? ???? ?????????????????????????? ?? ???????????????????????? ?????????????????? ????????
openaire +1 more source
?????????????????? ???????????? ?????????????? ???????????????????? ?????????????????? ??????????????????????????????? ?????? ??????????????????-?????????????????? ?????????????????? CUDA. ???????????????? ???????????????????? ???????? ???????????? ?????????????????? ???? ?????????????????????????? ?? ???????????????????????? ?????????????????? ????????
openaire +1 more source
2014
A search of arbitrary shape image fragments with full-search template matching on CUDA is examined. Different approaches to search area caching in a multiprocessor???s memory are proposed and analyzed. Acceleration on GPU in comparison to CPU is evaluated. The proposed algorithms can be used to accelerate object tracking in video.
openaire +1 more source
A search of arbitrary shape image fragments with full-search template matching on CUDA is examined. Different approaches to search area caching in a multiprocessor???s memory are proposed and analyzed. Acceleration on GPU in comparison to CPU is evaluated. The proposed algorithms can be used to accelerate object tracking in video.
openaire +1 more source
2017
???????????????????? ?????????? ???? ???????????? ?????????????????????? ???????????????????????? ???????????????????? CUDA ?????? ?????????????????????? ?????????????????? ?? ???????????? ?????????????????????? ???????????????? ???? ????????????????????????????. ???? ???????????????? ???????????? ?? ?????????????????? Hex 6.1 ?????????????????? ???????
openaire +1 more source
???????????????????? ?????????? ???? ???????????? ?????????????????????? ???????????????????????? ???????????????????? CUDA ?????? ?????????????????????? ?????????????????? ?? ???????????? ?????????????????????? ???????????????? ???? ????????????????????????????. ???? ???????????????? ???????????? ?? ?????????????????? Hex 6.1 ?????????????????? ???????
openaire +1 more source

