Results 271 to 280 of about 39,047 (336)
Some of the next articles are maybe not open access.
NAS Parallel Benchmarks with CUDA and beyond
Software, Practice & Experience, 2021NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the ...
G. Araujo +4 more
semanticscholar +1 more source
Performance Study of GPU applications using SYCL and CUDA on Tesla V100 GPU
IEEE Conference on High Performance Extreme Computing, 2021SYCL standard enables single-source programs to run on heterogeneous platforms consisting of CPUs, GPUs, FPGAs across different hardware vendors. SYCL combines modern C++ features along with OpenCL’s portability. SYCL runtime is also capable of targeting
Goutham Kalikrishna Reddy Kuncham +2 more
semanticscholar +1 more source
Impact of CUDA and OpenCL on Parallel and Distributed Computing
2021 8th International Conference on Electrical and Electronics Engineering (ICEEE), 2021Along with high performance computer systems, the Application Programming Interface (API) used is crucial to develop efficient solutions for modern parallel and distributed computing. Compute Unified Device Architecture (CUDA) and Open Computing Language
A. Asaduzzaman +4 more
semanticscholar +1 more source
CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures.
Weiguo, Liu +2 more
openaire +2 more sources
Dissecting the CUDA scheduling hierarchy: a Performance and Predictability Perspective
IEEE Real Time Technology and Applications Symposium, 2020Over the last few years, the ever-increasing use of Graphic Processing Units (GPUs) in safety-related domains has opened up many research problems in the real-time community.
Ignacio Sañudo Olmedo +4 more
semanticscholar +1 more source
ACM SIGGRAPH ASIA 2009 Sketches, 2009
Modern GPUs provide gradually increasing programmability on vertex shader, geometry shader and fragment shader in the past decade. However, many classical problems such as order-independent transparency (OIT), occlusion culling have not yet been efficiently solved using the traditional graphics pipeline.
Fang Liu +3 more
openaire +1 more source
Modern GPUs provide gradually increasing programmability on vertex shader, geometry shader and fragment shader in the past decade. However, many classical problems such as order-independent transparency (OIT), occlusion culling have not yet been efficiently solved using the traditional graphics pipeline.
Fang Liu +3 more
openaire +1 more source
Kevin: Multi-Turn RL for Generating CUDA Kernels
arXiv.orgWriting GPU kernels is a challenging task and critical for AI systems'efficiency. It is also highly iterative: domain experts write code and improve performance through execution feedback.
Carlo Baronio +4 more
semanticscholar +1 more source
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, 2011
The rise of multi-core computer hardware has introduced new urgency to learning parallel programming. In this presentation, we again focus on CUDA exercises suitable for undergraduate students. Trying to appeal to a wide audience of today's learners, we have developed a "Game of Life" exercise and an introductory CUDA summary.
Christopher T. Mitchell +2 more
openaire +1 more source
The rise of multi-core computer hardware has introduced new urgency to learning parallel programming. In this presentation, we again focus on CUDA exercises suitable for undergraduate students. Trying to appeal to a wide audience of today's learners, we have developed a "Game of Life" exercise and an introductory CUDA summary.
Christopher T. Mitchell +2 more
openaire +1 more source
Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
arXiv.orgRecent advances in large language models (LLMs) demonstrate their effectiveness in scaling test-time compute for software engineering tasks. However, these approaches often focus on high-level solutions, with limited attention to optimizing low-level ...
Robert Lange +5 more
semanticscholar +1 more source
Lessons learned from comparing C-CUDA and Python-Numba for GPU-Computing
International Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2020Python as programming language is increasingly gaining importance, especially in data science, scientific, and parallel programming. It is faster and easier to learn than classical programming languages such as C.
Lena Oden
semanticscholar +1 more source

