Results 41 to 50 of about 82,239 (329)
Implementing implicit OpenMP data sharing on GPUs
OpenMP is a shared memory programming model which supports the offloading of target regions to accelerators such as NVIDIA GPUs. The implementation in Clang/LLVM aims to deliver a generic GPU compilation toolchain that supports both the native CUDA C/C++
Bataev, Alexey +8 more
core +1 more source
We introduce AutomataGPT, a generative pretrained transformer (GPT) trained on synthetic spatiotemporal data from 2D cellular automata to learn symbolic rules. Demonstrating strong performance on both forward and inverse tasks, AutomataGPT establishes a scalable, domain‐agnostic framework for interpretable modeling, paving the way for future ...
Jaime A. Berkovich +2 more
wiley +1 more source
Benchmarking the cost of thread divergence in CUDA
All modern processors include a set of vector instructions. While this gives a tremendous boost to the performance, it requires a vectorized code that can take advantage of such instructions.
Bialas, Piotr, Strzelecki, Adam
core +1 more source
Skeleton‐oriented object segmentation (SKOOTS) introduces a new strategy for 3D mitochondrial instance segmentation by predicting explicit skeletons rather than relying on boundary cues. This approach enables robust analysis of densely packed organelles in large FIB‐SEM datasets.
Christopher J. Buswinka +3 more
wiley +1 more source
Instruction-Efficient and Parallelized AES using CUDA and PTX for Data Encryption
This research presents an instruction-efficient and parallelized implementation of the AES-256 encryption algorithm using NVIDIA CUDA with inline PTX to optimize instruction usage and execution performance on GPUs. Conventional AES implementation on CUDA
Raditya Hakim Daniswara +1 more
doaj +1 more source
The GPGPU (General Purpose Graphics Processing Units) have become a whole new area for research due to the fast development of GPU hardware and programming tools, such as CUDA (Compute Unified Device Architecture).
Yimu Ji +5 more
doaj +1 more source
Deep Feature-based Face Detection on Mobile Devices
We propose a deep feature-based face detector for mobile devices to detect user's face acquired by the front facing camera. The proposed method is able to detect faces in images containing extreme pose and illumination variations as well as partial faces.
Chellappa, Rama +2 more
core +1 more source
Long‐Tea‐CLIP (Contrastive Language‐Image Pre‐training) presents a multimodal AI framework that integrates visual, metabolomic, and sensory knowledge to grade green tea across appearance, soup color, aroma, taste, and infused leaf. By combining expert‐guided modeling with CLIP‐supervised learning, the system delivers fine‐grained quality evaluation and
Yanqun Xu +9 more
wiley +1 more source
Analysis and development tools for efficient programs on parallel architectures
The article proposes methods for supporting development of efficient programs for modern parallel architectures, including hybrid systems. Specialized profiling methods designed for programmers tasked with parallelizing existing code are proposed.
Alexander Monakov +3 more
doaj +1 more source
Programming GPUs with CUDA [PDF]
El documento contiene el material de un tutorial impartido en el congreso. No es una artículo científico en formato tradicional.Analizamos las prestaciones y características de las distintas generaciones de procesadores gráficos desarrollados por Nvidia ...
Ujaldon-Martinez, Manuel
core

