Cpu cache - Open Access .click

Results 51 to 60 of about 263,258 (207)

Fast Query Processing by Distributing an Index over CPU Caches [PDF]

2005 IEEE International Conference on Cluster Computing, 2005
New version published at IEEE Cluster Computing ...
Xiaoqin Ma, Gene Cooperman
openaire +2 more sources

Converting an Integer to a Decimal String in Under Two Nanoseconds

Software: Practice and Experience, EarlyView.
ABSTRACT Objective Converting binary integers to variable‐length decimal strings is a fundamental operation in computing. Conventional fast approaches rely on recursive division and small lookup tables. The goal of this work is to develop a significantly faster method for this task.
Jaël Champagne Gareau, Daniel Lemire
wiley +1 more source

Highly efficient parallel direct solver for solving dense complex matrix equations from method of moments

The Journal of Engineering, 2017
Based on the vectorised and cache optimised kernel, a parallel lower upper decomposition with a novel communication avoiding pivoting scheme is developed to solve dense complex matrix equations generated by the method of moments.
Yan Chen +4 more
doaj +1 more source

Casper: Accelerating Stencil Computations Using Near-Cache Processing

IEEE Access, 2023
Stencil computations are commonly used in a wide variety of scientific applications, ranging from large-scale weather prediction to solving partial differential equations.
Alain Denzler +6 more
doaj +1 more source

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper [PDF]

International Conference on Parallel Processing
Memory management across discrete CPU and GPU physical memory is traditionally achieved through explicit GPU allocations and data copy or unified virtual memory.
Gabin Schieffer +4 more
semanticscholar +1 more source

Multitype Game Optimisation: A Two‐Stage Fine‐Tuning Framework for Multi‑Game Optimisation With Large Language Models

CAAI Transactions on Intelligence Technology, EarlyView.
ABSTRACT Large language models (LLMs) have made remarkable advances in natural language processing, demonstrating great potential in modelling structured sequences. However, adapting these capabilities to machine gaming tasks such as Go remains challenging due to limitations in strategy generalisation and optimisation efficiency.
Xiali Li +5 more
wiley +1 more source

A Simple Cache Emulator for Evaluating Cache Behavior for SMP Systems

Acta Polytechnica, 2006
Every modern CPU uses a complex memory hierarchy, which consists of multiple cache memory levels. It is very difficult to predict the behavior of this hierarchy for a given program (for details see [1, 2]).
I. Šimeček
doaj

Cache-optimized BFS on multi-core CPUs

Proceedings of the 1st FastCode Programming Challenge
Breadth-First Search (BFS) performance on shared-memory systems is often limited by irregular memory access and cache inefficiencies. This work presents two optimizations for BFS graph traversal: a bitmap-based algorithm designed for small-diameter graphs and MergedCSR, a graph storage format that improves cache locality for large-scale graphs ...
Salvatore Domenico Andaloro, Thomas Pasquali, Flavio Vella +2 more
openaire +1 more source

GEMM-ArchProfiler: A simulation framework for hardware-level profiling and performance analysis of General Matrix Multiplication in real CNN workloads on heterogeneous CPU architectures

SoftwareX
In this paper, the authors present GEMM-ArchProfiler, a simulation framework for evaluating General Matrix Multiplication performance in convolutional neural networks.
Binu Ayyappan, G. Santhosh Kumar
doaj +1 more source

NuMagSANS: a GPU‐accelerated open‐source software package for the generic computation of nuclear and magnetic small‐angle neutron scattering observables of complex systems

Journal of Applied Crystallography, EarlyView.
NuMagSANS, a GPU‐accelerated software package for the computation of nuclear and magnetic small‐angle neutron scattering cross sections and correlation functions of complex systems, is presented.We present NuMagSANS, a GPU‐accelerated software package for calculating nuclear and magnetic small‐angle neutron scattering (SANS) cross sections and ...
Michael P. Adams, Andreas Michels
wiley +1 more source

computer science
engineering
fos: computer and information sciences

cache
performance cs.pf
gpu