Results 91 to 100 of about 224,259 (279)
A framework for FPGA functional units in high performance computing [PDF]
FPGAs make it practical to speed up a program by defining hardware functional units that perform calculations faster than can be achieved in software. Specialised digital circuits avoid the overhead of executing sequences of instructions, and they make
Koltes, A., O'Donnell, J.T.
core +1 more source
FinRL Contests: Data‐Driven Financial Reinforcement Learning Agents for Stock and Crypto Trading
FinRL Contests 2023–2025 explore the application of reinforcement learning in financial tasks, which are modelled as the Markov decision process (MDP). Participants specify state, reward and action to train the FinRL agents in stable market environments, advancing the development of RL‐based trading strategies in real‐world financial markets.
Keyi Wang+7 more
wiley +1 more source
Exploitation of potential parallelism is obviously a major source of code optimization. This chapter therefore focusses on DSP-specific techniques, which aim at parallelization of generated vertical machine code. In the first part, we consider the area of memory address generation.
openaire +2 more sources
Performance Debugging and Tuning using an Instruction-Set Simulator [PDF]
Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems.
Magnusson, Peter S., Montelius, Johan
core +3 more sources
Memory and Parallelism Analysis Using a Platform-Independent Approach
Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task.
Awan, Ahsan Javed+5 more
core +1 more source
Exploiting superword level parallelism with multimedia instruction sets [PDF]
S. Larsen, Saman Amarasinghe
openalex +3 more sources
Instruction scheduling and software pipelining for modern architectures
We describe the approach for instruction scheduling and software pipelining based on a two-stage extensible architecture of detecting and using the available instruction level parallelism.
Arutyun Avetisyan
doaj
Array languages and the N-body problem [PDF]
This paper is a description of the contributions to the SICSA multicore challenge on many body planetary simulation made by a compiler group at the University of Glasgow.
Cockshott, P., Gdura, Y., Keir, P.
core +1 more source
Abstract In this article we apply Wacquant's conceptualization of the ghetto to an analysis of interviews conducted with Roma people living in the state‐enforced camps of Turin, Italy. We illustrate how the elements characterizing a ghetto according to Wacquant (i.e.
Vincenzo Romania, Tommaso Bertazzo
wiley +1 more source
A high-performance tensor computing unit for deep learning acceleration
The increasing complexity of neural network applications has led to a demand for higher computational parallelism and more efficient synchronization in artificial intelligence (AI) chips. To achieve higher performance and lower power, a comprehensive and
Qiang Zhou+3 more
doaj +1 more source