Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs [PDF]
General Matrix Multiplication (GEMM) has a wide range of applications in scientific simulation and artificial intelligence. Although traditional libraries can achieve high performance on large regular-shaped GEMMs, they often behave not well on irregular-shaped GEMMs, which are often found in new algorithms and applications of high-performance ...
arxiv
RIoTBench: A Real-time IoT Benchmark for Distributed Stream Processing Platforms [PDF]
The Internet of Things (IoT) is an emerging technology paradigm where millions of sensors and actuators help monitor and manage, physical, environmental and human systems in real-time. The inherent closedloop responsiveness and decision making of IoT applications make them ideal candidates for using low latency and scalable stream processing platforms.
arxiv +1 more source
Guided Lock of a Suspended Optical Cavity Enhanced by a Higher Order Extrapolation [PDF]
Lock acquisition of a suspended optical cavity can be a highly stochastic process and is therefore nontrivial. Guided lock is a method to make lock acquisition less stochastic by decelerating the motion of the cavity length based on an extrapolation of ...
Arai, Koji+5 more
core +3 more sources
Xampling: Signal Acquisition and Processing in Union of Subspaces
We introduce Xampling, a unified framework for signal acquisition and processing of signals in a union of subspaces. The main functions of this framework are two.
Eldar, Yonina C.+2 more
core +1 more source
WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs [PDF]
The combination of Winograd's algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, handling arbitrary convolution kernel sizes in FPGA-based Winograd processing elements and supporting efficient data access remain ...
arxiv
A DSP shared is a DSP earned: HLS Task-Level Multi-Pumping for High-Performance Low-Resource Designs [PDF]
High-level synthesis (HLS) enhances digital hardware design productivity through a high abstraction level. Even if the HLS abstraction prevents fine-grained manual register-transfer level (RTL) optimizations, it also enables automatable optimizations that would be unfeasible or hard to automate at RTL.
arxiv +1 more source
Optoelectronic Devices for In‐Sensor Computing
The raw data obtained directly from sensors in the noisy analogue domain is often unstructured, which lacks a predefined format or organization and does not conform to a specific data model. Optoelectronic devices for in‐sensor visual processing can integrate perception, memory, and processing functions in the same physical units, which can compress ...
Qinqi Ren+7 more
wiley +1 more source
A new partitioning approach for layout synthesis from register-transfer netlists [PDF]
Most of the IC today are described and documented using heiarchical netlists. In addition to gates, latches, and flip-flops, these netlists include sliceable register-transfer components such as registers, counters, adders, ALUs, shifters, register files,
Gajski, Daniel, Wu, Allen C.H.
core
The Imaging Magnetograph eXperiment (IMaX) for the Sunrise balloon-borne solar observatory [PDF]
The Imaging Magnetograph eXperiment (IMaX) is a spectropolarimeter built by four institutions in Spain that flew on board the Sunrise balloon-borne telesocope in June 2009 for almost six days over the Arctic Circle.
A. Feller+78 more
core +2 more sources
Complexity Analysis and DSP Implementation of the Fractional-Order Lorenz Hyperchaotic System
The fractional-order hyperchaotic Lorenz system is solved as a discrete map by applying the Adomian decomposition method (ADM). Lyapunov Characteristic Exponents (LCEs) of this system are calculated according to this deduced discrete map.
Shaobo He, K. Sun, Huihai Wang
semanticscholar +1 more source