Results 1 to 10 of about 50,991 (186)
SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink
Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the ...
Oscar Ceballos +2 more
exaly +3 more sources
Continuous outlier mining of streaming data in flink [PDF]
In this work, we focus on distance-based outliers in a metric space, where the status of an entity as to whether it is an outlier is based on the number of other entities in its neighborhood. In recent years, several solutions have tackled the problem of distance-based outliers in data streams, where outliers must be mined continuously as new elements ...
Theodoros Toliopoulos +2 more
exaly +4 more sources
Apache Flink and clustering-based framework for fast anonymization of IoT stream data
In this paper, we present a novel framework that considers the expiration period time of the Internet of Things (IoT) data stream to anonymize it. IoT stands among one of most fast-growing technology in the world. Also, anonymity is one of the safeguards
Alireza Sadeghi-Nasab +2 more
exaly +3 more sources
FlinkCheck: Property-Based Testing for Apache Flink
Apache Flink is an open-source, soft real-time stream processing framework underlying many modern systems dealing with cloud and real-time computing, data analytics, and the Internet of Things, among others. As the complexity of stream-processing systems
Enrique Martín-Martín +2 more
exaly +3 more sources
DPASF: a flink library for streaming data preprocessing [PDF]
Background Data preprocessing techniques are devoted to correcting or alleviating errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques.
Alejandro Alcalde-Barros +3 more
doaj +3 more sources
As monitoring technologies and data collection methodologies advance, landslide disaster data reflects attributes such as diverse sources, heterogeneity, substantial volumes, and stringent real-time requirements.
Haibo Yang, Yingchun Cai
exaly +3 more sources
Node Priority Scheduling Strategy Based on Heterogeneous Flink Cluster [PDF]
The default task scheduling strategy of the Flink stream processing system ignores the cluster heterogeneity and available resources of nodes to a certain extent, resulting in an unbalanced overall cluster load.This study investigates the real-time ...
WANG Wenhao, SHI Xuerong
doaj +1 more source
Explainable Distance-Based Outlier Detection in Data Streams
Explaining outliers is a topic that attracts a lot of interest; however existing proposals focus on the identification of the relevant dimensions. We extend this rationale for unsupervised distance-based outlier detection, and through investigating ...
Theodoros Toliopoulos +1 more
doaj +1 more source
Benchmarking Distributed Stream Data Processing Systems [PDF]
The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics.
Heiskanen, Henri +5 more
core +7 more sources
s2p: Provenance Research for Stream Processing System
The main purpose of our provenance research for DSP (distributed stream processing) systems is to analyze abnormal results. Provenance for these systems is not nontrivial because of the ephemerality of stream data and instant data processing mode in ...
Qian Ye, Minyan Lu
doaj +1 more source

