Results 11 to 20 of about 2,924 (186)
Comparative Study of Record Linkage Approaches for Big Data
Record linkage is a challenging task for Big Data. This paper, hence, attempts to shed light on record linkage approaches for Big Data by comparing three dimensions involving record linkage phases, dataset properties, and parallel processing approach ...
Randa MOHAMED +3 more
doaj +3 more sources
Optimization for Large-Scale Dimension Table Connection Technology in Distributed Environment [PDF]
The large-scale dimension table connection technology in the distributed environment is one of the key technologies in online big data analysis, which is widely used in real-time recommendation, real-time analysis and other fields.
ZHAO Hengtai, ZHAO Yuhai, YUAN Ye, JI Hangxu, QIAO Baiyou, WANG Guoren
doaj +1 more source
GeoFlink: An Efficient and Scalable Spatial Data Stream Management System
This era is witnessing an exponential growth in spatial data due to the increase in GPS-enabled devices. Spatial data can be of extreme use to commercial businesses, governments and NGOs if processed timely.
Salman Ahmed Shaikh +4 more
doaj +1 more source
VeilGraph: incremental graph stream processing
Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal ...
Miguel E. Coimbra +3 more
doaj +1 more source
DDoS attacks and machine‐learning‐based detection methods: A survey and taxonomy
This review paper discusses the Distributed Denial of Service (DDoS) attacks, the machine learning‐based detection methods of these attacks and the existing challenges. Some of the most commonly used public datasets are also compared, and their strengths and shortcomings are discussed.
Mohammad Najafimehr +2 more
wiley +1 more source
Scalable multi‐site photovoltaic power forecasting based on stream computing
This work proposes a multi‐site photovoltaic forecasting system that contains message queue and stream engine, where a forecasting model is continuously updated using real‐time data. A benchmark with 60 sites served was performed to verify the scalability of the system.
Yuxi Sun +4 more
wiley +1 more source
An investigation of distributed computing for combinatorial testing
Combinatorial test generation is the process of generating sets of input parameters for a system under test, by considering interactions between t values of multiple parameters; the paper investigates the use of distributed algorithms to generate such test suites.
Edmond La Chance, Sylvain Hallé
wiley +1 more source
s2p: Provenance Research for Stream Processing System
The main purpose of our provenance research for DSP (distributed stream processing) systems is to analyze abnormal results. Provenance for these systems is not nontrivial because of the ephemerality of stream data and instant data processing mode in ...
Qian Ye, Minyan Lu
doaj +1 more source
Automated issue assignment using topic modelling on Jira issue tracking data
In this work, we provide a methodology for automated issue assignment, designed using Jira issue tracking data extracted from the Apache Software Foundation that describe both features and bugs. Our methodology employs topic modelling to extract the semantics of text features, while optimising the LDA algorithm (number of topics) using the assignment ...
Themistoklis Diamantopoulos +2 more
wiley +1 more source
Influencing Factors in the Scalability of Distributed Stream Processing Jobs
More and more use cases require fast, accurate, and reliable processing of large volumes of data. To do this, a distributed stream processing framework is needed which can distribute the load over several machines.
Giselle Van Dongen, Dirk Van Den Poel
doaj +1 more source

