Results 151 to 160 of about 20,810 (198)
Some of the next articles are maybe not open access.

Understanding Apache Spark

2021
Apache Spark is a data analytics platform that has made big data accessible and brings large-scale data processing into the reach of every developer. With Apache Spark, it is as easy to read from a single CSV file on your local machine as it is to read from a million CSV files in a data lake.
openaire   +1 more source

Performance comparison of Apache Hadoop and Apache Spark

Proceedings of the Third International Conference on Advanced Informatics for Computing Research, 2019
The term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the researchers view both frameworks as the rivals but it is not that easy to compare these two as they ...
Amritpal Singh   +2 more
openaire   +1 more source

Partitioning in Apache Spark

2019
Apache Spark performs in-memory computation. The data structure used is Resilient Distributed Datasets (RDDs). These RDDs are partitioned using inbuilt Hash and Range Partitioning. We propose a partition scheme which uses modular division on keys of elements with numbers from 2 to 10.
H. S. Sreeyuktha, J. Geetha Reddy
openaire   +1 more source

The Engine: Apache Spark

2016
If our stack were a vehicle, now we have reached the engine. As an engine, we will disarm it, analyze it, master it, improve it, and run it to the limit.
Raul Estrada, Isaac Ruiz
openaire   +1 more source

Balanced Graph Partitioning with Apache Spark

2014
A significant part of the data produced every day by online services is structured as a graph. Therefore, there is the need for efficient processing and analysis solutions for large scale graphs. Among the others, the balanced graph partitioning is a well known NP-complete problem with a wide range of applications.
Carlini, Emanuele   +4 more
openaire   +2 more sources

Accelerating Apache Spark with FPGAs

Concurrency and Computation: Practice and Experience, 2017
SummaryApache Spark has become one of the most popular engines forbig dataprocessing. Spark provides a platform‐independent, high‐abstraction programming paradigm for large‐scale data processing by leveraging the Java framework. Though it provides software portability across various machines, Java also limits the performance of distributed environments,
Ehsan Ghasemi, Paul Chow
openaire   +1 more source

Hello Apache Spark

2019
Doesn't it feel good when you are in the vicinity of your envisioned and cherished destination? When you see in retrospect that you've been through a long journey and the milestone that you once dreamed of is in your reach? You must have the same feeling as you start this chapter, because this last chapter of the book is all about how you can put the ...
openaire   +1 more source

Using Apache Spark

2016
Apache Spark is a data processing engine for large data sets. Apache Spark is much faster (up to 100 times faster in memory) than Apache Hadoop MapReduce. In cluster mode, Spark applications run as independent processes coordinated by the SparkContext object in the driver program, which is the main program. The SparkContext may connect to several types
openaire   +1 more source

Fraud Detection Using Apache Spark

2019 5th International Conference on Optimization and Applications (ICOA), 2019
Fraud detection methods are continuously developed to defend criminals. They allow us to identify quickly and easily the frauds. In this work, we will focus on the problem of fraud detection in banking transactions. A single algorithm may not be suitable for every problem.
Abdelkbir ARMEL, Dounia ZAIDOUNI
openaire   +1 more source

Home - About - Disclaimer - Privacy