Apache spark - Open Access .click

Results 1 to 10 of about 20,810 (198)

Framing Apache Spark in life sciences [PDF]

Heliyon, 2023
Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing,
Andrea Manconi +4 more
doaj +5 more sources

Large-scale digital forensic investigation for Windows registry on Apache Spark [PDF]

PLoS ONE, 2022
In this study, we investigate large-scale digital forensic investigation on Apache Spark using a Windows registry. Because the Windows registry depends on the system on which it operates, the existing forensic methods on the Windows registry have been ...
Jun-Ha Lee, Hyuk-Yoon Kwon
doaj +3 more sources

Implementing Apache Spark jobs execution and Apache Spark cluster creation for Openstack Sahara[1] [PDF]

Труды Института системного программирования РАН, 2018
In this paper the problem of creating virtual clusters in clouds for big data analysis with Apache Hadoop and Apache Spark is discussed. Existing methods for Apache Spark clusters creation are described in this work.
A. . Aleksiyants +4 more
doaj +4 more sources

Bioinformatics applications on Apache Spark. [PDF]

Gigascience, 2018
With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing. Therefore, there is an urgent need for highly scalable and powerful computational systems. Among the state-of-the-art parallel computing platforms, Apache Spark is a fast, general-purpose, in ...
Guo R, Zhao Y, Zou Q, Fang X, Peng S.
europepmc +4 more sources

Big Data in metagenomics: Apache Spark vs MPI. [PDF]

PLoS ONE, 2020
The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine.
José M Abuín +4 more
doaj +2 more sources

A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark [PDF]

Entropy, 2023
Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it ...
Huidong Ling +5 more
doaj +2 more sources

HRV-Spark: Computing Heart Rate Variability Measures Using Apache Spark. [PDF]

Proceedings (IEEE Int Conf Bioinformatics Biomed), 2020
Heart rate variability (HRV) analysis has been serving as a significant promising marker in clinical research over the last few decades. The rapidly growing heart rate data generated from various devices, particularly the electrocardiograph (ECG), need to be stored properly and processed timely.
Qu X, Wu Y, Liu J, Cui L.
europepmc +4 more sources

DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark [PDF]

BMC Bioinformatics, 2019
Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts.
Michael D. Linderman +3 more
doaj +2 more sources

Adding data provenance support to Apache Spark. [PDF]

VLDB J, 2018
Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countless hours collecting evidence (e.g., from log files) and performing trial-and-error debugging.
Interlandi M +7 more
europepmc +6 more sources

DNA short read alignment on apache spark [PDF]

Applied Computing and Informatics, 2023
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges.
Maryam AlJame, Imtiaz Ahmad
doaj +1 more source

big data
spark
machine learning

hadoop
apache hadoop
mapreduce

technology
medicine
3. good health