Results 1 to 10 of about 20,810 (198)
Framing Apache Spark in life sciences [PDF]
Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing,
Andrea Manconi +4 more
doaj +5 more sources
Large-scale digital forensic investigation for Windows registry on Apache Spark [PDF]
In this study, we investigate large-scale digital forensic investigation on Apache Spark using a Windows registry. Because the Windows registry depends on the system on which it operates, the existing forensic methods on the Windows registry have been ...
Jun-Ha Lee, Hyuk-Yoon Kwon
doaj +3 more sources
Implementing Apache Spark jobs execution and Apache Spark cluster creation for Openstack Sahara[1] [PDF]
In this paper the problem of creating virtual clusters in clouds for big data analysis with Apache Hadoop and Apache Spark is discussed. Existing methods for Apache Spark clusters creation are described in this work.
A. . Aleksiyants +4 more
doaj +4 more sources
Bioinformatics applications on Apache Spark. [PDF]
With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing. Therefore, there is an urgent need for highly scalable and powerful computational systems. Among the state-of-the-art parallel computing platforms, Apache Spark is a fast, general-purpose, in ...
Guo R, Zhao Y, Zou Q, Fang X, Peng S.
europepmc +4 more sources
Big Data in metagenomics: Apache Spark vs MPI. [PDF]
The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine.
José M Abuín +4 more
doaj +2 more sources
A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark [PDF]
Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it ...
Huidong Ling +5 more
doaj +2 more sources
HRV-Spark: Computing Heart Rate Variability Measures Using Apache Spark. [PDF]
Heart rate variability (HRV) analysis has been serving as a significant promising marker in clinical research over the last few decades. The rapidly growing heart rate data generated from various devices, particularly the electrocardiograph (ECG), need to be stored properly and processed timely.
Qu X, Wu Y, Liu J, Cui L.
europepmc +4 more sources
DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark [PDF]
Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts.
Michael D. Linderman +3 more
doaj +2 more sources
Adding data provenance support to Apache Spark. [PDF]
Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countless hours collecting evidence (e.g., from log files) and performing trial-and-error debugging.
Interlandi M +7 more
europepmc +6 more sources
DNA short read alignment on apache spark [PDF]
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges.
Maryam AlJame, Imtiaz Ahmad
doaj +1 more source

