Apache spark - Open Access .click

Results 11 to 20 of about 22,068 (223)

Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data. [PDF]

PLoS One, 2022
Jo J, Jung S, Park J, Kim Y, Kang M.
europepmc +2 more sources

DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark [PDF]

BMC Bioinformatics, 2019
Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts.
Michael D. Linderman +3 more
doaj +2 more sources

pmTM-align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP. [PDF]

BMC Bioinformatics, 2020
Chen W, Yao C, Guo Y, Wang Y, Xue Z.
europepmc +3 more sources

Adding data provenance support to Apache Spark. [PDF]

VLDB J, 2018
Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countless hours collecting evidence (e.g., from log files) and performing trial-and-error debugging.
Interlandi M +7 more
europepmc +6 more sources

DNA short read alignment on apache spark [PDF]

Applied Computing and Informatics, 2023
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges.
Maryam AlJame, Imtiaz Ahmad
doaj +1 more source

TRANSMUT‐Spark: Transformation mutation for Apache Spark [PDF]

Software Testing, Verification and Reliability, 2022
SummaryThis paper proposesTRANSMUT‐Sparkfor automating mutation testing of big data processing code within Spark programs. Apache Spark is an engine for big data analytics/processing that hides the inherent complexity of parallel big data programming. Nonetheless, programmers must cleverly combine Spark built‐in functions within programs and guide the ...
João Batista de Souza Neto +3 more
openaire +4 more sources

Efficient processing of complex XSD using Hive and Spark [PDF]

PeerJ Computer Science, 2021
The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with
Diana Martinez-Mosquera, Rosa Navarrete, Sergio Luján-Mora +2 more
doaj +2 more sources

Large-scale virtual screening on public cloud resources with Apache Spark. [PDF]

J Cheminform, 2017
Capuccini M, Ahmed L, Schaal W, Laure E, Spjuth O. +4 more
europepmc +3 more sources

Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark

ISPRS International Journal of Geo-Information, 2021
Aiming at the problem of spatial query processing in distributed computing systems, the design and implementation of new distributed spatial query algorithms is a current challenge.
Panagiotis Moutafis +3 more
doaj +1 more source

Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark [PDF]

, 2017
Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current
Dennis Hsu
openalex +5 more sources

data mining
artificial intelligence
operating system

machine learning
database
physics

parallel computing
algorithm
data science