Results 31 to 40 of about 22,068 (223)
Privacy-Preserving Machine Learning on Apache Spark
The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference.
Claudia V. Brito +4 more
doaj +1 more source
Dynamic Multi-Objective Optimization With jMetal and Spark: a Case Study [PDF]
Technologies for Big Data and Data Science are receiving increasing research interest nowadays. This paper introduces the prototyping architecture of a tool aimed to solve Big Data Optimization problems.
C Coello +9 more
core +1 more source
Alchemist: An Apache Spark ⇔ MPI interface [PDF]
SummaryThe Apache Spark framework for distributed computation is popular in the data analytics community due to its ease of use, but its MapReduce‐style programming model can incur significant overheads when performing computations that do not map directly onto this model. One way to mitigate these costs is to off‐load computations onto MPI codes.
Alex Gittens +8 more
openaire +2 more sources
Combining Terrier with Apache Spark to Create Agile Experimental Information Retrieval Pipelines [PDF]
Experimentation using IR systems has traditionally been a procedural and laborious process. Queries must be run on an index, with any parameters of the retrieval models suitably tuned.
Macdonald, Craig
core +1 more source
StreamApprox: Approximate Computing for Stream Analytics [PDF]
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset ...
Bhatotia, Pramod +5 more
core +1 more source
Evaluasi Kinerja MLLIB APACHE SPARK pada Klasifikasi Berita Palsu dalam Bahasa Indonesia
Machine learning digunakan untuk menganalisis, mengklasifikasikan, atau memprediksi data. Untuk melakukan tugas dari machine learning diperlukan alat bantu dengan kinerja serta lingkungan yang kuat demi mendapatkan akurasi dan efisiensi waktu yang baik.
Antonius Angga Kurniawan +1 more
doaj +1 more source
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures [PDF]
Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large ...
Fox, Geoffrey C. +4 more
core +1 more source
Time series analysis with apache spark and its applications to energy informatics
In energy economy forecasts of different time series are rudimentary. In this study, a prediction for the German day-ahead spot market is created with Apache Spark and R.
Cornelia Krome, Volker Sander
doaj +1 more source
CLASSIFICATION OF BIG POINT CLOUD DATA USING CLOUD COMPUTING [PDF]
Point cloud data plays an significant role in various geospatial applications as it conveys plentiful information which can be used for different types of analysis.
K. Liu, J. Boehm
doaj +1 more source
SpaRC: scalable sequence clustering using Apache Spark [PDF]
Abstract Motivation Whole genome shotgun based next-generation transcriptomics and metagenomics studies often generate 100–1000 GB sequence data derived from tens of thousands of different genes or microbial species.
Lizhen Shi +4 more
openaire +4 more sources

