Apache hadoop - Open Access .click

Results 31 to 40 of about 14,382 (202)

Apache Mahout’s k-Means vs. fuzzy k-Means performance evaluation [PDF]

, 2016
(c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or ...
Barolli, Leonard +3 more
core +1 more source

Apache Hadoop-MapReduce on YARN framework latency

Procedia Computer Science, 2021
Abstract Big Data is currently a fertile field for researchers and scientific companies around the world, due to the emergence of new technologies, Internet of Things (IoT) and means of communication such as social networking sites, which has led to a notable increase in the amount of data produced each day.
Abdelaziz EL YAZIDI +3 more
openaire +1 more source

BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink

Frontiers in Health Informatics, 2019
Introduction: Health care data is increasing. The correct analysis of such data will improve the quality of care and reduce costs. This kind of data has certain features such as high volume, variety, high-speed production, etc. It makes it impossible to analyze with ordinary hardware and software platforms. Choosing the right platform for managing this
Elham Nazari +2 more
openaire +2 more sources

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection [PDF]

Journal of Advances in Computer Engineering and Technology, 2019
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change.
Salah Uddin +4 more
doaj

A comparison of HDFS compact data formats: Avro versus Parquet / HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet

Mokslas: Lietuvos Ateitis, 2017
In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen
Daiga Plase, Laila Niedrite, Romans Taranovs +2 more
doaj +1 more source

Advancing Organic Chemistry Using High‐Throughput Experimentation

Angewandte Chemie, Volume 137, Issue 40, September 26, 2025.
This review outlines major advances in the design, execution, analysis, and data management phases of high‐throughput experimentation (HTE). The limitations and potential opportunities of applying modern HTE to organic synthesis are highlighted. Abstract High‐throughput experimentation (HTE), the miniaturization and parallelization of reactions, is a ...
Reem Nsouli +2 more
wiley +2 more sources

A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce [PDF]

, 2017
Nowadays many companies have available large amounts of raw, unstructured data. Among Big Data enabling technologies, a central place is held by the MapReduce framework and, in particular, by its open source implementation, Apache Hadoop.
Ardagna, Danilo +3 more
core +3 more sources

Challenging SQL-on-Hadoop Performance with Apache Druid [PDF]

, 2019
In Big Data, SQL-on-Hadoop tools usually provide satisfactory performance for processing vast amounts of data, although new emerging tools may be an alternative. This paper evaluates if Apache Druid, an innovative column-oriented data store suited for online analytical processing workloads, is an alternative to some of the well-known SQL-on-Hadoop ...
José Correia, Carlos Costa, Maribel Yasmina Santos +2 more
openaire +2 more sources

Real-time Twitter data analysis using Hadoop ecosystem

Cogent Engineering, 2018
In the era of the Internet, social media has become an integral part of modern society. People use social media to share their opinions and to have an up-to-date knowledge about the current trends on a daily basis.
Anisha P. Rodrigues, Niranjan N. Chiplunkar +1 more
doaj +1 more source

Deploying Apache Spark virtual clusters in cloud environments using orchestration technologies

Труды Института системного программирования РАН, 2018
Apache Spark is a framework providing fast computations on Big Data using MapReduce model. With cloud environments Big Data processing becomes more flexible since they allow to create virtual clusters on-demand. One of the most powerful open-source cloud
O. . Borisenko, R. . Pastukhov, S. . Kuznetsov +2 more
doaj +1 more source

big data
hadoop
apache spark

mapreduce
hdfs
spark

apache hive
hive