Results 91 to 100 of about 9,555 (241)

Performance Evaluation of Virtualized Hadoop Clusters [PDF]

open access: yesarXiv, 2014
In this report we investigate the performance of Hadoop clusters, deployed with separated storage and compute layers, on top of a hypervisor managing a single physical host. We have analyzed and evaluated the different Hadoop cluster configurations by running CPU bound and I/O bound workloads.
arxiv  

Natural Language Processing using Hadoop and KOSHIK [PDF]

open access: yesarXiv, 2016
Natural language processing, as a data analytics related technology, is used widely in many research areas such as artificial intelligence, human language processing, and translation. At present, due to explosive growth of data, there are many challenges for natural language processing.
arxiv  

Building and Installing a Hadoop/MapReduce Cluster from Commodity Components [PDF]

open access: yesarXiv, 2009
This tutorial presents a recipe for the construction of a compute cluster for processing large volumes of data, using cheap, easily available personal computer hardware (Intel/AMD based PCs) and freely available open source software (Ubuntu Linux, Apache Hadoop).
arxiv  

OS-Assisted Task Preemption for Hadoop [PDF]

open access: yesarXiv, 2014
This work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extremes cases of killing tasks (which waste work) or waiting for their completion (which ...
arxiv  

Finding Top- $k$ Dominance on Incomplete Big Data Using MapReduce Framework

open access: yesIEEE Access, 2018
Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes large.
Payam Ezatpoor   +3 more
doaj   +1 more source

Early Accurate Results for Advanced Analytics on MapReduce [PDF]

open access: yesProceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 10, pp. 1028-1039 (2012), 2012
Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big
arxiv  

M3R: Increased performance for in-memory Hadoop jobs [PDF]

open access: yesProceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 12, pp. 1736-1747 (2012), 2012
Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged -- including jobs produced by compilers for higher-level
arxiv  

Home - About - Disclaimer - Privacy