Results 91 to 100 of about 9,555 (241)
Performance Evaluation of Virtualized Hadoop Clusters [PDF]
In this report we investigate the performance of Hadoop clusters, deployed with separated storage and compute layers, on top of a hypervisor managing a single physical host. We have analyzed and evaluated the different Hadoop cluster configurations by running CPU bound and I/O bound workloads.
arxiv
Natural Language Processing using Hadoop and KOSHIK [PDF]
Natural language processing, as a data analytics related technology, is used widely in many research areas such as artificial intelligence, human language processing, and translation. At present, due to explosive growth of data, there are many challenges for natural language processing.
arxiv
Building and Installing a Hadoop/MapReduce Cluster from Commodity Components [PDF]
This tutorial presents a recipe for the construction of a compute cluster for processing large volumes of data, using cheap, easily available personal computer hardware (Intel/AMD based PCs) and freely available open source software (Ubuntu Linux, Apache Hadoop).
arxiv
OS-Assisted Task Preemption for Hadoop [PDF]
This work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extremes cases of killing tasks (which waste work) or waiting for their completion (which ...
arxiv
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics [PDF]
Ronald C. Taylor
openalex +1 more source
Finding Top- $k$ Dominance on Incomplete Big Data Using MapReduce Framework
Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes large.
Payam Ezatpoor+3 more
doaj +1 more source
Early Accurate Results for Advanced Analytics on MapReduce [PDF]
Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big
arxiv
Decoupling storage and computation in Hadoop with SuperDataNodes [PDF]
George Porter
openalex +1 more source
M3R: Increased performance for in-memory Hadoop jobs [PDF]
Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged -- including jobs produced by compilers for higher-level
arxiv