Results 1 to 10 of about 11,273 (224)
Some of the next articles are maybe not open access.

Beyond Hadoop

Communications of the ACM, 2013
The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise.
openaire   +1 more source

Securing Hadoop in cloud

Proceedings of the 2014 Symposium and Bootcamp on the Science of Security, 2014
Hadoop is a map-reduce implementation that rapidly processes data in parallel. Cloud provides reliability, flexibility, scalability, elasticity and cost saving to customers. Moving Hadoop into Cloud can be beneficial to Hadoop users. However, Hadoop has two vulnerabilities that can dramatically impact its security in a Cloud.
Xianqing Yu, Peng Ning, Mladen A. Vouk
openaire   +1 more source

Clustering with Apache Hadoop

Proceedings of the International Conference & Workshop on Emerging Trends in Technology - ICWET '11, 2011
The self-organizing map (SOM) is an unsupervised neural network which projects high-dimensional data onto a low-dimensional grid and visually reveals the topological order of the original data. Thus, SOM is an excellent tool in the exploratory phase of data mining.
Sindhu Nair, Jalpa D. Mehta
openaire   +1 more source

Autoscaling for Hadoop Clusters

2016 IEEE International Conference on Cloud Engineering (IC2E), 2016
Unforeseen events such as node failures and resource contention can have a severe impact on the performance of data processing frameworks, such as Hadoop, especially in cloud environments where such incidents are common. SLA compliance in the presence of such events requires the ability to quickly and dynamically resize infrastructure resources ...
Anshul Gandhi   +4 more
openaire   +1 more source

Experiments on Networking of Hadoop

2014 IEEE 22nd International Conference on Network Protocols, 2014
Hadoop is a popular application process big data problems in a networked dist computers. Investigations of performance for networking have been of interest with the networking paradigm through on-demand an enforcements. Network usage characterization can further help understand what policy info needed during application use cases.
Abdul Navaz   +2 more
openaire   +1 more source

Biodoop: Bioinformatics on Hadoop

2009 International Conference on Parallel Processing Workshops, 2009
Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is ...
Simone Leo   +2 more
openaire   +1 more source

Hadoop

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013
From it's beginnings as a framework for building web crawlers for small-scale search engines to being one of the most promising technologies for building datacenter-scale distributed computing and storage platforms, Apache Hadoop has come far in the last seven years.
openaire   +1 more source

Accelerating BigBench on Hadoop

2016
Benchmarking Big Data systems is an open challenge. The existing Micro-Benchmarks (e.g. TeraSort) do not present an end-to-end scenario in real world. To solve this issue, a new towards industry standard benchmark for Big Data Analytics called BigBench has been proposed.
Yan Tang   +6 more
openaire   +1 more source

Optimization Analysis of Hadoop

2016
Hadoop is a distributed data processing platform supporting MapReduce parallel computing framework. In order to deal with general problems, there is always a need of accelerating Hadoop under certain circumstance such as Hive jobs. By outputting current time to logs at specially selected points, we traced the workflow of a typical MapReduce job ...
Jinglun Li   +2 more
openaire   +1 more source

ST-Hadoop

Proceedings of the 2017 ACM International Conference on Management of Data, 2017
This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers.
openaire   +1 more source

Home - About - Disclaimer - Privacy