|
|
|
Hadoop++ and HAIL

Hadoop++: Nowadays, working over very large data sets (Petabytes of information) is a common reality for several enterprises. In this context, query processing is a big challenge and becomes crucial. The Apache Hadoop project has been adopted by many famous companies to query their Petabytes of information. Some examples of such enterprises are Yahoo! and Facebook. Recently, some researchers from the database community indicated that Hadoop may suffer from performance issues when running analytical queries. We believe this is not an inherent problem of the MapReduce paradigm but rather some implementation choices done in Hadoop. Therefore, the overall goal of Hadoop++ project is to improve Hadoop's performance for analytical queries. Already, our preliminary results show an improvement of Hadoop++ over Hadoop by up to a factor 20. In addition, we are currently investigating the impact of a number of other optimizations techniques.
paper

HAIL (Hadoop Aggressive Indexing Library) is an
enhancement of HDFS and Hadoop MapReduce that dramatically
improves runtimes of several classes of MapReduce jobs. HAIL
changes the upload pipeline of HDFS in order to create different
clustered indexes on each data block replica. An interesting feature
of HAIL is that we typically create a win-win situation: we improve
both data upload to HDFS and the runtime of the actual Hadoop
MapReduce job. In terms of data upload, HAIL improves over
HDFS by up to 60% with the default replication factor of three.
In terms of query execution, we demonstrate that HAIL runs up
to 68x faster than Hadoop and even outperforms Hadoop++.
initial paper follow-up paper
Current Team
- Prof. Jens Dittrich
- Dr. Jorge Quiane
- Stefan Schuh
- Stefan Richter
- Felix Martin Schuhknecht
News
Publications
-
Jens Dittrich, Stefan Richter, Stefan Schuh
Efficient OR Hadoop: Why Not Both?
Datenbank Spektrum, January 2013
-
Stefan Richter, Jorge-Arnulfo Quiane-Ruiz, Stefan Schuh, Jens Dittrich
Towards Zero-Overhead Adaptive Indexing in Hadoop
TR, arXiv:1212.3480 [cs.DB], 2012
-
Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Jens Dittrich
WWHow! Freeing Data Storage from Cages
CIDR 2013, Outrageous Ideas and Vision Track, Asilomar, USA.
-
Jens Dittrich, Jorge-Arnulfo Quiane-Ruiz
Efficient Big Data Processing in Hadoop MapReduce
VLDB 2012/PVLDB, Istanbul, Turkey. (Tutorial) slides
-
Jens Dittrich, Jorge-Arnulfo Quiane-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, Jörg Schad
Only Aggressive Elephants are Fast Elephants
VLDB 2012/PVLDB, Istanbul, Turkey. slides
-
Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Jens Dittrich
Trojan Data Layouts: Right Shoes for a Running Elephant
ACM SOCC 2011, Cascais, Portugal.
-
Jorge-Arnulfo Quiane-Ruiz, Christoph Pinkel, Jörg Schad, Jens Dittrich
RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures
SIGMOD 2011, Athens. (Demo paper) poster
-
Jorge-Arnulfo Quiane-Ruiz, Christoph Pinkel, Jörg Schad, Jens Dittrich
RAFTing MapReduce: Fast Recovery on the Raft
ICDE 2011, Hannover. TR
-
Jörg Schad
Flying Yellow Elephant: Predictable and Efficient MapReduce in the Cloud
VLDB 2010 PhD Workshop, Singapore.
-
Jens Dittrich, Jorge-Arnulfo Quiane-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad
Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)
VLDB 2010/PVLDB, Singapore. correction slides
-
Jörg Schad, Jens Dittrich, and Jorge-Arnulfo Quiane-Ruiz
Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance
VLDB 2010/PVLDB, Singapore. slides
|