Apache Hadoop 2.0 (Alpha) Released

ojilles · on May 27, 2012

Does anyone know if this improves error reporting at all? That one major pain I've had dealing with Hadoop: something goes wrong, you get some super obscure error message and consequently you'll end up doing a major binary search to find out what caused it.

Rickasaurus · on May 27, 2012

Disappointed to see no progress being made toward including iterative map reduce. Longer term I think we'll need to look to Mesos/Spark for real innovation.

fruchtose · on May 27, 2012

Frankly I think Hadoop suffers from too much flexibility. Notice just how many projects are built on top of Hadoop, but more importantly what they do. I'm not going to say that projects like Hive or Pig should be integrated, because these things are too far from the core philosophy of Hadoop. Iterative map-reduce is a featureset one would expect from a map-reduce library. Twister has iterative map-reduce for years.

If Spark gets anywhere near 30x the performance of Hadoop when it comes to iteartive map-reduce, that should raise serious questions: questions like, "What makes Spark so successful?" and, "Can we integrate Spark into the core library?" There is a need for iterative Hadoop that Spark addresses, and I would also consider it to be major enough to deserve inclusion in the core library. I realize that it is a Scala library, so this could take a lot of time. Even so, I would really want to catch up with that 30x speedup. That is too big to ignore.

What's interesting is that the new Hadoop engine YARN and Mesos have similar goals (source: http://www.quora.com/How-does-NextGen-MapReduce-compare-to-M...). Maybe this will make built-in iterative map-reduce a possibility in the future?

res0nat0r · on May 27, 2012

Official release notes page from Apache: http://hadoop.apache.org/common/docs/r2.0.0-alpha/