A very interesting observation!
Apache Hadoop is a platform modeled after Google released some papers on it’s own architecture. At it’s core it provides two things:
- A Relatively Cheaper Storage Alternative
- An MPP style processing platform that runs on stock hardware
While at it’s core, Apache Hadoop has only few components including it’s common utilities, the HDFS file system and the MapReduce infrastructure, there are too many Hadoop related projects such as HBase, Pig, Chukwa, Ambari, Hive, Cassandra, Spark, Tez, Sqoop and many more keep getting added. Not only do each of these projects have their own purposes and use cases, they tend to compete with each other at times too. Just look at the proliferation of SQL on Hadoop options.
… So, can we move our complete Data Warehouse to the Apache Hadoop platform?
And, the answer as usual is … “It depends”.
It really depends on your use cases. You can certainly use Hadoop as a platform to store your data and I would bet, that for over 90% of use cases, it would work.
However, you would still need to do actual data modeling and organizing of areas and all the other stuff related to data warehousing.
Reference: Apache Hadoop is Just a Platform!