Thursday, May 15, 2014

Hadoop: Questions I Am Asking

I have close to 14 years’ experience with SQL Server for ETL around Data Warehousing. I lead a team of very talented Data Warehouse Developers who have developed and maintain a multi-terabyte data warehouse. We ETL and dimensional model data daily describing tens of thousands orders, millions of dollars of sales, millions of web site visitor metrics and tens of millions of web page views. And we do this each night, and more, and have it all ready for the C-suite execs to drink in with their morning coffee! I’m not saying this to brag (well, maybe a little), but because despite that experience, Hadoop puts me in an alien world where the normal tools of my trade don’t seem to make sense.

At this point in time, the questions I am asking myself are:

  • How much of my Data Warehouse environment and processes will eventually be replaced by Hadoop related technologies and processes?
  • What ETL processes are best done in Hadoop and which in SQL/SSIS?
  • How much of my storage will transfer to Hadoop, Archive, Raw Staged, Operational Stores and Modeled Data?
  • How big of Hadoop environment do I need to surpass the power of my current SQL environment?
  • Does Hadoop mean adapting new technology to the existing BI strategy or do we need a new BI strategy?

I am tenacious, so it not a matter of “if” but “when” I’ll know which of my old tools will work, how to use new tools and new strategies to conquer the next generation of data challenges.