Over the last few years, you have invested heavily in building your Data Warehouse and Business Intelligence capabilities. And if you have done it right, you have been reaping the benefits of it in a meaningful way too. But we all know that it is handling only a reasonable amount of your structured data. Now we are in the Big Data era and you need to harness insights out of the large data sets – structured and un-structured – internal as well as external.
The current tools and technologies cannot handle the load that the new data sources and workloads bring to your current data warehouse. At the same time, there are a lot of opportunities (and to some extend a hype) about creating Data Lakes. While most of us are appreciating the value that data lakes can bring to the enterprise, but a lot of questions still exists on:
– How to build a true data lake?
-What is ideal reference architecture for a data lake?
-How to enable the end users to access the lake?
-What tools are best for end user adoption?
-How do we address the security concerns?
-How is Governance enforced in a data lake?
-How do we manage Meta data?
As we peel the onion, the number of such questions are becoming more and more.
A good way to get started on your Big Data journey would be to consider an Augmented Data Warehouse. You are augmenting your existing warehouse with the next generation capabilities like Hadoop and Spark for some of the new use cases that bring value to your business. In this way, you are not disrupting any of your existing capabilities but at the same time making the first step towards becoming a true Data Driven Enterprise.
I am listing below some high level benefits/best practices to consider when you start working on your architecture.
1. Continue to store Dimension, Hierarchies, Subject Oriented Aggregates into the Enterprise Data Warehouse (EDW)
2. Move the lower/granular/infrequently used/Additional History data from the EDW to the augmentation module, thus lowering costs (dramatically)
3. Move more reporting (where regular authoring is not required), Interactive Visualization and Analytics to Augmentation module – lowering the costs (dramatically)
4. Store unstructured data into Augmentation module that does not fit nicely into “Tables.” This means all the communication with your customers from phone logs, customer feedbacks, GPS locations, photos, tweets, emails, text messages, etc. can be stored. You can store this a lot more cost effectively in Augmentation module
5. Co-relate data in your EDW with the data in Augmentation module to get better insight about your customers, products, equipment, etc. You can now use this data for analytics that are computation-intensive, such as clustering and targeting
6. Run ad-hoc analytics in Augmentation module
7. Store only the critical data needed for fast analytics and BI in the data warehouse
8. No disruption to existing business analysts and reporting functions
9. Collect and stream real-time data
10. Execute real-time interactive queries
We @ Datalakes offer consulting services to help you on this journey. Please feel free to contact me at [email protected]