Monthly archive for February 2016

RDDs Simplified


Spark simplified

Spark RDDs are very simple at the same time very important concept in Apache Spark. Most of you might be knowing the full form of RDD, it is Resilient Distributed Datasets. Resilient because RDDs are immutable(can’t be modified once created), Distributed because it is distributed across cluster and Dataset because it holds data. Read more


It has to be a Story and should be told


Businesswoman showing financial reports on a tablet computer

The Art of Story Telling in Data Science

I was watching a Television interview with one of the famous regional movie actors of India who rejected to play the lead role in a film that apparently turned out to be a big hit with another second in line actor. The reason he gave for his decision not to take up that film was – the director couldn’t articulate the story well so that he could visualize the final product. So it was not that the story was bad, it was not that his expertise on direction is limited, it was not the hero was not keen – but the main issue was the way in which the story was told.

Read more




222 S Church St
Charlotte, NC 28202
Phone: (+1) 704 804 1090
Email: [email protected]


Important: Datalakes is committed to protecting the privacy of our subscribers and prospective subscribers. We want to provide a safe, secure user experience. Please review our Privacy Policy and Terms of Use.