Blogs

Getting Started With Apache Flink 1.0

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Flink

Apache Flink is the new star in the town. It is stealing the thunder from Apache Spark (at least in the streaming system) which has been creating buzz for some time now. This is because Spark streaming is built on top of RDDs which is essentially a collection, not a Stream. So now would be the right time to try your hands on Flink, even more so since Flink 1.0 was released last week.

Read more

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Flink Streaming – Tumbling and Sliding Windows

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Flink Streaming

Flink has two types of Windows – Tumbling Window and Sliding Window. The main difference between these windows is that Tumbling windows are non-overlapping whereas Sliding windows can beoverlapping.
In this article, I will try to explain these two windows and will also show how to write Scala program for each of these. Code used in this blog is also available in my Github Read more

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Spark RDDs Simplified – Part 2

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Spark Rdd

This is Part 2 of the blog Spark RDDs Simplified. In this part, I am trying to cover the topics Persistence, Broadcast variables and Accumulators. You can read the first part from here where I talked about Partitions, Actions/Transformations and Caching.

Read more

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

RDDs Simplified

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Spark simplified

Spark RDDs are very simple at the same time very important concept in Apache Spark. Most of you might be knowing the full form of RDD, it is Resilient Distributed Datasets. Resilient because RDDs are immutable(can’t be modified once created), Distributed because it is distributed across cluster and Dataset because it holds data. Read more

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

It has to be a Story and should be told

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Businesswoman showing financial reports on a tablet computer

The Art of Story Telling in Data Science

I was watching a Television interview with one of the famous regional movie actors of India who rejected to play the lead role in a film that apparently turned out to be a big hit with another second in line actor. The reason he gave for his decision not to take up that film was – the director couldn’t articulate the story well so that he could visualize the final product. So it was not that the story was bad, it was not that his expertise on direction is limited, it was not the hero was not keen – but the main issue was the way in which the story was told.

Read more

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

Should machines replace Data Scientists?

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

3d render of a scientist carrying folders

We all are hearing this question – “Can machines replace Data Scientists?” However, I thought I would tweak it a bit –“Should machines replace Data Scientists?”

We have heard a similar promise on Business Intelligence tools a few years back – if you use their stack of BI tools, any body can deliver insights regardless of their data and domain knowledge. We all know where that promise is now! And we all now can easily accept the fact that it will never be so.

Read more

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail
Page 3 of 512345

BLOG POSTS

ADDRESS

222 S Church St
Charlotte, NC 28202
Phone: (+1) 704 804 1090
Website: http://datalakes.com
Email: info@datalakes.com

PRIVACY POLICY

Important: Datalakes is committed to protecting the privacy of our subscribers and prospective subscribers. We want to provide a safe, secure user experience. Please review our Privacy Policy and Terms of Use.