What is big data and how big it is?

Yesterday, I was discussing with my younger brother who works as a petrophysicist. He suddenly paused me, “Hey, wait. I am hearing much about this big data. What makes this big data and how is it different from the data we deal?” I face this question quite often by clients as well. As I started my 5 minutes lecture to him on big data, I decided to compose a post with some nice collections to give a beginner a head start to big data. And here I am…

From a layman’s point of view, big data is a collection of massive data sets, which are complex enough to store and/or analyze by the traditional (existing) computer systems in an acceptable limit – economically and within the time constraint. To make it simpler consider the user generated content by the social media. Based on a report last year (I am sure it’s different today!) Facebook alone has to deal with data such as below on a daily basis:

• 2.7 billion likes made daily on and off of the Facebook site
• 70,000 queries executed by people and automated systems
• 500+ terabytes of new data “ingested”

So you can’t just deal with such massive data sets with the traditional computing systems – what you need is big data processing. Well, you might be thinking now that we are not Facebook that we will end with such data set. Or we have data in my company, how do I know if it is big data or Not?

I would suggest you to think in terms of 3V to find out if it big data or not. Fundamentally, big data is characterized by these 3V – Volume, Velocity and Variety.

Volume

Many people think big data only by volume. Though it not only about volume or size of the data that defines big data, it is important. When we talk about size of big data we mean in terms of Terabyte (103 GB), Petabyte (106 GB), Exabyte (109 GB) and more… Going back to the previous example think of the data volume for a month worth of Facebook data.

Velocity

“It’s not the size of the boat but the motion in the ocean…” Well, it certainly applies to big data. It grows really fast. Now take the same Facebook example – 500 TB of new data EVERYDAY. It is another characteristics of big data though the speed at which it grows or velocity depends case to case.

Variety

Next and the most of important characteristics in terms of challenges is the variety. The big data involves many different kinds of data that needs to be processed differently and them needs to be aggregated. Let me go back to the same Facebook example again. As you can see, it’s 2.7 billion likes – which is more of structured data as you may represent them in a tabular format like userID, ContentID (on which like is made). But that’s not all, then you have images, raw text as status updates, comments and also video posting etc. Now that represents variety! Unlike here, Variety may also be contributed because of different sources that data is coming from.

Next time you are wondering if something comes under big data then try to describe it in terms of 3Vs – Volume, Velocity and Variety. You will have your answer…

The following video, though a marketing material by EMC Corporation explains big data brilliantly and certainly useful for a newbie:

Big data is not only big it brings big opportunity too! In a research published this month, McKinsey pinpoints five catalysts that can quickly create jobs and deliver a substantial boost to US GDP by 2020. Not so surprisingly big data is one of them along with energy, trade, infrastructure, and talent. Until now big data processing has been in the realm of web 2.0 companies and start-ups but not any more. Every industry including governments is now keen to make the best of the opportunity brought by the big data processing.

You might be interested in one or more of the following links where big data is driving the world in the context.

OBAMA ADMINISTRATION UNVEILS “BIG DATA” INITIATIVE
http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf

How GE is taking on Big Data
http://allthingsd.com/20130529/ge-ceo-jeff-immelts-big-data-bet/

How banks use big data to manage human risks
http://pro.gigaom.com/report/proactive-compliance-using-big-data-analytics-to-manage-human-risks/

How innovative oil and gas companies are using big data to outmaneuver the competition – A Microsoft White paper
http://t.co/VIhBOwMWrk

It’s good that charities are interested in data, but why only now? – The Guardian Article
http://www.guardian.co.uk/voluntary-sector-network/2013/may/17/charities-data-why-now

How big data analysis helped President Obama defeat Romney in 2012 Elections
http://bosmol.com/2013/02/how-big-data-analysis-helped-president-obama-defeat-romney-in-2012-elections.html

If you still fancy more here is a romantic documentary on Big Data in use by BBC:

Sumit Mund

Sumit Mund is an Artificial Intelligence Consultant with more than 12 years of experience. He has an MSc by Research degree and B.Tech degree in Information Technology. He is also a part-time PhD scholar at University of Huddersfield where his research area includes applications of Deep Reinforcement Learning and uses Google Tensorflow extensively. Read More...