Data is everywhere. On your fridge notes, in my phone, in Nasa database. Some of it may be used for business and some may not. Around 80% of information gained by an enterprise is chaotic and unstructured and an extremely small amout is really used.
What is Big Data
Big data may seem hard to explain and understand, as something like infinity of the Universe or the moment when the time started. It is more simple to understand with 4 basic characteristics — the 4 V’s.
1. Volume
The amount of data all over the world.
2. Velocity
Stream of fast-moving data.
3. Variety
Different forms of data.
4. Veracity
Ambiguity of data.
Fast and Furious
Data grows up to 60% per year. 90% of the world’s data was created in the last two years.
While you are reading this article, people around the world will take more photos than during the entire 19th century. Maybe because 6 billion people now have cellphones, while the world population amounts to 7 billion.
Data world in 2017
With such a speed data world in 2 years is going to change dramatically. By 2017 data is going to grow extremely.
Business analysts spend about 80% of time looking for information, and only 20% using it. That is why automated data analysis is crucial for the companies, which want to grow.
Analyse it
We have to learn how to deal with all this information flood not to drown in it. By now 4.4 million IT jobs were created to support big data all around the world.
More and more analytics services are created. Their basic function is to collect chaotic data, structurise it and deliver in form of beautiful charts. For instance, the basic information that can be collected from a site and analysed is users hits, views and signups.
Tools and services
The first and obvious choice would be Google Analytics while it is simple and free. The only problem may arise in case you deal with really big data, than Google will be forced to use sampling — analysing only parts of all data.
If this approach does not fit your enterprise, it is worth trying another services, that do not apply sampling — t.onthe.io, Librato, StatHat or Sumologic. All of them may be also tried for free.
Created thanks to IBM researches.
To more information visit:big data and hadoop online training.
No comments:
Post a Comment