Data – Data is everywhere, where to store where to process?
We live in a world that is densely populated with data. Earlier, only trees, plants, animals, and humans surround us. But now it’s data everywhere. We live in data. The atmosphere is changing to Datasphere. According to a report, the amount of data humans and machines produce is mind-boggling.
 2.5 quintillion of data is generated daily.
The primary source of Data
- Machine data
- Transactional data
- Social data
A very large amount of data is generated by these sources daily.
- The stock market segregates 1 TB of data on daily basis.
- Users on YouTube upload videos of more than 40 hours every minute.
- Social networking sites like Twitter, Facebook, LinkedIn, and Instagram capture 10TB of data.
- 30M network sensors in the world.
It’s not just data now, it’s Big Data!
The term big data is in slang nowadays. It is self-explanatory: The data is generated in a large and complex quantity that our traditional systems are inadequate to deal with it. Large datasets are generated on the basis of volume, variety, veracity, volatility, and many other V’s Since data is intrinsic, it won’t stop now. It will only grow. We are living in a tsunami of data.
Use case of big data include
- Healthcare
- Financial
- Industrial
- Media and entertainment
- Education
- E-commerce
- Agricultural and many more
Big data is generated in 3 forms:
- Structured data
- Semi-structured data
- Unstructured data
Earlier it was considered a problem, but now it’s adopted by industries. Companies are working on big data. They have tools to store, process, analyse and update data (CRUD operations on data).
Big data Analytics comes into the picture now.
It is the use of advanced techniques that are used for taking decisions on data by analysing patterns, insights, correlations, and market trends of raw data, and processing it. Data scientists, data analysts, and statisticians are the career options one should opt for if he/she wants to learn and work on data.
They Collect->process->clean->analyse data
- Let’s take an example to clear big data. We all are using Spotify for music, and podcast streaming. It has more than 96M+ users currently. All these users produce a tremendous amount of data. Data of songs played repeatedly, likes, sharing, creating playlists, search history etc. are data generated by users.
Do you know what Spotify does with this data?
- It analyses it for providing the user with the song’s recommendations. I guess everyone should have seen the Top Recommendation for You list of songs and podcasts. This list is different for everyone based on your taste in songs, singers, movies etc. This is called a recommendation system/engine. It’s basically a data filtering tool which will take the data and analyse it and then predict what users will like. It’s a big data analytics tool.
- Big data analytics is used for making the engine of Rolls-Royce.
- Starbucks uses big data for important decisions. It uses big data for deciding if a particular location would be suitable for their new outlet or not!
- Delta airline uses analytics to improve customer experience. By monitoring the tweets of customers, they analyse the experiences of customers and if any wrong tweets are found, they work on the needful by upgrading customers’ tickets.
Lifecycle of big data Â
Business case evaluation -> Identification of data -> data filtering -> data extraction -> data aggregation -> data analytics ->visualization of data-> Final analysis of data.
Types of Big data analytics
- Descriptive Analytics (what has happened?)
- Diagnostic Analytics (why did it happen?)
- Predictive analytics (what will happen?)
- Prescriptive Analytics (what is the solution?)
Tools for Big data Analytics
-
Apache Hadoop: Talking about big data, Hadoop is
the first framework used for storing and analysing data in a disturbing form on commodity hardware. Hadoop uses Map-Reduce, Pig, and Hive for processing data. - MongoDB: It is a cross-platform document-oriented database used for storing and processing large amounts of unstructured data (NoSQL). It is used for data changing frequently.
- Talend: Tool for data integration, management, and cloud storage. Talend Studio is an open-source tool for the integration of data.
- Cassandra: A distributed database that is used for handling chunks of data. It is similar to Hadoop in the feature of fault tolerance as data is automatically replicated to multiple nodes.
- Spark: Spark is used for real-time processing and analysing large amounts of data. It is way faster than Hadoop because spark process data in the main memory.
- Kafka: A distributed platform developed by LinkedIn and later given to Apache software foundation. Â It is used to provide real-time analytics results.
The global datasphere is assumed to increase by double in 2026. The global Datasphere has reached 33 ZB in 2018 & will reach 175 ZB. The IDC has launched a Datasphere Initiative which is a mission to govern data models including agile and another scalable framework.
Must visit:https://www.thedatasphere.org/
Â
Asiana Times Technology window: https://tdznkwjt9mxt6p1p8657.cleaver.live/category/technology/