Vikas Mujumdar, February 27, 2020
Part 1 will talk about Data, Part 2 about Data Storage and Part 3 about Data Analytics.
Data is data, not technology. So what's it doing in this series on emerging technology?
As it has always been, data is what makes any application useful or valuable. It would be good to understand the terms data and information (or insights) in this context. Data is a set of connected or disconnected values in multiple formats. Data by itself may or may not be of any value to an end user - human or machine. But when this data is processed or analyzed, it is all connected and put together in a specific context and this now provides information or insights to the end user to enable them to take decisions. When this processing and analysis incorporates artificial intelligence and machine learning then the insights can be interpreted by computers and machines directly and actions can be taken without any human intervention.
Therefore, data is the foundation of all things digital and that's why it's not only in this series, it is the very first technical article in the series.
All the emerging applications that we will imagine and build will produce and consume data in hitherto unseen and possibly unimaginable volumes. Capturing, transmitting and storing data is expensive. To make this investment worth it, all of this data will have to be analyzed and put into context to derive value from it. This gave rise to the term Big Data, which introduces many new technologies and solutions to store and analyze data in ways that earlier technologies could not.
There are three characteristics of Big Data:
Volume: The size of data generated through various sources. When the world started going digital, the volume of data was very small. Storage devices would have capacities measured in Kilobytes. As digitization became mainstream, the volume of data produced started to grow and from Megabytes and Terabytes we are now in the region of Petabytes and Exabytes. One of the early contributors for the increase in data volume was the digitization of multimedia content (images, music and videos). And as we move towards the Internet of Things, the second contributor to this increase is going to be the huge number of connected things and the number of data values they will be capturing.
Velocity: The speed or frequency at which data is captured, transmitted and required to be stored. Early sources of data capture were largely limited to business user transactions. With the proliferation of the world wide web and smartphone access user transactions have extended into everyday lives, increasing the frequency at which transactions are executed and therefore data is generated. Again, as we move towards the Internet of Things, the fact that connected things will be capturing and transmitting data almost continuously, will increase the velocity of data manifold.
Variety: The types and structures of data that are required to be stored and retrieved. Early data storage solutions (databases) permitted the storage and retrieval of only alphanumeric data, stored in a predefined (relational) structure. All applications that used that data had to follow the same structure and additional data values could not be introduced without changes to the database design. As data types evolved, multimedia content could be stored, but still in a predefined structure. Today, with the huge number of applications available for end users and the wide range of transactions they support, the data to be stored and retrieved cannot be predefined. For example, data may come from text feeds, images, video feeds and multiple ambient data from sensors and all these different types of data will need to be stored, co-related and analyzed to derive information.
Two more Vs have emerged over the past few years: value and veracity. Data has intrinsic value but it is of no use until that value is realized. Also, the data must be true and accurate or the value realized may not be real.
Go on to read Part 2 about Data Storage.