Try our learning program for free!

STEMVentor LMS

Data, Storage and Analytics - Part 3

Vikas Mujumdar, February 27, 2020

Part 1 talked about Data, Part 2 about Data Storage. This last part is about Data Analytics and Data Science.

Data analytics and data science are broadly similar in that they employ techniques and theories drawn from many fields - including mathematics, statistics and computer science, and use scientific methods, processes, algorithms and systems to uncover hidden patterns, correlations and other insights from structured and unstructured data. As we surely agree, data by itself is of limited use and it is only when it is processed and analyzed that its true value is realized.

Although the terms are often used interchangeably, there is a difference which can be broadly explained by the fact that data science reviews volumes of data to give general guidance on trends and predictions and raises questions that need to be answered (for example: how likely is a cohort of customers to default on their loan repayments) while data analytics parses known sets of data to derive answers to specific questions (for example: which is the age group that had the highest loan repayment default rate in the last year). For this discussion, we will not focus too much on the differences.

- Data and its analysis has many useful applications in just about every facet of industry.

- Data captured from e-commerce and payment transactions can be used to determine customer purchase and spending patterns allowing merchants to target specific buyers with products or tweak product pricing dynamically in real time based on supply and demand.

- Data captured from users posting on social media can be used to analyze sentiments, emotions, and behavior patterns of individuals or cohorts which can be used to determine the way forward for a business selling products to a politician campaigning for an election.

- Data captured from airplanes can help airlines and manufacturers fine tune their design, flight paths, and cargo and passenger loading patterns.

- Data from machines in a factory can help manufacturers to plan preventive maintenance, predict possible failure and improve overall equipment efficiency.

- Data about a person's health and fitness, captured using smart wearables, can be continuously monitored and analyzed to predict health or fitness issues. Data across a cross-section of population adjusted for parameters such as age, gender, height and weight can be aggregated to predict health trends. All these predictions can be used by healthcare providers to be prepared with remedies and by insurance providers to tweak their products to suit the trends they foresee to provide their customers maximum benefits while still making a profit.

As with many other so-called emerging technologies and applications, the concept of data analysis has been around for years, probably since the day business transactions were first carried out. Most organizations have always known that they can analyze sales, production, customer and other data about their business to help them take decisions on how to produce and sell better and how to reduce operating costs and maximize revenues and profits.

What's new is that whereas earlier a business would gather data, run analytics and derive insights that could be used for future (days or months later) planning and decisions, today the speed at which technology is able to derive insights allow businesses to take immediate decisions, sometimes even in real time. And as the data volumes increase, traditional business intelligence solutions would not have been able to keep up pace neither in processing speed nor in the breadth, depth and accuracy of insights.

Some of the newer approaches to data analytics are:

- Machine Learning: A science, with its supporting tools and technologies, that trains a computer how to learn, by creating models of data and outcomes based on prior computations. As the systems analyze increasing volumes of data they learn more from the increasing number of outcomes and are thereby able to analyze more complex data and deliver more accurate results and deeper insights. And all this is achieved without any human intervention to provide instructions, which would have been quite limiting. Machine Learning is therefore considered a branch of Artificial Intelligence (which we will discuss later).

- Data mining: This technology helps you examine large amounts of data to discover patterns which can be used for further analysis. With data mining solutions, you can separate the noise (data that is of no value) from the good data which will then help derive more accurate outcomes. This is especially useful as data sources are becoming very diverse, unstructured and cannot always be trusted.

- Text mining: Similar to data mining but focusing on text, written and spoken. With text mining technology, you can analyze text data from the many sources on the internet (specifically social media) now including spoken text as voice controlled systems (such as Alexa and Google Home) become more pervasive. Text mining uses natural language processing technology to make sense of the data and help businesses derive insights mainly into user or consumer behavior.

The age old paradigm of "Garbage In Garbage Out (GIGO)" still holds true. None of the powerful analytics solutions will provide any value unless the quality, integrity and veracity of the data sources is of high standards. Thus, a strong data management program with the right solutions, tools and best practices is necessary for data analytics to be useful.

Finally, one of the biggest challenges in using data for analytics and defining business outcomes, is data privacy. The European General Data Protection Regulation or GDPR is a new regulation that protects customer data (originally in the EU but now this needs to be implemented in some form or manner globally), in hopes of reducing the severity and frequency of security breaches, and the mishandling of personal data on the web.

While GDPR will not be discussed at length, before any data-oriented program is considered, GDPR and legal experts must be consulted to make sure the program stays on the right side of regulators.

Now that you are capturing whatever data is possible through your existing infrastructure and processes and have designed data stores for all kinds of data, it's time to think of the Internet of Things, which is what the next article will be about.

Articles
Get In Touch
Call:
+91 98200 44097