Why is an apple, an apple and not a monkey? Quite a metaphysical question, isn’t it? Well, as I’m quite sure none of us is Immanuel Kant here, so let’s just go with a more materialistic answer – Because your brain (intelligence) processes the information (data) gathered by your senses (sensors) by continuously learning, interpreting and reasoning the gathered data (science). Humans have been doing this since…well, forever and we are not getting paid for making sense of things of daily life. For us this is trivial.
Actually, scientists were able to determine a rough number of how much information a human brain receives in a second – 400 billion bits, okay? It’s very much like downloading documents on your computer.
However, we have our cognitive limitations in terms of fetching the information, processing it, storing in our memory, recognizing objects and people, predicting things and so on. This is when computers come to aid. With the recent improvements and advancements in technology, hardware and computing resources, we can now gather ample data from almost everything. For e.g. your smartphone leaves traces of your movements and locations, your online search behaviour is stored, your television usage history can be tracked, your credit card transactions are recorded, and even your stroll in the city can be monitored. Our data is being captured and analysed everywhere and all the time, and mostly, without our consent.
You might ask – Why? Why is it being collected?
Because it is worth something and people are willing to pay money for it. Ask advertisers, who want to provide dedicated ads to maximize their business, ask people who want applications for health and fitness, ask superstore owners who want to know when or what people want to buy something to optimize their business outcomes, ask people in stock market who want to maximize their gains, ask law-enforcement officers who want to spot terrorist activities in crowded places and so on and so forth.
Thus, you have enormous data that can be potentially used for the benefit of people, customers, business owners, government agencies and many other stakeholders. Sure, it can also be used to damage, blackmail and ruin people, but this is not what we’re talking about today.
This is the “data” side of the things. Machine Learning, Artificial Intelligence, Data Mining and allied fields help in providing systematic and scientific methods to provide useful insights, predictions, recognitions and analytical solutions from this gigantic data. Hence you can combine two ideas together and get – “data science”.
Data – a set of values of qualitative or quantitive variables (numbers, words, measurements, observations, etc) that has been translated into a form that computers can process.
Importance of Data
We rely on data more than you may think. From stepping on your bathroom scale in the morning to viewing weather forecast on your phone. Data is what keeps humans in control of our world. Like a map; It tells us where we are in reference to where we need to be. If you think, this is exactly what life is about.
Data is what gives us the ability to create dynamic tension for driving change, which may not always be positive. Dynamic tension is created when we see our current condition versus our desired condition. It shows us the gap that we must close between the two. It also tells us if we are on the right track and gives us confidence and hope for our next step.
Our decisions are only as good as the information we base them on. Poor data leads to poor information and therefore to poor decision making (I’ll get back to the difference between data and information in a jiffy). That is why we should always try to rely on facts. Not every information is a fact, but every fact is a piece of information. Generally, what we mean by a fact – is accurate information. So, yea, be factual.
Now, data itself is of a little gain to us. It needs to be processed and interpreted. This process is called data analysis. Basically, it is a source for gaining objective, factual information, which can be used for several things:
- Problem identification & prioritization
- Monitoring processes or equipment
- Process Control & Continuous Improvement
Data is objective and if correctly gathered; factual. For example, imagine you have a terrible headache and decided to visit a doctor. What is the first thing the doctor does? Right, he collects data – measures your blood pressure, body weight, temperature, etc. They don’t ask the patients opinion on these. In an ideal scenario no guessing should be involved in this process; he should only gather factual data to guide his diagnosis.
Data Vs Information
Before we move on to more serious uses of data, we need to get one thing straight. Data and information are not the same things.
Data is the raw facts and statistics, whereas information is interpreted data. Or, by using more fancy words – information is the data that is accurate and timely; specific and organised for a purpose; presented within a context that gives it meaning and relevance; and can lead to an increase in understanding and decrease in uncertainty.
Data for Business
Information allows us and businesses to make informed decisions by presenting data in a way that can be interpreted. In a business context, customer information would be useful in providing metrics surrounding client/customer engagement to determine better ways to engage or work with your clients.
Even one-person startups generate data. Any business with a website, a social media presence, that accepts electronic payments of some form has data about customers, user experience, web traffic, and more. All that data is filled with potential if you can learn to access it and use it to improve your company.
Forbes says there are 2.5 quintillion bytes of data created each day and I know from some prior random read that only 0.5% data of what is being generated is analysed! Now, that is one mind-boggling statistic.
Types of Data and Their Usage
It doesn’t really matter which industry you work in, or what your interests are, “data” can certainly be of great use to you, although you might not realize this yet.
In computing and business (most of what you read about in the news when it comes to data – especially if it’s about Big Data), data is referred to as machine-readable as opposed to human-readable.
Humans vs Machines
Human-readable (also known as unstructured data) refers to stuff that can be read, interpreted and reasoned by humans. Like, reading a book and making sense of it – deriving ideas from it and such. A machine can’t do this, it can process the text according to a set of instructions, analyze it and give us some incentives, but even in order for it to be able to do this, the text must have some kind of uniform structure. This, basically, is what software is – set of programs applied to data.
This being said, there are several types of data, each of them is gathered and used differently.
Personal data is anything that is specific to you. Basically, any kind of personal information. This incorporates everything from your email address to your demographics. It’s usually in the news when it gets “leaked” (like a sex tape of Kim Kardashian) or is being used in a controversial way (when Uber worked out who was having an affair).
Nowadays, almost every website (especially social media sites) collect your personal data. Google, Facebook, Instagram, Pinterest, Amazon – they all do this in order to provide you with personalized suggestions to keep you engaged.
Generally, your personal data is protected by law, which doesn’t allow corporations or individuals to use this data in certain unpleasant ways. Like, they are not allowed to sell your credentials to anyone. But as our world and people who live in it aren’t perfect or all good, data abuses happen (*cough* Facebook’s Cambridge Analytica Data Scandal *cough*).
Transactional data is anything that requires action in order to be collectable. Visiting some webpage, clicking on an ad, making online purchase and so on and so on.
Pretty much every website you visit collects transactional data of some kind, either through Google Analytics, another 3rd party system or their own internal data capture system.
Transactional data is immensely important for businesses. By examining large amounts of data, it is possible to uncover hidden patterns and correlations. These patterns can create competitive advantages, and result in business benefits like more effective marketing and increased revenue.
Any data that is publically available on the internet is web data. When you do some online research, check the score of your favorite football team, or even read this article, you are viewing web data.
This type of data is important because it is one of the main ways for businesses to get data that is not generated by themselves. Naturally, businesses need information on what is happening internally and externally within their organization and what is happening in the wider market.
For what can it be used? Its uses are still being discovered as the technology for turning unstructured data into structured data evolves but to name a few – monitor competitors, track potential customers, keep track of channel partners, generate leads, build apps, and much more.
Sensor data is produced by objects and is often referred to as the Internet of Things. It covers everything from fingerprint password on your smartphone or a smartwatch measuring your heart rate to so-called “smart cities”.
It can be used to detect just about any physical element. So, yes, it’s applications are numerous. For example, Spanish city Santander, buried 12000 sensors under the asphalt to street lamps and atop buses, which are being used for things like – street signs that display real-time parking information, information about road closures, bus delays and so on.
Data for Solving Social Problems
Poisoning in children, police misconduct, the tight link between mental health disorders, homelessness and incarceration, sexual abuse, child abuse – these issues can be tackled using a combination of data analysis, problem-solving, communication and social sciences. The idea here is to combine interventions with information, which gives us some incentives about how to act…or even when to act. Data is not just limited to telling us how many or how much. Smart organizations are realizing that data can provide answers to the who, what, when, where, and why. It can provide the insight needed to solve social problems in ways we’ve never thought of before.
Let’s take homelessness for example. How can anyone ever hope to address the issue of homelessness if they aren’t fully aware of where it happens? By collecting and visualising this data in such a compelling and easily understandable way we are one step closer to solving the problem. It helps them recognize it.
The definition of homelessness is much broader than many people realize. A person or family may be considered homeless if they:
- Will be without housing in 14 days or less (ie., about to be evicted)
- Are victims of domestic violence (their home situation is life-threatening)
- Share accommodations with more people than is reasonable for the size of the structure
- Rely on shelters or missions to keep a roof over their heads
- Inhabit an abandoned building or vehicle
- Move more than twice in 60 days
So yea, it’s not just that drunk guy you see sleeping on the sidewalk (who might not actually be homeless and just be a really passionate alcohol enthusiast). Anyways, when we learn more about who the homeless are and what is depriving them of reliable shelter, we move closer to solving this social problem.
This is just an example of one social problem, while there are thousands and thousands more out there.
The sexiest job of the 21st century?
After data collection, all that data needs to be processed, researched, and interpreted by someone before it can be used for insights. No matter what kind of data you’re talking about, that someone is usually a data scientist. And in short, this is what they do:
- Identifying the data-analytics problems that offer the greatest opportunities to the organization
- Determining the correct data sets and variables
- Collecting large sets of structured and unstructured data from disparate sources
- Cleaning and validating the data to ensure accuracy, completeness, and uniformity
- Devising and applying models and algorithms to mine the stores of big data
- Analyzing the data to identify patterns and trends
- Interpreting the data to discover solutions and opportunities
- Communicating findings to stakeholders using visualization and other means maybe
Data Science jobs are the highest paid in the IT industry. It was also ranked as 2019 most promising job by LinkedIn in January 2019 and the best job in America for three years in a row.