“It matters not what the data says, but what it is that you do with the data.” - Master Oogway, probably.
Data holds a zillion possibilities. But, you can only realize the potential of these letters and numbers once you process them and turn them into something meaningful. This is where data mining, data analysis and even data exploration come into place.
Are data mining and data analysis the same thing? And what is data exploration? They all sound similar.
In this article we will explore data mining vs data analysis and even explore a dataset. Let's get started!
When you read the term ‘mining’, you probably think of someone in a hard hat, chipping away at rocks in a mine, searching for something. Data mining isn’t that different. When you mine data, you aim to ‘discover’ hidden patterns and data within a large dataset. Data mining is the process of collecting raw data and turning it into something useful.
Today, data mining is an essential part of business decision-making. Several industries such as retail, finance, healthcare, transportation, telecommunication, and e-commerce, use automated data mining techniques to generate insights from heaps of data. The data is stored in bulk and then processed with different data mining techniques in order to gather insights.
Data is analyzed and metadata - data about data - is created in classification analysis. Strange values are identified in outlier detection. Similar data is grouped in a process called cluster analysis. Relationships between data are detected in associate rule learning. Finally, statistical methodologies such as regression analysis can be applied to determine the relationship between data fields.
In the retail industry, data mining helps in customer segmentation. Data mining tools can identify the characteristics of target customers and segment them into distinct groups. Then, companies can devise sales and marketing strategies for each segment.
Data mining also helps in building predictive intelligence models for fraud detection in the banking sector. Businesses can run data mining algorithms through vast samples of fraudulent and non-fraudulent reports, and build models that can identify fraudulent and non-fraudulent transactions.
Isn’t data analysis the same as data mining? While they are similar and the terms are sometimes used interchangeably, data analysis is not the same as data mining. Data mining can be considered a precursor to data analysis. Once all of the data is mined, then it can be a source for a data analysis study.
Data analysis is an extensive process that moves through several iterative phases.
Based on the business needs and the initial problem statement, data analysis can be:
Data exploration, data exploring, or exploratory data analysis is the first step in data analysis.
Before data analysts can make a deep dive and understand patterns, trends, and anomalies in data, they perform data exploring as an ‘initial review.’ Data exploration is more superficial than data analysis, and can be done manually, or with simple tools like MS Excel or Gigasheet. Analysts may even conduct data exploration in data mining operations.
Everyone can benefit from analyzing data. However, not everyone knows how to code or use sophisticated tools.
If you are not a coder, it doesn’t mean you should miss out on the power of data.
Gigasheet is an easy-to-use, no-code data analysis solution. If you know how to use MS Excel or Google Sheets, you can start analyzing datasets with Gigasheet right now! All you need is a free account and some data.
This dataset from Kaggle is perfect for data exploration. It contains records of 2206 customers of a company with data on their customer profiles, product preferences, and campaign performance.
To upload this massive CSV file on Gigasheet, all you need to do is to click on New, select your file, and that's it. This is how the file looks in Gigasheet:
Now comes the fun part. Let us use Gigasheet to answer some questions.
The dataset contains campaign performance data for 5 campaigns, denoted by the columns AcceptedCmp1 through AcceptedCmp5. If the customer accepted the offer, the corresponding value is 1. And if they didn’t, the value is 0.
The column AccpetedCmpOverall is the sum of all the values in AcceptedCmp1 to AcceptedCmp5. So, if a customer does not accept any offer, this value is zero.
Let us now group the filtered dataset by the value of the column AcceptedCmpOverall.
This is what the result looks like. To simplify the results, we have unchecked several columns in the right pane. Only the checked columns are displayed.
For the income column, we have used the aggregate function 'row count'.
Here is what we can conclude:
1. 1747 customers did not accept any offers!
2. Out of all the customers, 322 have accepted at least one offer.
3. Only 11 customers have accepted 4 offers.
4. And there are no customers who have accepted all 5 offers.
Let us visualize this data with a bar chart. Select all the 4 rows and right-click. Select 'chart range' and the type of chart you like.
Creating a bar graph
Here is what the data looks like when visualized as a bar chart. ('Income' in the legend is actually the row count for each group.)
Let us now look at those 1,747 customers who did not accept any offers, i.e., the records for which the value of AcceptedCmpOverall is zero. We will use this filter:
And here are the results:
Is it possible that these customers did not accept any offers because they recently made a purchase, say, in the last month or so?
Let us filter by the column Recency, which is the number of days since the last purchase. We want the list of customers who made a purchase within the last month.
And we are left with 552 rows. This means that out of the 1,747 customers who did not respond to the campaign, only 552 had made a purchase in the last month. That is about 31%.
What if the customers who did not make a purchase were dissatisfied customers? Let us remove the 'Recency' filter and check the data for complaints.
For this purpose, we are adding the following filter:
And we get 18 rows:
To compare how many people made a complaint in the entire dataset, let us just delete the previous condition. So, we are left with this filter:
And the result is 20 rows!
From our previous operations, we know that 18 of these 20 people did end up rejecting all campaign offers. In fact, let us re-examine this.
We will now group these 20 rows by the column AcceptedCmpOverall. Here are the results:
From this data, we can conclude:
Here are some more data exploration blogs to get you started with Gigasheet.
Or, check out our data community for FREE!
Gigasheet is FREE FOREVER. Sign up today!