Big Data
Dec 15, 2022

Data Mining vs. Data Analysis

“It matters not what the data says, but what it is that you do with the data.” - Master Oogway, probably.

Data holds a zillion possibilities. But, you can only realize the potential of these letters and numbers once you process them and turn them into something meaningful. This is where data mining, data analysis and even data exploration come into place.

Are data mining and data analysis the same thing? And what is data exploration? They all sound similar.

In this article we will explore data mining vs data analysis and even explore a dataset. Let's get started!

Data Mining and Data Analysis

What Is Data Mining?

When you read the term ‘mining’, you probably think of someone in a hard hat, chipping away at rocks in a mine, searching for something. Data mining isn’t that different. When you mine data, you aim to ‘discover’ hidden patterns and data within a large dataset. Data mining is the process of collecting raw data and turning it into something useful.

Today, data mining is an essential part of business decision-making. Several industries such as retail, finance, healthcare, transportation, telecommunication, and e-commerce, use automated data mining techniques to generate insights from heaps of data. The data is stored in bulk and then processed with different data mining techniques in order to gather insights.

Data is analyzed and metadata - data about data - is created in classification analysis. Strange values are identified in outlier detection. Similar data is grouped in a process called cluster analysis. Relationships between data are detected in associate rule learning. Finally, statistical methodologies such as regression analysis can be applied to determine the relationship between data fields.

Examples of Data Mining

In the retail industry, data mining helps in customer segmentation. Data mining tools can identify the characteristics of target customers and segment them into distinct groups. Then, companies can devise sales and marketing strategies for each segment.

Data mining also helps in building predictive intelligence models for fraud detection in the banking sector. Businesses can run data mining algorithms through vast samples of fraudulent and non-fraudulent reports, and build models that can identify fraudulent and non-fraudulent transactions.

Data Mining Vs Data Analysis

Isn’t data analysis the same as data mining? While they are similar and the terms are sometimes used interchangeably, data analysis is not the same as data mining. Data mining can be considered a precursor to data analysis. Once all of the data is mined, then it can be a source for a data analysis study.

Data analysis is an extensive process that moves through several iterative phases.

  • First, data analysts identify a problem statement or a question they want to answer. Then, they start collecting data and building up datasets.
  • Before these datasets can be analyzed, they need to be cleaned. Empty and incomplete fields are removed, data is verified and validated, and the data structure is formatted and standardized.
  • Then, analysts perform a data exploration before starting the actual data analysis process. Within this stage, data mining tools come into the picture. They are used to discover patterns within databases.
  • Data visualization software is also used to transform the processed data into an easy-to-understand graphical format.

Based on the business needs and the initial problem statement, data analysis can be:

  1. Descriptive analysis: To understand what happened.
  2. Diagnostic analysis: To understand what happened, and why.
  3. Predictive analysis: To predict what is likely to happen in the future.
  4. Prescriptive analysis: To decide what to do in order to achieve a  specific outcome.

Data Exploration vs Data Analysis

Data exploration, data exploring, or exploratory data analysis is the first step in data analysis.

Before data analysts can make a deep dive and understand patterns, trends, and anomalies in data, they perform data exploring as an ‘initial review.’ Data exploration is more superficial than data analysis, and can be done manually, or with simple tools like MS Excel or Gigasheet. Analysts may even conduct data exploration in data mining operations.

Exploring Data With Gigasheet

Everyone can benefit from analyzing data. However, not everyone knows how to code or use sophisticated tools.

If you are not a coder, it doesn’t mean you should miss out on the power of data.

Gigasheet is an easy-to-use, no-code data analysis solution. If you know how to use MS Excel or Google Sheets, you can start analyzing datasets with Gigasheet right now! All you need is a free account and some data.

Let Us Explore A Marketing Dataset

This dataset from Kaggle is perfect for data exploration. It contains records of 2206 customers of a company with data on their customer profiles, product preferences, and campaign performance.

Data dictionary for the Kaggle dataset

Data dictionary

To upload this massive CSV file on Gigasheet, all you need to do is to click on New, select your file, and that's it. This is how the file looks in Gigasheet:

How the dataset looks like in Gigasheet

Now comes the fun part. Let us use Gigasheet to answer some questions.

How do customers respond to campaigns?

The dataset contains campaign performance data for 5 campaigns, denoted by the columns AcceptedCmp1 through AcceptedCmp5. If the customer accepted the offer, the corresponding value is 1. And if they didn’t, the value is 0.

The column AccpetedCmpOverall is the sum of all the values in AcceptedCmp1 to AcceptedCmp5. So, if a customer does not accept any offer, this value is zero.

Let us now group the filtered dataset by the value of the column AcceptedCmpOverall.

Group by AcceptedCmpOverall

This is what the result looks like. To simplify the results, we have unchecked several columns in the right pane. Only the checked columns are displayed.

For the income column, we have used the aggregate function 'row count'.  

Gigasheet UI

Here is what we can conclude:

1. 1747 customers did not accept any offers!

2. Out of all the customers, 322 have accepted at least one offer.

3. Only 11 customers have accepted 4 offers.

4. And there are no customers who have accepted all 5 offers.

Let us visualize this data with a bar chart. Select all the 4 rows and right-click. Select 'chart range' and the type of chart you like.

Gigasheet UI

Creating a bar graph

Here is what the data looks like when visualized as a bar chart. ('Income' in the legend is actually the row count for each group.)

Distribution of the number of people who accepted offers

What about the customers who didn’t accept any offers?

Let us now look at those 1,747 customers who did not accept any offers, i.e., the records for which the value of AcceptedCmpOverall is zero. We will use this filter:

Filter to identify the people who haven't accepted any offers

And here are the results:

Results of the filter

Is it possible that these customers did not accept any offers because they recently made a purchase, say, in the last month or so?

Let us filter by the column Recency, which is the number of days since the last purchase. We want the list of customers who made a purchase within the last month.  

Filtering by recency

And we are left with 552 rows. This means that out of  the 1,747 customers who did not respond to the campaign, only 552 had made a purchase in the last month. That is about 31%.

Results of the filter

What if the customers who did not make a purchase were dissatisfied customers? Let us remove the 'Recency' filter and check the data for complaints.

For this purpose, we are adding the following filter:

Filtering by complain

And we get 18 rows:

Gigasheet UI

To compare how many people made a complaint in the entire dataset, let us just delete the previous condition. So, we are left with this filter:

Removing the older filter

And the result is 20 rows!

The resulting dataset filtered by the column 'Complain'

How Do Dissatisfied Customers Respond to Offers?

From our previous operations, we know that 18 of these 20 people did end up rejecting all campaign offers. In fact, let us re-examine this.

We will now group these 20 rows by the column AcceptedCmpOverall. Here are the results:

Group by acceptedCmpOverall

From this data, we can conclude:

  1. 20 customers had filed a complaint because they were dissatisfied with the services
  2. Out of those 20 customers, 18 of them did not accept any offers!
  3. 1 of them accepted 2 campaign offers
  4. 1 of them accepted 1 campaign offer

Want To Give Data Exploration A Try?

Here are some more data exploration blogs to get you started with Gigasheet.

Or, check out our data community for FREE!

Gigasheet is FREE FOREVER. Sign up today!

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Training
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.