top of page
horizontal lines
Gigasheet Primary logo
  • Syed Hasan

Shopify Fraud Analysis

Shopify’s a name you can’t miss if you’re in the e-Commerce business. After all, the B2B e-Commerce platform has a whopping 1.7 million customers and is used to automate much of the electronic shopping experience for customers and businesses alike. If you haven't already be sure to check out our other post on Analyzing Shopify Data Exports.

If you’re looking to upscale your shop, learn trends and patterns from your customers, and analyze your Shopify big data, Gigasheet can greatly help you out! In this article, I’ll explore a sample sales dataset that I’ve acquired from Shopify using standard data analysis techniques. Let’s see what we find! 🚀

If you're not already familiar with Gigasheet, it's a free big data analysis spreadsheet. You can use it as an online CSV viewer of course, or to do things like merge tons of CSV files. But perhaps most relevant for this post, we'll be using it for big data analysis. The great thing about Gigasheet is that it works much like a spreadsheet, so no training is required.

Shopify Fraud Analysis using Gigasheet

Analyzing Shopify’s Data

The dataset we’ll be analyzing was part of Shopify’s recruitment challenges from 2021. If you’d like to follow along or get a sampler, you can acquire the dataset from Kaggle here. Once done, head over to Gigasheet, upload the dataset, and you’re good to follow along – starting with Data Exploration.

Upload file to analyze fraud data

Shopify Data Exploration

Let’s first take a quick look at the data and see what we’ve got here. From the UI itself, we can see it has 7 columns and 5,000 rows. Out of the 7 columns, three of them are ID fields (shop, user, and order) and the remaining 4 show more information about the orders themselves.

As part of exploration, we can also look for missing values which can throw off our analysis moving forward.

There are several ways to look for missing values in Gigasheet:

  • You can apply filters to your columns to see if the value is empty

  • You can group values for the column and see if any grouping has an empty label

We’ll look at missing values in the Data Cleanup section in a little more detail.

Shopify Data Cleanup

If you’ve had any hits against the filter or grouping, you can now delete all the matching rows or use a substitute value. Say you’d like to use the average value – you can group the values against that column and calculate the average. Here’s an example of the data grouped by “shop_id” and the average of “order_amount” (click the down arrow for more options).

Grouped Shopify data in Gigasheet

Luckily, this dataset doesn’t have any missing value. Neither does the rest of the dataset need cleanup. For a bonus or if you’d like to map the dataset against days, months, or year, you can explode the date. Here’s an example of how the “exploded” date looks:

Explode data breaks Shopify dates into its components

No more Python functions, memorizing datetime functions, or using the tz library to modify the time zones – it’s super simple to work on dates using Gigasheet!

Shopify Data Analysis

Grouping the shop_id column, we see the dataset contains sales data from 100 stores (the “Rows” count shows the number of grouped values). Using the “Row Count” aggregation on the order_id column, we can retrieve the number of orders against each of these shops. Here are the top five stores:

Shopify Fraud Data grouped by the stores

However, there’s no anomaly against the number of orders from a store. They’re all averaging the same figures. We can pivot from this field for now.

Next, let’s sort the data against the average of the order_amount column. Two shops stick out of the dataset and are clear outliers – shop 42 and 78 with an order amount of 235,101 and 49,213. How about we check which users account for these insanel