top of page
horizontal lines
Gigasheet Primary logo
  • Will Andrews

Remove Duplicate Rows in Excel Based on Two Columns, and more!

The power of deduplication of rows in a spreadsheet and how it helps with big data sets cannot be overstated! Whether it’s a spreadsheet of customer data or a large dataset with millions of records, data deduplication can help you save time and reduce errors while ensuring that you have accurate, reliable data.

Data Deduplication is the process of removing duplicate entries from a dataset based on a single column, on two columns, or even multiple columns. It can help to reduce the size of a dataset, improve data integrity and accuracy, and improve the performance of analytics. It can also reduce the cost of storage and help you gain insights more quickly by eliminating redundant data.

Removing duplicate rows can be easily accomplished in most spreadsheets, and has been a top requested feature from Gigasheet users, who rely on our product to manage data that has too many rows for Microsoft Excel or over the cell limit in Google Sheets. Gigasheet is a free big data spreadsheet that makes analyzing big data easy, no code required.

In this blog, we will explore how deduplication can help you get the most out of your data by removing duplicates! We will cover:

  1. How Does Removing Duplicates Work?

  2. How Sorting Your Data affects Deduplication

  3. How to Deduplicate based on Multiple Columns

  4. How to Remove Duplicates in Excel based on One Column

  5. Remove Duplicate Rows in Excel based on Two Columns

  6. Google Sheets Remove Duplicate Rows based on One Column

  7. Remove Duplicate Rows in Google Sheets based on Two Columns

  8. Remove Duplicates in Gigasheet

Remove Duplicates: Remove Duplicate Rows in Excel Based on Two Columns

How Does Removing Duplicates Work?

In its simplest form, removing duplicates works by checking the values of a single column for duplicates, and then deletes the rows containing duplicate instances of the value.

To illustrate this, we are going to look at a very simple file with columns containing Letters, Numbers, and Colors. For a simple deduplication, let's use the Letter column to remove duplicates.

How to Remove Duplicates in excel, 1 column example

We can immediately see that there are 3 unique values in the letter column:

  1. A - 3 instances

  2. B - 2 Instances

  3. C - 5 Instances

When we run a deduplication of rows, the first instance of each value will be kept, along with the rest of the data in the row. The repeated instances of the value (highlighted in red) will be deleted, along with the rest of the data in the row.

Here are the results of running deduplication on the Letter column. As you can see, only the rows containing first instance of each letter remain:

Excel Remove Duplicate Rows based on one column

Pro Tip: Sort to Desired Order Before Removing Duplicates

In the example above, we demonstrated that the first row containing a duplicate value is retained, while the remaining rows with duplicate values are deleted. What if you want a particular value to remain?

You can control the first row by applying a sort to your data prior to removing duplicates from the file!

In our file above, A is paired with the colors Red and Yellow. Let's say Yellow is more important than Red. We can first sort the file on the column Color, with Yellow appearing above Red, which would look like this:

Sorting before remove duplicates google sheets

And removing duplicates on the Letter column would yield the following results:

Different Data Duplication Results based on Sorting

As you can see, as result of sorting, the letters A and C both now have the color Yellow in the Color column, whereas before the value Red was retained.