Big Data Analysis with a Spreadsheet Database

We live in an age of unprecedented access to vast datasets.
‍

But it's not just about the volume of data. It's also about variety.
‍

Managers expect analysts to process multiple sources of data, ranging from conventional relational databases to semi-structured and unstructured formats like JSONs, XMLs, and NoSQL databases. However, unlike data scientists, most analysts don't need to understand the internal workings of a JSON dataset—they just need to analyze it.

For most applications, a spreadsheet-like front end makes it easier for knowledge workers to analyze large and varied datasets without investing years to develop technical skills.

That's why we created Gigasheet: It gives analysts the ability to work with large datasets in various formats without learning how to code.
‍

What is a Spreadsheet Database?

A spreadsheet database aims to gain the benefits of both a database and a spreadsheet—the data storage capabilities of a database with the flexibility and ease of use of a spreadsheet.

‍

Is a spreadsheet a database or even comparable to one? If we take this question literally, the answer is no—they are different animals designed for different purposes.

‍

However, the question of spreadsheet vs. database isn’t really about whether they are the same thing—it’s about whether they can be used for the same purpose.

Since databases require more time, effort, and skill to build and use effectively, many analysts prefer to work with spreadsheets. For many years (including much of this writer’s early career) spreadsheet applications like Excel could, in a pinch, be used in place of—or in conjunction with—databases to ease the analysis and insight gathering process.

Unfortunately, today this is generally not feasible. Datasets have grown too large for spreadsheet applications, so using a spreadsheet as a database simply doesn’t work. Even moving data from a database into Excel or Google Sheets is unwieldy—and often impossible, because today’s vast datasets exceed the maximum rows these applications allow.
‍

Why All the Hate for Databases?

First off, it’s important to recognize that databases are indispensable tools.

You simply can’t maintain huge datasets in real time while protecting data integrity without using some form of (probably relational) database. With that said, there are several reasons why many people find databases difficult to work with, particularly when it comes to analyzing data.

Today, data analysis is a crucial part of many roles, including marketing, product management, sales, finance, project management, and more. Many people in these roles simply don’t know how to analyze database data—it requires skills that aren’t common outside technical fields.

Even for those who do have the skills, analyzing data inside a database is cumbersome and potentially time-consuming. Databases were designed to manage large, persistent datasets—they weren’t intended for easy analysis of static point-in-time data. This is where we get into the spreadsheet vs. database debate.

For years, non-technical people relied on spreadsheets like Excel with data analysis capabilities to “fill in the blanks” left by databases.

Rather than learn complex query languages, they would simply extract a snapshot of data from a database and use a spreadsheet to manipulate and analyze it. Spreadsheets with data analysis capabilities are far more convenient than databases, require less difficult-to-acquire skills, are faster to create and replace, and generally make for a more streamlined data analysis process.

However, in today’s big data world, spreadsheets present new challenges:

Row limits. Spreadsheets like Excel and Google Sheets can only cope with so much data before they fail—either by running so slowly they are unusable or by reaching the maximum rows allowed by the software.
Limited analysis. We love spreadsheets—but they weren’t designed for serious data analysis. Sooner or later, analysts run up against the limitations of spreadsheet software and can’t obtain the insights they need.
Data types. Most spreadsheets can only cope with specific data formats, such as dates, currencies, numbers, letters, etc. If you’ve tried analyzing data with IP addresses or hash values in Excel or Google Drive, you’ll know how poorly they cope with less common data formats.

Given the inherent drawbacks of both systems, it’s natural to want a solution that straddles the gap between spreadsheets and databases, providing their essential functionality while addressing major shortcomings. In a phrase: a spreadsheet database.

Gigasheet: The Ultimate Spreadsheet Database

This is where Gigasheet comes in.

Gigasheet is a big data spreadsheet that allows anyone to manipulate, enrich, and analyze datasets of up to 1 billion rows—with no IT infrastructure, SQL, Python, or other technical skills required. Unlike other spreadsheets, Gigasheet enables analysts to:

Work with huge datasets without crashing or hitting row limits.
Easily analyze any data type, including IP addresses and hash values.
Enrich data via free and premium external APIs.
Instantly convert or combine data files.
Import data directly from SaaS platforms, databases, CRMs, data lakes, and more.

While visually similar to other spreadsheets, Gigasheet provides a host of additional features that make data analysis and manipulation faster, easier, and more effective. These include:

Groups and Filters

Use groups to achieve the same result as pivot tables but with WAY less frustration—and faster to boot. (Relax, there’s a pivot mode as well - you just probably won’t use it as much as you expected.)

Above is part of a table of NBA player data from the 2021-22 season. In a couple of clicks, we can use groups to see how different teams compare in different categories. For example, we can see the L.A. Lakers had the highest average player age, salary, and points scored—but the lowest average player height of any team.

Another click and we see line-by-line data for the team—or any other. This speed is part of what differentiates Gigasheet from typical spreadsheets.

Expanding Groups to Conduct Spreadsheet Data Analysis

Note: the above table is also a perfect example of why mean averages are misleading—just check the team’s average salary and then scan down the column for a dose of reality. Gigasheet allows the user to display a wide range of values in the groups view, including mean, median, mode, min, max, sum, count, range, etc. so you can more easily understand a dataset and pick out real insights.

Raw data obtained here, and you can play with it in Gigasheet here—no account required. Have fun!

Combining groups with traditional filter functionality makes exploring and analyzing even the largest datasets easy, fast, and enjoyable. Pivot capabilities are available too if you need them.

Data Cleaning

Gigasheet provides advanced data cleaning capabilities such as changing data types, splitting and combining columns, and exploding data/time columns into more easily filterable components. In the image below, the date and time field in the first column has been split into its component parts—year, month, day, time zone, etc.

Data Enrichment

Finally (well, not really, there’s tons more that Gigasheet can do, we just don’t have space to discuss it all here) Gigasheet enables automatic enrichment of certain data types—just like a database hooked up to an API. As of February 2023, Gigasheet supports automatic enrichment of email addresses, IP addresses, and hash values via free and premium API sources.

The video below shows how easily you can use Gigasheet to enrich email addresses (note, these emails aren’t real):

https://www.loom.com/share/ae69df7236da4ac3bcf966ca0e396342

And this one shows how you can enrich IP addresses with open source or premium intelligence:

https://www.loom.com/share/874f8ed2240a4421acdfc27a16eff81c

Best Used in Conjunction With a Database (or Other Storage)

In general, databases are excellent for storing live data, but not always for analyzing it—at least, not quickly and easily. On the other hand, while Gigasheet fulfills the role of a spreadsheet database, its main function is to make analyzing very large but static datasets as easy as possible.

So how can you get the best of all worlds? Simple: create your own true spreadsheet database capability by using Gigasheet in conjunction with a more traditional database or tool. Many of our customers use Gigasheet to quickly and easily analyze data from common tools including:

Log management tools
CRMs
SQL databases
SaaS platforms
Data lakes

You can find a full list of data sources currently supported by Gigasheet here.

While many of these have built-in reporting capabilities, they generally don’t provide much flexibility. Using them in conjunction with Gigasheet will allow you to retain their benefits while unlocking previously unfound insights via powerful and flexible data analysis capabilities.

Give Gigasheet a Try

On the surface, Gigasheet is a web-based, billion-row spreadsheet. Behind the scenes, it provides a high-performance big data analytics platform built specifically for business analysts.

So, if you’re in need of a spreadsheet database solution, give Gigasheet a try today—for free.

Big Data

Big Data Analysis with a Spreadsheet Database

What is a Spreadsheet Database?

Why All the Hate for Databases?

Gigasheet: The Ultimate Spreadsheet Database

Groups and Filters

Data Cleaning

Data Enrichment

Best Used in Conjunction With a Database (or Other Storage)

Give Gigasheet a Try

The intelligence layer for price transparency.

Similar posts

Big Data

Data Teams - Work Smarter, Not Harder

Big Data

AI Meets Big Data: Leveraging Gigasheet for Retail Optimization

Big Data

Best Practices for Mastering Self-Service Analytics

Big Data

Big Data Analysis with a Spreadsheet Database

Pete Hugh

What is a Spreadsheet Database?

Why All the Hate for Databases?

Gigasheet: The Ultimate Spreadsheet Database

Groups and Filters

Data Cleaning

Data Enrichment

Best Used in Conjunction With a Database (or Other Storage)

Give Gigasheet a Try

The intelligence layer for price transparency.

Similar posts

Big Data

Data Teams - Work Smarter, Not Harder

Big Data

AI Meets Big Data: Leveraging Gigasheet for Retail Optimization

Big Data

Best Practices for Mastering Self-Service Analytics