For years now, analysts have relied on spreadsheets to work with all kinds of data and analyze large CSV files.
Sure, there are other tools out there. Databases, BI solutions, and other tools (security-related and otherwise) promise to help analysts process their data. But as valuable as these tools are, it’s hard to beat the simplicity and flexibility of a good ol’ fashioned spreadsheet.
Unfortunately, there’s a problem. Analysts and business users rely on spreadsheets for a lot of their work... but the sheer volume of data has risen to the point where standard office software can no longer cope, turning analysts who can no longer rely on spreadsheets into Citizen Data Scientists.
What does an analyst do when their preferred method no longer works? They look for a workaround. (Spoiler: you can create a free Gigasheet account to try your own file!)
We built Gigasheet to make working with large files as easy as using a spreadsheet. Here's how to view your large CSV in 3 easy steps:
That's it! Now you can open any valid CSV file up to 1 GB (enough for tens of millions of rows) with the free Community version of Gigasheet. You can even zip your files before uploading to save time. You can also load CSVs from publicly accessible URLs using the Link option in the upload popup. The Premium version of Gigasheet supports CSVs of more than 1 billion rows or 50 GB. Best of all, your data stays private – we don't share or sell any data.
15 years ago, Excel spreadsheets were limited to 65,536 rows. Then, with the release of MS Office 2007, the number of rows supported rose to 1,048,576.
And then… nothing.
For the last 12 years, as the volume of data that data analysts work with has risen exponentially, the number of rows supported by Excel hasn’t budged an inch. So, for practically every data analyst on the planet, opening large data CSVs in Excel has become a huge headache.
So, how do you open large CSV files in Excel? Essentially, there are two options:
Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit or, find an Excel add-in that supports CSV files with a higher number of rows.
Not a great choice, is it?
Some years ago, Microsoft maintained a product called Power Query, which made it possible to open CSV or XLS files of up to 50 million rows. Unfortunately, the add-in came with its own limitations:
Finally, and perhaps most damningly of all, it still only allowed users to open CSV files of up to 50 million rows. In today’s big data society, that just isn’t enough.
If your CSV is too big for Excel, and you don't want to use Gigasheet, what’s the next step? Naturally, to use a CSV database program — most likely, MS Access — to open big CSV files.
To open large CSV files in MS Access there are a number of steps. First, you'll need to create a new database file. Next, name the database and save it somewhere appropriate. From the File menu, Choose File → Get External Data → Import. Then select your CSV file and click import.
And finally, your CSV file is open; albeit in an antiquated database. But here’s the real problem.
Opening a CSV file in MS Access doesn’t allow an analyst to work freely with data as they would in Excel. It’s much harder to ‘eyeball’ data for insights and trends, and you don’t have access to simple spreadsheet functions like filters and lookups. Instead, you’re forced to use a more complicated set of features and formula syntax to create queries.
Not only do these skills take time to learn, they are also prone to errors. And, unlike a spreadsheet, it’s often hard to tell when you’ve made a mistake in a query when working with a database. In the worst cases, an incorrectly written query may give you the ‘wrong’ answer — and you’ll never even realize you made a mistake.
And, yet again, we run into a size limitation. MS Access can open much larger CSVs than a spreadsheet can, but its capacity isn’t unlimited. The maximum size of an individual ACCDB file is 2 GB.
For data analysts working with datasets that run tens to hundreds of millions of rows, this may not be enough.
If you don’t like using office-style databases — or don’t have access to them — another option is to import large CSV files to a relational SQL database.
SQL stands for Structured Query Language, a standard programming language used to query data across one or more databases. SQL databases are tremendously powerful and boast impressive performance even at a very large scale. However, they come with their own set of challenges.
Building your own SQL database requires a specific set of skills that are far from easy to develop. Unless you are a programmer on the side, undertaking this type of project is often out of reach for a busy analyst. You could wait for a data science or other engineering team to help with the project, but that’s also not ideal - especially if you’re responding to a security incident. You could also hire a development firm to build the SQL database for you, but it’s likely to be a costly project.
And here’s something else to consider. Even if you did build a perfect SQL database, you would still run into performance issues once you start working with seriously big CSV files. You could solve that problem by hosting your database using a business cloud service, but then the project becomes even more costly.
And, really, none of this solves the original problem. Analysts use spreadsheets to open large CSV files because it affords them a lot of flexibility. Ultimately, no matter how powerful an SQL database you build, it will never be as easy to work with as a spreadsheet.
Often, SQL databases are used in conjunction with a business intelligence (BI) tool. These tools can make the process of querying and visualizing large datasets much simpler, and some can even rival spreadsheets in their flexibility.
Combining a well-constructed database with a BI tool is certainly a viable option for an analyst who has the necessary support resources and needs to view large CSV files. However, there are still two considerations:
Some BI tools allow analysts to import CSV files directly. But, yet again, performance issues and size limitations abound when working with very large CSV files.
At this point, it has likely become clear that — for our long-suffering analyst — workarounds don’t get the job done. If you want to open big CSV files (potentially running to hundreds of millions of rows) you need to take a completely different approach.
One such option is to use Python, or another similarly powerful coding or scripting language.
Python is a general-purpose programming language that, among other uses, has historically seen a lot of uptake in the scientific and mathematical communities. Its high-performance nature and built-in library of useful modules make Python an extremely powerful tool for interrogating and visualizing huge datasets.
And Python is far from the only option. As you can see from discussions like this one, there are many ways to use programming languages and custom scripts to interrogate even the largest CSV files. However, they all run into the same problems: time and complexity.
As an analyst, do you have the capacity (or inclination) to learn one or more complex programming languages just to analyze large CSV files? Even if you do, will you ever be completely confident that your custom-written scripts and tools are watertight?
Ultimately, while Python and other scripting languages are undoubtedly an option, they don’t fulfill our criteria.
Most analysts don’t need to be full-on data scientists or programming experts - they need to be experts in their business! They need a simple, powerful solutions that allow them to work with huge CSV files just as easily as they would with a smaller file, using a universal spreadsheet-like application.
So, now that we've covered the gamut of options, hopefully you'll agree that Gigasheet is the best option! It's a no-code, analyst workbench that allows anyone to work efficiently with even the largest CSV files and other datasets.
No longer will you be pressured to be a ‘unicorn analyst’ who can code, manage databases, and perform data science tasks. With Gigasheet, you can open CSV files of up to a BILLION rows, and work with them just as easily as you’d work with a much smaller file in Excel or Google Sheets.
Even better, Gigasheet is specifically designed for business data. It understands IP addresses, time and date fields, and other common data — so you won’t be stuck trying to reformat, split, or concatenate columns just to answer basic questions.
On the surface, Gigasheet is a web-based, billion-row spreadsheet. Behind the scenes, it provides a high-performance big data analytics platform built specifically for business analysts.
We’re looking for beta testers right now to help us make Gigasheet the best it can be. We want to know exactly what problems you face as an analyst so we can make sure Gigasheet is equipped to solve as many of them as possible.
The Big Data party has started and you're invited. Create a free account and get started now.