• Syed Hasan

Simplify ETL with Gigasheet


Simplify ETL With Gigasheet

Look around - we’re living in the big data realm. There’s so much data to go around to perform analytics on and derive insights from that small-scale companies can rarely excel at this. This is where Gigasheet comes in.


We aim to help democratize big data so everyone has a fair shot at deriving business decisions, identifying trends, and insights from this vast amount of data from digital systems. So, how exactly do we do that? Keep reading!



What are ETL Operations?


If you’re just getting started with Data Warehousing, you might question - what’s ETL?


Short for "Extract, Transform, & Load", ETL is a process by which we can extract data from a system, transform it based on our needs, and later load it into another system (possibly a Data Warehouse). It is, by far, the most commonly used process for ingesting data into large data warehouses for processing of big data.


Gigasheet excels at transform operations which may include:

  • Filtering data

  • Cleaning data – performing translations or data consistency operations like modifying rows or columns, editing data, performing summarizations, etc.

  • De-duplicating data

  • Validating or authenticating data

  • Formatting data and converting it from one format to another (perhaps the schema of the warehouse)

You might think - why can’t we utilize the good ol’ Excel for it?


Excel was meant to be a simple spreadsheet processor while assisting people in traditional data operations. That’s a thing of the past.


It can’t process large files and mostly fails. Not to mention, it doesn’t operate on JSON data from the get-go and requires great effort for this conversion to take place.


Gigasheet has an automated parser to convert your JSON data into a tidy spreadsheet for quick transform (ETL) operations. I’ll demonstrate that in a bit.


What’s a Data Warehouse?


To briefly touch base on what a Data Warehouse is – it’s a repository or system which holds data from all digital systems acting as a central source of all cleansed and processed information which can later be used to make business decisions, generate insights, and more.


For example, as a retail store, a few source systems for the data would be:

  • Point-of-sale (POS) systems

  • Website (front-end, internal systems, etc.)

  • Cash registers

ETL operations are used to process data into information, making it ready for consumption by a warehouse. Next, we’ll discuss how you can utilize Gigasheet for these operations.


ETL Operations with Gigasheet


Let’s break down ETL into its three phases:


Extract


Extraction is used to export data from a system or a source location to another system to perform transformation operations. This could be any internal system and depending on the exporting capabilities of that system, you can dump that information out in any format.


Gigasheet supports over 100+ data connectors and readily integrates data from these solutions to our data processor. Some examples include:


  • AWS Redshift

  • AWS S3

  • MySQL

  • PostgresSQL

  • ActiveCampaign

Available Data Connectors

Don’t wish to use integrations and want to use flat files to import data into Gigasheet for processing? You can use these file formats to do so:

  • CSV

  • LOG

  • EVTX

  • JSON

  • TSV

  • XLSX

  • ZIP/GZIP (and a supported file format within the archive)

Upload Screen

For a quick test run, you can use a dataset from Gigahseet’s newly launched Data Community. Just open a dataset, press Open in Gigasheet, and copy the sheet over to your account and get started with transforming the data.


Gold Prices Data Community Article

I’ve got a sample JSON file containing data related to Dunkin’ stores in the United States which I’ll be uploading to Gigasheet.


All I’ve done is used the File Upload functionality to upload that JSON file in – and that’s it – the parser did the rest in converting the data. JSON data can be quite variable and it can quickly get out of hand if it’s not parsed and managed properly. If you were to manually convert the data from JSON into a spreadsheet-like data format, it’d require a lot of effort!


Using Gigasheet, we went from this:

Dunkin JSON File

To this spreadsheet with nicely categorized data in a matter of minutes:

Dunkin File Neatly in Gigasheet

Transform

Now that we’ve got the extraction part covered, let’s take a look at Transformation operations in Gigasheet.


First in line, I don’t want to export all this data for my warehouse. I’d rather select a few columns to be loaded. For that, I can choose one of two options:

  • Hide columns in Gigasheet

  • Delete columns from Gigasheet

If you choose the hide column feature, it won’t be included in the final export (which we’ll cover later). Removing the column, however, removes it from the sheet entirely and won’t be recoverable.


To hide a column, open the Columns menu from the right side of the screen. Press the Blue tick-marks to successfully hide the column from your exportable data.

Hide Columns in Gigasheet

To remove a column, select the column using the hamburger icon and press Delete to remove it from your dataset permanently.

Remove Column from Gigasheet

Perhaps you’d like to split a few columns to better categorize your data? Take a look at the FRI_HOURS column. How about we split it using the ‘-’ as the separator into two columns? Opening and closing times on Friday. That might help compare it against times on other days.


Select the column, press Apply Function, and select Split column. Using the hyphen as the separator and this is how our data looks:

Split Column in Gigasheet

We can hide the original column from our dataset now as it’s split into two.


Well, this is good but I see the closing time for some stores is empty as they’re open 24 hours. How about we apply a filter to remove such stores from our dataset? Simply select the row and select the Filter to exclude this option from the menu.

Filter to Exclude Columns in Gigasheet

There’s a lot more that we can do depending on the data we have. For example, you might want to:

  • Convert your dates into a specific format so datetime objects from all systems are same in the warehouse (or wherever you wish to load this data in)

  • Explode your dates into day, month, and year

  • Change the data type of a particular column – integer to string, IP address to string, string to IP address, etc.

  • Introduce a character count – perhaps for a feedback column to see how engaged your users are


Load


Done transforming your data? It’s time to export it out of Gigasheet to load it into a tool or system of your choice. To export the data, simply press the Export button from the navigation bar at the top.

Export Your FIle

Once done, a prompt will show up saying Find your export in the Library. Head over to the Library or My Files page and you should see your data exported in a ZIP (archive containing a CSV of your data) ready to be downloaded (it might take some time depending on the size of your data).

Ready to Download

Click on it and you should be able to download it. That’s it – it’s this easy to convert a JSON dataset into CSV using Gigasheet while having the freedom to apply transformations of your choice!


What's Next?

We’re simplifying big data processing, data warehousing, and data science operations by removing complexities from the entire process. No-code data science is the future and we’re happy to be working on this front.


Excited to try Gigasheet out? Sign up for a free account today and test your heart out.


Wish to be a step closer to the team behind all this greatness? Join our Slack server to get in touch with the team and people from different markets using Gigasheet for so many cool things!