top of page
horizontal lines
Gigasheet Primary logo

Sep 9, 2022

Crime in Denver

The data was collected here, and became available here. With this information, we can have a better understanding of what crimes are occuring, and where. With this, we can prepare ourselves for these crimes, and also try to prevent them. We do have to keep in mind that this information is from Denver, and crimes likely vary significantly by state. However, we could use this data with other data to understand how crimes vary.

Rows:

496,402

Sep 2, 2022

Fantasy Draft '22 Cheat Sheet

Rows:

655

Sep 1, 2022

Wordle Answers

The New York Times maintains an updated list of the potential five-letter solves. This list can be found here.

Rows:

2,309

Aug 19, 2022

League of Legends

This dataset contains information of over 100000 games played in the Masters rank. This information includes the match’s game id, duration, and both teams’ kills and objectives, such as first blood, first tower, first baron, total number of dragon kills, total wards placed and killed, total kills, assists, and deaths, total gold gained, and more. This dataset was collected from Kaggle and can be found here. The data can be used to visualize how games at high ranks play out and look at the many factors that could go into a team’s win or loss, like any correlation on which objective specifically helps a team win.

Rows:

107,125

Aug 4, 2022

IMDB Movie Reviews

This dataset includes over seven million TV shows and movies from the IMDB dataset. The data includes the primary title, original title, title type (tv show, movie, etc), start year, end year, runtime, genre, and more. With this information, you can find the perfect TV show or movie for any audience. It is a great place to find new films, especially if you know broadly what you are looking for. We can also look at this dataset with an analytic view and try to understand how the titles of movies are changing through time. This data was provided by IMDB here.

Rows:

7,357,888

Aug 2, 2022

Motorcycle Data

This dataset includes bike models, old and new, as well as stats about each model. This dataset was scraped from Bikez.com, and became available here. With this dataset, individuals interested in motorcycles, or the specs surrounding them, can research whatever they want. Whether you are looking for a new bike, or just want to know more, this dataset provides the tools for an extremely thorough look at each motorcycle, and at all bikes as a whole.

Rows:

38,624

Jul 26, 2022

Cost of College

Compiled from the National Center of Education Statistics Annual Digest and became available here. Specifically, Table 330.20: Average undergraduate tuition and fees and room and board rates charged for full-time students in degree-granting postsecondary institutions, by control and level of institution and state or jurisdiction


This dataset allows us to have a significantly better understanding of the costs involved in a college degree.


Further analysis we could do with such a dataset includes: Finding the state with the lowest average room and board price. Compare our findings in this dataset with another that includes other information about schools in specific states. Determine the state with the best return on investment based on average first year salary and compare that to costs.

Rows:

3,203

Jul 21, 2022

Anime Analytics

This data was scraped from MAL (MyAnimeList), and then became available here. MyAnimeList is an anime and manga social networking and social cataloging application website run by volunteers. The site provides its users with a list-like system to organize and score anime and manga. With this information, users can find out more about their favorite animes, look for new anime to watch in the future, or even use an analytical lens and try to understand what makes successful shows successful. This dataset includes the title, type, mean rating, number of scoring users, status, number of episodes, start date, end date, source, and SO much more.

Rows:

24,012

Jul 19, 2022

Air Quality

AQI or Air Quality Index is the primary way to measure the current quality of the air. AQI values range from 0-500 with 0 being perfectly healthy and 500 being extremely hazardous. AQI values are derived from moving averages/current values of PM2.5 (particulate matter), PM10, Ozone, Carbon Monoxide, Sulfur Dioxide, and Nitrogen Dioxide levels. This dataset was created using Locational Data from the United States Cities Database and AQI Data from the United States Environmental Protection Agency. The dataset then became available here.


With this dataset, we can look at what areas are the most dangerous, how the air quality changes over time, and how an area’s pollution levels change over time.

Rows:

5,617,325

Jul 15, 2022

Health Insurance Prices

The file comes from Aetna's Machine Readable Transparency in Coverage website.


We downloaded the first Life Insurance file:

Plan Name: Aetna CVS Bronze: Low-Cost MinuteClinic Visits- Telehealth- Roanoke- Ped Dental

File Name: 2022-07-01_f42d21fd-3576-4569-b0c7-20253bccc7fe_Aetna-Life-insurance-Company.json.gz

File Type: In Network Rates

Plan ID: 38234VA0180009

Rows:

5,446,583

Jul 15, 2022

NFL Player Data

The data was scraped using a Python code. The code can be located at Github: NFL Statistics Scrape, the data then became available here


With this dataset, we can look at the characteristics of existing (and past) NFL players to have a better understanding of the sport. With this information, we can also predict the positioning or quality of new players. Lastly, this information could be used for fantasy football, by having a better understanding of how players rank in comparison to other players in the same position.

Rows:

17,172

Jun 27, 2022

NHL Stats

Thanks to Kevin Sidwar who began documenting the still un-documented NHL stats API which was used to gather this data, and became available here.


This dataset provides an in depth look at the performance of teams in the NHL during the 2020-2021 season. With this information, we can try to predict the outcome of games, or performance of teams. 


Team’s names and other information are held in a separate sheet here

Rows:

26,305

Jun 15, 2022

Avocado Prices

This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV then became available here. Here's how the Hass Avocado Board describes the data on their website:


The table below represents weekly 2018 retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.


The data includes the following columns:

  • Date - The date of the observation

  • AveragePrice - the average price of a single avocado

  • type - conventional or organic

  • year - the year

  • Region - the city or region of the observation

  • Total Volume - Total number of avocados sold

  • 4046 - Total number of avocados with PLU 4046 sold

  • 4225 - Total number of avocados with PLU 4225 sold

  • 4770 - Total number of avocados with PLU 4770 sold


With this information, one can better understand the avocado market and the fluctuations within it. This data also details the region in which these avocados are being sold. This allows us to better understand where to get the cheapest avocados.

Rows:

18,249

Jun 15, 2022

Bigfoot Sightings

This dataset includes the type of the report, the ID, class, submission date, headline, year, season, month, state, county, location details, nearest town, nearest road, and other information for each sighting. With all of this, you can get a great sense of where these sightings occur, and more importantly, what is being sighted. This data originated from The Bigfoot Field Researchers Organization, and became available on kaggle here.

Rows:

5,467

Jun 15, 2022

Collectible Sneakers

This data became available for the StockX sneaker data contest in 2019 and was sourced here. Unfortunately, this data only includes the sales of Yeezys and Off-White footwear. That being said, we can apply the knowledge we gained from this dataset to a broader one if we need to.  This dataset includes data from 9/1/17 to 2/13/19. It includes order date, brand, sneaker name, retail price, sale price, release date, shoe size, and buyer region.

Rows:

99,956

Jun 15, 2022

Crypto Prices

All historic open, high, low, close, trading volume and market cap info for all cryptocurrencies as of the 21st of May, 2018. This dataset includes 1,584 unique crypto currencies, and over 900,000 observations. With this information, one can get a much deeper understanding of the crypto market. While this is not financial advice, one could certainly use this dataset for more informed trading decisions. We can also use this dataset with others to determine how what's happening in the world affects the prices of crypto.

Rows:

99,956

Jun 15, 2022

DC Hero Appearances

This data includes name, number of appearances, page ID, wiki URL, ID (secrete or not), Align (good or bad character), eye color, hair color, sex, first appearnce, and more for each character in the DC universe. With this information, you can really understand a lot about any character, or even groups of characters. We could look at only the older characters, or even compare the older ones to the newer ones. We can look at how physical attributes like eye and hair color are used to signify whether a character is good or bad. We can also look at how these attributes change over time. We gathered this data here.

Rows:

6,896

Jun 15, 2022

Data Science Jobs

This dataset includes position title, company name, job description, number of reviews for the company, and location of job. It was sourced from here. With this data, you can answer questions like:

  1. Who gets hired? What kind of talent do employers want when they are hiring a data scientist?

  2. Which location has the most opportunities?

  3. What skills, tools, degrees or majors do employers want the most for data scientists?

  4. What's the difference between data scientist, data engineer and data analyst?

  5. Can you develop an efficient classification algorithm to differentiate the three job types above?

Rows:

6,964

Jun 15, 2022

Elden Ring Weapons

The Elden Ring Weapons dataset includes information about all of the available weapons in Elden Rings. The data includes the following breakdown for each weapon:  

Name - name of weapon

Type - type of weapon

Phy - physical damage

Mag - magical damage

Fir - fire damage

Lit - light damage

Hol - holy damage

Cri - critical damage

Sta - stamina usage

Str - strength scaling

Dex - dexterity scaling

Int - intelligence scaling

Fai - faith scaling

Arc - arcane scaling

Any - special effect damage

Phy - physical blocking damage

Mag - magical blocking damage

Fir - fire blocking damage

Lit - light blocking damage

Hol - holy blocking damage

Bst - boost

Wgt - weight of weapon

Upgrade - which stone should be used to upgrade the weapon


This data was collected from Kaggle, an online community of data scientists and machine learning practitioners, and can be found here.


This data is extremely useful for players looking to maximize the quality of their builds. Having the ability to understand all aspects of a weapon, and more importantly to compare weapons to other weapons can really make the difference, especially in such a difficult game. Before going out to search for a new weapon, check this table to ensure the weapon fits your build perfectly.

Rows:

307

Jun 15, 2022

Gold Prices

This data includes daily data of gold rates from 1st Jan 1985 to 11th Feb 2022. This data was collected from gold.org and then cleaned, becoming publically available here. With this information, one can have a much better understanding on what is going on in the gold market. We can use this data with other information to understand what events may cause the price of gold to increase or decrease. We can also look at the other currencies/prices to determine the effect of other countries on the gold market. Lastly, we can analyze the rises and falls in price to determine an optimal time to purchase.

Rows:

9,824

Jun 15, 2022

Historical Recessions

This dataset was built using the Philadelphia Federal Reserve's State Coincident Indices and the Bry-Boschan Method for business cycle dating. It then became available here. In the tradition of Owyang, Piger, et al. business cycles are calculated on the state level which provides interesting analysis opportunities for looking at recession timing for different regions or sectors present in different states. With this information, we could look to predict future recessions, or try to understand why they occurred in the past.

Rows:

21,800

Jun 15, 2022

NBA Player Data

This data includes name, team, position, age, height, weight, college, and salary of all current NBA players from the 2021-2022 season. This data was found here.


With this information, we can look at all sorts of things. We can look at how height, or weight, can affect a player's salary. We can look at how different teams select players based on physical attributes differently. We can also look at what schools produce the best players, what schools produce the tallest players, etc. This information can also be compared to that of previous year to understand how the league is changing.

Rows:

558

Jun 15, 2022

NFT Projects

This dataset was created via Python using the requests, json, and pandas libraries. The information was pulled on January 16, 2022, and represents all time information for the top NFT collections. As an example, the Sales column represents all sales under a specified NFT collection from its creation up until January 16, 2022.

This data was scraped from the top NFT Collections on Coin Market Cap. The data became available here


The dataset consists of the following information:

  • Index: The index of the file.

  • Name: The name of the NFT collection.

  • Volume: The volume of sales from the NFT collection in Solana (SOL).

  • Volume_USD: The volume of sales from the NFT collection in United States Dollar (USD).

  • Market_Cap: The market capitalization—total value of the collection's items in circulation—in Solana (SOL).

  • MarketCapUSD: The market capitalization—total value of the collection's items in circulation—in United States Dollar (USD).

  • Sales: The number of sales from the NFT collection.

  • Floor_Price: The lowest price of any NFT in the collection in Solana (SOL).

  • FloorPriceUSD: The lowest price of any NFT in the collection in United States Dollar (USD).

  • Average_Price: The average price of an NFT in the collection in Solana (SOL).

  • AveragePriceUSD: The average price of an NFT in the collection in United States Dollar (USD).

  • Owners: The number of owners of NFT's in the collection.

  • Assets: The number of items in the collection.

  • OwnerAssetRatio: The ownership percentage of all items in the collection.

  • Category: The category of the NFT collection.

  • Website: The associated website of the NFT collection.

  • Logo: The associated image of the NFT collection.

With this information, we hope to better understand what is going on in the NFT world. What projects are doing, how much they are selling for, how many people are buying them, etc. Understanding aspects of the crypto market can also help us understand where people’s money is going and why. From this, you can learn more about where to invest your money, or even what NFT project to create next!

Rows:

592

Jun 15, 2022

Redfin Home Prices

This data was downloaded here from RedFin. It has weekly data on housing sales by region and includes region ID, region name, region type, period begin and end, duration, total homes sold, median sale price, homes sold year over year, and so much more. With all of this information, we can get a better understanding of what happens in the housing market. Where homes are being sold, what areas have homes selling for the highest and lowest prices. With this, we can determine where the cheapest place to live is. We can also start looking at possible investments, or look for places where a decrease in price may occur.

Rows:

2226075

Data Community

bottom of page