Sep 9, 2022
Crime in Denver
The data was collected here, and became available here. With this information, we can have a better understanding of what crimes are occuring, and where. With this, we can prepare ourselves for these crimes, and also try to prevent them. We do have to keep in mind that this information is from Denver, and crimes likely vary significantly by state. However, we could use this data with other data to understand how crimes vary.
Sep 2, 2022
Fantasy Draft '22 Cheat Sheet
Sep 1, 2022
Aug 19, 2022
League of Legends
This dataset contains information of over 100000 games played in the Masters rank. This information includes the match’s game id, duration, and both teams’ kills and objectives, such as first blood, first tower, first baron, total number of dragon kills, total wards placed and killed, total kills, assists, and deaths, total gold gained, and more. This dataset was collected from Kaggle and can be found here. The data can be used to visualize how games at high ranks play out and look at the many factors that could go into a team’s win or loss, like any correlation on which objective specifically helps a team win.
Aug 4, 2022
IMDB Movie Reviews
This dataset includes over seven million TV shows and movies from the IMDB dataset. The data includes the primary title, original title, title type (tv show, movie, etc), start year, end year, runtime, genre, and more. With this information, you can find the perfect TV show or movie for any audience. It is a great place to find new films, especially if you know broadly what you are looking for. We can also look at this dataset with an analytic view and try to understand how the titles of movies are changing through time. This data was provided by IMDB here.
Aug 2, 2022
This dataset includes bike models, old and new, as well as stats about each model. This dataset was scraped from Bikez.com, and became available here. With this dataset, individuals interested in motorcycles, or the specs surrounding them, can research whatever they want. Whether you are looking for a new bike, or just want to know more, this dataset provides the tools for an extremely thorough look at each motorcycle, and at all bikes as a whole.
Jul 26, 2022
Cost of College
Compiled from the National Center of Education Statistics Annual Digest and became available here. Specifically, Table 330.20: Average undergraduate tuition and fees and room and board rates charged for full-time students in degree-granting postsecondary institutions, by control and level of institution and state or jurisdiction
This dataset allows us to have a significantly better understanding of the costs involved in a college degree.
Further analysis we could do with such a dataset includes: Finding the state with the lowest average room and board price. Compare our findings in this dataset with another that includes other information about schools in specific states. Determine the state with the best return on investment based on average first year salary and compare that to costs.
Jul 21, 2022
This data was scraped from MAL (MyAnimeList), and then became available here. MyAnimeList is an anime and manga social networking and social cataloging application website run by volunteers. The site provides its users with a list-like system to organize and score anime and manga. With this information, users can find out more about their favorite animes, look for new anime to watch in the future, or even use an analytical lens and try to understand what makes successful shows successful. This dataset includes the title, type, mean rating, number of scoring users, status, number of episodes, start date, end date, source, and SO much more.
Jul 19, 2022
AQI or Air Quality Index is the primary way to measure the current quality of the air. AQI values range from 0-500 with 0 being perfectly healthy and 500 being extremely hazardous. AQI values are derived from moving averages/current values of PM2.5 (particulate matter), PM10, Ozone, Carbon Monoxide, Sulfur Dioxide, and Nitrogen Dioxide levels. This dataset was created using Locational Data from the United States Cities Database and AQI Data from the United States Environmental Protection Agency. The dataset then became available here.
With this dataset, we can look at what areas are the most dangerous, how the air quality changes over time, and how an area’s pollution levels change over time.
Jul 15, 2022
Health Insurance Prices
The file comes from Aetna's Machine Readable Transparency in Coverage website.
We downloaded the first Life Insurance file:
Plan Name: Aetna CVS Bronze: Low-Cost MinuteClinic Visits- Telehealth- Roanoke- Ped Dental
File Name: 2022-07-01_f42d21fd-3576-4569-b0c7-20253bccc7fe_Aetna-Life-insurance-Company.json.gz
File Type: In Network Rates
Plan ID: 38234VA0180009
Jul 15, 2022
NFL Player Data
With this dataset, we can look at the characteristics of existing (and past) NFL players to have a better understanding of the sport. With this information, we can also predict the positioning or quality of new players. Lastly, this information could be used for fantasy football, by having a better understanding of how players rank in comparison to other players in the same position.
Jun 27, 2022
This dataset provides an in depth look at the performance of teams in the NHL during the 2020-2021 season. With this information, we can try to predict the outcome of games, or performance of teams.
Team’s names and other information are held in a separate sheet here
Jun 15, 2022
This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV then became available here. Here's how the Hass Avocado Board describes the data on their website:
The table below represents weekly 2018 retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.
The data includes the following columns:
Date- The date of the observation
AveragePrice- the average price of a single avocado
type- conventional or organic
year- the year
Region- the city or region of the observation
Total Volume- Total number of avocados sold
4046- Total number of avocados with PLU 4046 sold
4225- Total number of avocados with PLU 4225 sold
4770- Total number of avocados with PLU 4770 sold
With this information, one can better understand the avocado market and the fluctuations within it. This data also details the region in which these avocados are being sold. This allows us to better understand where to get the cheapest avocados.
Jun 15, 2022
This dataset includes the type of the report, the ID, class, submission date, headline, year, season, month, state, county, location details, nearest town, nearest road, and other information for each sighting. With all of this, you can get a great sense of where these sightings occur, and more importantly, what is being sighted. This data originated from The Bigfoot Field Researchers Organization, and became available on kaggle here.
Jun 15, 2022
This data became available for the StockX sneaker data contest in 2019 and was sourced here. Unfortunately, this data only includes the sales of Yeezys and Off-White footwear. That being said, we can apply the knowledge we gained from this dataset to a broader one if we need to. This dataset includes data from 9/1/17 to 2/13/19. It includes order date, brand, sneaker name, retail price, sale price, release date, shoe size, and buyer region.
Jun 15, 2022
All historic open, high, low, close, trading volume and market cap info for all cryptocurrencies as of the 21st of May, 2018. This dataset includes 1,584 unique crypto currencies, and over 900,000 observations. With this information, one can get a much deeper understanding of the crypto market. While this is not financial advice, one could certainly use this dataset for more informed trading decisions. We can also use this dataset with others to determine how what's happening in the world affects the prices of crypto.
Jun 15, 2022
DC Hero Appearances
This data includes name, number of appearances, page ID, wiki URL, ID (secrete or not), Align (good or bad character), eye color, hair color, sex, first appearnce, and more for each character in the DC universe. With this information, you can really understand a lot about any character, or even groups of characters. We could look at only the older characters, or even compare the older ones to the newer ones. We can look at how physical attributes like eye and hair color are used to signify whether a character is good or bad. We can also look at how these attributes change over time. We gathered this data here.
Jun 15, 2022
Data Science Jobs
This dataset includes position title, company name, job description, number of reviews for the company, and location of job. It was sourced from here. With this data, you can answer questions like:
Who gets hired? What kind of talent do employers want when they are hiring a data scientist?
Which location has the most opportunities?
What skills, tools, degrees or majors do employers want the most for data scientists?
What's the difference between data scientist, data engineer and data analyst?
Can you develop an efficient classification algorithm to differentiate the three job types above?
Jun 15, 2022
Elden Ring Weapons
The Elden Ring Weapons dataset includes information about all of the available weapons in Elden Rings. The data includes the following breakdown for each weapon:
Name - name of weapon
Type - type of weapon
Phy - physical damage
Mag - magical damage
Fir - fire damage
Lit - light damage
Hol - holy damage
Cri - critical damage
Sta - stamina usage
Str - strength scaling
Dex - dexterity scaling
Int - intelligence scaling
Fai - faith scaling
Arc - arcane scaling
Any - special effect damage
Phy - physical blocking damage
Mag - magical blocking damage
Fir - fire blocking damage
Lit - light blocking damage
Hol - holy blocking damage
Bst - boost
Wgt - weight of weapon
Upgrade - which stone should be used to upgrade the weapon
This data was collected from Kaggle, an online community of data scientists and machine learning practitioners, and can be found here.
This data is extremely useful for players looking to maximize the quality of their builds. Having the ability to understand all aspects of a weapon, and more importantly to compare weapons to other weapons can really make the difference, especially in such a difficult game. Before going out to search for a new weapon, check this table to ensure the weapon fits your build perfectly.
Jun 15, 2022
This data includes daily data of gold rates from 1st Jan 1985 to 11th Feb 2022. This data was collected from gold.org and then cleaned, becoming publically available here. With this information, one can have a much better understanding on what is going on in the gold market. We can use this data with other information to understand what events may cause the price of gold to increase or decrease. We can also look at the other currencies/prices to determine the effect of other countries on the gold market. Lastly, we can analyze the rises and falls in price to determine an optimal time to purchase.
Jun 15, 2022
This dataset was built using the Philadelphia Federal Reserve's State Coincident Indices and the Bry-Boschan Method for business cycle dating. It then became available here. In the tradition of Owyang, Piger, et al. business cycles are calculated on the state level which provides interesting analysis opportunities for looking at recession timing for different regions or sectors present in different states. With this information, we could look to predict future recessions, or try to understand why they occurred in the past.
Jun 15, 2022
NBA Player Data
This data includes name, team, position, age, height, weight, college, and salary of all current NBA players from the 2021-2022 season. This data was found here.
With this information, we can look at all sorts of things. We can look at how height, or weight, can affect a player's salary. We can look at how different teams select players based on physical attributes differently. We can also look at what schools produce the best players, what schools produce the tallest players, etc. This information can also be compared to that of previous year to understand how the league is changing.
Jun 15, 2022
This dataset was created via Python using the requests, json, and pandas libraries. The information was pulled on January 16, 2022, and represents all time information for the top NFT collections. As an example, the Sales column represents all sales under a specified NFT collection from its creation up until January 16, 2022.
The dataset consists of the following information:
Index: The index of the file.
Name: The name of the NFT collection.
Volume: The volume of sales from the NFT collection in Solana (SOL).
Volume_USD: The volume of sales from the NFT collection in United States Dollar (USD).
Market_Cap: The market capitalization—total value of the collection's items in circulation—in Solana (SOL).
MarketCapUSD: The market capitalization—total value of the collection's items in circulation—in United States Dollar (USD).
Sales: The number of sales from the NFT collection.
Floor_Price: The lowest price of any NFT in the collection in Solana (SOL).
FloorPriceUSD: The lowest price of any NFT in the collection in United States Dollar (USD).
Average_Price: The average price of an NFT in the collection in Solana (SOL).
AveragePriceUSD: The average price of an NFT in the collection in United States Dollar (USD).
Owners: The number of owners of NFT's in the collection.
Assets: The number of items in the collection.
OwnerAssetRatio: The ownership percentage of all items in the collection.
Category: The category of the NFT collection.
Website: The associated website of the NFT collection.
Logo: The associated image of the NFT collection.
With this information, we hope to better understand what is going on in the NFT world. What projects are doing, how much they are selling for, how many people are buying them, etc. Understanding aspects of the crypto market can also help us understand where people’s money is going and why. From this, you can learn more about where to invest your money, or even what NFT project to create next!
Jun 15, 2022
Redfin Home Prices
This data was downloaded here from RedFin. It has weekly data on housing sales by region and includes region ID, region name, region type, period begin and end, duration, total homes sold, median sale price, homes sold year over year, and so much more. With all of this information, we can get a better understanding of what happens in the housing market. Where homes are being sold, what areas have homes selling for the highest and lowest prices. With this, we can determine where the cheapest place to live is. We can also start looking at possible investments, or look for places where a decrease in price may occur.