A few weeks ago, a client approached us for advice on how to accurately determine his company’s level of exposure to COVID-19-related web scams. As it turns out, a large number of his employees had fallen prey to a COVID-19 phishing campaign that purported to originate from the Center for Disease Control and Prevention (CDC). The phishing email notified recipients that someone they had been in contact with had tested positive for COVID-19 and urged them to click on a link to get recommendations on how to self-isolate and get tested. While some employees quickly realized that the email was fraudulent others became a victim to the phishing scam. Luckily for this company, and thanks to the scammer’s zero attention to detail, the link inside the email was misspelled and returned a “page not found” error upon accessing it.
We quickly started thinking of different ways to help our client address his request. We began our quest at the web proxy server used at the client’s site to control and monitor all end-user Internet traffic. Our initial thought was to check the proxy server’s URL category database to determine if a URL category for COVID-related websites existed. But this would have been too easy, wouldn’t it? Most of the COVID-related websites were categorized as Health, Placeholder, or Security Risk, therefore, we focused our attention on websites in the Placeholder and Security Risk categories since most of the websites categorized as Health would have probably been legitimate. We then extracted six-months’ worth of logs from the web proxy server and ran into the first issue. The CSV file exported from the web proxy was 11.77 GB, effectively crashing every computer used to open it using Excel.
We abandoned Excel and tried our luck with a Linux terminal. Using a variety of Linux commands, we opened the CSV file, filtered on COVID-related keywords using grep, and saved the output to a new file in hopes of getting a small enough sample to open in Excel. We quickly realized that this was not going to work for various (somewhat obvious) reasons:
Realizing that our approach was leading to utter frustration, we decided to obtain a list of ~1,800 fraudulent COVID websites from a popular threat intelligence provider (in this case, ThreatConnect). Our idea was to extract the past thirty days’ worth of logs from the web proxy server to get a small enough CSV file that could be opened in Excel and then write a search formula that would look for the 1,800 fraudulent COVID websites in the web proxy logs. We were now getting somewhere, but the results were only for the past thirty days, which did not paint an accurate picture of the level of exposure to COVID-related scams. We repeated the process five more times, each time extracting thirty days’ worth of logs from the web proxy until we covered the past six months.
The process of extracting and correlating log data with threat intelligence could have been a lot simpler and quicker had the client deployed a Security Information and Event Management (SIEM) with threat intelligence capabilities. But having worked with small to mid-sized clients for a while now, I’ve realized that it is not always possible for small companies to invest in enterprise-grade security tools and applications because they would rarely get a return on their security investments.
Now, imagine there was a tool that allowed you to upload large CSV files, quickly separate text into columns, and could integrate with your favorite threat intelligence feed to enrich and contextualize your data, all while being affordable? Lucky for us, there is!
Gigasheet is more than just a billion-row spreadsheet. It is a billion-row spreadsheet purpose-built for security practitioners to solve everyday security problems. Gigasheet’s threat intelligence integration capability allows you to connect your spreadsheet to your favorite threat intelligence provider. All you have to do is get an API token from your provider, select the columns in your spreadsheet that you want to enrich with threat intelligence data and Gigasheet does the rest, producing fast and accurate results effortlessly.