top of page
horizontal lines
Gigasheet Primary logo
  • Luciana Obregon

Large PCAP File Analysis 101 with Gigasheet, GreyNoise, and Google

Imagine it's your first day on the job as a junior security analyst and your assignment is to analyze a large packet capture (PCAP) file that was collected from a monitoring port configured on one of the core switches at a remote site. The company you work for has not made significant investments in security technology, so you don't have a lot of enterprise-grade tools in your arsenal to begin your assignment. All you know is that an employee’s machine has been behaving abnormally for a few days but the antivirus software running on the employee’s computer has not detected any malicious files or programs. The employee does not want to have his computer re-imaged because he is concerned about losing important files, so your boss decided to collect network traffic from the local area network where the employee works to try to identify the root cause of the problem.

Your boss asks you to begin analyzing the packet capture which he stored in your team’s network file share. You scan the file share, find the PCAP file, move it to your laptop, and open it with Wireshark. You are ready to put your packet analysis skills to work when suddenly Wireshark crashes. After several attempts to open the file, your computer keeps freezing and crashing. You conclude that the file is too large to open with Wireshark and make your way to Google to look for other alternatives. You learn about Tshark, the command-line version of Wireshark.

You could hunt around for the right commands to read the PCAP file using Tshark, but results will be endless lines of text on your screen. Tshark filters can help you make sense of the data, but by now you've spent several hours just to open and analyze a large PCAP file.

Luckily for all you junior security analysts out there, there is a simpler way to analyze packet captures; one that does not require learning complicated command-line tools and syntaxes. Any security analyst, regardless of their level of experience, can apply these techniques.

In this blog post, we will show you how to analyze large PCAP files using Gigasheet, the big data spreadsheet built for cybersecurity, and GreyNoise a provider of internet-wide scan and attack data. Here we illustrate the power of Gigasheet by analyzing a sample packet capture file from Stratosphere Lab, which contains network traffic associated with malware.

Step 1: Convert PCAP file to CSV

UPDATE: Gigasheet now supports raw PCAP file analysis! Upload a big PCAP and Gigasheet will extract some standard fields from it into a clean sheet.

Gigasheet allows you to upload and analyze huge csv's and log files (you can request your free account here). The first step in the process is to convert the PCAP file to CSV format. In this example, we use Tshark to export all packets in a 274 MB PCAP file, gigasheet.pcap, into a CSV file.

The Tshark commands below read the gigasheet.pcap file and extract the packet number, timestamp, source and destination IP addresses, protocol, length, and other OSI-Layer 7 information to the gigasheet-csv.csv file.

The resulting CSV file is 208 MB and contains over 2 million rows!

Step 2: Upload CSV File to Gigasheet

The next step is to log in to Gigasheet, upload the CSV file, and begin analyzing the data. We do not know much about the specific malware contained within the PCAP file. All we know is that the file contains traffic associated with malware, but we don’t know the malware type, ports, or protocols used to communicate outbound, or the IP address(es) of the infected system(s).

Upon uploading and processing the CSV file, Gigasheet displays seven columns:

- Column A: Packet number

- Column B: Timestamp

- Column C: Source IP address

- Column D: Destination IP address

- Column E: Protocol

- Column F: Packet length

- Column G: Information

Gigasheet makes it easy to convert time to different formats, such as Universal Time Coordinate (UTC). By default, Wireshark displays all timestamps in absolute time (in seconds) since the beginning of the capture, therefore, we need to normalize the time displayed in column B, Timestamp, which Gigasheet can do using the Time Cleanup function.

Gigasheet allows you to enrich data with intelligence from popular threat intelligence providers. In this example, we'll use GreyNoise (you can sign up for a free API token here). We'll run the enrichment feature on Column D, the destination IP addresses, to identify any IP's that may have been observed by GreyNoise in the past as being malicious, noisy, or suspicious.

After enriching Gigasheet creates a new column, H which contains the GreyNoise response. We'll use the Group feature on Column H, to bucket the data. Here we see three unique values:

- Invalid

- Never_observed

- Noise_ol