This blog will demonstrate how to do an analysis of network security data using a dataset from the 2019 Trendmicro CTF Wildcard 400. We have shared this file in Gigasheet publicly here so that you can follow along.
This TrendMicro Capture The Flag exercise involves using network flow data to uncover anomalous security events. The data presented here is synthetic and does not represent typical network protocols or behavior. As a result, an extensive understanding of network protocols is not required for these issues.
While network flow datasets can typically overwhelm tools, Gigasheet can handle up to 1 billion rows of data in a file.
The gowiththeflow_20190826.csv file does not originally come with header names. If your file does not have header names, you can add them in Gigasheet for easier analysis. Headers help keep your data organized. We can add header names by clicking the three lines on the far right and selecting rename.
Adding Header Names
I named the headers Time Frame, IP.SRC, IP.DST, PORTS, and BYTES SENT.
Added Header Names
Now that we have the headers named, we can answer some questions from TrendMicro's 2019 Capture the Flag exercise.
Q1: Our intellectual property is leaving the building in large chunks. A machine inside is being used to send out all of our widget designs. One host sends out much more data from the enterprise than the others. What is its IP?
To find out the IP address, we used our grouping feature to group the IP.SRC column. By selecting the three lines on the right, you can select group.
I identified 18.104.22.168 as the bad IP and looking at the distribution of the outbound of traffic; I can see that this is suspicious because it's the only one that stayed under 50 GB to not seem suspicious.
Q2: Another attacker has a job scheduled that export the contents of our internal wiki. One host is sending out much more data during off-hours from the enterprise than the others. Office hours are between 16:00 to 23:00. What is its IP?
I first sorted the bytes by selecting Sort Sheet - 9 to 1
Sort Sheet 9 to 1
This then showed me what value of the bytes sent was the highest. By also using our aggregations feature in the bottom right under the Bytes Sent column and selected max to see the highest number of bytes sent.
It is clear that the host with IP 22.214.171.124 is exfiltrating data during off-business hours, and the highest number of bytes sent is 843091.
Q3: We're always running a low-grade infection; some internal machines will always have some sort of malware. Some of these infected hosts phone home to C&C on a private channel. What unique port is used by external malware C&C to marshal its bots?
By using our filter feature, I filtered port 113
I found that port 113 is used to communicate with internal machines with only one external IP address (126.96.36.199), it was the only IP address doing so.
This is most likely the port used by external malware C&C (command and control) to marshal its bots.
Gigasheet makes this kind of analysis easy on any CSV, JSON, PCAP, or EVTX file.
Sign up at gigasheet. co so you can try this on your own too.