horizontal lines
Gigasheet Primary logo
  • Luciana Obregon

Insider Threat Hunt Series: Part Two, Finding the D-Grunt

This blog is part two of the Insider Threat Hunt: The Series, a collection of posts designed to demonstrate the process of analyzing large synthetic data sets for insider threat patterns. In part one of the blog series, we showed you how to identify users who attempted to access company information assets after being offboarded. This blog will analyze another dataset from Carnegie Mellon University's Insider Threat Dataset (available for public download at KiltHub) to identify a user who, before terminating employment with the organization, logged into a company after hours and uploaded data to wikileaks.org from removable media.



If you would like to follow along, create a free Gigasheet account, and either download a copy of the dataset or access them directly in Gigasheet via the shared links below, and get hunting.


The Dataset


The dataset used in this demonstration is approximately 8 GB compressed, containing seven (7) data sources. However, for this demonstration, we only use the following four (4):

  • device.csv (view in gigasheet here): records device activity, including power on and off and removable media events

  • http.csv (view in gigasheet here): includes URLs accessed by users during and off business hours. As indicated in the readme.txt accompanying the dataset, some of the URLs in this file may lead to malicious websites, so be mindful when accessing them (or don't access them at all)

  • logon.csv (view in gigasheet here): contains user logon and logoff activity

  • ldap.csv (view 2010-03 in gigasheet here, and 2010-04 here): holds employee records, including on and offboarding events


The Analysis


We uploaded the device.csv, http.csv, logon.csv, and the multiple (18) LDAP files to Gigasheet. Each LDAP file is named YYYY-MM.csv, where YYYY indicates the year and MM the month the file was generated.


Let's revisit the insider threat scenario: we are looking for a user who, after business hours, uploaded data to wikileaks.org from a USB thumb drive before terminating employment with the company. Hence, the first natural step would be to look for any mention of wikileaks.org in the http.csv file.


In the http.csv file, we can use the built-in search function to see if any rows contain the word wikileaks.org. After a few seconds, Gigasheet returns three (3) matches, which we can find using the up/down arrows next to the search box or filtering the URL column for values containing wikileaks. The filter returns three records containing two unique user and PC names:

  • Users: ONS0995 and HCH0089

  • PCs: PC-3585 and PC-2597


The scenario notes that the user accessed wikileaks.org after hours, automatically ruling out HCH0089 because this user's access to wikileaks.org was at 14:58 on June 7, 2010.

Next, we can take two different analysis paths:

  1. Analyze the devices.csv file to correlate the time of access to wikileaks.org to a USB device event

  2. Analyze the LDAP files to identify which of the two users terminated employment with the company in the same month or months after accessing wikileaks.org.

Let's start with analyzing the devices.csv file for any USB device event that may indicate one of the two users connected or accessed a USB thumb drive around the time wikileaks.org was accessed.

Upon opening the devices.csv file, we can filter the PC column for any values containing PC-3585 or PC-2597, resulting in 3,456 matches.




Subsequently, we can filter the DATE column for events dated before wikileaks.org was accessed. From the http.csv file, we learned that the two users accessed wikileaks.org between March 6 and June 7, 2010:

  • User ONS0995 accessed wikileaks.org on two different dates: 2010-03-06 01:47:22 and 2010-03-20 01:59:32

  • User HCH0089 accessed wikileaks.org once on 2010-06-07 14:58:34

Therefore, our DATE filter must return device events before 2010-03-06 and 2010-06-07. The filter returns 654 results, which are still too many to review one by one, so let's look at each user individually, starting with ONS0995.




Grouping by the USER and the PC column returns twelve (12) unique device events for ONS0995 and PC-3585.


In the http.csv file, we could see that user ONS0995 first accessed wikileaks.org on March 6, 2010, at 01:47:22. The device.csv file shows USB activity of type "insert" on March 6, 2010, at 01:47:08, a few seconds before the user accessed wikileaks.org. Similarly, the second (and last) USB device event of type "insert" took place on March 20, 2010, at 01:47:28, approximately 12 minutes before the user accessed wikileaks.org for the second time.



The absence of USB device events for ONS0995 after March 20, 2010, may suggest that the user left the company in or after March, but we cannot make such a conclusion until we analyze the LDAP dataset.


In the first blog of the Insider Threat Hunt: The Series, we indicated that each LDAP file contains a list of active users at the end of the particular month. We also mention that the LDAP file for a specific month runs at the end of that month; therefore, users who end employment in the middle of a month will be included in the previous month's LDAP file but not in the LDAP file for the month the employment ended. For example, users departing in June will appear in May's LDAP file but not in June's.


Let's start by analyzing the LDAP file for March 2010. Searching for ONS0995 in the 2010-03.csv file returns one match, a director named Otto Nero Schwartz, revealing that the user was still with the company in March.



If our assumptions are correct and the user did leave the company in March 2010, a search for ONS0995 in the 2010-04.csv file (April LDAP file) should not return a match.


As illustrated below, the user does not show up in the LDAP report for April 2010, confirming our hypothesis.