horizontal lines
Gigasheet Primary logo
  • Syed Hasan

Incident Response in the AWS Cloud

Amazon Web Services (AWS) is one of the most dominant public cloud platforms to date. As cloud migration continues, traditional incident response operations also migrate to the cloud. Today, we’ll explore a security incident in the AWS cloud by utilizing Gigasheet for log aggregation and data analysis.

How to perform Incident Response in the AWS Cloud quickly and easily (Like the sun coming out / Ooh, I just know that something good is gonna happen / I don't know when)

Although Cloud Incident Response poses its own set of challenges, AWS has several services to assist responders during operations. Let’s dive right into our simulated incident, where we’ll see most of these services in action, and then put Gigasheet to good use.


Performing Incident Response in AWS Infrastructure

Here, I’ve simulated an environment in my test AWS infrastructure such that a single web server is running in the North Virginia region. It is configured to run in the default VPC with a route out to the internet. The web server is configured to run an instance of WordPress which is configured by the administrator of the account.


Simple, right? It was deployed in seconds using a CloudFormation template (you can find it here).


Now, the administrator appears to have tweaked the server to make things a tad bit easier to manage. Anyhow, the administrator does have good logging enabled; VPC flow logs and CloudTrail logs are being routed to an S3 bucket and a helpful SNS alert has also been set up to monitor new instances being launched in North Virginia using CloudWatch. Whew, that’s a relief.


Identification

One not-so-fine evening, the administrator sees an email pop up which doesn’t sound right. It’s the CloudWatch alarm he set up to monitor newly launched instances. It appears that a new instance has been launched in North Virginia (US-EAST1). Only problem is.. he didn’t do it. How’d it happen?!

SNS Notification via Email - A New Instance is Launched
SNS Notification via Email - A New Instance is Launched

Springing into action, the administrator dials in someone who could help - the “responder”.


Containment

At this point in time, the responder only knows that an instance was launched in the North Virginia region at 15:48:02 (UTC) on 2022-03-22. Whether or not this was an incident will be based on further analysis. Luckily, the administrator had several handy alarms set in place to monitor suspicious activity. None of them triggered.


Assuming no other suspicious activity happened on the victim account, let’s scope our incident to the instance itself. Now, there are several things we can do here with the instance itself:


● Hibernate the instance (if hibernation is enabled; to ensure contents of the RAM are preserved)

Hibernate Instances in AWS
Hibernate Instances in AWS

● Snapshot the EBS (root) volume the instance

Snapshot EBS Volumes in AWS
Snapshot EBS Volumes in AWS

● Isolate the instance using security groups to avoid further damage (NACLs aren’t recommended as they might be linked with other instances operating in the same VPC as the suspect instance)


Now that we’ve taken initial actions to scope and isolate the incident as well. Let’s quickly move to identifying the root-cause of the event.


Root Cause Analysis

We’ve got several possibilities here as to what “could’ve” had happened which led to the event. Here are a few initial hypothesis’ we can use to further pivot into this potential compromise:



● Was the root user compromised?

● Was the administrator’s account compromised?

● Were there misconfigurations in the account which led to the creation of the instance?

● Were any auto-scaling groups actively used to spawn instances for scaling?


Some of these are fairly easy to answer. To look up whether the root or administrator user was used to launch the instance; we can simply look up their last activity in the Identity and Access Management (IAM) service. Luckily, both accounts and their last activity is accounted for.


Lastly, there seems to be no auto-scaling group actively launching instances in the North Virginia region. Well, that’s a bummer.


This is likely a misconfiguration in a service or account which led to the compromise. How can we determine that? CloudTrail is your only bud (If you’re looking for details into what the service is, I’ve got you covered in the next section).


Continuing with CloudTrail logs, let’s fetch them from the S3 bucket. Download it all to disk. It’s time to transform them into a single file such that we could send it over to Gigasheet. You can use this script to achieve your goal:



import json
import glob
import csv

ctLogFiles = glob.glob("*.json")
jsonData = []
import pandas as pd

for f in ctLogFiles:
    with open(f, "rb") as infile:
        jsonData.append(json.load(infile))

allRecords = []
for json_file in jsonData:
    allRecords += json_file['Records']

finalJsonFile = open('CloudTrailLogs.json', 'w')
allRecords = json.dump({ "Records": allRecords }, finalJsonFile)
finalJsonFile.write(str(allRecords))
finalJsonFile.close()


Let’s upload the JSON to Gigasheet. Simply log in to your account, head over to the Your Files page, and click Upload. Pass it your JSON file and that’s it. Sit back and let Gigasheet quickly parse the file out for you to analyze it.

Upload Files to Gigasheet
Upload Files to Gigasheet
Fun Fact: Gigasheet can handle datasets with billions of rows of data without breaking a sweat. Don’t believe me? Take a look at this video where Steve gives a demo of analyzing a huge dataset!

Let’s jump into analysis. To get started, we have three things:


Timestamp at which the instance was created (or went into running state)

Region in which the instance was created (we’ll restrict to analyzing CT logs from the same region i.e., us-east-1)

● The fact that an instance was created; since CT logs all API calls, it will most definitely log the RunInstance event which is typically used to launch instances (from the CLI or the Management Console)


Filtering on the event, there are just 7 rows. That’s a fairly small dataset. If you check the EventTime field, you’ll see these events are also quite close to the actual instance launch.


4 of these 7 rows appear to have an ErrorMessage field populated. CloudTrail logs both successful and unsuccessful attempts to utilize the AWS API.

CloudTrail also shares the SourceIPAddress from which the request originated. Swiping to the right, we see that 2 of these requests have AWS Internal as the SourceIP. These calls were likely made by the administrator from the console. The rest of the five API calls are launched from 44.202.228.109. What’s strange is that this IP address is actually of the instance which was already launched in the North Virginia region (the user-agent string also points to usage of AWS CLI to launch the instance).


Filtering on the SourceIPAddress, we can see just 12 requests originating from the compromised instance. Only two are important; RunInstance and ListBuckets. It seems to be that we’ve stopped the attacker from doing anything else with the compromised instance.

How was the victim instance compromised though? That’s where we’ll pivot next.


Let’s revert to the system logs. We’ve got the entire /var/log/* directory which we can upload to Gigasheet. Firstly, let’s open up the authentication logs (secure log file). Although the fields aren’t parsed, we can apply Functions to split the column into multiple values (perhaps based on space as a delimiter - you’re free to explore all functions!).

Quick Tip: You can transform your data in Gigasheet however you like. Want to split it? Want to combine a few columns together? Rename them? Have at it!

My initial guess would be a password compromise. Looking up “Failed Password” in the logs, we can see several failed logins from the IP, 123.214.118.119.


Lastly, there’s an actual successful login from the same IP address against the root user when we filter against the IP address.

This isn’t the end of the incident of course; What else did the attacker perform on the system? Was persistence acquired? Were any commands executed on the system? Was data exfiltrated? There are several questions we can ask and try to answer to conclude this incident.



We’ll skip past that to share that we in fact responded to the incident very timely. The attacker only launched a new instance in the region but didn’t connect to it. Neither was any data stolen from the compromised system. It could’ve been worse such that if the attacker passed in user-data (scripts to run at instance start-up), custom payload would’ve been executed on the new instance:



aws ec2 run-instances --image-id ami-0c02fb55956c7d316 --instance-type t2.micro --iam-instance-profile Name=MyInstanceRole --key-name myKeyPair --security-group-ids sg-08bb08a0f76ec0ec –user-data file://my-reverse-shell.sh

With that said, let’s go over a few security misconfigurations which need our urgent attention.


Incident Remediation in the Cloud

Roles are similar to what users are in the AWS cloud. The only difference is that the permissions assigned to them can be assumed by any other service or identity which needs them. As such, roles need to be carefully created while following the principle of least privilege.



In our case, the role was configured to allow the user (or the EC2 service more specifically) to launch instances and pass the role. Such permissions are fairly dangerous and should be used with caution.


Root logins should be disabled on the system. Password-based logins should be disabled to reduce the chances of a brute-force. We may even restrict SSH in security groups to limit the service to known IP addresses only.

Quick Tip: Gigasheet allows you to chart and visualize your data. What's better than visually rich and detailed incident reports? Try Gigasheet today!

Continuous Monitoring in the AWS Cloud

Like I said before, AWS has released several great services which can aid incident responders and SOC operations in continuous monitoring. We’ve seen some of these in action in the last section. Now, let’s discuss the services in a little more detail:


AWS CloudTrail (CT)

AWS CloudTrail is a service that allows you to audit, log, and manage user activities in your AWS account. User activities are registered as events in the shape of logs (which combine together to form a trail). The service logs all actions taken in the AWS Management Console, AWS Command Line Interface, and AWS SDKs or APIs.


Luckily, this handy service is enabled by default on all new AWS accounts. However, the default retention is set to 7 days. For long-term retention, trails need to be configured to send logs to S3 buckets. Later, these trails can be downloaded, analyzed, organized, and checked across the platform.


Gigasheet’s out of the box JSON integration can help you run through CT logs with ease. Although you can use AWS Athena and AWS CloudTrail Lake to try and achieve a similar outcome, the services are fairly costly and restrict you to using SQL for querying the data.


AWS GuardDuty

AWS GuardDuty is a continuous (threat) monitoring service tasked to ensure the safety and security of an AWS account. It sources its data from the following (non-exclusive) list of services:


● VPC flow logs