Cross Account AWS Athena for SecOps (Security Operations/Incident Response)

Athena enables us to write SQL-like queries and search CloudTrail and other logs in S3 without any pesky databases. This can be an excellent tool for security operations when you aren't using a big expensive external tool.

CloudSLAW is, and always will be, free. But to help cover costs and keep the content up to date we have an optional Patreon. For $10 per month you get access to our support Discord, office hours, exclusive subscriber content (not labs — those are free — just extras) and more. Check it out!

Prerequisites

The Lesson

Everyone likes to think their job is cool. And if you have “security” in your profession, apparently the more you can use terms out of a James Bond, Jason Bourne, or other special forces movie… the better.

So it’s time to introduce you to SecOps: security operations. I put it in bold so you know how important it is!

The term Security Operations is a bit of a loaded one — it means different things to different people. It can cover anything operational in security, which is a huge scope that spans from running tools and fixing things, through domains including security architecture (design), to security engineering (installing and maintaining security tools). That said, if you go to a conference and see “SecOps” on a banner, they probably mean everything we wrap around incident detection and response.

Look, don’t get too hung up on this stuff. I’ve been in the industry nearly 30 years and try not to get emotionally attached to, I dunno, dictionaries?

But SecOps sounds cool and makes good clickbait for your friendly neighborhood cloud security newsletter/blog, so that’s what I’ll use for our new series of labs exploring the different domains of Security Operations and Cloud Incident Response.

Security Monitoring: Your Feeds and Speeds

When I wrote the Cloud Security Maturity Model, I placed Security Monitoring in the Foundational domain. It’s one of the most important cloud security activities, one I slot just behind IAM. We’ve already set up nearly all our core monitoring, but I never really gave you the full context for how we tie this all together for incident detection, analysis and response.

This is one of those areas I’ve been writing, teaching, and speaking about since before CloudTrail even existed. I call it the Feeds and Speeds talk because effective security monitoring relies on knowing what security telemetry sources to collect, how to collect them, and the time it takes from something happening to someone seeing it. This is a huge deal in cloud since attackers use automated tools which operate lighting fast, and many traditional methods of security monitoring lag 15-60 minutes behind an attack.

Today I’ll limit myself to reviewing the 3 main categories of security feeds:

  • Logs are activity streams saved into files stored someplace. So far we’ve set up CloudTrail feeding S3 as our primary log for management plane activity. Another example of logs, which we haven’t used yet, is VPC Flow Logs.

  • Events are the real-time stream of activity, alerts, and… uh… events issued by various services. In our labs we have used CloudTrail events (when we need real time, instead of the slower S3 logs), and GuardDuty and Access Analyzer via Security Hub. 

  • Configuration(s) are detected misconfigurations. We haven’t set this up yet and will get to it, but there is an entire category of tools called Cloud Security Posture Management (CSPM) to look for these misconfigurations. Many security professionals treat these more like a vulnerability scanner (scan, get report, complain to someone to go fix their sh-t), but savvy incident responders know that most cloud native attackers create misconfigurations to exfiltrate data and do bad things, and we should treat some of these as security alerts.

Each of these provides a security pro with a different piece of the puzzle. Many many services generate logs, and these are our bread and butter for monitoring and detection (even for you gluten-free types). They support everything from fast detection, to deep investigation, to meeting compliance requirements.

Events, as you’ve seen if you’ve competed our previous labs, offer near-real-time security visibility. Logs always have a delay to reach analysis tools because they must be generated, batched, and transferred; while events come in a real-time stream of information (which isn’t saved unless we take action to save it).

Configuration is a bit special, and I don’t want to get distracted by it today. The TL;DR is that posture tells us the outcome of actions, and can identify risky or malicious outcomes such as sharing an S3 bucket to an unknown account.

Log Analysis with Athena

We’ve been collecting CloudTrail logs since our earliest labs, and today we’ll set up a service we can use for threat detection and analysis. Threat detection is the process of identifying potential attacker activity, while analysis is the process of determining what happened and tracing attacks. A threat detector is an automated alert for “oh crap, someone may have broken into the safe,” while analysis helps you figure out whether it was an attacker or just a legit employee who forgot to tell someone they were going into the safe.

In previous labs we showed how to alert directly off a CloudTrail event in EventBridge. That’s great for simple triggers, but analysis and more complex triggers require searching and analyzing logs. That’s where AWS Athena comes in.

Athena is a service which enables you to run SQL-like queries on data in S3. What’s cool is that it works with different types of data — it doesn’t need to be in a structured database. Under the hood Athena uses Presto, which gets quickly into data science above my pay grade. To use it you define a schema. The schema is essentially a logical mapping layer which tells Athena, "when you read these JSON files in S3, interpret field X as a timestamp, field Y as a string, etc." without actually modifying or preprocessing the underlying files.

Athena isn’t free, and there is a lot of nuance to working with it at scale, but we can get into those details later (hello, partitions!). Our costs should be extremely low, especially since we configured S3 so we only keep 3 months of CloudTrail logs.

But there’s one major nuance for our labs: our logs are in LogArchive, but we plan to run security investigations from our SecurityAudit account. The good news is that it’s pretty simple to configure Athena in one account to query a database in a different account, with just a permissions update.

What the heck is a SIEM, and where does Athena fit in?

SIEM stands for Security Information and Event Management. It’s a category of tools purpose-built for security monitoring and alerting. There are many commercial SIEMs on the market. These tools collect and normalize security-relevant logs and enable you to build alerts, run investigations, and more.

Athena is just a query tool we can use to analyze any kind of data. When I teach incident response and detection engineering for cloud, I use Athena a lot because it’s native to AWS and cost effective. It is not a replacement for a commercial SIEM without a lot of work. What’s nice is that I can teach you on Athena, and the skills will translate to whatever SIEM your employer uses, but you’ll understand working with native data.

Key Lesson Points

  • Different security telemetry feeds have different latency.

  • There are three major categories of feeds:

    • Logs are saved to files, and are the most common and detailed.

    • Events are real-time, but aren’t saved unless we capture them when they happen.

    • Configuration “alerts” come from Cloud Security Posture Management tools and tell us when something is misconfigured, which can indicate an attack.

  • AWS Athena enables us to directly query logs saved in S3, including CloudTrail, for threat detection and incident analysis.

The Lab

For our lab we will add a permission to our CloudTrail bucket in S3, so we can run Athena in our SecurityAudit account. Then we’ll configure Athena to work with an Organizations CloudTrail, which is a bit different than setting it up in a single account. We’ll finish by running a test query, seeing how crappy the performance is; then I’ll say “partitions later”, since we want to keep things simple for now.

Video Walkthrough

Step-by-Step

First up: for this lab there is a lot of copying and pasting, so open up a blank text file.

Then go to your Sign-in portal > SecurityAudit > IAM > copy and paste the ARN of AWSReservedSSO_SecurityFullAdmin_xxx.

We also need our Organizations ID. Go to Organizations > copy and paste the Org ID:

Really, you want these in a text file for reference later:

NOW CLOSE THAT TAB AND > Sign-in portal > LogArchive > AdministratorAccess > S3 > Click your CloudTrail bucket:

Time for more copy and paste! Properties > Copy and paste the bucket ARN and the bucket name:

My text file now looks like this:

Permissions > Bucket policy > Edit:

We need to add a statement to the policy to allow our SecurityFullAdmin role from SecurityAudit access to this bucket so it can run Athena queries. First up, copy this JSON and paste your role ARN and the bucket ARN in the indicated spots (***replace all this***). You should copy and paste a total of 3 times!

{
    "Sid": "SecurityFullAdminCloudTrailAccess",
    "Effect": "Allow",
    "Principal": {
        "AWS": "***arn of the AWSReservedSSO_SecurityFullAdmin role***"
    },
    "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
    ],
    "Resource": [
        "***bucket arn***",
        "***bucket arn***/*"
    ]
},

Then copy your updated JSON and paste it right before the first statement, to look like this:

Now it looks like this, and I highlighted the text I replaced:

Scroll down and click Save Changes.

See? Now I’m pretty!

Normally this is where we would set up an S3 bucket to hold our Athena queries. That's great, but can add to our costs and, as you know, I'm nearly as cheap as all of you. Our costs would have been low, but hey, now we have a free option! AWS just launched (as in, I learned about it yesterday) a new managed query capability which stores queries and results 24 hours for free. This is a great new option for security operations since we tend to either run ad hoc queries for incident investigations, or automated queries for threat detection, and don't need to necessarily store those results long-term.

There's one exception, and it's a big one. When analyzing a real security incident, you absolutely need to save all your query results. But you can export or save them selectively — you don't need to keep every query ever written until the dawn of time (or until someone complains about your storage costs).

NOW CLOSE THAT TAB AND > Sign-in portal > SecurityAudit > SecurityFullAdmin:

Then Go to Athena > Click the "hamburger" (three lines in the upper left corner) > Workgroups > Create workgroup:

Name it security and leave all the defaults, scroll to the bottom, and click Create workgroup. The main defaults we are using are Athena SQL for the query engine, and setting up the brand new very shiny Athena managed query result configuration. This keeps your queries and results for 24 hours free. Why would you want it longer? That’s more for ongoing data analysis stuff we security grunts try to avoid.

Click Query editor:

Okay, we have our data in S3 and our permissions all ready to go, but we need to set Athena up to talk to our bucket ‘o CloudTrail logs, which is hanging out in LogArchive. If you’ve worked with databases before, you are probably used to running SQL queries to do everything — including creating databases and tables. That’s how Athena works; we start by creating our database, then we create our table. When we “create the database”, that doesn't actually create a new database — it actually creates the metadata and mapping to our files in S3. Don't think about it too hard — data stuff is weird, and beyond our scope as security plebes. You’ll paste in each of these commands and then click Run.

First Workgroup > security:

Copy and paste, then Run:

CREATE DATABASE IF NOT EXISTS cloudtrail_logs

Easy, right? Now we need to create our table, which maps our expected JSON fields into table columns. This is the magic of this distributed database stuff. For this one you need to copy this SQL, then swap in your S3 bucket name AND your organization’s ID at the bottom (look for *** on the last line):

CREATE EXTERNAL TABLE IF NOT EXISTS cloudtrail_logs.organization_trail (
    eventversion STRING,
    useridentity STRUCT<
        type: STRING,
        principalid: STRING,
        arn: STRING,
        accountid: STRING,
        invokedby: STRING,
        accesskeyid: STRING,
        userName: STRING,
        sessioncontext: STRUCT<
            attributes: STRUCT<
                mfaauthenticated: STRING,
                creationdate: STRING>,
            sessionissuer: STRUCT<
                type: STRING,
                principalId: STRING,
                arn: STRING,
                accountId: STRING,
                userName: STRING>>>,
    eventtime STRING,
    eventsource STRING,
    eventname STRING,
    awsregion STRING,
    sourceipaddress STRING,
    useragent STRING,
    errorcode STRING,
    errormessage STRING,
    requestparameters STRING,
    responseelements STRING,
    additionaleventdata STRING,
    requestid STRING,
    eventid STRING,
    resources ARRAY<STRUCT<
        ARN: STRING,
        accountId: STRING,
        type: STRING>>,
    eventtype STRING,
    apiversion STRING,
    readonly STRING,
    recipientaccountid STRING,
    serviceeventdetails STRING,
    sharedeventid STRING,
    vpcendpointid STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://***bucket name***/AWSLogs/***org-id***/'

Take a minute to look at that. You can see that we defined all the potential CloudTrail fields you see in the JSON, then mapped them to strings and stuff. This is the schema.

There's one big flaw in what we just did. As I mentioned, Athena supports partitioning to improve performance and reduce query costs. I went back and forth on implementing partitioning in this first lab, but my wife reminded me that when you are first learning the, queries it's easier not to deal with the added complexity. Partitioning comes with some big complexity demands in our scenario, because once you partition you need to specify more criteria in your queries. Since we only keep 3 months of logs for a relatively small number of accounts, we should be fine on costs and can get by with simpler queries. I'll cover partitioning in a future lab, since it's important to understand.

With all that out of the way, let's run our first query to make sure everything works:

SELECT eventtime, eventname, useridentity.accountid, awsregion
FROM cloudtrail_logs.organization_trail
ORDER BY eventtime DESC
LIMIT 100

Notice how long that took? That's the big downside of not using partitions: Athena needs to scan everything. Costs aren't bad, $5 per terabyte scanned (per query) for our region, but you can see that could add up quickly. Queries also take a very long time to run (about a minute for mine). You tend not to see these issues in a single account setup, but organizations run into them quickly.

Don't worry — once we cover all the query basics we will partition, and then I'll show you how to use lambda functions to automate these queries and use them as threat detectors.

Key Lab Points

  • To query S3 in one account using Athena in another account, start by setting up permissions for the role(s) which will access S3.

  • Athena now supports managed query storage, which keeps your queries free for 24 hours. You configure it in Workgroup settings.

  • Working with Athena is like working with any other database. You use SQL queries for everything.

  • Without partitioning Athena can be slow and expensive. We will get to partitioning later.

-Rich

Reply

or to participate.