Own (or PWN) the Org with CloudFormation StackSets

CloudFormation StackSets let you deploy anything everywhere, almost all at once. In Part 2 of our Epic Automation series we'll learn how to use them more securely, and why you need to lock them down.

Prerequisites

The Lesson

I have, you might say, a love/hate relationship with CloudFormation StackSets. The kind of relationship that has resulted in my children learning some colorful metaphors a bit younger than society commonly approves of.

What are StackSets? Here’s the short version: they enable you to deploy CloudFormation templates across multiple accounts and regions in your AWS Organization. StackSets solve problems like “I need to deploy this IAM role for my Ops team into every account and region we use. And I need it to automatically deploy the role as I add new accounts.”

You really shouldn’t need StackSets for a ton of things, since you can achieve the same outcome using an infrastructure as code deployment pipeline hooked into all your accounts. But StackSets are the tool to hook those pipelines into the accounts in the first place. I like them for consistently deploying core components of shared services. But I prefer to keep them simple, and really limit who can use them.

Here’s how it works:

  • StackSets start in your Management Account and are always enabled, but you can also delegate administration into other accounts (WHICH WE WILL ALWAYS DO BECAUSE RICH SAID SO, AND THAT’S ALL YOUR CIO NEEDS TO KNOW!).

  • A StackSet is just CloudFormation at heart. There’s no special formatting or anything, but you should write them in ways that work across regions.

  • There are manual and automated modes. We’ll focus on automation for this lab, because that’s what we need. I tend to not use manually deployed StackSets often.

    • What I mean by manual is that deployment is one and done. It’s a one-time push operation to deploy the stack where you want it.

    • In automated mode, when you create a StackSet you assign it to Organizational Units and Regions.

      • It deploys into all of those OUs and regions.

      • If an account is moved into the OU, the StackSet will automatically deploy.

      • If you update a StackSet, it tries to update in all the accounts in those OUs/regions.

      • There is an option to delete the StackSet if you remove the account from the OU. Imagine a scenario where you deploy something you want in all Dev workloads, but not necessarily when you move the account into Prod.

  • When you start using StackSets it creates a service-linked account with full administrative privileges in all the accounts in your organization.

    • In researching this post I learned there is the option to create self-managed permissions for StackSets. Which means instead of full admin everywhere you can define specific permissions in specific accounts. This is a cool update, but for reasons you are about to see, we will use the service-linked role option today. Kudos to AWS for this.

  • There are more advanced options and capabilities we won’t cover today. For example, you can use a Lambda function to validate something in the account is the way you want before pushing the StackSet — this is called a target account gate.

Here’s a great diagram from AWS, superior to what I can create:

The biggest issue I’ve had with StackSets is that they aren’t resilient to drift. If you aren’t familiar with the concept, drift is the term we use when some IaC has been deployed but then someone changes something specified in the template. Then if you push updates or delete the template it may fail (in my case, it usually fails).

Here’s a simple example: if you create a new VPC with a StackSet and then, in one of the accounts, someone creates a security group by hand, you can’t delete that VPC via StackSets until you delete that new security group. Sometimes things break, sometimes they don’t, but when managing hundreds or thousands of accounts it doesn’t take much to run into trouble. I will show you how to use Service Control Policies to minimize drift on critical security resources.

I’ve frequently broken StackSets or had them fail to deploy. As we work with them I’ll try and steer you around the conditions which cause this.

Using StackSets for (or Against) Security

StackSets are extremely valuable for security because they enable us to consistently deploy security hooks and controls into accounts. For example we will use them to deploy the IAM roles we will use for security auditing and operations. If you use third-party tools, odds are you’ll use StackSets to deploy them into your accounts.

The danger is that, by default, they can do nearly anything and everything to all your accounts. If someone gets into your management account, or any account with delegated administration for StackSets, they can push CloudFormation throughout the org and build in backdoors, enable unwanted services, or even delete other stacks using your security tooling.

StackSets are a tool for privileged administration across multiple AWS accounts. Here are some rules of thumb to help reduce the security risks.

  • Use delegated administration, but carefully. Ideally to only a couple privileged accounts. These will likely be Ops and SecurityOps.

  • Within any account which has StackSet delegated admin, tightly control IAM.

    • I don’t tend to use self-managed permissions for StackSets, but I also haven’t been in a situation where I needed to limit what an ops (or other) team could do with them that I couldn’t do better with an SCP. That said, these situations exist, so keep that option in your back pocket.

  • StackSets can’t override your security invariants like Service Control Policies. So protect all critical shared resources, such as security IAM roles, using SCPs.

  • For ops teams using StackSets, restrict who on the team can deploy them, and ideally force those deployments to go through a deployment pipeline with security controls.

    • We haven’t really covered deployment pipelines yet, but that really should be how any IaC is managed. Darn, I think I need to add some labs on them sooner, rather than later.

I just think of StackSets as full admin for the org. To be honest, I haven’t seen self-managed permissions used very often, but those should be strongly considered, especially if you allow multiple teams to use them.

Key Lesson Points

  • CloudFormation StackSets enable you to deploy CloudFormation across multiple accounts and regions.

  • StackSets are a privileged administration tool, and by default they can perform any create action in any account which isn’t restricted using an SCP.

  • StackSets are very valuable for deploying security capabilities across your entire organization.

The Lab

This week we will enable delegated administration for StackSets to our SecurityOperations account. Then we’ll push our first Stack, which creates a role we will use later when we wire in our EventBridge rules.

So why are we delegating admin to SecurityOperations? Some of this is personal preference. For me SecurityOperations is where I run security tools and procedures which change things, including incident response support. This is my primary read/write AWS account, and SecurityAudit is my primary read/analyze account. Some orgs prefer to have a security tools account, an IR/SecOps account, an audit account, and maybe even an automation account.

For what we are doing with CloudSLAW I don’t need to show you every option, but I do want you to understand why I choose my options, so you can think through the problem in your own environment, with your own needs.

Video Walkthrough

Step-by-Step

Go to your Sign In Portal > Copy the account ID for SecurityOperations. You’ll need it to set up delegated administration:

Then: CloudSLAW > AdministratorAccess > CloudFormation > StackSets >Activate trusted access:

Then Register delegated administrator:

Enter your SecurityOperations account ID and Register delegated administrator again:

Now CLOSE THE TAB, and in your Sign in portal > SecurityOperations, click AdministratorAccess:

This next step will shock you: CloudFormation > StackSets > Create stack:

We’ll stick with the defaults on the first page but there’s one setting to highlight. Service-managed permissions will use a Service Linked Role in the management account to push the stack into accounts. This is the one that has full admin all the time. To use the automation capabilities (adding and deleting stacks when an account moves into and out of an OU) you need this option. For my most-trusted admins (e.g., high-level operations and security) this is what I use.

Self-service permissions means you have a role in the current account with cross-account access to whatever other accounts you are pushing the stack into. You need to have all those permissions set up, but you don’t always need full admin. This is great for empowering a team which might need to deploy some limited shared resources to a subset of your footprint. For example to push some kind of shared tooling like a CI/CD pipeline.

Paste in this URL for your template: https://cloudslaw.s3-us-west-2.amazonaws.com/slaw50.template > Next:

Yes, I messed up the name and it should be Lab50, but since I already shot the video you get to live with the pain of my inattention.

For the name use SecOpsEventForwaderRole and put in a good description like Role to forward events to the security operations team. Remember, this stack will be pushed into an account and will be visible in CloudFormation. Good names and descriptions will help the administrators of that account know what’s going on and who to contact if they have any questions.

Now a little trick. We are in our SecurityOperations account, and I build a parameter into the template so you can add your account ID. If you recall, in our last lab we created our Event Hub for security automation inside SecurityOperations. This template is the first part of pointing other accounts to feed their events here. To send events across accounts you need an IAM role, and that’s what our template is pushing. This is what it looks like:

AWSTemplateFormatVersion: '2010-09-09'
Description: "Creates IAM role for EventBridge cross-account event forwarding"

Parameters:
  SecurityEventBusAccountId:
    Type: String
    Description: "AWS Account ID where the central event bus is located"

Resources:
  EventBridgeForwardRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: SecurityEventForwarder
      Path: /SecurityOperations/
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: ForwardToEventBus
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action: events:PutEvents
                Resource: 
                  - !Sub arn:aws:events:us-east-1:${SecurityEventBusAccountId}:event-bus/central-security-bus
                  - !Sub arn:aws:events:us-west-2:${SecurityEventBusAccountId}:event-bus/central-security-bus

Outputs:
  RoleArn:
    Description: "ARN of the EventBridge forwarder role"
    Value: !GetAtt EventBridgeForwardRole.Arn

Notice the Parameters section at the top and both “!Sub” entries at the bottom? This is the first time we are using a CloudFormation Parameter. All we are doing is a basic variable substitution. This enables me to publish a template you all can use, even though you have different account IDs. You can get much fancier with these, but we are sticking to the basics.

Click in the upper-right corner and copy your account ID (you are in the SecurityOperations account where the Event Bus lives) > Paste into SecurityEventBusAccountID > Next:

Click the checkbox > Next:

On the next page we start with defaults, but I want to explain how they work. We are deploying an entirely new stack (a new CloudFormation template). You can also import a stack and add it to the set, which means multiple templates deploying together.

In terms of target we want this across our organization. This is not what we will do for our next round in a future lab. This role is for automation, and we want that capability everywhere. But the individual automation we will push out later, we only want in our Workloads > Production OU. This way you’ll get to see both options. Since we aren’t limiting our OUs, the auto-deployment options don’t matter — this is going everywhere. But you can see the options allow you to push a stack into an account when it goes into the OU, and/or delete the stack when it moves out. I used to use this a lot in my Pragmatic Cloud Incident Response class, until I exceeded the capabilities of StackSets. I’d move accounts into an OU to prep them for class, and remove them to clean things up.

But we DO need to specify regions and a couple other options. Under Specify regions, set US East (N. Virginia); also set Maximum concurrent accounts = 100, and Failure tolerance = 99, then Next.

What do these mean? We are deploying an IAM role, so we only need to push it once in one region (because roles are global). Then we are saying “deploy to up to 100 accounts at once” (we only have 6 now, but it’s good to be ambitious) and “allow the stack to keep running if up to 99 of them fail.” That setting is… specific to this lab. Usually I set this lower to stop problems before I create a big mess, but that isn’t really a concern today.

Then Submit:

This next page shows the progress/status of the StackSet. If you click the radio button it will swap you into a view where you can see the status of the StackSet in each account — a weird UI convention I’ve not seen used anywhere else in AWS:

This deploys very quickly, within a few minutes, but don’t get too excited since this is the fastest you’ll ever see. If you use lower concurrency and larger, more complicated stacks it can take much longer.

But that’s it — we now have our role in all our accounts, except the management account.

If you want, you can see what the stack looks like in a target account by going to CloudFormation > Stacks, and making sure you are in the region we deployed into (us-east-1).

That’s it! Next lab we’ll start getting into building and pushing out alert rules.

-Rich

Reply

or to participate.