Stage Check: Org and IAM Foundation

It's time to step back again and provide some context about WHY we've built what we have before we move on to the next training block.

Rich Mogull
July 04, 2024 • Estimated Reading Time: 13 minutes

Prerequisites

Every lab to date. (Just kidding! It’s time to zoom out, and understand why we built things this way.)

The Lesson

This Stage Check will be a bit different. Yes, I’ll review a bunch of what we’ve done, but I want to focus on providing context and the bigger picture of why we built things this way. I also want to talk about why we haven’t done certain things I would normally do by now, because omission decisions are every bit as important as inclusion decisions.

Video Version (18 Minutes):

Our Objectives and Roadmap(s)

To be honest, this project evolved differently than planned. Originally I intended to bounce around topics more, but I received a ton of feedback that it would be more valuable to build out the fundamentals in basically the same order as I would if I were building a new AWS Organization from scratch.

CloudSLAW thus has two complementary objectives:

Teach people in-depth cloud and security, even if they don’t have any cloud or security background, and do it for free. I want more people in the world who are better at securing things; and I egotistically think I can break down the barriers and gatekeepers in the way.
Build a detailed roadmap for enterprise-scale AWS security best practices, with in-depth implementation guidance.

We are hopefully achieving these goals together, by building out the structure for a massive, enterprise-scale AWS organization. And doing it cheap. 😀

I obviously come to the table with a ton of experience and my own perspective. In the decade-plus I’ve been focused on cloud security, I’ve helped a lot of organizations through their cloud transitions, but there are four additional resources I check constantly to make sure we are on track:

Chris Farris’s Org Kickstart project on GitHub. Chris is a good friend and collaborator, and the first person I check with to stay honest. At this point we’ve implemented most of what he has in this Terraform, albeit with my own spin and some purposeful omissions.
Scott Piper’s AWS Security Maturity Roadmap 2021. I deviated more from this because for lab purposes, we need to work in a different order. Also, Scott is full time over at Wiz and can’t maintain it the same any more, but it’s still incredibly useful. (Also from 2021 is Rami’s Cloud Security Orienteering which is exceptional, but a bit more advanced).
The AWS documentation. No single link — I pull from a lot of different areas.
The Cloud Security Maturity Model (CSMM). Okay, this is a cheat because I wrote it. But that model encapsulates nearly everything I’ve learned about cloud security.

Why these labs in this order? The CSMM.

When Mike Rothman and I wrote the first version of the CSMM, we designed it to tell a story. What does the path to ‘good’ cloud security look like? After using it in multiple consulting/advisory engagements, we realized we had actually written a security framework. The CSMM is how I think about organizing a cloud security program.

So it’s time to tell the story of CloudSLAW, to date, using the CSMM.

You don’t need to read the CSMM, but at a high level it’s comprised of 3 domains, each including 4 categories. The Foundational domain has categories for Governance, Organization Management, IAM, and Security Monitoring. The Structural domain for Network, Workload, Data, and Application security. The Procedural domain includes Incident Response, Resilience, Risk Management, and Compliance.

Look familiar?

Before I get into details, there are some deliberate omissions in our labs. Not because these topics and technologies don’t matter, but because:

We won’t see how to use them until we have more running in our organization/accounts. For instance CSPM (for finding misconfigurations) and cross-account incident response access.
Cost. In a real job or engagement, I would enable Access Analyzer and just it running all the time. But that definitely increases costs, so I’m holding off.

Building the Foundation

Nearly every lab to date has focused on building a solid foundation for growth. To be honest, I nearly never get to build things this way, and spend more time going into companies (and government agencies) with messes that need fixing.

Let’s look at the foundational domain and why we set things up this way:

Governance

There are two sides to governance: technical and non-technical. We haven’t really covered the non-technical, since CloudSLAW is about hands-on labs. But when I get air-dropped into a company, this is where I start: what’s the org chart? Who is responsible for what? How are control objectives and controls defined and documented? If you think you might want a CloudSLAW-style governance training, let me know.

On the technical side we use Organization Management to enforce much of our governance, and that’s how I set things up in these labs. SCPs and IAM are the two core governance technologies we implemented, because they keep things operating the way we want.

Organization Management

Org management includes how we structure the hierarchy of our cloud footprint, and the aligned technical controls.

We built an AWS Organization’s hierarchy with multiple Organizational Units which align with business needs for several reasons:
- We can use Service Control Policies (SCPs) to enforce different rules based on the business needs and functions of the accounts. A term I skipped over is security invariants. These are rules enforced with technology, which define the boundaries of what can be done. We’ve locked out root accounts, restricted what regions are in use, and locked accounts so even an administrator can’t remove them from the org. As we progress we’ll enforce different policies on development and production workloads, and even on different applications, based on what they do and the data they hold.
- OUs can be used as IAM and SCP conditions. This enables us to write rules like “let accounts in the same business unit access this data.”
- Thanks to Identity Center, we can align IAM permissions with our org structure.
We introduced IaC, which we will use a lot more for creating accounts in the future. Using automation to create account factories is a very important way to ensure new accounts all share the same security baseline.
We set up relatively isolated accounts, using delegated administration for our shared services. We minimized use of the management account because it’s so powerful if someone compromises it, and broke out functions to better align with IAM and job roles. That’s why logging is in its own account, Identity Center is delegated to an IAM account, and Security Hub is delegated to the Security Audit account. This, my friends, is org management.
We turned on billing alerts for cost management. Yep, managing costs is part of governance.

As a reminder, we skipped CSPM. We’ll get to that after we deploy some workloads, and even automate creating misconfigurations to evaluate.

We have a solid hierarchy with minimal use of the management account, a good OU structure for aligning SCPs, and some shared central security controls.

Identity and Access Management

Most of our labs are on IAM. All our labs have some IAM. In cloud, it’s IAM all the way down. Thus my favorite quote:

All cloud security failures are IAM failures, and all IAM failures are governance failures.

-Me

We focused on implementing ‘good’ IAM from the start, and on learning foundational IAM principles:

We really hammered on securiting the management account. Especially the root user account. We also created a full admin user we can use in case anything goes wrong with federated identity.
We learned how to write IAM policies, including conditionals. This is a lesson we will come back to time and time again.
We created an IAM Role and learned how awesome roles are, because they get rid of static credentials and allow us to change personas (and thus entitlements) when we move between accounts and change what we are doing.
We learned about policy evaluation, and how organization-based, user-based, and resource-based policies all fit together (default deny, any deny overrides any allow, except sometimes a direct allow in a resource policy still allows).
We moved to federated identity and single sign on by turning on IAM Identity Center. This is exactly how you manage access in the real world, because it avoids static IAM Users. Why? In AWS IAM Users have long-lived static credentials, which you now know are the single largest source of cloud breaches.
- And we learned about permission sets and how they are translated into IAM roles when we deploy access to an account.
We enabled Multifactor Authentication (MFA) for everyone! Don’t you dare ever let someone access the cloud without MFA after reading my content!
We learned about permissions boundaries which we don’t often need, but which help out in some thorny use cases. This also helped us implement some least privilege, instead of giving everyone full administration rights.

I’m very happy with our IAM foundation. It’s very real, and hopefully I’ve managed to teach you some foundational aspects of IAM which are skipped in most security training (such as entities, identities, and personas) but extremely useful to understand. Especially when you start looking at other cloud providers and IT systems.

Security Monitoring

We’ve barely scratched the surface of this one, but already set up some great core monitoring:

We enabled CloudTrail, the single most important log source in AWS, and learned how to automate and centralize it across an AWS organization. This provides visibility into all the management plane activity in our organization.
We turned on GuardDuty for threat detection, then learned how to make sure it’s deployed across all accounts and regions, even when we activate new ones. We even used SCPs to ensure our GuardDuty region coverage is aligned with allowed regions. This has been a problem in real breaches. With GuardDuty we can potentially detect many common attacks.
We learned how to use Security Hub to consolidate cloud security events, and used SNS to forward those events to ourselves, including GuardDuty events. Alerting without alerts like having an invisible silent alarm. Why would you do that? I have, for real, assessed organizations which didn’t implement effective notifications, and missed things like GuardDuty alerts for active breaches. Is email how I handle real alerting in a big company? Nope, but it works well for labs.

Tying the room together

I like focusing on desired security outcomes. In our case:

We built an AWS Organizations hierarchy to support effective technical governance.
- We turned on billing alerts to manage costs, and also detect common abuses such as cryptomining.
We implemented organization-based policies (SCPs) to control what can happen in our accounts.
We consistently used delegated administration to minimize the risk of using our management account.
We built a strong foundation for federated identity management, with MFA to prevent credential abuse, using different roles (personas) to support least privilege. Even when one person is using different roles, but doesn’t need full admin rights all the time.
- As I keep saying, static credentials (IAM Users) are the single biggest source of cloud security breaches. We minimized them to the bare essentials.
We enabled organization-wide central security monitoring, which will auto-deploy as we add accounts. This is essential for detecting and investigating incidents.
We turned on threat detection for our entire org, and forwarded events to email so we know when to investigate and respond.

This is awesome. Is it everything I set up at the start of a new org? Duh, I already said it isn’t, but this is most of it. And we really honed in on the foundational aspects of AWS, and rather than all the additional tools we could enable or buy. This is a very solid starting point, especially considering that we are spending pennies per month.

What’s next?

We will run through more CSMM categories.

Our first big block will be on network security. It’s such a huge topic that we cannot cover it all, but we’ll hit all the foundational elements and set ourselves up to discuss more complex architectures.
Then we’ll hit workloads — specifically instances (virtual machines). We need to understand how they work in AWS before we can get into containers and serverless.
We covered a bit of data security, and will sprinkle more in, but I’m saving most of it for a big training block, after we get through network and workload.

That’s right — we are finally ready to start running things in our accounts!

I hope this overview helps. I try to avoid long narrative posts, but this is a major transition point in our journey. I think it’s important to show how we build these security programs within bigger frameworks, and that we aren’t just winging it.

And I hate to say it, but most places just wing it. They adopted cloud organically and didn’t establish this kind of secure foundation. If you work in one of those places, you know how hard it is to set this all up after the fact, and why a foundation like ours is essential for being scaling security.

-Rich

Reply

or to participate.