Keep Private Subnets Private with VPC Endpoints

In this lab we will connect to an instance on a totally private subnet, without inbound or outbound Internet access, using a VPC Endpoint.

Prerequisites

The Lesson

In Run Our First Instance (FINALLY!) we set up a VPC, launched an instance in a private subnet, and then logged into the instance via the web console using Session Manager.

I mentioned that the way Session Manager works is that there is a software agent pre-installed in the instance, which connects with the service. That agent constantly polls the service to see if there is a connection request. If there is, and all the permissions line up, the little terminal window opens up in your browser.

I raced past a lot of the networking, so this week we’ll learn why that private subnet wasn’t totally private, and how we can use a cool capability known as a VPC endpoint to bend reality like a warp drive.

Connecting to AWS Services: NAT Gateways

In that instance lab our CloudFormation template configured a NAT Gateway for our private subnet. As a reminder, that allows outbound Internet access but not inbound access. Then our instance was in a security group which blocked all inbound access, but allowed all outbound access.

For Session Manager to work, our instance connects to its API endpoint. An API endpoint is an Internet destination for one or more API calls. Because AWS is a public cloud platform — available on the Internet to anyone with a credit card — all its API endpoints are on the Internet. Even when you use the web console, it’s making those API calls.

Our lab worked because the traffic was routed out through the NAT Gateway, onto the Internet, and to the Session Manager API endpoint (technically Systems Manager, the larger service where Session Manager lives). It looked like this:

Remember that NAT Gateways still use Internet Gateways to connect to the Internet

That’s all great, but some resources you don’t want to ever access the Internet. Especially if you deal with those pesky compliance auditors. In those cases we use a VPC Endpoint.

Private Connections with VPC Endpoints

I made a joke about auditors, but there are many good reasons you might want a private subnet. If a resource can’t talk to the Internet, that makes it much harder for attackers to do anything nefarious, even if they get access. Take the example of cryptomining: even if an attacker got you to run the mining software, it won’t work if it can’t talk to the Internet. But as you see above, with a fully private subnet without NAT or other connection to the Internet, you can’t talk to any AWS services — even S3.

AWS fixed this (for a fee) with VPC Endpoints — generically called service endpoints across cloud providers.

A VPC Endpoint is like adding a private network connection on your subnet to route inside AWS — without touching the Internet — to an AWS service’s API endpoint. Heck, through a capability called PrivateLink you can even publish your own services! (We’ll get there.) There are two flavors of VPC endpoints:

  • Interface endpoints add an Elastic Network Interface to your subnet. You may recall we discussed them a couple times in other labs; they are the software equivalent of the Network Interface Card in your computer. An ENI has an IP address, and a service accessible directly on your subnet.

  • Gateway endpoints do their magic right within DNS, and they only support S3 and Dynamo Database. Basically they perform routing table magic, so if you try to access s3.amazonaws.com the traffic magically routes to S3 within AWS, rather than traversing the Internet.

No Internet needed!!

You may ask, “How does Session Manager know to connect to the new IP address for the ENI instead of the one on the Internet?” Well, magic! By default we use Amazon’s domain name servers; they are configured to point ssm.amazonaws.com to the right address for the endpoint.

Service endpoints aren’t free (shocker). They cost $0.01 per hour and $0.01 for the first petabyte of data. This is cheaper than a NAT Gateway, but remember that VPC endpoints are specific to a service, so if you use a lot of services the costs can add up.

Today we will skip some of the nuance and complexity, but here’s a diagram from the AWS documentation showing how to use multiple endpoints for higher availability.

Lesson Key Points

  • All AWS services have API endpoints so people can connect to them over the Internet.

    • This means even private subnets need Internet access to communicate with AWS services.

  • VPC Endpoints create private connections to AWS services so you don’t need to allow Internet access.

    • Interface endpoints (the kind used for everything except S3 and Dynamo) create a network interface (ENI) on your subnet.

    • This network interface connects directly to the AWS service in the same region, keeping your traffic private and off the Internet.

    • If you don’t use the default Amazon DNS, you also need to modify your DNS entries to direct API calls to the service to the new network address.

The Lab

We will build on the past few labs, using the same basic VPC structure with two public subnets and two private subnets. However this week we won’t deploy a NAT Gateway or create a route in the routing table for our private subnet to access 0.0.0.0/0.

In other words our subnet will be totally private. No Internet access in or out.

To save time I also set the CloudFormation template to launch the instance we need for testing and assigned it the SSMClient IAM Role (via the instance profile — you remember that part, right?).

After we deploy the template, which should only take a few minutes, we will create the 3 VPC endpoints required for Session Manager. Yes, it’s a bit annoying and triples our costs, but that little SSM software agent needs to connect to all 3 of those API endpoints to work.

Then we’ll wait a few minutes for everything to settle, and connect to our private instance, hiding in a totally private subnet, via the AWS Console.

Video Walkthrough

Step-by-Step

Alrighty, here’s the big test! I’ll skip most of the CloudFormation screenshots since this should now be familiar. Log into your Sign-In Portal > TestAccount1 > AdministratorAccess. Then go to CloudFormation > Create stack. Use the following settings:

Then wait a few (3-5) minutes until it’s complete:

This created your VPC without a NAT Gateway, and launched an instance into private-subnet-1. Now go to VPC > Your VPCs > Resource map. Then click the route table connected to the private subnets:

Notice that it only has a route for local traffic. There is no route to the Internet:

I won’t waste your time trying to connect to your instance in that private subnet. Let’s jump right to Endpoints > Create endpoint (important note: endpoint services are something else — don’t go there):

We will follow this exact process three times in a row! I’ll show all the screenshots the first time, then just give you the settings for the next two, and show what it all looks like at the end. This is because the SSM Agent running in the instance needs to connect to 3 different API endpoints. As mentioned in the Lesson, we need a different VPC endpoint for each API endpoint.

Start with a name of ssm-1, keep the default radio button selecting AWS services, then under Services search for ssm and select com.amazonaws.us-west-2.ssm:

Then click the radio button to select the service and select the CloudSLAW VPC. I expanded the Additional settings in this screenshot to show you how the console will set up our DNS entries, so when anything in the VPC tries to connect to the ssm URL, it will redirect to the IP address of the VPC endpoint:

Now select the slaw-private-1 subnet from the dropdown in the us-west-2a Availability Zone. We are only setting up a single endpoint, which is bad for resiliency, but more than enough for running our cheapskate lab:

Remember how I said anything on a VPC requires a security group? Because a VPC Endpoint is based on an Elastic Network Interface (ENI) sitting on a subnet, we need to assign it a security group. I pre-created one with our template which allows inbound TCP port 443 (HTTPS), which is the protocol used by all AWS APIs. I also set it to allow all outbound access. It’s called SLAW-SLAWSecurityGroup, and you can double check by confirming the description says “inbound HTTPS”:

Skip the rest (it’s for a future lab), and Create endpoint:

Now CREATE TWO MORE ENDPOINTS! We need these for those two other parts of the ssm service (as a reminder ssm is Systems Manager, and Session Manager is part of it — ssm includes a bunch of other capabilities).

Use the following settings:

  • Create ssm-2 for the ssmmessages service.

  • Create ssm-3 for the ec2messages service.

When you are all done it should look like this. If you see Pending status, just keep hitting refresh until it’s Available: 

Now it’s time to test things out. Go to EC2 > Instances. Don’t try to connect yet!

Not only does it take a few minutes for all the changes to propagate, but sometimes we can help things along by rebooting the instance, which clears its DNS cache so the SSM software agent knows to connect to the new IP address. In my experience it catches up on its own, but not always right away.

Go to Instance state in the upper-right and Reboot instance:

Give it about 3 minutes, then Connect > Session Manager > Connect:

If you get an error or the Connect button is greyed out, just wait a minute and refresh the page. I ran through this half a dozen times prepping the lab, and sometimes it took a little longer but it always worked.

Pretty cool, eh? We have an instance in a totally private subnet without any Internet access, yet can still connect to it from our computer sitting at home. Not all AWS services support VPC endpoints but many do, and these days I haven’t had to do this for a service that didn’t.

Cleanup

These things cost $0.01/hr per endpoint, which means $0.03/hr for our 3, which is $0.72/day and, as you can see, could really blow our bill over the course of a week or month.

The steps are:

  • Delete the 3 VPC endpoints

  • Delete the CloudFormation stack

If you try to delete a VPC before deleting everything in it, that will fail. Our instance and all the other parts were created using CloudFormation, which is smart enough to delete things in the right order, but we need to get rid of those endpoints first:

Wait until they are totally deleted:

Now delete the CloudFormation stack:

Fini!

Lab Key Points

  • VPC Endpoints add an Elastic Network Interface (ENI) to a subnet and assign it an IP address.

    • By default (when using the console), AWS creates a DNS entry to route traffic for the selected service to the endpoint.

  • Some services require multiple VPC endpoints, since the underlying service uses multiple API endpoints.

    • You don’t get a discount, FYI. 😀 

  • Software agents and other tools may need a refresh to update DNS so they correctly route to the endpoint.

    • This happens when the TTL (Time To Live) of the DNS entry expires, but you can speed things up with a reboot.

-Rich

Reply

or to participate.