Networking in AWS is one of the most critical points, and we often do not give it enough importance. In the end, it is still something derived from the OnPrem world, and often, we see it as something Legacy. It is also true that many solutions applied at the networking level are derived from interconnectivity with the OnPrem world and can become limiting in our evolution towards being Cloud natives.

But sometimes, we can add certain features to these solutions that can help us, and today, we will see one of these solutions. Before we begin, we must put a small disclaimer:

This solution is only valid for some use cases. Implementing this solution can add more complexity and cost than a decentralized solution.

In this post, we will see how to centralize the Egress of our VPCs to go out to the Internet in a homogeneous way in all the accounts without having to deploy a Nat Gateway in each account.

Why Centralize the Egress?

There are several reasons why we may be interested in centralizing the Egress. Sometimes, it may be to gain some control, but it may become an exciting solution to save costs.

If we centralize the egress in a single account and VPC, we only have one control point, so we can implement controls to secure our access to the Internet and thus limit data exfiltration or even mining from our accounts.

We can also do this in a decentralized way, and we can implement controls using GuardDuty and generate self-remediation based on the events we detect. However, controlling multiple egress can be complicated and generate additional costs.

The other important point may be cost savings, but here I will tell you that it will cause some discussions.

In the case of a decentralized egress, we will need each account to exit through the NAT Gateway, which has a cost. Internet access may be essential to our architecture. In that case, we should have a NAT Gateway per AZ since NAT Gateways are zonal resources.

Region Cost/Hour Cost/Month Cost/Month HA
us-east-1 $0,045 $32,85 $98,55
eu-west-1 $0,048 $35,04 $105,12
sa-east-1 $0,093 $67,89 $203,67

Note: sa-east-1 is probably the most expensive region in AWS, so it is in this table with eu-west-1 (The Most common region in Europe) and us-east-1 (The top region in the USA).

We would have to add the NAT Gateway processing cost of $0.05 per GB.

If we opt for a single NAT Gateway in an AZ, we must consider the cost per cross-AZ. Since some instances are in a different AZ than the Nat Gateway we will use, we have an extra charge of 0 $.02 per GB.

NAT Gateway can be expensive, but if we want to centralize the egress, we have to use Transit Gateway, which also has a cost. Still, it gives us other functionalities without additional cost per AZ.

Region Cost/Hour Cost/Month
us-east-1 $36,50 $36,50
eu-west-1 $0,048 $36,50
sa-east-1 $0,093 $67,89

Also, we need to add the Transit Gateway data processing costs, which are $0.02 per GB processed, and the cost of having a VPC with a centralized NAT Gateway.

We will see several cost examples, using the AWS calculator, with different numbers of accounts and GB processed to analyze it.

With 10 VPCs and 1 TB processed:

Region Decentralized Decentralized HA Centralized
us-east-1 $388,25 $1.031,70 $530,13
eu-west-1 $413,25 $1.100,40 $539,76
sa-east-1 $787,75 $2.131,80 $976,39

With 10 VPCs and 100 TB processed:

Region Decentralized Decentralized HA Centralized
us-east-1 $6.301,83 $5.593,50 $7.119,55
eu-west-1 $6.630,93 $5.966,40 $7.433,32
sa-east-1 $11.567,43 $11.559,90 $12.431,87

With 100 VPCs and 1 TB processed:

Region Decentralized Decentralized HA Centralized
us-east-1 $3.344,65 $9.900,00 $3796,63
eu-west-1 $3.566,65 $10.560,00 $3806,26
sa-east-1 $6.897,65 $20.463,00 $6870,89

With 100 VPCs and 100 TB processed:

Region Decentralized Decentralized HA Centralized
us-east-1 $9.258,33 $14.463,00 $10.404,55
eu-west-1 $9.784,33 $15.426,00 $10.718,32
sa-east-1 $17.677,33 $29.889,00 $18.344,87

With 500 VPCs and 1 TB processed:

Region Decentralized Decentralized HA Centralized
us-east-1 $16.483,65 $49.320,00 $18.414,63
eu-west-1 $17.583,65 $52.605,00 $18.424,26
sa-east-1 $34.053,65 $101.925,00 $33.168,89

With 500 VPCs and 100 TB processed:

Region Decentralized Decentralized HA Centralized
us-east-1 $22.400,33 $53.880,00 $25.006,55
eu-west-1 $23.800,33 $57.480,00 $25.320,32
sa-east-1 $44.835,33 $111.360,00 $44.626,87

As we can see in the different use cases, the price varies if we use HA in a decentralized model and data consumption since this can significantly penalize our costs.

A Decentralized solution without using the Nat Gateways in HA is a little cheaper than Centralizing. Still, when we need HA at the Nat Gateway level, the difference skyrockets in favor of the decentralized solution.

Hence, at the beginning of the post, we had a small disclaimer since this solution is not valid for some use cases.

If your use case is simple, without many accounts, without the need for HA, you probably do not need to use this type of solution.

But if we already have to deploy a Transit Gateway because we require connectivity with On-Prem in our VPCs or interconnectivity between them, in this case, the Transit Gateway attachment must exist. Therefore, that cost would not count in our analysis. In that case, this solution wins a lot of points.

On the other hand, NAT Gateway is a perfect solution, but it penalizes high networking consumption. Deploying another solution is only sometimes recommended since the operation team must maintain the solution. But we have a large number of accounts and VPCs. In that case, we can study other solutions using NAT Instances like alterNAT, or including an appliance of a Vendor if we require it for different reasons, since when using an Internet Gateway directly, these solutions have a much lower networking cost (less than half)

We usually do not recommend these solutions because they are more complex to manage and have a lot of operational load, which penalizes us significantly in decentralized models. Still, in a centralized solution like this, these restrictions change. In this case, we can prioritize cheaper solutions, even if they are more complex, because they will serve many projects, and we distribute the operational cost over many projects.

If we also need to filter outgoing traffic to block specific contexts or block connectivity with malicious addresses, we can use a service like Network Firewall.

Deploying this service in a decentralized model has drawbacks because it is more expensive (it requires one endpoint per Egress in each AZ), and the management could be more straightforward. While in a centralized model, it is cheaper and easier to operate.

The architecture of Centralized Egress

Arquitectura de Egress centralizado

The architecture consists of using a Transit Gateway connected to all the VPCs in such a way that it routes the exit to the internet through an Egress VPC.
In this Egress VPC, we deploy a Network Firewall so that all traffic to the Internet goes through this Firewall and thus can be filtered.

It is a solution in which, if we want, we could change some parts, so we can replace the Network Firewall and Nat Gateway with a Firewall Appliances solution of any Vendor or replace Nat Gateway with a NAT Instances solution like alterNAT.

If you are unfamiliar with alterNAT, look at it since it is a solution developed by the Chime engineer team that allows us to replace NAT Gateway with this solution that will deploy NAT Instances unattended without requiring additional management.

Now, let's look at the configuration in a little more detail.

VPCs

The configuration within each VPC is quite simple, being necessary to deploy a Transit Gateway Attachment in the AZs where we have our Subnets.

It is unnecessary to deploy the Attachment in the subnets we will route. We can deploy in any subnet, but we will deploy in one of these or a dedicated subnet for organization and coherence.

At the routing level, we add a route to route all traffic through the Transit Gateway so that all internet traffic is routed through the Transit Gateway. We can add specific routes to other destinations if we want them.

The following example would be valid for a VPC with private and public subnets, routing through the Transit Gateway only the private subnets (since the public ones have to use an Internet Gateway):

VPC con subnets privadas y públicas

Transit Gateway

Transit Gateway is responsible for routing traffic to the Egress VPC, so generating the Transit Gateway Route Tables is necessary.

If we want to isolate the environments, we can create a route table for each VPC or a route table for each environment that includes several VPCs or even use only one table to have routing between all our VPCs.

Additionally, there must be a route table with all the VPCs so that the Egress VPC can reach the VPCs that will use the centralized Egress.

Tabla de rutas con todas las VPCs

In this diagram, we can see an example in which VPC A and VPC B share a routing table, having routes to themselves, to OnPrem using Direct Connect and to the Egress VPC, but have a BlackHole that blocks the routing to the VPCs C and D.

VPC C and D also route to each other, to OnPrem and the Egress VPC, but also have a BlackHole with VPC A and B.

The Egress VPC has routing against all VPCs. This configuration is necessary because the returned traffic route will use this route table, requiring visibility of all VPCs.

This way, we have two environments isolated from each other that use the VPC Egress to access the Internet.

Egress VPC

Within the Egress VPC, we will deploy the NAT Gateway and the Network Firewall.

In this case, we have a Subnet dedicated to the Transit Gateway Attachment. We need this because AWS does not recommend deploying anything in the Network Firewall Subnet since it only inspects elements that cross the subnet and not that have the origin or destination of their subnet.

Each part will be deployed in its subnet with a routing table in each AZ to direct the traffic from the Transit Gateway Attachments to the Network Firewall Endpoints and from these to the NAT Gateways and perform the reverse path for responses from the Internet.

The following diagram shows this configuration:

Diagrama VPC

We can deploy the Network Firewall rules to filter the traffic.

Network firewall has many rules we can apply, both managed and custom rules, and we can even import Suricata-compatible stateful rules.

Suritcata is an open-source cybersecurity project the OSIF (Open Information Security Foundation) maintains. If we use Appliances, each Vendor has different deployment models, but they are similar. However, activating the appliance mode when deploying the Transit Gateway Attachment of the Egress VPC is necessary if we use them.

Conclusions

As mentioned, it is only a valid solution for some use cases. Still, it is engaging in models that require more centralized solutions or in case we already have Transit Gateway deployed for interconnectivity of VPCs or OnPrem.

It is a solution that can be generated via IaC without problem, even in the part that involves the deployment of VPC of the accounts to generate the attachments automatically.

With this solution, we centralize all outbound traffic to the internet, and we could also add a security layer that we don't have with a NAT Gateway.

At a cost level, it may need to be more competitive in small multi-account environments. Still, in large multi-account environments, it can reduce our bills considerably.

It is a solution that adds complexity by centralizing resources and requiring a specific operation that manages each project in a decentralized model. But you have to think that in large or medium-sized multi-account environments where control of egress to the Internet can be critical, this solution. However, it adds complexity and improves the level of control and security.

Tell us what you think.

Comments are moderated and will only be visible if they add to the discussion in a constructive way. If you disagree with a point, please, be polite.

Subscribe

We are committed.

Technology, people and positive impact.