In this article, we focus on advanced troubleshooting scenarios with middlebox appliances and SD-WAN in a global network on AWS.
Middlebox appliances and SD-WAN
A customer building a global network may wish to implement traffic inspection with their own network security appliances. Frequently this requirement is achieved through introduction of a transparent, middlebox appliance. AWS Transit Gateway allows administrators to steer traffic towards a Security VPC and through a network appliance within that VPC. Once inspected, traffic is forwarded to the destination, which could be in another VPC or on-premises network. Hence the term, middlebox. A box that sits in the middle and transparently inspects traffic. We illustrate a middlebox appliance placement in Figure 1 below.
SD-WAN is a popular option to connect remote sites over a private network. In many implementations, the private network is overlayed on top of the Internet through encrypted tunnels between SD-WAN appliances at remote sites and centralized SD-WAN hubs. AWS Transit Gateway can act as a hub and it natively integrates with a number of SD-WAN partners. Another option is to have SD-WAN headends running on EC2. With this option, AWS Transit Gateway becomes a router that moves traffic in and out of networks behind SD-WAN headends through the rest of the global network. Read more about possible architectures here.
With these two use cases in mind, we built a global network. It covers a variety of remote sites connectivity options and spans across multiple AWS Regions.
Figure 1: Logical diagram of a global network on AWS
Topology of our global network on AWS
A company that uses an AWS global network is likely to have presence in multiple geographic locations. In our case, we assumed offices are in North America, Australia, and Southeast Asia. Each remote office is connected via either AWS Direct Connect (DX), AWS Site-to-Site VPN (AWS S2S VPN), or using an AWS partner SD-WAN offering. We enabled Accelerated AWS S2S VPN with AWS Global Accelerator where needed. Connections to remote sites terminate at the regional network hub, an AWS Region. We also use AWS Transit Gateway’s ability to integrate network security appliances (middleboxes) to meet network security requirements.
To understand how routing is configured, we allocated hypothetical IP ranges to each network as follows:
Regional IP allocations:
- Australia: 10.0.0.0/12 and 192.168.0.0/16
- Southeast Asia: 10.16.0.0/12
- North America: 10.32.0.0/12
We use these IP ranges to allocate VPC CIDRs as well as CIDRs used by remote sites. We also make use of the 100.64.0.0/10 range for middlebox appliance VPCs. This range is not routed throughout our global network. Check a blog post on “How to integrate third-party firewall appliances into an AWS environment” for details on how to configure a middlebox appliance and associated VPC.
Following diagram demonstrates final network configuration with all subnets allocated and shows a sample AWS Transit Gateway route table configuration.
Figure 2: Network diagram of the hypothetical global network on AWS
These show the final result as viewed in AWS Transit Gateway Network Manager.
Figure 3: Geographic view within AWS Transit Gateway Network Manager
Figure 4: Topology view within AWS Transit Gateway Network Manager
Route Analyzer in AWS Transit Gateway Network Manager
To troubleshoot misconfigurations and other issues with our global network, we will use AWS Transit Gateway Network Manager events and Route Analyzer.
To get started with Route Analyzer, you must first create a global network in AWS Transit Gateway Network Manager. First, register all AWS Transit Gateways then define remote sites and devices. You are now able to start troubleshooting. Route Analyzer considerations:
- Route Analyzer considers routes in AWS Transit Gateway route tables only
- Route Analyzer does not analyze routes in VPC route tables or in customer gateway devices
- Support both IPv4 and IPv6 addresses
- Free to use as part of AWS Transit Gateway Network Manager
Now that we have our global network built and AWS Transit Gateway Network Manager is configured, let’s dive into the advanced use cases we’ve prepared.
Use case 1: On-premises connectivity to a VPC in remote AWS Region using a middlebox appliance
We have a AWS Transit Gateway connected to an on-premises network using AWS Direct Connect. We are using a middlebox appliance to inspect all the traffic. In this use case, we validate traffic going from on-premises network in Sydney to an application VPC in Oregon. Traffic goes across AWS Transit Gateway, peering through a middlebox appliance in Sydney and another in Oregon.
Figure 5: Route Analyzer Console
Once you click “Run route analysis” it will interactively prompt for potential middlebox appliances. You then select “Yes/No” from the drop down list.
Figure 6: Results of route analysis, prompting to confirm middlebox presence
Once route analysis is complete, the connection status, including the return path and middlebox appliance, is shown.
Figure 7: The path from source to destination and back
This use case shows how to use Router Analyzer to walk through a global network and get a true picture of the traffic path. In this example, traffic goes though a middlebox appliance four times to complete a roundtrip.
Use case 2: Remote Site connectivity via SD-WAN
In this use case, we analyze traffic originating from a remote site that is connected via SD-WAN to Singapore Region, where applications are hosted.
For remote sites where additional bandwidth is required and AWS Direct Connect is not an option, we scale throughput horizontally. This is done with Equal Cost Multi Path (ECMP) routing using multiple VPN connections. If multiple AWS S2S VPN are configured, Router Analyzer will detect this configuration and display it accordingly. As shown below, the return traffic path shows ECMP group.
Figure 7: ECMP is detected with AWS S2S VPN
Clicking on “ECMP group” provides further details on which VPNs (and tunnels) are used:
Figure 8: ECMP group details
If ECMP is misconfigured, or one of the VPN tunnels is down, only a single VPN attachment is listed. In the example that follows, we’ve shutdown one VPN tunnel and the Console only lists a single attachment for the return traffic.
Figure 9: A single VPN tunnel as opposed to an ECMP Group
Let’s take a look at a scenario where one of the remote sites is offline. For the purpose of this test, we’ve manually shutdown the VPN connection to our SD-WAN headend. This simulates a complete failure. We analyzed traffic from Applications VPC in Singapore towards the remote site.
Figure 10: Route analysis for an offline site office
The status is “Not connected.” A message is confirming there is no matching route for the destination available in the AWS Transit Gateway route tables.
Use case 3: Monitoring and observability
During the life cycle of our global network, many events will impact connectivity. This events could be due to hardware or links failure, power disruption and other environmental conditions.
AWS Transit Gateway Network Manager offers monitoring and events consoles that examine the current status of the global network. We can view, at a glance, the health of networks directly connected to our AWS Transit Gateways globally. We can also examine events with timestamps in the past. This makes it easier to triage and conduct root cause analysis.
Figure 11: Example of events detected by AWS Transit Gateway Network Manager