Creating a highly available ipsec tunnel at Amazon EC2/VPC using open source tools.
A while back I created a highly available ipsec tunnel from our colocation facility at RackSpace to Amazon EC2/VPC using open source utilities such as OpenSwan & Heartbeat/Pacemaker.
Amazon provides a HA solution out of the box using VPC but the problem with it is that it requires the your ipsec terminator on the colo side support BGP. Unfortunately our Cisco ASAs do not support BGP so it was either spend a bunch of money buying something that would support BGP or come up with an open source solution. Spending a bunch of money was not an option for us so I came up with a solution to our problems using our Cisco ASAs on the colo end and an openswan instance on the other side. Other people have done this but I've yet to see anyone that's done in a highly available way. We find that our EC2 instances will sometimes reboot themselves for no clear reason so only having 1 instance that could do this was not acceptable. This is also required when for some reason you need to take down that instance for an upgrade or a number of other reasons. We could not afford any downtime so a 2nd instance that could take over when needed was essential.
The trick was to use Amazon's new (released in late 2011) interface feature which allows you to attach and detach interfaces and their associated ips (internal and elastic) at will. We can then use Heartbeat to monitor the servers and promote (detach interface from other instance, attach to this instance & manipulate routing tables) servers based on the reachability of the other instance.
Multiple routing tables on each instance were required so that the system always maintained a reachable interface via eth0 regardless if they had the eth1 interface which is the interface all other NAT & ipsec traffic went through. This is done by applying scope to the routing tables based on where the traffic came in from.
I had to come up with a couple of custom scripts that use the ec2 command line tools to query and change interface location as well as change the routing tables on the individual box based on whether or not it had eth1.
Someday maybe I'll detail more of the specifics on how this was done but it is possible if anyone is out there looking.
