VNS3 HA Plugin

The VNS3 HA Plugin is a powerful solution that enhances the high availability capabilities of VNS3 appliances in cloud environments. It provides automatic failover vi cloud provider route table management to ensure the continuity of network services in the event of a VNS3 controller failure.

Key Features

  • Failover: The plugin monitors the health of VNS3 HA pair controllers and automatically updates the cloud provider’s route tables to redirect traffic to the healthy controller in case of a failure.
  • Automatic: Leveraging the cloud provider’s API and assigned IAM roles, the plugin can retrieve information about the VNS3 controllers and make necessary route table changes without manual intervention.
  • Health Checks: The plugin uses a combination of ICMP ping and API-based health checks to accurately determine the health status of the VNS3 controllers, reducing the chances of false positives.
  • Flexiblity: Configure the plugin to support various HA pair topologies, including primary/secondary, hybrid, public, and private VNS3 HA pairs.
  • Logging and Monitoring: Comprehensive logging and monitoring capabilities allow for effective troubleshooting and integration with existing monitoring systems.

Benefits

  • Minimize Downtime: By automatically redirecting traffic to the healthy VNS3 controller, the plugin minimizes downtime and ensures the seamless continuity of network services.
  • Improved Reliability: The automatic failover and self-healing capabilities provided by the plugin enhance the overall reliability of the network infrastructure.
  • Reduced Operational Overhead: The plugin eliminates the need architecting topologies with additional high margin services or for manual intervention during failover scenarios reducing the mean time to recovery (MTTR) and operational overhead.
  • Increased Efficiency: With the ability to manage route tables programmatically, the plugin streamlines the failover process and improves operational efficiency.

The VNS3 HA Container Plugin is a robust and scalable solution that ensures high availability and resilience for VNS3 appliances in cloud environments, providing peace of mind and enhanced network performance.

Functionality

VNS3 appliances are assigned IAM roles with a specific set of API permissions, allowing specific access to cloud information and route update actions. When a VNS3 controller (the term used for the appliance instances) detects its HA pair controller is unresponsive (via ICMP ping AND API 8000 ping - using AND logic), the cloud route tables are automatically updated to point to the healthy VNS3 controller.

This functionality ensures that network traffic is seamlessly redirected to the active and responsive VNS3 controller in the event of a failure or unresponsiveness of one of the controllers. By leveraging the cloud provider’s API and the assigned IAM roles, the VNS3 HA container plugin can make the necessary route table updates without manual intervention.

The use of both ICMP ping and API 8000 ping checks provides a more robust determination of the health status of the VNS3 controllers. The AND logic ensures that both checks must fail before triggering the failover process, reducing the chances of false positives.

This HA functionality aims to minimize downtime and ensure the continuity of network services by automatically adapting to failures in the VNS3 controller pair. It provides a resilient and self-healing network architecture for subnets and address ranges managed by the VNS3 appliances.

Technical Details

The HA container plugin leverages the cloud provider’s SDK and the assigned IAM role to retrieve information about the host VNS3 controller and its HA pair controller. The plugin periodically checks and stores the route table information that includes either the host VNS3 controller or the HA pair controller, using identifiers such as the AWS instance-id or eni-id.

The periodicity of these checks is configurable, allowing administrators to adjust the frequency based on their specific requirements and network stability. During each check, the plugin compares the current route table information with the previously stashed state.

If changes are detected in a healthy topology, where both the host and HA pair controllers are accessible, the plugin interprets these changes as planned or approved modifications. It logs the changes and updates the route stash and state files accordingly. This ensures that the plugin maintains an accurate representation of the intended network configuration.

On the other hand, if changes are detected in an unhealthy topology, where one of the VNS3 controllers is unresponsive, the plugin takes corrective action. It overrides the changes and updates the route tables to point to the healthy VNS3 controller. This automatic failover mechanism ensures that network traffic is redirected to the functioning controller, minimizing disruption to the network services.

The plugin’s ability to distinguish between planned changes in a healthy topology and unexpected changes in an unhealthy topology is crucial for maintaining the integrity and stability of the network. By automatically overriding changes in an unhealthy scenario, the plugin prevents misconfigurations and ensures that the network continues to operate according to the defined HA setup.

The stashing and periodic checking of route table information, along with the differentiated handling of changes based on the health status of the VNS3 controllers, form the core technical functionality of the HA container plugin. This enables the plugin to provide robust and automated failover capabilities for VNS3 appliances in various network topologies.

Important Note

Important Note: As of the latest version of the VNS3 HA container plugin, the functionality of updating VNS3 routes has been removed. This change is based on the fact that the VNS3 Routes functionality itself has evolved and matured to provide robust support for various HA pair configurations, regardless of the underlying network topology. With the enhanced VNS3 Routes capabilities, there is no longer a need for the HA container plugin to actively update the VNS3 routes. The VNS3 appliances themselves can now effectively handle the route management and synchronization between the HA pair controllers. This simplification in the plugin’s functionality brings several benefits:

  1. Reduced Complexity: By eliminating the need for the plugin to update VNS3 routes, the overall complexity of the HA setup is reduced. The VNS3 appliances can autonomously manage the routes, making the HA configuration more streamlined and easier to maintain.

  2. Improved Security: With the removal of VNS3 route updating functionality, the plugin no longer requires access to the host and HA pair controller API passwords or tokens. This enhances the security posture of the HA setup by minimizing the exposure of sensitive credentials and reducing the attack surface.

  3. Simplified Configuration: The elimination of VNS3 route updating responsibilities simplifies the configuration process for the HA container plugin. Administrators no longer need to provide the plugin with API credentials or worry about potential conflicts between the plugin’s route updates and the VNS3 Routes functionality.

  4. Seamless Compatibility:The removal of VNS3 route updating from the plugin ensures seamless compatibility with the latest VNS3 Routes capabilities. Administrators can leverage the full potential of VNS3 Routes without any interference or duplication of effort from the HA container plugin.

It is important to note that while the HA container plugin no longer updates VNS3 routes, it still plays a crucial role in monitoring the health of the VNS3 controllers and initiating the failover process by updating the cloud provider’s route tables when necessary.

Administrators should rely on the VNS3 Routes functionality to define and manage the desired routing configuration for their HA pair setup. The VNS3 documentation provides detailed information on how to leverage VNS3 Routes effectively in various HA pair configurations.

This update to the VNS3 HA container plugin simplifies the HA setup, enhances security, and aligns with the advancements in the VNS3 Routes capabilities, ultimately providing a more streamlined and robust high availability solution for VNS3 appliances.

Version 3 Requirements and Specifications

Supported Clouds

Version 3 of the HA Plugin is launched with support for AWS. Azure, Google and OCI support will be offered in later 3.x versions.

Supported VNS3 Versions

The plugin is compatible with VNS3 versions 5.x and later.

Prerequisites

VNS3 HA pair Controllers Setup

  • Deploying VNS3 controllers in the same VPC/VNET: The VNS3 HA pair controllers must be deployed within the same Virtual Private Cloud (VPC) or Virtual Network (VNET) in the cloud environment. This is necessary because the cloud route tables are specific to a VPC or VNET.

  • Configuring VNS3 Routes for HA pair configurations: VNS3 Routes should be properly configured to support the desired HA pair configuration. Refer to the VNS3 documentation for detailed instructions on setting up VNS3 Routes for your specific HA pair topology.

  • Access to cloud metadata: Each VNS3 controller requires access to the cloud provider’s metadata service to retrieve information about the instance, network interfaces, and other relevant details. Ensure that the necessary permissions and network access are in place.

  • API/IAM permissions for route table management: The VNS3 controllers must be assigned the appropriate API and IAM permissions to interact with the cloud provider’s services. The specific permissions required may vary depending on the cloud provider. Refer to the cloud provider’s documentation for details on configuring the necessary permissions.

  • Inbound access for health checks (ICMP and TCP port 8000): The VNS3 HA pair controllers must allow inbound access from each other on ICMP for ping-based health checks and TCP port 8000 for API-based health checks. Configure the necessary security group rules or network access controls to permit this traffic.

  • Network connectivity between VNS3 controllers and managed subnets: Ensure that the VNS3 HA pair controllers have reliable network connectivity with each other and with the subnets or address ranges they are managing. This connectivity is essential for the controllers to communicate, perform health checks, and handle the routing of network traffic effectively.

Architecture / Use-cases

Primary/Secondary Configuration:

In this use case, the VNS3 HA pair is configured to have the cloud route tables point all the relevant routes to a single VNS3 controller, designated as the primary controller. The other controller acts as a secondary or standby unit.

  • If the primary controller becomes unresponsive, the HA container plugin automatically updates the route tables to redirect traffic to the secondary controller.

  • When the primary controller recovers and becomes responsive again, the plugin can be configured to either maintain the current state with the secondary controller active or revert back to the original configuration with the primary controller handling the traffic.

This setup ensures that there is always a designated controller handling the network traffic, with the secondary controller ready to take over in case of a failure.

Hybrid Configuration:

In a hybrid configuration, the VNS3 HA pair is set up to have the cloud route tables distribute the routes between both controllers. Some routes point to the primary controller, while others point to the secondary controller.

  • If one of the controllers becomes unresponsive, the HA container plugin updates the route tables to redirect the affected routes to the healthy controller.

  • When the unresponsive controller recovers, the plugin redistributes the routes according to the original hybrid configuration.

This use case allows for controlling traffic during normal operations to use the “local”/“closest” VNS3 controller for reduced latency and costs (avoiding cross-AZ traversal), while still providing failover capabilities.

Public Or Private:

The VNS3 HA Container Plugin is designed to support both public and private VNS3 controller configurations. In a public VNS3 HA pair setup, the plugin monitors the health of the controllers with public IP addresses and manages the public route tables accordingly. Alternatively, in a private VNS3 HA pair configuration, the controllers are assigned private IP addresses and are not directly accessible from the public internet. In this case, the plugin monitors the health of the controllers with private IP address and the failover process by updates the private route tables within the cloud environment, ensuring that internal network traffic is redirected to the healthy controller.

Health Check

The VNS3 HA Plugin employs a robust health check mechanism to accurately determine the status of the VNS3 controllers and initiate failover actions when necessary. The health check process combines ICMP ping and API-based checks, ensuring a comprehensive assessment of the controllers' responsiveness.

Required Health Checks

  • TCP 8000 VNS3 API Call: The plugin performs a TCP connection attempt on port 8000 to the VNS3 controller’s API endpoint. This check verifies the availability of the VNS3 API and the controller’s ability to respond to API requests. The API call is made to the controller’s public or private IP address, depending on the plugin configuration file.

  • ICMP Ping: The plugin sends ICMP ping requests to the VNS3 controller’s public or private IP address. This check ensures that the controller is reachable on the network and can respond to basic network traffic.

The plugin uses an AND logic to evaluate the results of the required health checks. Both the TCP 8000 API call and the ICMP ping must succeed for the plugin to consider a VNS3 controller as healthy. If either fails, the HA pair is declared unhealthy and the failover process is initiated.

Optional Health Check

In addition to the required health checks, the plugin supports an optional health check mechanism:

Overlay IP TCP 8000 and Ping: If configured, the plugin can perform TCP 8000 API calls and ICMP pings to the VNS3 controller’s overlay IP address. This additional check can provide enhanced visibility into the controller’s health, particularly in scenarios where the overlay network connectivity may be impacted.

Configuration

Step 1: VNS3 Instance/VM Permissions

The VNS3 controller will require certain cloud permissions to be able to update the cloud route tables.

AWS permissions

VNS3 Controllers running in AWS will need a IAM Role assigned (AWS Documentation) with an appropriate IAM Policy to grant permissions to update the AWS VPC Route Tables. Below are some example policies: Allow-All, VPC-ID Limited, Route Table-ID Limited, and Tag Limited.

NOTE: While the ec2:ReplaceRoute action can be narrowed down by specifying conditions (limit action to a specific vpc-id, a set of route-table-ids, or tag key:value combination), the ec2:Describe* actions cannot be limited by a resource ARN and cannot be conditionally controlled.

Allow-All

No edits required, simply copy/paste into a policy, attache to a role and assign the role to the VNS3 controller running the HA Plugin.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeRouteTables",
                "ec2:ReplaceRoute"
            ],
            "Resource": "*"
        }
    ]
}
VPC-ID Limited

Replace REGION, ACCOUNT-ID, and VPC-ID with the relevant details for your deployment.

 {
     "Version": "2012-10-17",
     "Statement": [
         {
             "Sid": "VisualEditor1",
             "Effect": "Allow",
             "Action": [
 				"ec2:DescribeRouteTables",
 				"ec2:DescribeInstances"
 			],
             "Resource": "*"
         },
         {
             "Sid": "VisualEditor0",
             "Effect": "Allow",
             "Action": "ec2:ReplaceRoute",
             "Resource": "arn:aws:ec2:REGION:ACCOUNT-ID:route-table/*",
             "Condition": {
                 "ArnEquals": {
                     "ec2:Vpc": "arn:aws:ec2:REGION:ACCOUNT-ID:vpc/VPC-ID"
                 }
             }
         }
     ]
 }
Route Table-ID Limited

Replace REGION and ACCOUNT-ID with the relevant details for your deployment. Also replace ROUTE-TABLE-IDs with the rtb-ids associated with the VNS3 controllers in your deployment.

 {
     "Version": "2012-10-17",
     "Statement": [
         {
             "Sid": "VisualEditor1",
             "Effect": "Allow",
             "Action": [
 				"ec2:DescribeRouteTables",
 				"ec2:DescribeInstances"
 			],
             "Resource": "*"
		 },
         {
             "Sid": "VisualEditor0",
             "Effect": "Allow",
             "Action": "ec2:ReplaceRoute",
             "Resource": "arn:aws:ec2:REGION:ACCOUNT-ID:route-table/*",
             "Condition": {
                 "ForAnyValue:StringEquals": {
                     "ec2:RouteTableID": [
                         "ROUTE-TABLE-1-ID",
                         "ROUTE-TABLE-2-ID"
                     ]
                 }
             }
         }
     ]
 }
Tag Limited

Replace TAG-KEY and TAG-VALUE with the tag key and value respectively that are applied to the rtb-ids associated with the VNS3 controllers in your deployment.

 {
     "Version": "2012-10-17",
     "Statement": [
         {
             "Sid": "VisualEditor1",
             "Effect": "Allow",
             "Action": [
 				"ec2:DescribeRouteTables",
 				"ec2:DescribeInstances"
 			],
             "Resource": "*"
         },
         {
             "Sid": "VisualEditor0",
             "Effect": "Allow",
             "Action": "ec2:ReplaceRoute",
             "Resource": "arn:aws:ec2:REGION:ACCOUNT-ID:route-table/*",
             "Condition": {
                 "StringEquals": {
                     "ec2:ResourceTag/TAG-KEY": "TAG-VALUE"
                 }
             }
         }
     ]
 }

Azure permissions (not currently supported in version 3.0.0)

Permissions:actions:

  • Microsoft.Compute/virtualMachines/read
  • Microsoft.Network/networkInterfaces/read
  • Microsoft.Network/virtualNetworks/read
  • Microsoft.Network/publicIPAddresses/read
  • Microsoft.Network/routeTables/read
  • Microsoft.Network/routeTables/routes/read
  • Microsoft.Network/routeTables/routes/write

Scope:

  • Resource group for the VNS3 controllers

Step 2: Plugin Network Access

The HA Plugin requires network access such that it can send requests to the environment’s cloud, communicate with the HA Pair VNS3 controller via VNS3 API (TCP 8000) and ICMP ping (echo request/reply).

2.1: Cloud Network Security Groups

Configure the Cloud Provider Network Security Groups to allow the following between the VNS3 controllers in the HA Pair.

Required Traffic

  • TCP traffic on port 8000 (VNS3 API) via Public IPs or Private IPs depending on your architecture.
  • ICMP traffic via Private IPs

NOTE:

  • Optional health check via overlay network IPs are allowed by default when VNS3 controllers are Peered.

2.2: VNS3 Firewall Rules

The following rules need to be added to both the Primary and Secondary VNS3 Controllers to provide the HA Plugin with the appropriate network access. The following rules assume the HA Plugin is allocated using the 198.51.100.2 Plugin Network IP:

POSTROUTING -o eth0 -s 198.51.100.2/32 -j MASQUERADE-ONCE
FORWARD -s 198.51.100.2/32 -j ACCEPT
FORWARD -d 198.51.100.2/32 -j ACCEPT
INPUT -i plugin0 -d 198.51.100.2/32 -j ACCEPT
OUTPUT -o plugin0 -s 198.51.100.2/32 -j ACCEPT

Step 3: Upload and Allocate HA Plugin

The HA Plugin is available via the Plugin Catalog on all version 5 and later VNS3 Controllers with network access to the publicly available Cohesive Networks Plugin storage site.

Install the HA Plugin via the Plugin Catalog. Once the HA Plugin is installed and available, allocate an instance of the HA Plugin.

Step 4: HA Plugin Configuration File

You can configure the HA plugin via a configuration file or via environment variables. If using environment variables, you would prepend the variables with HAENV_. If using the configuration file edit via the Plugin Manager or directly on the Plugin at /opt/hacontainer/conf/vars.yml, you would use yaml format without the HAENV prefix.

Variables

  • cloud: aws (azure, google and oci support will be offered in later 3.x versions.)
  • sleep_time: number of seconds to wait in between health checks to the HA pair/peer controller. Default is 15s.
  • peer_underlay_ip: private IP of the peer/pair VNS3 controller
  • my_underlay_ip: private IP of the host VNS3 controller
  • peer_public_ip: public IP of the peer/pair VNS3 controller
  • my_public_ip: public IP of the host VNS3 controller
  • log_level: level of detail in the logs generated by the plugin. This variable uses Linux log level values - info, debug, and error. Defailt is info.
  • log: mnt

NOTE: you only need to specify host and peer/pair public IPs OR underaly/private IPs depending on the deployment architecture.

Configuring via Config file

Configuring the HA Plugin via Configuration File can be done by accessing the Plugin directly (via SSH and port forward VNS3 firewall rule) OR via the Plugin Manager available on VNS3 version 5 and later (recommended).

The configuration file is located at /opt/hacontainer/conf/vars.yml and should be in yaml format. Here is an example of a primary and secondary mode config file:

Configuring via the Environment

You can also configure the HA plugin via the environment by capitalizing the variable and prepending with HAENV_. Here is an example using environment variables:

HAENV_CLOUD=aws
HAENV_MY_UNDERLAY_IP=10.0.0.253
HAENV_PEER_UNDERLAY_IP=10.0.1.253
HAENV_PEER_OVERLAY_IP=100.127.255.253
HAENV_SLEEP_TIME=15
HAENV_LOG_LEVEL=debug
Tip: For more verbose logging you can set log level to debug with the variable log_level. Or vi _the environment with HAENV_LOG_LEVEL=debug_

Step 5: Restart the HA Plugin

Once Steps 1-4 are complete, restart the HA Plugin process via the Plugin Manager executables actions or restart the plugin in order for the updated configuration file to be used.

Automating Your Configuration

And that’s it! But you should only ever have to do this once. This configuration can be totally automated when deploying new network segments.

The API endpoints you would use are the following:

  1. Update your VNS3 firewall allowing plugin network access with POST /firewall/rules
  2. Update your VNS3 route table with POST /routes
  3. Upload the Plugin Image with POST api/container_system/images
  4. Start the Plugin with POST /api/container_system/containers (can pass environment variables)

Version 2.0.0 Plugin Modes

While it is recommended you use the latest version 3.x or later HA plugin, 2.x version of the plugin require slightly different coThe HA Plugin has two modes with the following operations depending on configuration parameters:

Primary Mode

  • During normal operation an HA Plugin in primary mode captures the Cloud Provider Route Table settings and stores any routes where the Primary VNS3 controller is the “target”/“next hop ip”.
  • When Primary recovers after an outage, the HA Plugin looks at the routes stored during normal operations and replaces any of those routes on the Cloud Provider Route Table that are pointing at the Secondary VNS3 Controller.

Secondary Mode

  • During normal operation an HA Plugin in secondary mode:
    • Captures the Cloud Provider Route Table settings and stores any routes where the Primary VNS3 controller is the “target”/“next hop ip”.
    • Captures the Primary VNS3 controller’s VNS3 routes and stashes them every N seconds depending on configuration settings.
    • Sends heartbeats on a specific and configurable periodicity to the Primary VNS3 Controller is down/unresponsive. The following are the configurable heartbeat checks
      • Public IP ping
      • Private IP ping
      • Overlay IP ping
      • Private IP VNS3 API call (get/config)
      • Overlay IP VNS3 API call (get/config)

Supported Connectivity

The HA Plugin is designed for specific highly available hybrid connectivity architectures. If you don’t see your specific use-case listed below, please contact our support team.

  • Active-Active BGP-over-IPsec (dynamic route-based IPsec VPN)
  • Active-Passive IPsec (static route-based IPsec VPN)
  • VNS3 Peering Mesh
  • VNS3 SecLink (federated multicloud network solution)

Architecture / Use-case

Below is an example Hybrid Cloud Connectivity architecture that can leverage the HA Plugin.

HA diagram

  1. Two VNS3 Controllers are running in a Cloud (e.g. AWS, Azure, etc.) and are connected via active-active BGP-over-IPsec VPN.
  2. During normal operations the Primary VNS3 controller is the route to the remote on-premises data center subnet/network.
  3. In the event the Primary fails, the HA Plugin running on the Secondary VNS3 Controller will update the Cloud route Tables
  4. The Secondary VNS3 Controller is the route to the remote network until the Primary VNS3 Controller recovers.

Variables

Primary Mode Variables

Primary mode accepts the following variables:

  • mode: primary
  • cloud: aws or azure
  • peer_public_ip: Public IP address of secondary VNS3 controller
  • sleep_time: number of seconds to wait in between checking to see if it is the primary and assuming all routes. default is 15.

Secondary Mode Variables

Secondary mode accepts the following variables:

  • mode: secondary
  • cloud: aws, azure or overlay (if overlay, only updates VNS3 routes)
  • peer_public_ip: Public IP address of primary VNS3 controller
  • my_underlay_ip: primary or secondary IP of secondary VNS3 controller
  • peer_underlay_ip: primary or secondary IP of primary VNS3 controller
  • peer_overlay_ip (optional): overlay IP address of the primary controller
  • my_api_password: secondary controller’s API password
  • peer_api_password: primary controller’s API password
  • sleep_time: number of seconds to wait in between checking to see if primary is down and assuming all routes. Default is 15.

Have any questions? Contact Cohesive Networks support. We take pride in our speedy and high quality support.