Applies To:

Show Versions Show Versions

Manual Chapter: Fail-Safe and Fast Failover
Manual Chapter
Table of Contents   |   << Previous Chapter   |   Next Chapter >>

Two features of high availability are fail-safe and HA groups. The remainder of this chapter describes these features.
Important: For information on implementing configuration synchronization, failover, and connection mirroring, see the BIG-IP® Redundant Systems Configuration Guide and BIG-IP® TMOS®: Implementations.
Fail-safe is the ability of a BIG-IP® system to monitor certain aspects of the system or network, detect interruptions, and consequently take some action. More specifically:
In the case of a redundant system, a device can detect a problem with a service or VLAN, and initiate failover to a peer device.
In the case of a single device (non-redundant system), the system can detect a problem with a service or VLAN, and take some action, such as restarting the service.
System fail-safe
Monitors the switch board component and a set of key system services.
VLAN fail-safe
Monitors traffic on a VLAN.
Note: Only users with either the Administrator or Resource Administrator user role can configure fail-safe.
To configure and manage fail-safe, log in to the BIG-IP Configuration utility, and on the Main tab, expand System, and click High Availability.
When you configure system fail-safe, the BIG-IP system monitors various hardware components, as well as the heartbeat of various system services, and can take action if the system detects a heartbeat failure.
You can configure the BIG-IP system to monitor the switch board component and then take some action if the BIG-IP system detects a failure.
Using the Configuration utility, you can specify the action that you want the BIG-IP system to take when the component fails. Possible actions that the BIG-IP system can take are:
You can specify the particular action that you want the BIG-IP system to take when the heartbeat of a system service fails. The following table lists each system service, and shows the possible actions that the BIG-IP system can take in the event of a heartbeat failure.
TMROUTED
(Single device only)
For maximum reliability, the BIG-IP system supports failure detection on all VLANs. When you configure the fail-safe option on a VLAN, the BIG-IP system monitors network traffic going through that VLAN. If the BIG-IP system detects a loss of traffic on the VLAN and the fail-safe timeout period has elapsed, the BIG-IP system attempts to generate traffic by issuing ARP requests to nodes accessible through the VLAN. The BIG-IP system also generates an ARP request for the default route, if the default router is accessible from the VLAN. Failover is averted if the BIG-IP system is able to send and receive any traffic on the VLAN, including a response to its ARP request.
For a redundant system configuration, if the BIG-IP system does not receive traffic on the VLAN before the timeout period expires, the system can initiate failover and switch control to the standby device, reboot, or restart all system services. The default action is Reboot.
Warning: You should configure the fail-safe option on a VLAN only after the BIG-IP system is in a stable production environment. Otherwise, routine network changes might cause failover unnecessarily.
Each interface card installed on the BIG-IP system is typically mapped to a different VLAN. Thus, when you set the fail-safe option on a particular VLAN, you need to know the interface to which the VLAN is mapped. You can use the Configuration utility to view VLAN names and their associated interfaces.
The BIG-IP system includes a feature known as fast failover. Fast failover is a feature that is based on the concept of an HA group. An HA group is a set of trunks, pools, or clusters (or any combination of these) that you want the BIG-IP system to use to calculate an overall health score for a device in a redundant system configuration. A health score is based on the number of members that are currently available for any trunks, pools, and clusters in the HA group, combined with a weight that you assign to each trunk, pool, and cluster. The device that has the best overall score at any given time becomes or remains the active device.
To configure and manage fast failover, log in to the BIG-IP Configuration utility, and on the Main tab, expand System, and click High Availability.
The fast failover feature is designed for a redundant configuration that contains a maximum of two devices in a device group.
Note: Only VIPRION® systems can have a cluster as an object in an HA group. For all other platforms, HA group members consist of pools and trunks only.
An HA group is typically configured to fail over based on trunk health in particular. Trunk configurations are not synchronized between units, which means that the number of trunk members on the two units often differs whenever a trunk loses or gains members. The HA group feature allows failover to occur based on changes to trunk health instead of on system or VLAN failure.
To summarize, when you configure the HA group, the process of one BIG-IP device failing over to the other based on HA scores is noticeably faster than if failover occurs due to a hardware or daemon failure.
A weight is a health value that you assign to each object in the HA group (that is, pool, trunk, and cluster). The weight that you assign to each object must be in the range of 10 through 100. The maximum overall score that the BIG-IP system can potentially calculate for a device is the sum of the individual weights for the HA group objects, plus the active bonus value. (For information on the Active Bonus setting, see Specifying an active bonus.)
Table 8.2 shows an example of how the system calculates a score for the device, based solely on the weight of objects in the HA group. In this example, the HA group contains two pools (my_http_pool and my_ftp_pool) and one trunk (my_trunk1). A user has assigned a weight to each object.
Available Members
31
(60% x 50)
20
(100% x 20)
23
(75% x 30)
Total device score = 74
On each device, the system uses each weight, along with a percentage that the system derives for each object (the percentage of the objects members that are available), to calculate a score for each object.
The system then adds the scores to determine a total score for the device. The device with the highest score becomes or remains the active device in the redundant system configuration.
Note that if you have configured VLAN fail-safe, and the VLAN fails on an active device, the device goes offline regardless of its score, and its peer becomes active.
For each object in an HA group, you can specify an optional setting known as a threshold. A threshold is a value that specifies the number of object members that must be available to prevent failover. If the number of available members dips below the threshold, the BIG-IP system assigns a score of 0 to the object, so that the score of that object no longer contributes to the overall score of the device.
For example, if a trunk in the HA group has four members and you specify a threshold value of 3, and the number of available trunk members falls to 2, then the trunk contributes a score of 0 to the total device score.
If the number of available object members equals or exceeds the threshold value, or you do not specify a threshold, the BIG-IP system calculates the score as described previously, by multiplying the percentage of available object members by the weight for each object and then adding the scores to determine the overall device score.
Tip: Do not configure the tmsh attribute min-up-members on any pool that you intend to include in the HA group.
An active bonus is an amount that the BIG-IP system automatically adds to the overall score of the active device. An active bonus ensures that the active device remains active when the devices score would otherwise temporarily fall below the score of the standby device.
A common reason to specify an active bonus is to prevent failover due to flapping, the condition where failover occurs frequently as a trunk member toggles between availability and unavailability. In this case, you might want to prevent the HA scoring feature from triggering failover each time a trunk member is lost. You might also want to prevent the HA scoring feature from triggering failover when you make minor changes to the BIG-IP system configuration, such as adding or removing a trunk member.
Suppose that the HA group on each device contains a trunk with four members, and you assign a weight of 30 to each trunk. Without an active bonus defined, if the trunk on device 1 loses some number of members, failover occurs because the overall calculated score for device 1 becomes lower than that of device 2. Table 8.3 shows the scores that could result if the trunk on device 1 loses one trunk member and no active bonus is specified.
30 30
30 30
23 30
(75% of 30) (100% of 30)
23 30
You can prevent this failover from occur i ng by specifying an active bonus value. In our example, if we specify an active bonus of 10 (the default value), the score of the active device changes from 23 to 33, thereby ensuring that the score of the active device remains equal to or higher than that of the standby device (30).
Although you specify an active bonus value on each device, the BIG-IP system uses the active bonus specified on the active device only, to contribute to the score of the active device. The BIG-IP system never uses the active bonus on the standby device to contribute to the score of the standby device.
Important: An exception to this behavior is when the active device score is 0. If the score of the active device is 0, the system does not add the active bonus to the active device score.
To decide on an active bonus value, calculate the trunk score for some number of failed members (such as one of four members), and then specify an active bonus that results in a trunk score that is greater than or equal to the weight that you assigned to the trunk.
For example, if you assigned a weight of 30 to the trunk, and one of the four trunk members fails, the trunk score becomes 23 (75% of 30), putting the device at risk for failover. However, if you specified an active bonus of 7 or higher, failover would not actually occur, because a score of 7 or higher, when added to the score of 23, is greater than or equal to 30.
You can prevent failover from occur i ng by specifying an active bonus value. Table 8.4, and the list that follows, show how configuring an active bonus for the active device can affect failover.
Device Score
(Trunk Score + Active Bonus)
Fail over?
Device 1 is active (initial state)
30 30
(100% of 30) (100% of 30)
40 30
Device 1 loses a trunk member
23 30
(75% of 30) (100% of 30)
33 30
Device 1 loses another trunk member
15 30
(50% of 30) (100% of 30)
25 30
Device 1 goes to standby. Device 2 goes active.
15 30
(50% of 30) (100% of 30)
0 10
15 40
Device 2 loses a trunk member
15 23
(50% of 30) (75% of 30)
0 10
15 33
Device 1 regains two trunk members
30 23
(100% of 30) (75% of 30)
0 10
30 33
To help understand Table 8.4, the row numbers in the left column of the table correspond to the explanations below:
1 Device 1 is active (initial state)
With all trunk members available on both units, and the active bonus configured, the active device (device 1) retains the higher device score and therefore remains active.
2 Device 1 loses a trunk member
The device score for device 1 is still higher than the score for device 2, due to an active bonus value of 10.
3 Device 1 loses another trunk member
With an active bonus of 10, failover occurs when 50% of the members are lost.
4 Device 1 switches to standby mode and device 2 becomes active
Once the active device (device 1) has failed over to device 2, the active bonus on device 1 no longer applies, thus reducing its score from 25 to 15. The active bonus on device 2 is then applied, increasing device 2s score from 30 to 40.
5 Device 2 loses a trunk member
If the active device (device 2) loses a trunk member, the score on device 2 is still higher than device 1 (with two unavailable members), due to the active bonus.
6 Device 1 regains two trunk member
Device 2 remains the active device even when one trunk member is unavailable, due to the active bonus.
Table of Contents   |   << Previous Chapter   |   Next Chapter >>

Was this resource helpful in solving your issue?




NOTE: Please do not provide personal information.



Incorrect answer. Please try again: Please enter the words to the right: Please enter the numbers you hear:

Additional Comments (optional)