Manual Chapter : Redundant Pairs HA

Applies To:

Show Versions Show Versions

ARX

  • 6.3.0
Manual Chapter
15 
heartbeat (optional) clears only the heartbeat-related counters. This applies to heartbeats over the redundant-pair link (from show redundancy peer) as well as quorum-disk heartbeats (from show redundancy quorum-disk).
transition (optional) clears the counters associated with state/status transitions. This applies to all transition counters in all show redundancy ... commands.
network (optional) clears only the network counters, from the show redundancy network output. See clear counters redundancy network for details.
critical-services (optional) clears the counters associated with critical-services (from show redundancy critical-services).
any except ARX-VE
bstnA# clear counters redundancy
bstnA# clear counters redundancy heartbeat
clears only the heartbeat-related counters. This does not clear any transition-related counters.
bstnA# clear counters redundancy transition
Use the no form of the command to make a subnet non-critical.
critical route subnet mask
subnet (0.0.0.0-255.255.255.255) is the IP address of the subnet. This must be reachable through at least one static route; use show ip route to view all static routes.
mask (0.0.0.0-255.255.255.255) is the netmask, which identifies the network part of the subnet address.
any except ARX-VE
If a critical route fails on the current peer and the other peer has no failures, control fails over to the other peer. If the other peer has any failures that would ordinarily cause a failover (such as a major hardware fault), no failover occurs. This prevents unnecessary failovers.
The ARX tests for failure with regular ARP requests. Every 20 seconds, the ARX sends an ARP to the routes gateway. (The gateway is configured with the ip route command.) If the gateway fails to respond, the ARX waits an additional 20 seconds before asking the peer if it is possible to fail over. The ARPs continue indefinitely at 20-second intervals. If the gateway responds before the failover is initiated, the failover does not occur.
Critical routes (unlike critical shares) are not shared between redundant switches. This is because the switches may have different visibility into the network. To duplicate a critical route, issue this command on both peers.
bstnA(cfg-redundancy)# critical route 172.16.54.0 255.255.255.0
designates a class-C subnet as a critical route. If the ARX loses all routes to that subnet and its redundant peer has no serious issues, a failover occurs.
bstnA(cfg-redundancy)# no critical route 172.16.0.0 255.255.0.0
any except ARX-VE
2.
For ARX-500 only: configure a ron tunnel between the switches to carry the redundancy heartbeats.
4.
Define the quorum disk by using the quorum-disk command on both switches. Each peer writes its heartbeats to the same external-filer share, called a quorum disk, and reads the heartbeats from its peer. The quorum disk configuration must be identical on both switches.
The enable command then starts the initial rendezvous of the switches. Once this command is invoked on the peer switch, the rendezvous can proceed. The pair is joined after the rendezvous is complete.
The enable command invokes the initial-rendezvous process, which then starts the metalog (namespace log) resilvering process. Metalog resilvering is duplicating the metalog data from the active peer to the backup peer. The metalog data must be mirrored on both peers to ensure that the namespace software can fail over. You may want to monitor the resilvering process if the peers are separated by a long distance: you can use the show redundancy metalog command for this. If resilvering times out due to excessive latency, you can use the resilver-timeout command to reset the timeout value.
The rendezvous also fails if the private subnets of the peers match, or if one peers private subnet matches any private subnet in the others RON. The show redundancy history command indicates this problem if it occurs. To recover, use no enable to undo the redundancy configuration, use ip private subnet reassign on one of the switches, then try enabling redundancy again.
The switches continue to retry the rendezvous indefinitely. If you decide to abandon the pairing, you can use no enable to stop the switches from retrying. The no enable feature is disabled after the pair successfully forms.
The nsm binary-core-files command re-instates detailed core files from NSM processors. The core file is useful for diagnosing processor failures.
An ARX-500 requires nsm recovery before you can enable this feature.
Use the show nsm command to see the current setting for binary-core files.
bstnA(cfg)# nsm binary-core-files
Each NSM-processor core has a redundant peer that takes over in case of a failure. A recovered processor goes into a Standby state while its peer processor manages all network traffic for both. If the peer processor fails, all network traffic fails back to the peer that failed first. On the advice of F5 Support, you can use no nsm recovery to prevent NSM-processor recoveries on this ARX.
The nsm recovery command re-enables this ARXs NSM processors to recover from failures.
The show processors command shows all NSM processors that are in the standby state.
This command cannot run on an ARX-2000 chassis. An ARX-2000 contains four cores on a single chip, so it cannot benefit from the nsm recovery command. If a core fails on the ARX-2000, it remains in the Failed state while its peer core processes all of its traffic.
On an ARX-500, this must be enabled before you can enable nsm binary-core-files. The nsm binary-core files command enhances the diagnostic files produced by an NSM processor when it fails. It is enabled by default.
You can use the show nsm command to see the current setting for NSM-processor recovery and/or binary-core files.
prtlndA(cfg)# no nsm recovery
bstnA(cfg)# nsm recovery
The no nsm warm-restart command returns to the default: if any core fails on an NSM processor, the entire processor reboots.
A warm restart produces a core-memory file, visible with the show cores command. The core-memory files produced by this failure are smaller than those produced by a full NSM restart, but they contain enough information for F5 Engineering to analyze the failure. The size of the core-memory file is unaffected by the nsm binary-core-files command.
There are two internal counters, restart count and restart limit, that prevent repetitive warm restarts. The restart limit is 3, and the restart count applies to an entire NSM processor (or CPU). If core 1 fails on a four-core CPU, and then core 3 fails on the same CPU, the NSM CPU has a restart count of 2. If the restart count reaches the restart limit (3), the entire NSM CPU reboots and behaves as dictated by the nsm recovery setting.
You can use the show nsm command to see the current settings for NSM warm restarts, NSM-processor recovery, NSM binary-core files. To see the history of warm restarts on this system, use the show nsm warm-restart history command.
bstnA(cfg)# nsm warm-restart
Use the no form of the command to remove the redundant-peer configuration.
peer peer-address
peer-address (0.0.0.0-255.255.255.255) is one of the peer switchs management-IP addresses. This can be the out-of-band (MGMT) interface or one of the switchs inband (VLAN) management interfaces. For the ARX-1500 and ARX-2500, you must use the inband management interface defined for the redundancy link: define a special VLAN for the link (with vlan and members (cfg-vlan) for a standalone link, or with vlan (cfg-channel) for a channel), and use the interface vlan and ip address (cfg-if-vlan) commands to create the inband-management IP.
any except ARX-VE
A rendezvous occurs after you issue enable (cfg-redundancy) at the second switch. Each switch uses its peer-address to contact the other switch and exchange information for the join operation. It also uses this address for regular heartbeat exchanges.
Use the show redundancy peer command to see the current configuration for the peer, as well as heartbeat counters.
quorum-disk nfs-server:/export[path] {nfs2 | nfs3 | nfs3tcp}
nfs-server:/export[path] (1-1024 characters) selects an NFS export:
nfs-server is the IP address for the filer (for example, 192.168.70.65). This address must be on a server (proxy-IP) subnet (see ip proxy-address) or reachable through a gateway on that subnet (via static route: see ip route to create a static route).
export is the path to an NFS export on the server.
path (optional) is the specific directory to use.
nfs2 | nfs3 | nfs3tcp is a required choice; this is the NFS protocol to use for accessing the quorum-disk share.
quorum-disk \\cifs-server\share[\path] cifs [DOMAIN/]username spn spn
\\cifs-server\share[path] (1-1024 characters) is the syntax for a CIFS share.
cifs-server is the IP address for the filer (for example, 192.168.23.23). This address must be reachable, as described above for an NFS filer.
share is the specific share to use.
path (optional) is a path within the share.
cifs is a required keyword.
[DOMAIN/]username (1-1024 characters) is the username that the redundancy software can use to write to the CIFS share. If you use a short DOMAIN name, like medarch, you authenticate with NTLM or NTLMv2. If you use an FQDN for the domain, like medarch.org, you use Kerberos authentication.
spn spn (required for a Windows 2008 cluster, optional for other servers; 1-255 characters) is the Service-Principal Name (SPN) for the back-end server. You require a SPN to connect to a CIFS service on any Windows 2008 cluster.
path defaults to the top of the NFS export or CIFS share.
any except ARX-VE
If the quorum disk is an NFS export, it must be configured (at the filer) for synchronous writes. Use the sync option. We also recommend that you specify the no_wdelay option. CIFS shares do not have this configuration issue; they perform synchronous writes on request.
Each peer uses one of its in-band (VLAN) management addresses to communicate with the quorum disks filer. Use the interface vlan command to create an in-band management address for a particular VLAN. The switch can reach the quorum-disk filer through this interface if
the filer is reachable through a static ip route that goes through a gateway on the same VLAN as the management interface.
Use the show redundancy quorum-disk command to view the current configuration for the quorum disk, as well as some counters.
bstnA(cfg-redundancy)# quorum-disk 172.16.4.98:/lhome/qdisk1 nfs3
provB(cfg)# quorum-disk \\10.10.201.8\qd cifs BOSTONCIFS/juser spn svcA@BOSTONCIFS
Password: jpasswd
Confirm: jpasswd
any except ARX-VE
For a channel, use the vlan (cfg-channel) command to assign the channel to the VLAN.
For a single port, use the vlan command to create a new VLAN, then use members (cfg-vlan) to assign the single port to that VLAN.
Use the interface vlan to create a management-IP interface on the VLAN; this puts you into cfg-if-vlan mode.
From cfg-if-vlan mode, use the ip address (cfg-if-vlan) command to establish an in-band (VLAN) IP address. You later use this VLAN-management IP address to identify this ARX to its peer, as described below.
From the same mode, use redundancy (cfg-if-vlan) to designate the interface for exchanging metalog data and heartbeats.
From the same mode, use no shutdown (cfg-if-vlan) to enable the management interface.
Other platforms use a layer-2 connection for their redundancy link. After cabling the peers together, you use the redundancy protocol command on the links interface to designate it for use as this link. If you use multiple links in a channel, as recommended, you use the redundancy protocol (cfg-channel) command instead.
The cfg redundancy command brings you to cfg-redundancy mode, where you configure the parameters for creating a redundant pair. From this mode, you use the peer command to identify the redundant peer; for the ARX-1500 and ARX-2500, you must use the other peers in-band (VLAN) IP address at the other end of the redundancy link. You also use quorum-disk to identify an external-filer share to be used as a quorum disk. Repeat these steps at the peer switch, which must have redundancy parameters that agree. Once the parameters match on both switches, you enable the redundant pair with enable (cfg-redundancy) at each peer.
The enable command invokes the initial-rendezvous process, which then starts the metalog (namespace log) resilvering process. Metalog resilvering is duplicating the metalog data from the active peer to the backup peer. The metalog data must be mirrored on both peers to ensure that the namespace software can fail over. You may want to monitor the resilvering process if the peers are separated by a long distance: you can use the show redundancy metalog command for this. If resilvering times out due to excessive latency, you can use the resilver-timeout command to reset the timeout value.
For rare situations where network maintenance may cause unwanted failovers, you can use the suspend-failover command to suspend failovers for a short time. Use the no form of the command to lift the suspension when the maintenance is finished.
the copy command (copy ftp, copy {nfs|cifs}, copy scp, or copy tftp) to copy a full release file to the disk, and
the boot system command to unpack the release file on the disk.
bstnA(cfg)# redundancy
any except ARX-VE
bstnA(cfg)# redundancy force-active
The redundant peers exchange their metalog (namespace-transaction) data during initial rendezvous, or after a failover. This is called resilvering the metalog data. At some sites, the latency between peers is high enough that the resilvering process times out before the pair can form. On the advice of F5 Support, you can use this command to increase the time allowed for resilvering.
Use the no form of the command to reset the resilvering timeout to its default.
minutes (6-60) is the maximum number of minutes for resilvering. If this time expires before resilvering is complete, the redundant pair cannot form.
any except ARX-VE
A rendezvous occurs after you issue enable (cfg-redundancy) at the second switch. A failover occurs whenever the active switch fails and the backup switch takes control. You can use this command to increase the time allotted for resilvering metalog data during a rendezvous or a failover.
If resilvering times out, the redundancy software retries until it succeeds. This severely impacts system performance. You can use the show redundancy history command to see the results of resilvering (referenced as synchronization in that output), and determine whether or not it is repetitively timing out. You can use show redundancy metalog to monitor the resilvering process as it occurs.
Use the show redundancy resilver-timeout command to see current timeout value for resilvering.
provA(cfg-redundancy)# resilver-timeout 10
The show nsm command shows the current state of the NSM-maintenance features: processor recovery and binary-core files. For each of these features, this command shows the administrative setting as well as whether or not the setting is operational.
recovery | binary-core-files | warm-restart (optional) focuses the output on the state of a single NSM-maintenance feature. If you omit these, the output shows the state of all NSM features.
The output contains a separate table for each NSM-maintenance feature, Recovery, Binary Core Files, and Warm-Restart. Each table has one row per NSM, where each row has the following fields:
Proc identifies an NSM processor, in slot.processor format.
Status - Admin is the status that was last set for this feature.
Status - Operational is the features actual status.
You can use [no] nsm recovery to disable or re-enable NSM-processor recovery. You can also use [no] nsm binary-core-files to change NSM-processor core files to an ASCII format or a larger binary format. NSM recovery must be enabled before you can enable NSM binary-core files. Only change these settings on the advice of F5 Support. The final table concerns the [no] nsm warm-restart command; this allows a processor core to fail and recover independently, without rebooting the entire processor.
bstnA# show nsm binary-core-files
processor slot.processor (optional) specifies one NSM processor for which to view this history. If you omit this option, the output includes warm-restart history for all NSM processors.
slot (2 for ARX-4000; 1 for any other) is the slot number.
processor is the processor number. Use the show processors command to show all processors and their associated slot.processor IDs.
An NSM warm restart occurs when an NSM-processor core encounters a catastrophic software failure and reboots, without causing any other cores to reboot with it. A warm restart is only possible on a system where nsm warm-restart is enabled.
The output contains two tables, one that shows the most-recent recent warm restart and another that shows the current values for restart count and restart limit. The restart count is the sum of all restarts for a given CPUs cores, and the restart limit is the maximum number of restarts for each CPU. If a core fails after its CPUs restart count reaches the restart limit, the entire CPU reboots. The full-CPU reboot follows the rules set by the nsm recovery command.
Proc identifies the NSM processor, in slot.processor format. This identifies a specific core on the CPU.
CPU identifies the NSM-processor CPU, where each CPU typically contains multiple cores. This field identifies the CPUs with a letter, such as A, B, or C.
Restart Number is the most-recent restart number for this core. The full restart count for the CPU is the sum of all its cores Restart Numbers. As mentioned above, you use the nsm warm-restart command to make warm restarts possible for NSM cores.
Date/Time (UTC) is time stamp (if any) for the most-recent warm restart.
Slot is the slot number for the NSM processor.
CPU identifies the NSM-processor CPU, where each CPU typically contains multiple cores. This maps to the CPU field in the table above.
Restart Remaining is the number of warm restarts remaining for this NSM CPU. This count decreases by one every time one of the CPUs cores has a warm restart. It increases by one every time a warm restart ages by 24 hours.
Restart Limit is the total number of restarts allowed for each CPU.
bstnA# show nsm warm-restart history
prtlndA# show nsm warm-restart history
shows the warm-restart history for all NSM processors in the prtlndA chassis. See 15.2 for sample output., where two warm restarts occurred in the last 24 hours.
bstnA# show nsm warm-restart history
prtlndA# show nsm warm-restart history
any except ARX-VE
Node is 1 for the initial-senior switch, 2 for the initial-junior switch, or QD for the quorum disk. The asterisk (*) indicates the local node.
Switch/Quorum Disk identifies each node with a hostname or IP address.
Status is Up, Up,NoFovr, Down, Suspended, or Unknown for a redundant peer. The UpNoFovr status indicates that someone used suspend-failover to temporarily freeze the Active/Backup status of the peers and suspend failovers.
The status is Up, Up,NoHb, Pending, or Down for the quorum disk. Up,NoHb means that the quorum disk is up but not showing any heartbeats from the peer yet.
Role is Active, Backup, or Quorum. The senior peer is always Active, meaning it can run namespace software and virtual servers. The junior peer is Backup; it is a hot standby for the active peer.
Transitions: Total is the number of times that the Status has changed. This increments for each failover. If there has never been a failover, the Total is Never.
Transitions: Last (UTC) is the timestamp for the last status change, in Universal Coordinated Time (UTC).
prtlndA> show redundancy
any except ARX-VE
This command shows all flavors of show redundancy ... output, in the following order:
For details about the show redundancy all output, refer to the command descriptions for these individual commands.
prtlndA> show redundancy all
any except ARX-VE
Node is 1 for the initial-senior switch, 2 for the initial-junior switch, or QD for the quorum disk. The asterisk (*) indicates the local node.
Switch/Quorum Disk identifies each peer with a hostname or IP address.
Ballot Cast shows the most-recent seniority vote from each node:
Senior Switch is the switch that the node believes should be senior, and
Epoch is the epoch number that the node last recorded. All three quorum members keep a common epoch number that increments with each failover. If one node has a lower epoch number than the others, its vote for Senior Switch is discounted.
Each switch, as well as the quorum disk, stores an epoch number and the identity of the senior switch. After a switch failure, this information is exchanged as election ballots to determine which peer should have seniority. Ballots with higher epoch numbers carry more weight in the election. If two nodes agree but disagree with the third, the majority rules. The results determine which peer is senior.
None of the ballots are authoritative; use the show redundancy command to see which peer(s) is/are currently active. (The senior switch is always active, the junior switch is active after the initial rendezvous but backup after a failover.)
prtlndA> show redundancy ballots
any except ARX-VE
Type is quorum share, route, or meta-only. The quorum disk is a critical resource by default, and you cannot remove it from this list. To declare a namespace share as critical, use critical in gbl-ns-vol-shr mode. To create a critical route, use critical route in cfg-redundancy mode. A meta-only resource is a dedicated metadata share that is critical: use metadata share to configure a dedicated metadata share, and use metadata critical to make it a critical resource.
Service specifies the exact route or share. For a quorum disk, this shows the machine and path to the share. For shares, this shows the share in namespace~volume~share-name format. For a critical route, this shows the critical subnet in ip-address/subnet-length (CIDR) format.
Status is Up, Down, or Config. A Config status indicates that the critical service is configured but the Up or Down status has not been determined yet.
Transitions: Count is the number of times that the Status has changed.
Transitions: Last (UTC) is the timestamp for the last status change, in Universal Coordinated Time (UTC). If there has never been a failover, this shows Never.
Counters Last Cleared is the timestamp for the last time someone ran clear counters redundancy [critical-services].
prtlndA> show redundancy critical-services
any except ARX-VE
Recent History messages include:
prtlndB# show redundancy history
any except ARX-VE
In the case of a failure, you need to determine which peer has the desired license, and then activate or re-activate the license at the other peer. Run show active-license at each peer to see details about the licenses there. If the peer can connect to the Internet, you can use the license activate command to automatically activate the license there. Otherwise, you can use a manual activation method, as described in the documentation for the license create license-dossier command.
prtlndA> show redundancy license
Metalog (namespace transaction) data is mirrored between redundant peers during their initial rendezvous, and it is duplicated between the peers during normal operation. The duplication process is called resilvering. You can use the show redundancy metalog command to see the current state of the metalog-resilvering process between peers. This command is especially useful for monitoring the connection between redundant peers that are separated by long distances.
any except ARX-VE
On all platforms, the namespace software keeps metalog read/write statistics. You can use the show statistics metalog command to see these metalog-usage statistics from a namespace-software perspective.
Resilvering - the active peer is copying all of its metalog data to the backup peer during a rendezvous.
Peer Online - the metalog data was 100% duplicated after rendezvous, and now the active peer is sending metalog updates as they occur.
The next three fields only appear for the Resilvering state, which occurs during rendezvous:
Started is the start time for the rendezvous process.
Timeout indicates the total time allowed for resilvering. The resilvering process times out, sends an SNMP trap, and restarts if it exceeds this time limit. You can use the resilver-timeout command to reset this time limit.
Time Remaining is the estimated time left for resilvering. This estimate is based on the current data-transfer rate and the amount of data left to transfer.
Byte Count is the number of bytes of metalog data transferred to the backup peer. If the initial Resilvering process is still underway, this also shows the total size of the metalog data to be copied to the backup peer.
Retransmits counts the retransmissions of individual metalog packets. A retransmit occurs if an internal timeout passes before the packet is acknowledged by the backup peer. This field counts the total retransmits that occurred since the most recent start of the resilvering operation. A long latency between peers may increase the number of retransmits. This always displays 0 (zero) for the ARX-1500 or ARX-2500 when they are resilvering. These platforms use a different transmission mechanism for their metalog packet.
Latency shows the minimum, maximum, and average latency for sending packets of metalog data to the backup peer. These are measured in micro-seconds (us).
Data Rate shows the average megabits per second for the transfer of metalog data. This field only appears while resilvering is occurring.
nyc15> show redundancy metalog
prtlndA> show redundancy metalog
any except ARX-VE
Peer is the heading for peer-identification parameters:
Name is set by the hostname command at the peers CLI.
IP Address is the peers management IP that is chosen for the rendezvous. You can change this with the peer command.
Port is the peer port used for rendezvous. You can use an option in the peer command to change the port number.
Status is Active, Backup, Down, or Unknown..
Heartbeats are counters for the number of redundancy heartbeats sent and received. You can use the clear counters redundancy command to clear this counter.
Transitions are the number of changes in redundancy Status for this peer. You can use the clear counters redundancy command to clear this counter, too.
prtlndA> show redundancy peer
any except ARX-VE
Path is in machine:/path format (for an NFS export) or \\machine\path format (for a CIFS share). This is the external-filer share used as the quorum disk. From cfg-redundancy mode, use quorum-disk to reset this.
Protocol is the file-access protocol (nfs2, nfs3, nfs3tcp, or cifs) used to access the quorum-disk share. This flag is also set with the quorum-disk command.
If the protocol is cifs, the following fields appear to describe the CIFS options used to access the quorum disk. These are all options from the quorum-disk command:
QD User is the Windows username that the ARX uses as its identity when accessing the quorum disk.
User Domain is the Windows domain for the above username.
QD SPN is the Service Principle Name (SPN) for the quorum disks host server.
Status is Up, Pending, or Down. Pending indicates that the quorum disk is functional but the redundant pair is in the process of forming.
Heartbeats are counters for the number of redundancy heartbeats sent to the quorum disk by the current node, and received from the quorum disk by the current node. You can use the clear counters redundancy command to clear this counter.
Transitions shows the changes in Status for the quorum disk:
Count is the number of changes. This should be a low number; the only valid reason for a quorum-disk transition is a planned outage of the quorum-disk filer (such as a hardware upgrade).
Last is the time stamp for the most-recent transition.
Reason is the log message associated with the Status change.
You can use the clear counters redundancy command to clear the transitions counter, too.
Heartbeat Latency is a chart of latency measures over the past 24 hours. This is the latency (round-trip time) for heartbeat packets between the current peer and the quorum disk. Each row shows the latency measures in one hour:
Time Interval shows the start and end time for the hour, in local time.
[0-499] is a count of heartbeats with a latency of 0-499 milliseconds. This column should have by far the highest count of all of them.
[500-999] is the number of heartbeats that took 500-999 milliseconds to make a round trip. This is a long latency and should be uncommon.
[1-3999] shows how many heartbeats took 1 second to 3,999 milliseconds (almost 4 seconds). This latency should be extremely rare, if it ever occurs.
[No Response] is the number of heartbeats that were lost. This number should be 0, unless there is a planned outage for the quorum disk. If either peer reboots while one of them is disconnected from the quorum disk, they both may reboot simultaneously.
Heartbeat Latency Summary shows the percentage of heartbeats in each of the above time intervals.
If the Heartbeat Latency and Heartbeat Latency Summary tables indicate long latencies, choose a faster quorum disk. You can run the quorum-disk command on both peers to change the quorum disk.
prtlndA> show redundancy quorum-disk
any except ARX-VE
Version is the software version running at the time the reboot was issued.
Time of reboot is a timestamp from the beginning of the reboot.
Message is the message that appeared on the Console to announce the reboot.
prtlndA# show redundancy reboot-history
Initial rendezvous and standard managed-volume processing involves duplicating the metalog (namespace transaction) data from the active peer to the backup. This process is called resilvering. Use this command to show the maximum time allowed for resilvering. If this time expires before resilvering is complete, the redundant pair cannot form.
any except ARX-VE
The output is a single value, the timeout allowed for resilvering. You can use the resilver-timeout command to reset this. The show redundancy metalog command shows the current state of the resilvering process.
prtlndA# show redundancy resilver-timeout
force (optional) is only necessary when the peer switch is disabled or otherwise unreachable. This causes the current switch to assume the active role in the pair, whether or not it previously had the active role. This is similar to the behavior from the redundancy force-active command.
Important: If you use the force option on the backup peer when the other peer is still active and connected, this command creates a split-brain condition.
any except ARX-VE
During a period of failover suspension, both peers continue to record conditions that would typically cause failovers and/or failover-related reboots. These appear in the syslog (use show logs syslog to view the syslog), and some result in SNMP traps (see the ARX SNMP Reference for a full list of traps). Traps and logs also appear for any suppressed failovers or reboots.
As stated above, you should only use the force option if the peer is disabled or otherwise unreachable. If the peer is reachable and active, the force option may cause both peers to take the active role and work at cross purposes. This is called a split-brain situation, and it can create serious issues that are very difficult to repair. The CLI prompts for special confirmation before lifting suspension in this way.
bstnA(cfg-redundancy)# no suspend-failover
bstnA(cfg-redundancy)# no suspend-failover force