Applies To:

Show Versions Show Versions

sol13946: Troubleshooting ConfigSync and device service clustering issues (11.x)
TroubleshootingTroubleshooting

Original Publication Date: 12/28/2012
Updated Date: 04/01/2014

This article applies to BIG-IP 11.x. For information about other versions, refer to the following article:

Purpose

You should consider using this procedure under the following conditions:

  • The BIG-IP system experiences device service clustering (DSC) issues.
  • The BIG-IP system experiences configuration synchronization (ConfigSync) issues.

Description

F5 introduced the DSC architecture in BIG-IP 11.x. DSC provides the framework for ConfigSync, and other high-availability features, such as failover for BIG-IP device groups.

Note: The DSC technology is also referred to as centralized management infrastructure (CMI).

This article provides steps to troubleshoot ConfigSync and the underlying DSC components. DSC and ConfigSync include the following elements:

DSC components

DSC provides the foundation for centralized management and high-availability features in BIG-IP 11.x, including the following components:

  • Device trust and trust domains

    Device trust establishes trust relationships between BIG-IP devices through certificate-based authentication. Each device generates a device ID key and Secure Socket Layer (SSL) certificate upon upgrade or installation. A trust domain is a collection of BIG-IP devices that trust each other, and can synchronize and fail over their BIG-IP configuration data, as well as regularly exchange status and failover messages.

    When the local BIG-IP device attempts to join a device trust with a remote BIG-IP device, the following applies:

    • If the local BIG-IP device is added as a peer authority device, the remote BIG-IP device presents a certificate signing request (CSR) to the local device, which then signs the CSR and returns the certificate along with its CA certificate and key.
    • If the local BIG-IP device is added as a subordinate (non-authority) device, the remote BIG-IP device presents a CSR to the local device, which then signs the CSR and returns the certificate. The CA certificate and key are not presented to the remote BIG-IP device. The subordinate device is unable to request other devices to join the device trust.
  • Device groups

    A device group is a collection of BIG-IP devices that reside in the same trust domain and that are configured to securely synchronize their BIG-IP configuration and failover when needed. Device groups can initiate a ConfigSync operation from the device group member with the desired configuration change. You can create two types of device groups:

    • A Sync-Failover device group contains devices that synchronize configuration data and support traffic groups for failover purposes.
    • A Sync-Only device group contains devices that synchronize configuration data, but do not synchronize failover objects and do not fail over to other members of the device group.
  • Traffic groups

    A traffic group represents a collection of related configuration objects that are configured on a BIG-IP device. When a BIG-IP device becomes unavailable, a traffic group can float to another device in a device group.
  • Folders

    A folder is a container for BIG-IP configuration objects. You can use folders to set up synchronization and failover of configuration data in a device group. You can sync all configuration data on a BIG-IP device, or you can sync and fail over objects within a specific folder only.

CMI communication channel

When the DSC components are properly defined, the device group members establish a communication channel to accommodate device group communication and synchronization. The CMI communication channel allows the mcpd process that runs on the device group member to exchange Master Configuration Process (MCP) messages and commit ID updates to determine which device has the latest configuration and is eligible to synchronize its configuration to the group. After the ConfigSync IP addresses are defined on each device, and the device group is created, the devices establish the communication channel, as follows:

  1. The local mcpd process connects to the local Traffic Management Microkernel (TMM) process over port 6699.
  2. The local TMM uses an SSL certificate (/config/ssl/ssl.crt/dtca.crt) to create a secure connection to the peer TMM using the ConfigSync IP address and TCP port 4353.
  3. The peer TMM translates the port to 6699 and passes the connection to the peer mcpd.
  4. Once the connections are established, a full mesh exists between mcpd processes for devices in the trust domain.
  5. If a device fails to establish the connection for any reason, the local mcpd process attempts to re-establish the connection every 5 seconds.

ConfigSync operation

ConfigSync is a high-availability feature that synchronizes configuration changes from one BIG-IP device to other devices in a device group. This feature ensures that the BIG-IP device group members maintain the same configuration data and work in tandem to more efficiently process application traffic. The ConfigSync operation is dependent on the DSC architecture and the resulting communication channel.

The BIG-IP system uses commit ID updates to determine which device group member has the latest configuration and is eligible to initiate a ConfigSync operation. The configuration transfers between devices as an MCP transaction. The process works as follows:

  1. A user updates the configuration of a BIG-IP device group member using the Configuration utility or the tmsh utility.
  2. The configuration change is communicated to the local mcpd process.
  3. The mcpd process communicates the new configuration and commit ID to the local TMM process.
  4. The local TMM process sends the configuration and commit ID update to remote TMM processes over the communication channel.
  5. The remote TMM process translates the port to 6699 and connects to its mcpd process.

Automatic Sync

If you enable the Automatic Sync feature for a device group, the BIG-IP system automatically synchronizes changes to a remote peer system's running configuration, but does not save the changes to the configuration files on the peer device. This behavior is by design and recommended for larger configurations to avoid a long ConfigSync duration due to large configurations.

In some cases, you may want to configure Automatic Sync to update the running configuration and save the configuration to the configuration files on the remote peer devices. For information, refer to SOL14624: Configuring the Automatic Sync feature to save the configuration on the remote devices.

Beginning in BIG-IP 11.4.0, the Automatic Sync feature is available for both Sync-Only and Sync-Failover device groups. In addition, the automatic sync behavior can be configured to be either full or incremental. For more information, refer to SOL14809: Auto Sync is possible for Sync-Failover device group.

Symptoms

DSC and ConfigSync issues may result in the following symptoms:

  • Device group members have configuration discrepancies.
  • The system displays status messages that indicate a synchronization or device trust issue.
  • The BIG-IP system logs error messages related to device trust or the ConfigSync process.

Procedures

When you investigate a possible device service clustering/ConfigSync issue, you should first verify that the required configuration elements are set for all device group members. If the required elements are set, then attempt a ConfigSync operation. If ConfigSync fails, the BIG-IP system generates Sync status messages that you can use to diagnose the issue. Use the following procedures to troubleshoot DSC and ConfigSync:

Troubleshooting a ConfigSync operation

Attempt a ConfigSync operation to gather diagnostic information to help you troubleshoot ConfigSync/DSC issues. To troubleshoot the ConfigSync operation, perform the following procedures:

Verifying the required elements for ConfigSync/DSC

For DSC and ConfigSync to function properly, you must verify that required configuration elements are set. To do so, review the following requirement information:

Requirement Description Configuration utility location Traffic Management Shell (tmsh) location
Licensing and provisioning Device group members must have the same product licensing and module provisioning. System > License tmsh show /sys license
tmsh show /sys provision
Software versions Device group members must run the same BIG-IP software version. System > Software Management tmsh show /sys software
Management IP Each device must have a unique management IP address, netmask, and management route. System > Platform list /sys management-ip
list /sys management-route
NTP Network Time Protocol (NTP) is required for all device group members. System > Configuration > Device > NTP tmsh list /sys ntp servers
ConfigSync IP Self IP addresses for ConfigSync must be defined and routable between device group members. F5 recommends that the addresses reside on a dedicated HA VLAN.
Device Management > Devices tmsh list /cm device <device> configsync-ip
Failover IP Self IP addresses for failover must be defined and routable between device group members (for Sync-Failover device groups). Device Management > Devices tmsh list /cm device <device> unicast-address
Ports Device group members should be able to communicate over ports 443, 4353, 1026 (UDP), and 22 (recommended).
Not Applicable
Not Applicable
Device trust Device trust must be established for device group members. Device Management > Device Trust tmsh show /cm device-group device_trust_group

Viewing the commit ID updates

When you troubleshoot a ConfigSync issue, it is helpful to determine which device group member has the latest commit ID update and contains the most recent configuration. You can then decide whether to replicate the newer configuration to the group, or perform a ConfigSync operation that replicates an older configuration to the group, thus overwriting a newer configuration.  

To display the commit ID and the commit ID time stamps for the device group, perform the following procedures:

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the BIG-IP command line.
  2. To display the commit IDs for the device group, type one of the following commands:

    BIG-IP 11.2.1 and later:

    tmsh run /cm watch-devicegroup-device

    BIG-IP 11.0.0 through 11.2.0:

    watch_devicegroup_device
  3. Locate the relevant device group and review the cid.id and cid.time columns.

    For example, the following output shows that the sync_test device group has three members, and device bigip_a has the latest configuration as indicated by the cid.id (commit ID number) and cid.time (commit ID timestamp) columns:

    devices  <devgroup           [device   cid.id   cid.orig               cid.time  last_sync
    20 21    sync_test           bigip_a   32731   bigip_a.pslab.local     14:27:00      :  :
    20 21    sync_test           bigip_b   1745    bigip_a.pslab.local     13:39:24    13:42:04
    20 21    sync_test           bigip_c   1745    bigip_a.pslab.local     13:39:24    13:42:04


    Note: Multiple devices with identical information are collapsed into a single row that displays in green.
  4. Perform steps 1 through 3 on all devices in the device group.
  5. Compare the commit ID updates for each device with each device group member. If the commit ID updates are different between devices, or a device is missing from the list, proceed to the Troubleshooting DSC section.

Verifying a ConfigSync operation

When troubleshooting a ConfigSync issue, attempt a ConfigSync operation and verify the sync status message. If the ConfigSync operation fails, the BIG-IP system generates a sync status message that you can use to diagnose the issue. To attempt a ConfigSync operation, perform one of the following three procedures:

Configuration utility (11.2.1 and later)

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the Configuration utility.
  2. Navigate to Device Management > Overview.
  3. In the Device Groups section, click the name of the device group you want to synchronize.
  4. In the Devices section, click the appropriate device.
  5. From the Sync menu, select the desired synchronization operation.
  6. Click Sync.

Configuration utility (11.0.0 through 11.2.0)

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the Configuration utility.
  2. Navigate to Device Management > Device Groups.
  3. Click the name of the device group to synchronize.
  4. Click the ConfigSync tab.
  5. Click Sync Device to Group.

Traffic Management Shell (tmsh)

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the Traffic Management Shell (tmsh) by typing the following command:

    tmsh

  2. To synchronize the configuration to the device group, use the following command syntax:

    run /cm config-sync <option> <device_group>

    For example, to synchronize the local device configuration to the device group, use the following syntax:

    run /cm config-sync to-group <device_group>

Verifying the sync status

After you attempt the ConfigSync operation, you can verify the synchronization status messages and begin to troubleshoot the issue. To verify the synchronization status, refer to the following utilities:

Configuration utility

Version Configuration utility page Description
11.2.1 and later
Device Management > Overview
The Device Groups section displays the ConfigSync status for device groups.
The Devices section displays the ConfigSync status for devices.
11.0.0 - 11.2.0 Device Management > Device Groups > Group_Name > ConfigSync The ConfigSync Properties section displays the ConfigSync status for the device.

tmsh utility

Version tmsh command Description
11.x tmsh show /cm sync-status Displays the ConfigSync status of the local device.

Understanding sync status messages

The BIG-IP system displays ConfigSync status messages for device groups and specific devices. Common synchronization status messages are displayed in the following tables.

Synchronization status messages for device groups

The BIG-IP system displays a number of specific synchronization status messages for each device group. Use the following table to help you troubleshoot messages that you might encounter:

Sync status Summary Details
Recommendation
Awaiting Initial Sync The device group is awaiting the initial ConfigSync
The device group was recently created and has either not yet made an initial sync, or the device has no configuration changes to be synced.
Sync one of the devices to the device group.
Awaiting Initial Sync hostname-1, hostname-2, etc. awaiting the initial config sync One or more device group members have not yet synced their data to the other device group members, or a device group member has not yet received a synchronization from another member. Sync the device that has the most current configuration to the device group.
Changes Pending Changes Pending
One or more device group members have recent configuration changes that have not been synchronized to the other device group members.
Sync the device that has the most current configuration to the device group.
Changes Pending There is a possible change conflict between hostname-1, hostname-2, etc. There is a possible conflict among two or more devices because more than one device contains changes that have not been synchronized to the device group. View the individual synchronization status of each device group member, and then sync the device that has the most current configuration to the device group.
Not All Devices Synced hostname-1, hostname-2, etc. did not receive last sync successfully One or more of the devices in the device group does not contain the most current configuration. View the individual synchronization status of each device group member, and then sync the device that has the most current configuration to the device group.
Sync Failure
A validation error occurred while syncing to a remote device The remote device was unable to sync due to a validation error.
Review the /var/log/ltm log file on the affected device.
Unknown
The local device is not a member of the selected device group The device that you are logged in to is not a member of the selected device group. Add the local device to the device group.
Unknown
Not logged in to the primary cluster member The system cannot determine the synchronization status of the device group because you are logged in to a secondary cluster member instead of the primary cluster member. This status pertains to VIPRION systems only. Use the primary cluster IP address to log in to the primary cluster member.
Unknown
Error in trust domain
The trust relationships among devices in the device group are not properly established. On the local device, reset device trust and then re-add all relevant devices to the local trust domain.
None
X devices with Y different configurations The configuration time for two or more devices in the device group differs from the configuration time of the other device group members. This condition causes one of the following synchronization status messages to appear for each relevant device:

Device_name awaiting initial config sync
Device_name made last configuration change on date_time
Sync the device that has the most current configuration to the device group.

Synchronization status messages for devices

The BIG-IP system displays a number of specific synchronization status messages for individual devices. Use the following table to help you troubleshoot messages that you might encounter:

Sync status Summary
Recommendation
Awaiting Initial Sync The local device is waiting for the initial ConfigSync. The device has not received a sync from another device and has no configuration changes to be synced to other members of the device group.
Determine what device has the latest/desired configuration and perform a ConfigSync from the device.
Changes Pending The device has recent configuration changes that have not been synced to other device group members.
Sync the device to the device group.
Awaiting Initial Sync with Changes Pending The configuration on the device has changed since joining the device group, or the device has not received a sync from another device but has configuration changes to be synced to other device group members.
Determine the device with the latest/desired configuration and perform a ConfigSync operation from the device.
Does not have the last synced configuration, and has changes pending The device received at least one synchronization previously, but did not receive the last synchronized configuration, and the configuration on the device has changed since the last sync. Determine the device with the latest/desired configuration and perform a ConfigSync operation from the device.
Disconnected The iQuery communication channel between the devices was terminated or disrupted. This may be a result of one of the following:
*The disconnected device is not a member of the local trust domain.
*The disconnected device does not have network access to one or more device group members.
*Join the disconnected device to the local trust domain.
*Verify that the devices have network access using the ConfigSync IP addresses.
Device does not recognize membership in this group The local device does not recognize that it is a member of the device group. Add the device to the device group.
No config sync address has been specified for this device The device does not have a ConfigSync IP address. Configure a ConfigSync IP address for the device.
Does not have the last synced configuration The device previously received the configuration from other device group members, but did not receive the last synced configuration. Perform a ConfigSync operation to sync the device group to the local device.

Reviewing the log files for ConfigSync error messages

The BIG-IP system logs messages related to ConfigSync and DSC to the/var/log/ltm file. To review log files related to ConfigSync and DSC issues, refer to the following commands:

  • To display the /var/log/ltm file, use a Linux command similar to the following example:

    cat /var/log/ltm
  • To display log messages related to DSC or CMI, use a command similar to the following example:

    grep -i cmi /var/log/ltm
  • To display log messages related to ConfigSync, use a command similar to the following example:

    grep -i configsync /var/log/ltm

Troubleshooting DSC

ConfigSync failure is often a result of DSC issues. Perform the following steps to troubleshoot DSC:

Verifying the device trust status

BIG-IP devices must be members of the same local trust domain before you can add them to a device group. When a BIG-IP device joins the local trust domain, it establishes a trust relationship with peer BIG-IP devices that are members of the same trust domain. If the trust relationship among all device group members is not properly established, the synchronization functionality does not work as expected. To verify the device trust status, perform the following procedure:

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the Traffic Management Shell (tmsh) by typing the following command:

    tmsh
  2. List the devices in the local trust domain by typing the following command:

    show /cm device-group device_trust_group
  3. Verify that all device group members are joined to the local trust domain.

    For example, the following output indicates that three devices are joined to the trust domain of the local BIG-IP system:

    ------------------------------------------------------------------------
    CM::Device-Group
    Group Name          Member Name          Time since Last Sync (HH:MM:SS)
    device_trust_group  bigip_a               -
    device_trust_group  bigip_b              00:15:12
    device_trust_group  bigip_c              18:29:59
  4. Perform steps 1 through 3 on all devices in the device group.
  5. Compare the list for each device with each device group member.

Verifying time synchronization for device group members

If you experience device trust issues, ensure that all device group members are configured with an NTP server, and that the time across all devices is in synchronization. To verify time synchronization and NTP configuration, perform the following procedure:

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the command line.
  2. Run the following command:

    date; tmsh list /sys ntp
  3. Perform steps 1 and 2 on all devices in the device group.
  4. Verify that each device group member is configured to use an NTP server, and that the times are in synchronization.

To configure NTP on a BIG-IP system, refer to SOL3122: Using the BIG-IP Configuration utility to add an NTP server.

Troubleshooting the CMI communication channel

When the device clustering components are properly defined, the device group members establish a communication channel to accommodate device group communication and synchronization. To troubleshoot the CMI communication channel, perform the following procedures:

Verifying network access to the ConfigSync IP address

The self IP addresses used for ConfigSync must be defined and routable between device group members. To test network access to another device's ConfigSync IP address, perform the following procedure:

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Obtain the ConfigSync IP address of each device group member.

    For example:

    tmsh list /cm device configsync-ip
  2. Log in to the command line of one of the device group members.
  3. To test network access to each of the other device's ConfigSync IP addresses, use the following command syntax:

    ping <remote_configsync-ip>
  4. Perform steps 1 through 3 on all devices in the device group.

Verifying the communication channel using netstat

The communication channel allows device group members to exchange MCP messages and commit ID timestamps. When the device trust is working properly, the devices establish a full mesh channel. You can verify this behavior by checking the connection table on each device and ensuring that each device has a connection entry to all other devices in device_trust_group. To do so, perform the following procedure on each device:

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the command line.
  2. Type the following command:

    netstat -pan | grep -E 6699
  3. Perform steps 1 and 2 on all devices in the device group, and then confirm that the list looks the same on each device.

Monitoring the CMI communications channel for commit ID updates

The sniff-updates program monitors the internal CMI communications channel for commit ID updates. Each commit ID update is linked to an MCP transaction. The system displays each commit ID update as it arrives, one per line. To monitor the CMI communication channel for commit ID updates, perform the following procedure:

Impact of procedure: Performing the following procedure should not have a negative impact on your system.

  1. Log in to the Traffic Management Shell (tmsh) by typing the following command:

    tmsh
  2. To run sniff-updates (add -v for verbose output), type the following command:

    run /cm sniff-updates


    For example, the following command output shows a single commit ID update:

    # run /cm sniff-updates
    [12:48:01] bigip_b (v11.2.1) -> sync_test: CID 32 (bigip_b) at 12:48:01 FORCE_SYNC


    The fields are defined as follows:
  3. Example output Description
    [12:48:01] Time that the update arrived from the network
    bigip_b Source device
    (v11.2.1) Version of source device
    sync_test Destination device group
    CID 32
    Commit ID
    (bigip_b) Commit ID originator
    12:48:01 Commit ID timestamp
    FORCE_SYNC FORCE_SYNC if set (nothing if not)

Verifying required processes

DSC requires that the following processes are running on all device group members:

  • devmgmtd: Establishes and maintains device group functionality
  • mcpd: Allows userland processes to communicate with TMM
  • sod: Provides failover and restart capability
  • tmm: Performs traffic management for the system

To verify that the processes are running, perform the following procedure:

  1. Log in to the command line.
  2. Run the following command:

    bigstart status devmgmtd mcpd sod tmm
  3. Verify that the processes are all running and that the output appears similar to the following example:

    devmgmtd     run (pid 7277) 330 hours, 2 starts
    mcpd         run (pid 7309) 330 hours, 1 start
    sod          run (pid 7339) 330 hours, 1 start
    tmm          run (pid 7352) 330 hours, 1 start
  4. If one or more processes are not running, use the following command syntax to start the service:

    bigstart start <daemon>

Resetting the device trust and re-adding a device to the trust domain

If you verify the elements in the previous section and still have issues, you can reset the device trust and re-add a device to the trust domain by performing the following procedure:

Impact of procedure: F5 recommends that you perform this procedure during a maintenance window. This procedure causes the current device to lose connectivity with all other BIG-IP devices. Depending on the device group and traffic group configuration, the connectivity loss may result in an unintentional Active-Active condition that causes a traffic disruption. To prevent a standby device from going active, set the standby device in the device group to Force Offline before performing the procedure. Standby devices that were set to Force Offline should be set to Release Offline after performing the procedure.

  1. Log in to the Configuration utility of the affected BIG-IP device.
  2. Navigate to Device Management > Device Trust > Local Domain.
  3. Click Reset Device Trust.
  4. Select the Generate new self-signed authority option.
  5. Log in to the Configuration utility of the authority BIG-IP device.
  6. Navigate to Device Management > Device Trust > Local Domain.
  7. Click the Peer List or Subordinate List menu (in BIG-IP 11.0.0 through 11.1.0, scroll to the Peer Authority Devices area or the Subordinate Non-Authority Devices area of the screen).
  8. Click Add.
  9. Type the management IP address, administrator user name, and administrator password for the affected BIG-IP device, and then click Retrieve Device Information (in BIG-IP 11.0.0 through 11.1.0, click Next).
  10. Verify that the certificate and the remote device name are correct, and then click Finished (in BIG-IP 11.0.0 through 11.1.0, click Next and advance through the remaining screens).

Resetting device trust across all devices

Warning: Attempt this procedure only if none of the devices are communicating.

If procedures in the previous sections do not alleviate the issue, then completely reset device trust across all devices. To reset the device trust across all devices, perform the following procedure on all devices in the device group:

Impact of procedure: F5 recommends that you perform this procedure during a maintenance window. This procedure causes the current device to lose connectivity with all other BIG-IP devices. Depending on the device group and traffic group configuration, the connectivity loss may result in an unintentional Active-Active condition that causes a traffic disruption. To prevent a standby device from going active, set the standby device in the device group to Force Offline before performing the procedure. Standby devices that were set to Force Offline should be set to Release Offline after performing the procedure.

  1. Log in to the Configuration utility.
  2. Navigate to Device Management > Device Trust > Local Domain.
  3. Click Reset Device Trust.
  4. Select the Generate new self-signed authority option.
  5. Click Update (or Next).
  6. Click Finished.
  7. Repeat this procedure for each device in the device group.

After you complete the device trust reset on all devices, set up the device trust by performing the procedures described in the following articles:

Supplemental Information

Was this resource helpful in solving your issue?




NOTE: Please do not provide personal information.



Incorrect answer. Please try again: Please enter the words to the right: Please enter the numbers you hear:

Additional Comments (optional)