Applies To:

Show Versions Show Versions

Manual Chapter: Monitoring with SNMP
Manual Chapter
Table of Contents   |   << Previous Chapter

Simple Network Management Protocol (SNMP) is an industry-standard protocol that gives a standard SNMP management system the ability to remotely manage a device on the network. One of the devices that an SNMP management system can manage is a WebAccelerator system.
The SNMP versions that the BIG-IP system supports are: SNMP v1, SNMP v2c, and SNMP v3. The WebAccelerator system implementation of SNMP is based on a well-known SNMP package called Net-SNMP, which was formerly known as UCD-SNMP.
A standard SNMP implementation consists of an SNMP manager, which runs on a management system and makes requests to a device, and an SNMP agent, which runs on the managed device and fulfills those requests. SNMP device management is based on the standard management information base (MIB) known as MIB-II, as well as object IDs and MIB files.
The MIB defines the standard objects that you can manage for a device, presenting those objects in a hierarchical, tree structure. Each object defined in the MIB has a unique object ID (OID), written as a series of integers. The OID indicates the location of the object within the MIB tree.
A set of MIB files resides on both the SNMP manager system and the WebAccelerator system. MIB files specify values for the data objects defined in the MIB. This set of MIB files consists of standard SNMP MIB files and enterprise MIB files. Enterprise MIB files are those MIB files that pertain to a particular company, such as F5 Networks, Inc.
Typical SNMP tasks that an SNMP manager performs include polling for data about a device, receiving notifications from a device about specific events, and modifying writable object data
Note: This chapter assumes that you already have a network management system (NMS) in place for the sites infrastructure. Due to the wide range of NMS products on the market today, this guide cannot describe how to integrate the MIBs into your particular NMS. For information, see the documentation that came with your specific NMS product.
2.
In the /config/wa/pvsystem.conf file, locate the enableMonitor parameter and change it to false.
2.
In the /config/wa/pvsystem.conf file, locate the enableMonitor parameter and change it to true.
3.
Optionally, locate the monitorCommunity parameter and change its value to meet your sites requirements. The default value is Public.
Table 6.1 contains an overview of the WebAccelerator systems SNMP MIB objects that we recommend you monitor to see how the WebAccelerator system is managing the traffic for the applications that are currently in a production environment.
WA::numReqRejected
WA::numPendingReq
WA::imcHitCount
WA::hdsHitCount
WA::lruLength
WA::lruItemBulk
WA::lruPops
WA::msgQueueDepth
WA::postQueueDepth
WA::numDupRecv
WA::numExpired
Connection throughput reflects how the WebAccelerator system is managing the rate of requests. You can use the following SNMP objects to monitor the specific details of the WebAccelerator systems connection throughput.
WA::numReqRejected
This SNMP object identifies the number of connection requests that the WebAccelerator system has rejected. The WebAccelerator system rejects additional connections when it reaches the maximum number of connections it can service. This maximum is defined by the maxPendingRequests variable in the pvsystem.conf file.
WA::numPendingReq
This SNMP object identifies the number of requests that the WebAccelerator system has accepted, but is not currently processing. A high number of pending requests indicates that the WebAccelerator system is not keeping up with the traffic. This can be caused by two reasons:
Slow disk access performance
Disk access issues can occur if you are using a Network File System (NFS) to access the on-disk cache, and the NFS server cannot keep up with the number of times that the WebAccelerator system reads the on-disk cache. For specific information about monitoring on-disk cache, see Hit counts.
Large numbers of assembly events (not caused by ESI includes)
Assembly events are prompted by an acceleration policys assembly rules for parameter substitution, parameter value randomizers, as well as ESI operations that do not use ESI include statements. The WebAccelerator system uses Smart Cache to process any requests that require special assembly, and for ESI operations that do not use ESI includes. For more information about monitoring smart cache, see Monitoring Smart Cache.
The WebAccelerator system uses the Smart Cache feature to store objects that require special assembly processing, such as objects for which variation rules apply. When the WebAccelerator system serves requests from Smart Cache it increases the speed in which traffic to your applications is processed, and it reduces the load on your origin web servers.
To monitor the number of requests that the WebAccelerator system is serving from Smart Cache, you view SNMP objects for hit counts and in-memory cache size.
When the WebAccelerator system receives a request, it counts that request as a hit. You monitor hit counts by viewing the following SNMP objects:
WA::imcHitCount
This SNMP object identifies the number of cache hits against the in-memory cache.
 
WA::hdsHitCount
This SNMP object identifies the number of hits against the on-disk cache.
These numbers fluctuate over time, as the content expires and the WebAccelerator system sends requests to the origin web servers for fresh content. Cache invalidation events can also lower these numbers, and any time the WebAccelerator system is restarted, it resets the hit counts to zero.
If both the WA::imcHitCount and WA::hdsHitCount objects are zero, the WebAccelerator system is not serving content from Smart Cache. Assuming normal traffic loads for the site, this condition usually indicates that there are issues with your acceleration policies. For information about acceleration policies, see the Policy Management Guide for the BIG-IP® WebAccelerator System.
 
If the WA::hdsHitCount is zero and the WA::imcHitCount is something other than zero, it indicates an internal error condition. If this occurs, contact F5 Networks Technical Support for assistance.
The WA::lruLength SNMP object represents the number of objects that are stored in the in-memory cache. Assuming that you are not exceeding the memory available to the in-memory cache, you should see this value fluctuate as objects expire from cache in accordance with the TTL settings, or in relation to object invalidation.
The WA::lruItemBulk SNMP object represents the current size of the in-memory cache, which should never be larger than the value of the imcMaxFootprint parameter in the pvsystem.conf file. The WebAccelerator system continuously puts new objects into the in-memory cache until it either has no more unique objects to cache there, or it reaches the size limit set by the imcMaxFootprint variable. If the WebAccelerator system meets the maximum size limit for in-memory cache, it removes (or pops) cached objects that have not been accessed for the longest period of time.
If the WebAccelerator system receives a request for an object that it has popped from its in-memory cache, it services that request from the on-disk cache. Obtaining data from disk is much slower than obtaining it from memory, therefore, it is important to monitor the WA::lruPops object. If you see a large number of pops reported, you can improve the WebAccelerator systems performance by increasing the size of the in-memory cache (if local system resources allow it).
If you cannot increase the size of the in-memory cache footprint, you may be able to reduce the number of pops by reducing the number of unique objects that the WebAccelerator system is caching. You can accomplish by modifying the acceleration policys content variation rules, as follows.
Examine your site to see if there are legacy query parameters or query parameter values that do not affect content. If there are, identify those parameters in the acceleration policys content variation rules.
 
Examine your acceleration policys content variation rules to see if you have identified any parameters, such as cookies or user agent values, that do affect content. Remove any existing parameters that do not affect content.
Note: For specific information about configuring variation rules, see the Configuring Variation Rules chapter in the Policy Management Guide for the BIG-IP® WebAccelerator System.
To establish a maximum time limit for content, limit the size of the in-memory cache using the TimcMaxFootprint parameter in the pvsystem.conf file. For information about modifying the pvsystem.conf file, see Modifying the pvsystem.conf file.
If the size of a specific queue is rising significantly, it could be an indication that the WebAccelerator system is not managing requests as effectively as it could be.
The WebAccelerator system places each response that it receives from the origin web server (that doesnt contain a surrogate-control header), into the compile queue. You can monitor the number of objects in the compile queue using the WA::sizeCompileQueue SNMP object.
Compilation is a very efficient process, therefore, the compile queue is typically small. Most sites normally see a compile queue size of 0 or 1. If the compile queue is consistently large (greater than 5), it indicates that the WebAccelerator system is not keeping up with the workload. If this is occurring, you can attempt to resolve the issue by editing the pvsystem.conf files maxParserThreads parameter to change the number of threads assigned to the compile queue to 2 (by default, only 1 thread is used).
The compile queue is also affected by the size of the pages that the Webaccelerator system is compiling. If a site responds to a request with an erroneously large page, the compile queue grows as it attempts to handle the larger page. Therefore, if you experience unexpected spikes in the compile queue, examine your sites code to verify that it is building pages correctly.
The maximum number of responses allowed in the compile queue is defined by the maxParserTasks parameter. By default, this parameter is set to 1000. If the compile queue is at this limit, the WebAccelerator system fails to cache some responses. This can result in a greater than anticipated load on the origin web servers. In this situation, reducing the size of the queue also reduces the load on the origin web servers.
See Changing default values for the compile queues for information about modifying parameters.
A surrogate-control header indicates that a response uses ESI markup. When the WebAccelerator system sends a request to the origin web servers and receives a response that uses the surrogate-control header, it places the response in the ESI compile queue for compilation.
You can use the WA::sizeESICompileQueue SNMP object to monitor the number of objects in the ESI compile queue. The size of this queue is dependent on the size and number of the fragments (that is, the number of ESI includes), and the size of the templates that you are compiling.
The maxESIParserTasks parameter in the pvsystem.conf file defines the maximum number of responses allowed in the ESI compile queue. By default, this parameter is set to 1000. If the ESI compile queue reaches this limit, the WebAccelerator system fails to cache some responses. The result can be a greater than anticipated load on the origin web servers.
As with the compile queue, a sudden spike in the size of the ESI compile queue can indicate a problem with the code that generates your site. In this situation, reducing the size of the queue reduces the load on the origin web servers. Use the following tips to reduce the size of the ESI compile queue.
Increase the number of threads used to process the ESI compile queue to 2 (by default, only 1 thread is used), by editing maxESIParserThreads parameter in the pvsystem.conf file.
See Changing default values for the compile queues for information about modifying parameters.
The WebAccelerator system places into an assembly queue all HTTP requests that it can service from its cache and applies special threads to those requests to create the compiled responses. You can monitor the size of the assembly queue using the WA::sizeAssembleQueue SNMP object. Assembly is a very efficient process. For sites that are not using ESI, the size of the assembly queue should be minimal. Most sites normally have a queue size of 0 or 1.
If you are using ESI, the size of the assembly queue can grow in direct proportion to the complexity of the ESI instructions. If the assembly queue grows too large, users will perceive a reduction in your sites responsiveness. For this reason, we recommend that you keep the assembly queue as small as possible. If your site seems to be responding slowly and the assembly queue is persistently large, reduce the size of the queue.
The maxAssembleTasks parameter in the pvsystem.conf file determines the maximum number of requests allowed in the assembly queue. By default, this parameter is set to 1000. If the assembly queue reaches this limit, clients will receive an HTTP Service Not Responding error. In this situation, use the following tips to reduce the size of the assembly queue:
 
Try increasing the number of threads used to process the assembly queue to 2 (by default, only 1 thread is used), by editing the maxAssembleThreads parameter in the pvsystem.conf file.
See Changing default values for the compile queues for more information about modifying parameters.
The WebAccelerator system uses the communications manager to exchange information between processes. It is critical that these interrelated processes are working correctly; any failure can mean that the acceleration policy changes, or any invalidated objects, did not propagate. An issue with the communications system can also cause hit log and change log files to build up in the WebAccelerator systems archive directory.
Each WebAccelerator system process uses a specific SNMP port to monitor the two types of communication managers, as specified in the Table 6.2.
Table 6.2 SNMP ports
WA::msgQueueDepth
This object identifies the number of messages in the message queue. A message queue that is always increasing indicates that the WebAccelerator system is not properly processing messages.
WA::postQueueDepth
This object identifies the number of messages in the retry queue. A retry queue of more than 10 to 20 messages can indicate that the WebAccelerator system failed to receive acknowledgements.
WA::numDupRecv
This object identifies the number of messages in the duplicate message queue. A duplicate message queue of more than 10 or 12 messages can indicate that the WebAccelerator system is failing to send acknowledgements.
WA::numExpired
This object identifies the number of messages in the expired messages queue. A large expired messages queue can indicate that the system clocks are not synchronized, or that there is a high latency in the network.
Port definitions
A process was unable to obtain the port that is defined for it in the pvsystem.conf file. This most likely means that a process that was previously using that port did not shut down properly. Use the netstat(8) command to see what process is currently using the port.
Upstream processes
A process was unable to connect to its upstream (parent) process. Possible reasons are:
Upstream process is not running
Check the system processes (see Checking the WebAccelerator system processes) and restart them as required.
Upstream process is not using the expected port
Check the port assignments for the upstream and downstream systems processes (see Monitoring the communications channels), and correct them in the pvsystem.conf file as required.
Network misconfiguration
Verify that the host names identified in the pvsystem.conf file are configured in DNS, and that they resolve to the correct IP addresses, and then ping the systems over the network.
Firewall misconfiguration
Verify that the firewall is open for both incoming and outgoing TCP/IP traffic for all relevant hosts and ports.
SSL certificates
The communications system uses certificate-based encrypted traffic (SSL version 3). If the certificates are corrupted or expired, or not yet valid in certain pathological cases, then communications cannot be established. If you suspect a corrupted or invalid certificate, check the validity of the certificate on the upstream and downstream systems, using the following openssl command:
If there is an SSL certificate issue, the last line of output does not display OK. Note that Error 18 (self-signed certificate) is reported for all default WebAccelerator system installations. This is considered an acceptable error if the openssl command reports OK.
Time is not synchronized
The WebAccelerator system uses NTP to keep all system clocks synchronized. The communications system is sensitive to time synchronization issues and the clocks must be within 1000 seconds of each other for NTP to be able to correct the differential. This limit can be exceeded if there is a hardware clock failure on a host. Check the time hosts in your installation. If they are not synchronized, verify that NTP is running on each host and check for NTP errors in the /var/log/wa/messages file. For information, see Defining an NTP server.
The objects you use to monitor these operations are located in the /usr/local/wa/wa.mib file. In addition to the various WebAccelerator system processes, you should also monitor the hosts on which those processes are running. F5 Networks recommends that you monitor the objects listed in Table 6.3.
Identifies the amount of data transmitted out of the network interface.
If this value stops increasing, it indicates that there is either a problem with the WebAccelerator system processes or with your network in general.
Identifies the available space on disk.
Available disk space can become too low because (1) the on-disk cache is filling up its partition, or (2) the log files are filling up their partition.
If the on-disk cache is growing too large for its partition because the WebAccelerator system is caching more objects than there is available disk space, you can resolve it by increasing the size of the on-disk cache partition.
If the on-disk cache is growing too large because it is not being pruned correctly (that is, the WebAccelerator system is not deleting objects from disk), verify that the hds_prune script is operating correctly.
Identifies the amount of space used on disk as a percentage of the total size of the disk. You can use this as another way of identifying disk space issues described for the ucd-snmp::dskAvail object.
Identifies the amount of unused memory available on the host.
It is especially important to make sure that the WebAccelerator system is not moving data from the swap space into real memory (RAM), because this slows it down. If you see this object approaching 0, consider reducing the size of the in-memory cache.
Note that for Linux systems, available RAM is almost always identified as 0. For this reason, it is better to monitor the amount of available swap space (ucd-snmp::memAvailSwap following) to see if Linux systems are swapping.
Identifies the amount of unused swap space available on the host.
Swap space is an area on the disk that is used as virtual memory to temporarily hold the least used files from real memory (RAM). Sufficient swap space is important to ensure that some RAM is free at all times.
Identifies whether a process is reporting an error.
Use this object to verify that the various WebAccelerator processes are operating without issue. If any processes are reporting a prErrorFlag value other than 0, investigate that process to see if there is a problem that requires your attention.
Table of Contents   |   << Previous Chapter

Was this resource helpful in solving your issue?




NOTE: Please do not provide personal information.



Incorrect answer. Please try again: Please enter the words to the right: Please enter the numbers you hear:

Additional Comments (optional)