Applies To:

Show Versions Show Versions

Manual Chapter: Configuration Guide for the BIG-IP® Application Security Module: Working with the Crawler Tool
Manual Chapter
Table of Contents   |   << Previous Chapter   |   Next Chapter >>


5

Working with the Crawler Tool


What is the Crawler tool?

The Crawler tool is one of the Policy Builder tools that is available in the Application Security Module. The Crawler tool is most beneficial when you are configuring a high security (APC) security policy. The Crawler tool scans the web application that you want to secure, and populates the security policy with the components of the web application. Examples of web application components are: HTML files, picture files, form fields, links, and the flows that lead from the defined start points to other objects in the application.

When you run the Crawler tool for the first time on a policy, it populates the policy with the current objects (application components). The next time you run the Crawler tool, it collects only the objects that were added to the web application since the last time you ran the Crawler tool. You can configure the Crawler tool either to add the newly-found objects directly to the security policy, or to list the newly-found objects in the Crawler Learning screen. You can then examine the Crawler Learning screen entries, and decide whether the objects actually belong in the security policy.

Enhancing the Crawler tool's data collection process

You can enhance the accuracy of the data that the Crawler tool collects by using the Policy Browser. The Policy Browser is another Policy Builder tool designed to help streamline the policy creation process. For details on the Policy Browser, refer to Using the Policy Browser to collect web application data .

Refining security policies using the Crawler tool and the Learning process

The Crawler tool provides a good foundation for a high security policy. The initial security policy is never fully accurate, however, because the Crawler tool collects its information only from the web application itself. For the security policy to be truly accurate, and thus effective, it must also know the details of the traffic that the web application processes. For example, while the Crawler tool can determine values for static parameters, such as options in list boxes, it cannot always provide reasonable value ranges for dynamic parameters, such as account numbers or user names. To collect that information, and further refine the policy, you can use the Learning process. See Chapter 4, Refining the Security Policy with Learning Tools , for more information.

Configuring the Crawler tool settings

Before you use the Crawler tool to populate a security policy, you must first configure the settings for the tool itself. This process notifies the Crawler tool of some of the web application objects the tool may encounter as it scans the web application. This process also notifies the Crawler tool of the actions to take when it encounters certain types of web objects.

You configure all of the Crawler tool settings on the Crawler settings screen. Once you have set up the Crawler tool and run it for the first time, you need only confirm the settings for the subsequent times that you run it.

Tip


When you are configuring the Crawler tool settings, it is helpful to collect input from the web application designer, or someone else who knows the web application architecture well.

To configure the Crawler tool settings

  1. On the Main tab of the navigation pane, expand the Application Security section, and then click Web Applications.
    The Web Applications list screen opens.
  2. In the Active Policy column, click the name of the security policy for which you are running the Crawler tool.
    The Policy Properties screen opens.
  3. In the Build Tools section, in the Crawler row, click the Settings button.
    The Crawler Settings screen opens, where you configure the applicable settings.

The following sections of this chapter describe the Crawler tool settings, and how to use them. Note that the settings may not all be relevant for your web application. You may wish to confer with the web application designer, to ensure that the Crawler tool is effective in recognizing the components of the web application.l

Configuring a Crawler domain

The first Crawler setting that you configure is the Crawler domain, which is the host and address information for the web server that hosts the web application. The Crawler domain configuration enables the communications between the Crawler tool itself and the web servers.

The Crawler domain settings should match the client SSL and server SSL settings for the local traffic virtual server with which the Application Security Class is associated. Otherwise the Crawler tool cannot gain direct access to the web server that is hosting the web application. You can configure the Crawler domain settings to use any combination of HTTP and HTTPS. Table 5.1 shows the mapping.

 

Table 5.1 Mapping virtual server settings to Crawler domain settings
If the virtual server uses
Then the Crawler domain setting is
Enable server-side encryption?
No client SSL
Use HTTP
No
Client SSL
Use HTTPS
No
Server SSL
Use HTTP or Use HTTPS
Yes

 

To configure a Crawler domain

  1. On the Crawler Settings screen, in the Crawler Domains section, click Create.
    The Create New Crawler Domain popup screen opens.
  2. In the Host box, type the fully-qualified domain name of the web server.
  3. In the HTTP Settings section, configure the following options:
    1. If the web application accepts HTTP traffic, check the Use HTTP box.
    2. In the IP box, type the address of the web server.
    3. In the Port box, type the port for the HTTP service, typically 80.
    4. If the system should encrypt traffic from the web server, check the Use Encryption box.
  4. In the HTTPS Settings section, configure the following options:
    1. If the web application accepts HTTPS traffic, check the Use HTTPS box.
    2. In the IP box, type the address of the web server.
    3. In the Port box, type the port for the HTTPS service, typically 443.
    4. If the system should encrypt traffic from the web server, check the Use Encryption box.
  5. Click OK.
    The system adds the new Crawler domain to the configuration.

Configuring the Start Points component

The Crawler tool starts the data collection process for a web application from a URL. This URL is known as the start point. The start point is usually the web application's home page. If the web application has several start points, you can instruct the Crawler tool to scan the application from each start point, separately. For example, an online banking site has a public area, which anyone can access, and a secure area that requires a unique login. When a customer logs in to the secure area, they may be redirected to a different web application. To successfully map the entire web application, the Crawler tool must know about all of these start points.

Tip


If your web application has more than one start point, we recommend that you run the Crawler tool one or more times to scan the public access areas of the web application, and then run the Crawler tool with the login information configured, to scan the secure areas of the web application.

To create a Crawler start point

  1. On the Crawler Settings screen, above the Start Points section, click Create.
    The Create New Crawler Start Point popup screen opens.
  2. In the Domain box, select the Crawler domain for which you are creating a start point.
  3. In the Start Point box, type the address of the default start page of the application, for example, http://myapp.example.com/index.html.
  4. Important: Every start point must reference a Crawler domain.
  5. Click OK.
    The system adds the new start point to the Crawler settings.

Configuring the Form Fillers component

Since the Crawler tool emulates user behavior, it submits data to the web application pages in the same way users do. Each time you run the Crawler tool, it populates any form fields in the application with the values that are defined in the Form Fillers component, for example, username or password. When you run the Crawler tool against a web application for the first time, the Crawler tool may generate incomplete results when it finds a form field that it cannot fill in with the correct information. However, this process identifies the form fields so that you can provide the correct information, and run the Crawler tool again.

Sometimes the parameter names are not self-explanatory, and you may need to consult with the web application programmer. If it is available to you, you can also search the HTML source code for this information.

To create a form filler component

  1. On the Crawler Settings screen, above the Form Fillers section, click Create.
    The Create New Crawler Parameter popup screen opens.
  2. In Parameter Name box, type the name of the parameter, for example, username.
  3. In the Parameter Type box, select the appropriate type. Note that if you select password, the system displays asterisks instead of clear text in the Form Fillers list.
  4. In the Parameter Value box, type the value that you want the Crawler tool to enter when it reaches the specified parameter. If the parameter type is password, the system requires you to confirm the value.
  5. Click OK.
    The system adds the new form filler entry to the Crawler tool settings.

Configuring Page Not Found Criteria component

When a request to a non-existing page comes in, web applications typically return a standard HTTP 404 response page, with a page not found error message. This response page may be exploited to stage attacks. To prevent attacks, some web applications may use customized error pages that use the HTTP 200 status code in the response, instead of the HTTP 404 status code. Web application designers do this so that their content can be controlled and verified.

By default, the Crawler tool adds pages that use the HTTP 200 Status OK message to the security policy, and ignores pages that generate the HTTP 404 message. If you do not define the Page Not Found Criteria component, the Crawler tool attempts to identify it by itself. If your web application uses customized error pages (those that do not return the 404 status code), you need to supply a text string that the pages do contain, so that the Crawler tool can identify them as valid error message pages, and avoid adding them to the policy. The Crawler tool can recognize an error page by its file name, or by text strings that are found in the <TITLE> or <BODY> HTML tags.

Tip


The Crawler tool always follows the redirect link, if configured. The Crawler tool identifies the page behind the link, and avoids the link if the identified page is included in the Page Not Found list.

To create a customized error page component

  1. On the Crawler Settings screen, above the Page Not Found Criteria section, click Create.
    The Create New Page Not Found Criteria popup screen opens.
  2. In the Apply to box, select the object that the Crawler tool searches to identify the error page.
  3. In the Search Item box, type the header or string that the Crawler tool searches for.
  4. Click OK.
    The system adds the new page not found criteria for the custom error page to the Crawler tool settings.

Configuring the Logout Pages component

If the web application contains a page designed to log a visitor out of the web application, you need to instruct the Crawler tool not to follow the logout link. Other wise, the Crawler tool logs out of the application before it has fully scanned the application. For example, many web applications have an Exit or Logout link right on the home page, which would cause the Crawler tool to exit the application as soon as it enters. You can prevent this behavior by using the Logout Pages settings to identify the logout points that the Crawler tool should avoid.

Note

If you configure the Crawler tool to recognize (and ignore) a logout page in a web application, the system adds this page to the security policy.

To create a logout page component

  1. On the Crawler Settings screen, above the Logout Pages section, click Create.
    The Create New Logout Page popup screen opens.
  2. In the Logout Pattern (URL) box, type the relative path of the logout page.
  3. Click OK.
    The system adds the new logout page component to the Crawler tool settings.

Configuring the Properties components

The Properties section provides additional ways to customize the Crawler tool. For example, you can instruct the Crawler tool to analyze JavaScript code included in the web application, or to emulate the Microsoft® Internet ExplorerTM web browser.

To specify the properties options

  1. On the Crawler Settings screen, in the Properties section, enable or modify the properties as required. Each property is described in Table 5.2 .
  2. Click Save.
    The system adds the new logout page component to the Crawler tool settings.
  3.  

    Table 5.2 The Properties options in the Crawler tool settings
    Crawler property
    Description
    Analyze JavaScript
    Specifies whether the Crawler tool analyzes or ignores JavaScript code. The default setting is checked.
    Check this box to instruct the Crawler tool to analyze the JavaScript code included in the web application. This is useful if the scripts contain references to links that can be followed, or if they include form fields that need to be filled.
    Clear the box if JavaScript analysis is not necessary.
    Accept untrusted SSL certificates
    Specifies whether the Crawler tool accepts untrusted SSL certificates. The default setting is checked.
    Check this box to instruct the Crawler tool to accept any untrusted SSL certificates within the web application.
    Clear this box to instruct the Crawler tool to accept only trusted SSL certificates.
    Create back flows
    Specifies whether the Crawler tool creates back flows in the security policy for referrer objects. The default setting is checked.
    Check this box to instruct the Crawler tool to register in the policy all flows in the reverse direction. You can then use the back flow information to impose rules on navigating backwards, which occurs when the visitor uses the Back button.
    Clear this box if the Crawler tool should not create back flow objects in the security policy.
    Create cache flows
    Specifies whether the Crawler tool creates flows in the security policy for objects that a web browser can cache. (Examples of cacheable objects are image files.) The default setting is checked.
    Check this box to instruct the Crawler tool to create a cache flow for cacheable objects in the web application. The Crawler tool creates the flow from the first non-cacheable referrer object around the cacheable object, and adds the parameters of the incoming flow to the cache flow. When the Crawler tool cannot find a non-cacheable referrer object for a cacheable object, the cacheable object itself becomes an entry point, and the Crawler tool adds the corresponding cache flow to the policy.
    Clear this box to instruct the Crawler tool to not create cache flows.
    Minimal delay between worm requests to web application (milliseconds)
    The Crawler tool is a mechanism similar to a central unit sending out multiple probes to the different areas of the web application in order to register web application components simultaneously. Each probe behaves as if it were a real user, following links and filling in forms, and therefore increases traffic.
    The probes can be sent in quick or slow succession. Quicker bursts create more traffic. A burst is measured in terms of the number of seconds to wait before sending the next probe. If your web application is active and currently serving visitors, consider increasing this value in order to slow down the Crawler tool.
    Number of threads to be used by the Crawler
    This parameter also relates to simultaneous probe activity. A smaller number decreases the Crawler tool's bandwidth consumption, leaving more bandwidth to actual visitors.
    Number of times the Crawler fetches requests with the same structure
    Applications usually have many identical structures where only the parameter values differ. The following examples illustrate identical links passing different parameter values:
    http://www.myapp.htm?par=111
    http://www.myapp.htm?par=222
    http://www.myapp.htm?par=333
    To reduce crawling time and traffic, you can instruct the Crawler tool to scan only a few and not all of such identical structures, assuming that all others behave in the same way.
    For this property, specify the number of samples that are sufficient for the Crawler tool to scan. A higher value yields a more accurate policy with longer crawling times.
    Maximum number of requests generated for each form by the form iterator
    When the Crawler tool encounters a form, it processes it as many times as the number of pre-defined parameter values included in it. For example, a list containing ten objects causes the Crawler tool to process the form ten times. You can reduce crawling time and traffic, however, by instructing the Crawler tool to process only a few of the objects and not all of them.
    For this property, specify the number of samples you deem it sufficient for the Crawler tool to process from the same form with different values. A higher value yields a more accurate policy with longer crawling times.
    Emulate browser
    If the web application works only with a particular Internet browser, select the relevant browser name from the list.
    The Crawler tool uses this property to select the User-Agent header data when it scans the web application.
    Default charset for user input fields
    Select the character set in which data is normally entered in the form fields of the scanned application. The Crawler tool uses this property as the default setting for all new fields it adds to the security policy.

     

Configuring the HTTP Authentication component

If the web application uses HTTP authentication, then you can use the HTTP Authentication component of the Crawler tool settings to configure the login criteria. The Crawler tool accepts all RFC 2617 authentication formats, as well as the Microsoft NTLM authentication format.

To configure the HTTP authentication component

  1. On the Crawler Settings screen, in the HTTP authentication section, type the user name and password that the Crawler tool should supply to access the server where the web application resides. For Microsoft NTLM authentication, type the user name in the following format:
  2. <domain>\<user_name>
  3. Click Save.
    The system updates the HTTP authentication component for the Crawler tool settings.

Configuring the Object Types Associations component

The Object Types Associations component provides a list of file types frequently used in web applications, and their most common usage in the web application. In this list, you can configure how the Crawler tool processes a certain file type or object, thus saving tedious manual configuration in the security policy. For example, you can instruct the Crawler tool to define all BMP file types as files that are not referrers.

If an object type already exists in the security policy, then the Crawler tool uses the policy settings instead of the settings you define in the Object Type Associations component. However, when the Crawler tool discovers an object type that does not yet exist in the security policy, the Crawler tool applies the object type associations that you define in the Object Type Associations component.

Note

The Crawler tool applies the object type associations on a per-policy basis.

The default settings provided in the object type associations list cover the most common file types and associations, however, you can adapt them to your needs by checking or clearing boxes. Table 5.3 provides a description of the file type associations.

 

Table 5.3 File type associations
Option
Description
Is Entry Point
If all files of this type can be entry points to the web application, check this box.
Is Referrer
If objects of this type may refer to other files, check this box. For example, HTML pages containing a link or CGI files calling another file, are referrers. Pictures and sound files cannot be referrers because these objects never contain links to other objects and are not web pages.
Don't Check Flow
If you want the system to ignore the flows to objects of this file type, check this box.
Don't check object
If you want the system to ignore the requests referring to files (objects) of this type, check this box.
Note: This association is also applied to files that do not exist in the application.

 

Running the Crawler tool

Once you have configured the settings for the Crawler tool, you run the Crawler tool against the web application. This is an iterative process, so that you can continue to refine the security policy, based on the Crawler tool's results.

To run the Crawler tool

  1. On the Main tab of the navigation pane, expand the Application Security section, and then click Web Applications.
    The Web Applications list screen opens.
  2. In the Active Policy column, click the name of the security policy for which you are running the Crawler tool.
    The Policy Properties screen opens.
  3. In the Build Tools section, in the Crawler row, click the Start button.
    The Run Crawler popup screen opens.
  4. Select the appropriate option from the following:
    • Run Crawler
      Check this option to run the Crawler tool as is, without the additional information supplied by the Policy Browser.
    • Run Crawler with Browser output file
      Check this option to run the Crawler tool, and also use web application details pre-recorded by the Policy Browser. Click the Browse button, and select the Policy Browser's output file. For additional information on the Policy Builder tool, see Using the Policy Browser to collect web application data .
  5. If you do not want the Crawler tool to automatically update the security policy, check the Store results in Crawler Learning (No policy update) box. For additional information, refer to Running the Crawler tool in Learning mode .
  6. Click the Run Crawler button.
    The Crawler tool starts collecting data.
  7. Click the Status button to open a window where you can review progress of the Crawler tool.
    • The message Running displays while the Crawler tool is running. During this time, the dialog box displays the number of objects and flows that have been scanned and identified.
  8. The message Finished displays when the operation ends.

Running the Crawler tool in Learning mode

Running the Crawler tool in Learning mode enables the user to scan the web application, collect data about the web application, and review the data before changing the security policy.

  • When you use the Crawler tool in a regular mode, the Crawler tool automatically populates the policy with any new items that it finds.
  • When you run the Crawler tool in Learning mode, it populates the Crawler Learning screen with the new items, instead of directly populating the security policy. You can then review the data and update the security policy as required.

Using the Crawler in Learning mode to update new security policies

You can use the Crawler tool in Learning mode to initialize a new security policy. When when you use the Crawler tool to scan a web application, the tool populates the Crawler Learning screen with all of the data is finds. You can then go through each item and accept it, which then populates the security policy with the accepted objects.

Tip


We recommend that you run the Crawler tool at least twice when you use it to populate a new security policy, and you are running the Crawler tool in Learning mode. The first time you run the Crawler tool, it identifies form fields and other objects in the application. You can then provide the required information before running the Crawler tool again.

Using the Crawler in Learning mode to update existing security policies

You can also use the Crawler tool in Learning mode to update an existing security policy. When updating a security policy, the Crawler tool populates the Crawler Learning screen with only the new objects and items that it finds in the web application. You can then review the entries, and accept or reject them as required.

Important

If you permanently reject an item from the Learning process, the system adds it to the Ignored Items list.

Configuring the Crawler tool to run in Learning mode

When you actually start the Crawler tool, you configure Learning mode for the Crawler tool.

To run the Crawler tool in Learning mode

  1. On the Main tab of the navigation pane, expand the Application Security section, and then click Web Applications.
    The Web Applications list screen opens.
  2. In the Active Policy column, click the security policy name for which you are running the Crawler tool.
    The Policy Properties screen opens.
  3. In the Build Tools section, in the Crawler row, click the Start button.
    The Run Crawler popup screen opens.
  4. Check the Store results in Crawler Learning (No policy update) box.
  5. Click the Run Crawler button.
    The Crawler tool starts.
  6. Click the Status button to open a window where you can review progress of the Crawler tool.
    • The message Running displays while the Crawler tool is running. During this time, the dialog box displays the number of objects and flows that have been scanned and identified.
  7. The message Finished displays when the operation ends.

Using the Policy Browser to collect web application data

The Policy Browser is a Policy Builder tool that collects data about the web application. You can use the information that the Policy Browser collects to enhance the security policy updates by the Crawler tool. You can also use the Policy Browser to overcome browsing obstacles that the Crawler tool encounters.

The Policy Browser browses the application as a client or user would browse the application. The Policy Browser records the user actions and application responses in a file, which you can then configure the Crawler tool to use.

Important

We encourage you to use the Policy Browser extensively, and let it collect as much data as possible about the web application, to later help the Crawler tool create a more accurate security policy.

Downloading and installing the Policy Browser software

Before you can use the Policy Browser tool, you must download and install the Policy Browser software onto your workstation. There are two installation options for the Policy Browser: one for Microsoft® Windows® platforms, and one for Linux platforms. You can also choose to download the Policy Browser software with Java Virtual Machine (JVM) software included, if your current configuration does not include this functionality.

To download and install the Policy Browser software

  1. On the Main tab in the Configuration utility session for the Local Traffic Manager, expand the Overview section, and click Welcome.
    The Welcome screen opens.
  2. In the Downloads section, in the Application Security Policy Browser list, click the link for the download option you want to install.
    A download popup screen opens.
  3. Click Save to save the installation file to your workstation.
  4. When the file transfer is complete, on your workstation double-click the installation file to start the installation, and follow the on-screen instructions.
Note

You can run the Policy Browser from a temporary location, or you can install it locally.

Running the Policy Browser

Once you have installed the Policy Browser software, you can use it to browse the web application. The Policy Browser simulates user actions within the web application, and then records the results of those actions in a data file.

Important

Before you run the Policy Browser, you must configure the Crawler domain and start point for the web application. For configuration details, refer to Configuring a Crawler domain , and Configuring the Start Points component .

To run the Policy Browser

  1. On the Main tab of the navigation pane, expand the Application Security section, and then click Web Applications.
    The Web Applications list screen opens.
  2. In the Active Policy column, click the name of the security policy for which you are running the Policy Browser.
    The Policy Properties screen opens.
  3. In the Build Tools section, in the Browser Recording row, click the Load button.
    The Run Browser Recording popup screen opens.
  4. In the Use Browser output file box, type a path where you want to store the output file. Alternately, click the Browse button, and navigate to the location where you want to store the output file.
  5. Click OK.
    The Policy Browser starts collecting data, and you see status information in the popup screen.
  6. When you see the Finished message, click Close to close the Run Browser Recording popup screen.

Configuring the Crawler tool to use the Policy Browser output file

After you run the Policy Browser, you can run the Crawler tool and instruct it to use the output file that the Policy Browser generated.

To run the Crawler tool using the Policy Browser output file

  1. On the Main tab of the navigation pane, expand the Application Security section, and then click Web Applications.
    The Web Applications list screen opens.
  2. In the Active Policy column, click the name of the security policy for which you are running the Crawler tool.
    The Policy Properties screen opens.
  3. In the Build Tools section, in the Crawler row, click the Start button.
    The Run Crawler popup screen opens.
  4. Select the Run Crawler with Browser output file option.
  5. Optionally, check the Store results in Crawler Learning (No policy update) box if you do not want the Crawler tool to automatically update the security policy. For additional information, refer to Running the Crawler tool in Learning mode .
  6. Click the Run Crawler button.
    The Crawler starts collecting data.
  7. Click the Status button to open a window where you can review progress of the Crawler tool.
    • The message Running displays while the Crawler tool is running. During this time, the dialog box displays the number of objects and flows that have been scanned and identified.
  8. The message Finished displays when the operation ends.



Table of Contents   |   << Previous Chapter   |   Next Chapter >>

Was this resource helpful in solving your issue?




NOTE: Please do not provide personal information.



Incorrect answer. Please try again: Please enter the words to the right: Please enter the numbers you hear:

Additional Comments (optional)