Web scraping is a technique for extracting information from web sites that often uses automated programs, or bots (short for web robots), opening many sessions, or initiating many transactions. You can configure Application Security Manager™ (ASM) to detect and prevent various web scraping activities on web sites that it is protecting.
ASM™ provides the following methods to address web scraping attacks. These methods can work independently of each other, or work together to detect and prevent web scraping attacks.
The BIG-IP® system can accurately detect such anomalies only when response caching is turned off.
For web scraping detection to work properly, you should understand the following prerequisites:
|Rate Limiting||When enabled, the system drops sessions from suspicious IP addresses after determining that the client is an illegal script. If you select this option, the screen displays an option for dropping requests from IP addresses with a bad reputation.|
|Drop IP Addresses with bad reputation||When enabled, the system drops requests originating from IP addresses that are in the system’s IP address intelligence database when the attack is detected; no rate limiting will occur. (Attacking IP addresses that do not have a bad reputation undergo rate limiting, as usual.) This option is available only if you have enabled rate limiting. You also need to set up IP address intelligence, and at least one of the IP intelligence categories must have its Alarm flag enabled.|
|Sessions opened per second increased by||The system considers traffic to be an attack if the number of sessions opened per second increased by this percentage. The default value is 500%.|
|Sessions opened per second reached||The system considers traffic to be an attack if the number of sessions opened per second is equal to or greater than this number. The default value is 50 sessions opened per second.|
|Minimum sessions opened per second threshold for detection||The system only considers traffic to be an attack if this value plus one of the sessions opened values is exceeded. The default value is 25 sessions opened per second.|
|Session transactions increased by||The system considers traffic to be an attack if the number of transactions per session increased by this percentage. The default value is 500%.|
|Sessions transactions reached||The system considers traffic to be an attack if the number transactions per sessions is equal to or greater than this number. The default value is 400 transactions.|
|Minimum session transactions threshold for detection||The system considers traffic to be an attack only if the number of transactions per session is equal to or greater than this number, and at least one of the sessions transactions numbers was exceeded. The default value is 200 transactions.|
This figure shows a Web Scraping Statistics event log where a persistent storage violation occurred. You can click the attack type to view additional details about the cause of the attack (as shown in the figure).
The figure shows that a web scraping attack occurred on 2-19-2013 from 10:08 to 10:13. It was caused by too many session resets (more than 10 in 93 seconds) and inconsistencies (more than 3 in 60 seconds).
Web scraping statistics specify the attack type so you have more information about why the attack occurred. This shows the web scraping attack types that can display in the web scraping event log.
|Bot Detection||Indicates that the system suspects that the web scraping attack was caused by a web robot.|
|Session Opening Anomaly by IP||Indicates that the web scraping attack was caused by too many sessions being opened from one IP address. Click the attack type link to display the number of sessions opened per second from the IP address, the number of legitimate sessions, and the attack prevention state.|
|Session Resets by Persistent Client Identification||Indicates that the web scraping attack was caused by too many session resets or inconsistencies occurring within a specified time. Click the attack type link to display the number of resets or inconsistencies that occurred within a number of seconds.|
|Transactions per session anomaly||Indicates that the web scraping attack was caused by too many transactions being opened during one session. Click the attack type link to display the number of transactions detected on the session.|
Click the attack type link to display the detected injection ratio and the injection
Note: You cannot configure the Transparent Mode CS injection ratio values. This attack type can occur only when the security policy is in Transparent mode.
When you have completed the steps in this implementation, you have configured the Application Security Manager™ to protect against web scraping. Depending on your configuration, the system detects web scraping attacks based on bot detection, session opening violations, or session transaction violations.
After traffic is flowing to the system, you can check whether web scraping attacks are being logged or prevented, and investigate them by viewing web scraping event logs and statistics.
If you chose alarm and block for the web scraping configuration and the security policy is in the blocking operation mode, the system drops requests that cause the Web scraping detected violation. If you chose alarm only (or the policy is in the transparent mode), web scraping attacks are logged only but not blocked.