The Learner builds a complete profile of the web site including static requests, web applications and input parameters bu analyzing incoming requests.
To avoid learning from worms, attacks and other unauthorized access the Learner employs a combination of heuristic attack classification, statistics and server responses.
When learning is enabled for the website the Learner keeps analyzing requests until no changes to the resulting policy are recorded. That is, for every 10,000 requests the Learner builds a trial policy, compares it to the former trial policy and records the number of changes. When a configurable number of trial policies in a row (default 30) has not resulted in a number of changes between each trial build exceeding a configurable threshold (default 0) a policy is built.
By default the Learner is configured to generate a short yet fine grained policy. This is achieved by identifying global characteristics of the web site and generating global patterns matching those characteristics. The global patterns typically account for the majority of the web systems content and applications leaving only the "real" web applications to be accounted for by specific web application policy entries.
Applications learned is shown as a 3-level expandable table.
| Applications learned
Expandable: Click to expand. Expands 2 levels. |
Group, URL path and details.
|
| Paths |
Number of unique URL Paths in the group. Applies to: Group level (1). |
| Param |
Number of parameters the application takes as input. If a blue number in parentheses is shown at the left of the number this number indicates how many of the parameters learned that are approved based on the Learner thresholds which are configurable. Parameters that does not exceed one or more threshold values are colored blue while trusted parameters name are black. Applies to: URL path level (2). |
| Class |
Name of input validation class mapped to a parameter. If the parameter is not trusted yet, the class name is blue. Applies to: Detail level (3). |
| Source |
Number of unique IP-addresses requesting the resource. Applies to: Group (1), URL path (2) and Detail level (3). |
| Time |
Number of unique timestamps in requests for the resource. Applies to: Group (1), URL path (2) and Detail level (3). |
| ΔTime (delta time) |
Time difference between the first and last observed request for the resource. Applies to: Group (1), URL path (2) and Detail level (3). |
The Global parameters learned section shows all parameters observed on a number of paths that exceeds the Learner setting Global parameters Path duplication threshold.
Note that the list also includes observed parameter names which are still pending approval based on the Learner threshold settings. The number of approved, or trusted, observations is indicated with black number while a blue number shows the number of non-approved observations.
| Global parameters
Expandable: Click to expand. Expands 1 level. |
Group, URL path and details.
|
| Class |
Name of input validation class mapped to a parameter. Applies to: Parameter name level (1). |
| Paths |
Number of unique URL Paths observed using the parameter. Applies to: Parameter name level (1). |
| Pending |
Number of unique URl Paths using the parameter but where the parameter name is not approved yet - where threshold values is not reached yet. Applies to: Parameter name level (1). |
| Trusted |
Number of unique URl Paths using the parameter where the parameter name is approved - where threshold values is reached. Applies to: Parameter name level (1). |
This section shows all URL Paths to static resources learned. URL Paths are grouped by their extension.
| Static content learned
Expandable: Click to expand. Expands 1 level. |
Extension and URL Paths learned.
|
| Paths |
Number of unique URL Paths in the extension group. Applies to: Extension level (1). |
| Source |
Number of unique IP-addresses requesting the resource. Applies to: Extension (1) and URL path level (2). |
| Time |
Number of unique timestamps in requests for the resource. Applies to: Extension (1) and URL path level (2). |
| ΔTime (delta time) |
Time difference between the first and last observed request for the resource. Applies to: Extension (1) and URL path level (2). |
This contains tools for tidying the learning data set.
| Delete querys by name wildcard
Input field |
Delete learned parameter names using simple wildcard matching.
displays parameter names matching the wildcard below the input field. performs deletion of parameters matching wildcard. |
| Delete querys by data
Input field |
Delete learned parameter names using matching occurrence data. Source Number of IP addresses requesting the resource.
Time Number of unique timestamps in requests for the resource.
ΔTime (delta time) Time difference between the first and last recorded request for the resource.
displays parameter names matching search criteria below the input fields. performs deletion of parameters matching search criteria. |
The lower button bar contains the following buttons.
| Re-analyze data
Button |
To see the effect of deleting selected learning data in the resulting policy section click this button. Wait a few seconds and reload the page. |
| Reset learn data
Button |
Use with caution! When clicking this button and accepting the confirm pop-up window. All learning data for that proxy will be deleted! If learning is enabled the learning and data sampling process will start from scratch. |
The two bars in the top of the page indicates the current state of sampling and verification.
The Learner works in two stages when profiling the website.
Data sampling
This is the process of collecting information about the website in terms of what paths/applications are used, what parameters do they take as input, what extensions are used for static content, etc.
Verification
The verification process 1) validates the data samples using statistical methods like analyzing spread in IP sources and time, number of requests, etc. and 2) verifies that the resulting policy covers the requests sampled.
As the Web Security Manager Learner extracts characteristics like extensions, specific directories in paths and global parameters (parameter names a number of applications take as input - like print=1) and even patterns used in global parameters the verification process may start before the Data sampling progress has reached 100%.
Verification is calculated as the number of sample runs in a row with no policy changes relative to the required number configured in learner settings.
When Verification has reached 100% Web Security Manager will either build and commit a new policy or notify the administrator by email that verification has reached 100% and a new policy can be built and committed.
When a new policy is generated and committed, either automatically or manually, it is added to the Policy history list.
| Policy history |
The policy number. |
| Type |
Automatic or manual (requested by administrator user) |
| Changes |
Click link to see resulting policy and changes compared to the former (if any). |
| Sample run |
The sample run number at which the policy was generated. |
| Web Apps |
The number of entries in the |
| Global URLs |
The number of entries in the |
| Global Parms |
The number of entries in |
| Static |
The number of entries in static file types |
This section shows a sample of the policy resulting from the Learner settings effective.
When the settings are changed the resulting policy sample is rebuilt using the new threshold values. This is done as a background job and depending on the load on the Web Security Manager node and the complexity of the sample data it may take anywhere from e few seconds to a minute or two to build the policy. If the new policy is not visible yet, wait a while and refresh the window.
| Commit to WAF
Button |
Builds a policy which is accessible and editable in the Global Patterns and Web applications windows. When clicked the policy displayed in the table will be committed to the WAF engine, that is: made active for filtering requests. If policy verification has not reached a warning message (two actually) will be displayed asking to confirm the action. Remember: Web Security Manager is a white-list based web application firewall. If the policy put into production does not match real life requests building the policy prematurely (not fully verified) is likely to result in false positives. If verification has not reached 100% it means that it not verified that the policy does not generate false posivites. Have patience and wait for verification to reach 100%. |
| Web applications
Expandable: Click to expand. |
Learned web applications. Expand the item to get a list of applications learned. For each application is shown:
|
| Global URL patterns
Expandable: Click to expand. |
Global URL Path Policy built from learned applications. For each application group (see Section 3.1.1, “Applications learned” below) a regular expression is built which matches all samples in that specific group. Most CMS based web systems have a number of global parameters, like for instance By making global policies that account for all the static content which is served dynamically only "real" web applications with a number of private parameters have to be mapped in detail. Thus the global patterns allows for building a condensed, yet fine grained, policy which also account for future standard content added to the web site. |
| Global parameters
Expandable: Click to expand. |
Global Parameters Policy built from learned applications. Displayed in the format: Depending on the Name grouping threshold value the name can either be a literal string or a regular expression matching a number of parameter names with name and value similarities. The value is displayed as a class name. When the policy is built the corresponding regular expression will be used. |
| Static content allowed extensions
Expandable: Click to expand. |
Learned static path extensions which will be allowed. |
| Static content path allowed characters
Expandable: Click to expand. |
Unique characters and character classes (like 'A' - all international word characters) learned from static path samples. Also the regular expression built to match requests for static content is shown. Note the last set of parantheses preceded
by an escaped period |
The Learner analyzes request samples in chunks of approximately 10,000 requests (or more if the system is very busy) . For each sample run an entry is added to the Sample run information table which shows total and delta values of summarizing the learning process.
| Sample run |
The sample run number. |
| Hits total |
The total number of hits processed during the learning process. |
| URL paths |
Total number of unique URL paths identified. |
| Parameters |
Total number of unique parameter names identified. Uniqueness is determined by URL path. Two parameters with the same name but mapped as belonging to different URL paths are therefore identified as two unique parameters. When the policy is built Web Security Manager identifies parameters with similar names and input data as as global in scope and builds global patterns matching such parameters. |
| Changes |
When the chunk of raw sample data has been processed Web Security Manager builds a policy based on the total sample population. This policy is compared to the policy built in the last sample run and changes are recorded. The number shown is the sum changes recorded to the Web Application Policy ( Click on the number shown to get a change report detailing the changes. |
| ACL |
The number of changes to the |
| GURL |
The number of changes to the |
| Gparm |
The number of changes to |
| Ext |
The number of changes to the |
![]() |
Note |
|---|---|
|
The number of policy changes recorded is calculated with the Learner settings effective when the sample data is analyzed. Whereas the resulting policy (below) is recalculated when the Learner settings are changed this is not the case with the sample run policy builds. It is therefore possible that the two sections show different results. The next sample run is run using the new settings. |
The lower button bar contains the following buttons.
| Re-analyze data
Button |
To see the effect of deleting selected learning data in the resulting policy section click this button. Wait a few seconds and reload the page. |
| Reset learn data
Button |
Use with caution! When clicking this button and accepting the confirm pop-up window. All learning data for that proxy will be deleted! If learning is enabled the learning and data sampling process will start from scratch. |
| Learning
Drop down list |
Enable/disable learning |
| Develop static extensions list
Check box |
Enable / disable static extensions learning. If enabled, Web Security Manager will treat static content separately and develop a static content policy (from learned static content. Default: See Section 1.3.1, “Validate static requests separately” for more information |
| Enable global parameters generation
Check box |
Enable / disable global parameters generation If enabled, Web Security Manager will identify parameters which many or all learned applications have in common. If a (configurable) number of applications takes a specific parameter as input the parameter will be learned as a global parameter and added to the Global Parameters Policy (Section 1.3.4, “Query and Cookie validation”). Default: |
| Enable global parameters name grouping
Check box |
Enable / disable global parameters name grouping. If enabled, Web Security Manager™ will analyze the global parameter names to identify name similarities and build parameter groups based on common characteristics. If the number of parameter names in a group exceeds a configurable threshold a parameter name pattern will be built matching all parameter names in the group. Grouped parameter names with corresponding input validation classes are inserted in the Global Parameters Policy (Section 1.3.4, “Query and Cookie validation”). Default: |
| Develop static extensions list
Check box |
Enable / disable static extensions learning. If enabled, Web Security Manager will treat static content separately and develop a static content policy (from learned static content. Default: See Section 1.3.1, “Validate static requests separately” for more information |
| Enable autostart of policy generation
Check box |
Enable / disable autostart of policy generation. If enabled, when a (configurable) number of sample data chunks has been processed without resulting in a number of policy
changes exceeding thresholds a policy will be generated, the operating mode will automatically be changed to Default: |
| Avoid learning from broken bots
Check box |
Enable / disable checks for broken robots. If enabled, Web Security Manager will try to identify requests originating from robots not behaving correctly. An example is robots that for example maps the URL /index.asp?page=8&print=1 but for some reason translates the print parameter to &print=1 when requesting it. Because the parameter &print is not requested in in general from many different sources it will never exceed threshold values and consequently will not be included in the policy - but it is annoying to look at. Default: |
| Learn from hostile sources (IPs)
Check box |
Enable / disable exclusion of sample data from hostime sources. If disabled, requests from sources from which entries in the deny log classified as attacks also originates will not be included in the sample population used for generating the policy. As all learning samples are valideted against negative policy rules obvious attacks will not be included no matter what the
setting of this option is. But as attackers often "sneak around" trying different probes to detect vulnerability against for
instance SQL injection (entering Default: |
| Auto enable request origin validation (CSRF protection)
Check box |
Enable / disable automatic activation of request origin validation. If Session protection and generation of request form validation tokens (CSRF protection) is enabled (see Section 1.3.7, “Session and CSRF protection”) the Learner will map applications taking input from forms generated by the web system by detecting the validation token parameter (___pffv___) inserted by Web Security Manager and correlating other input parameters to the presense of the validation token. If parameters are detected that are only present in requests where the validation token is also present (like "amount" or "submit") then the application is created in the web applications policy with one of these parameters as "validation parameter" - that is: a parameter which when present in requests will trigger a validation of the request form origin based on the validation token which is tied to the current user session. If Auto enable request origin validation (CSRF protection) is enabled the Learner will map the validation parameter and enable origin validation (CSRF protection) for that application. If disabled the Learner will only map the validation parameter.Default: |
| Keep validation settings for enabled applications
Check box |
Enable / disable automatic overwriting of request origin validation activation settings. This input is only active when Auto enable request origin validation (CSRF protection) is disabled. Suppose you want the Learner to map validation parameters for request origin validation but that you only wan to activate it for certain applications. In order to avoid the Learner overwriting the activation settings for the activated applications next time it develops a policy activate this control. Default: |
| Path duplication threshold
Input field |
Define how many unique paths (applications) are required to take the parameter as for the parameter to be regarded global.
|
| Name grouping threshold
Input field |
Define how many occurrences of a global parameter with similar patterns in name it requires for the generation of a name pattern.
|
Policy verification thresholds allow for granular control of when the Learner will generate a policy or notify by email that thresholds are reached.
| Web application policy changes threshold
Input field |
Define the upper threshold of web application policy changes.
|
| Static content policy changes threshold
Input field |
Define the upper threshold of static content policy changes.
|
| Global parameters policy changes threshold
Input field |
Define the upper threshold of global parameters policy changes.
|
| Global URL patterns policy changes threshold
Input field |
Define the upper threshold of global URL patterns policy changes.
|
| Verification runs
Input field |
The Verification runs threshold controls how many trial policies without changes exceeding the threshold values below are required before a learned policy is considered verified and ready to be committed to WAF. The process of verifying the policy before committing to WAF is important because it reduces the risk of false positives.
|
To avoid learning from worms, attacks and other unauthorized access the Learner employs a combination of heuristic attack classification, statistics and server responses.
The statistic analysis is based on aggregates, delta, min values etc.
The statistics based approving of request samples is divided into approving:
URL Paths based on the URL Paths group membership.
This approval only affects the URL Path, not parameters and associated input values.
Parameters
The approval of URL paths, applications as well as static resources, is based on the URL Path group membership.
The threshold values below control the statistics based approval of groups.
| IP addresses
Input field |
The minimum number of unique IP addresses observed requesting URL paths belonging to a group.
|
| Timestamps
Input field |
The minimum number of unique timestamps observed on requests for URL paths belonging to a group. The timestamp granularity is in seconds.
|
| Time spread
Input field |
The minimum difference in seconds between the first and the last request for URL Paths belonging to the group.
|
The threshold values below control the statistics based approval of query names.
Note that the threshold values applies to unique URL Path/Query combinations.
The term Query name refers to a request parameter name (ie. name=value).
| IP addresses
Input field |
The minimum number of unique IP addresses observed requesting the URL Path/Query combination.
|
| Timestamps
Input field |
The minimum number of unique timestamps observed requesting the URL Path/Query combination. The timestamp granularity is in seconds.
|
| Time spread
Input field |
The minimum difference in seconds between the first and the last request for the URL Path/Query combination.
|
The threshold values below control the statistics based selection of input validation class selection for approved Querys (above).
The term Query value refers to a request parameter value (ie. name=value).
The methods available depend on the license type.
| Determine valid input class using value frequency analysis
Check box + Input fields |
If enabled input validation class selection will be based on value the relative frequency of class samples. This method utilizes that in most cases valid samples will vastly outnumber invalid samples (like for instance attack probes not matching signatures - searching for o'neill for instance to test for proper input handling in forms). Input validation classes are ranked according to possible complexity in input with simple classes having the lowest rank. When input values to a parameter are learned the values are mapped to input validation classes. The higher the rank of the class the more general input is accepted. When the policy is built the class with the highest rank is chosen provided enough samples of the class has been recorded with respect to its relative weight in the sample population in terms of hits, unique IP sources and unique timestamps. Input fields for relative thresholds: Source frequency threshold
Timestamp frequency threshold
Hits frequency threshold
|
| Determine valid input class using value counting
Check box + Input field |
If enabled input validation class selection will be based on value counting. Class samples required for query value Input validation classes are ranked according to possible complexity in input with simple classes having the lowest rank. When input values to a parameter are learned the values are mapped to input validation classes. The higher the rank of the class the more general input is accepted. When the policy is built the class with the highest rank is chosen provided enough samples of the class has been recorded. This threshold is defined by Class samples required for query value.
The lower threshold selected the higher is the risk that a few invalid samples will affect the class selection resulting in a policy that is too lose. |
Learn data sampling settings allow for limiting learn data sampling to specific source IP addresses or specific URL Paths. Similarly it is possible to exclude learning from IP addresses and URL Paths.
| Path - Only learn from the paths below
Check box + Input field |
If enabled the Learner will only record sample data from the URL Paths specified in the input area. In combination with very general global policies it is possible to learn and filter specific applications only.
|
| Path - Do not learn from the paths below
Check box + Input field |
If enabled the Learner will not record sample data from the URL Paths specified in the input area.
|
| IP - Only learn from the IP addresses below
Check box + Input field |
If enabled the Learner will only record sample data from the IP addresses specified in the input area.
|
| IP - Do not learn from the IP addresses below
Check box + Input field |
If enabled the Learner will not record sample data from the IP addresses specified in the input area.
|
| Build policy
Button |
Builds a policy which is accessible and editable in the Global Patterns and Web applications windows. When clicked a confirm dialog is shown with the question: "Disable data sampling and switch to detect mode when a policy is generated?" Select cancel if the Learner should continue the data sampling and learning process and OK if you want the Learner to switch to detect mode. If cancel is selected the built policy willhave no effect but the editing and reporting tools will be available. |
| Reset learn data
Button |
Use with caution! When clicking this button and accepting the confirm pop-up window. All learning data for that proxy will be deleted! If learning is enabled the learning and data sampling process will start from scratch. |
| Default values |
Revert to default values. |
| Save settings |
Click to save settings. |