3. Learning

The Learner builds a complete profile of the web site including static requests, web applications and input parameters bu analyzing incoming requests.

To avoid learning from worms, attacks and other unauthorized access the Learner employs a combination of heuristic attack classification, statistics and server responses.

When learning is enabled for the website the Learner keeps analyzing requests until no changes to the resulting policy are recorded. That is, for every 10,000 requests the Learner builds a trial policy, compares it to the former trial policy and records the number of changes. When a configurable number of trial policies in a row (default 30) has not resulted in a number of changes between each trial build exceeding a configurable threshold (default 0) a policy is built.

By default the Learner is configured to generate a short yet fine grained policy. This is achieved by identifying global characteristics of the web site and generating global patterns matching those characteristics. The global patterns typically account for the majority of the web systems content and applications leaving only the "real" web applications to be accounted for by specific web application policy entries.

3.1. Learning data

3.1.1. Applications learned

Applications learned is shown as a 3-level expandable table.

Applications learned

Expandable: Click + to expand.

Expands 2 levels.

Group, URL path and details.

Application group (level 1)

Applications are divided into groups based on path characteristics. The group name reflects the characteristics of the group. The most common grouping criteria is the file extension. But also the appearance of special characters like '$' or '.' in the path is used as grouping criteria.

Applications URL paths (level 2)

When a group is expanded the URL paths in that group is listed. Each URL path is an application learned. Note that this list also contains "simple" applications, applications that only takes global parameters as input, and therefore potentially can be very long.

Application details (level 3)

When an application URL path is expanded the details learned about that specific application is shown.

Paths

Number of unique URL Paths in the group.

Applies to: Group level (1).

Param

Number of parameters the application takes as input.

If a blue number in parentheses is shown at the left of the number this number indicates how many of the parameters learned that are approved based on the Learner thresholds which are configurable.

Parameters that does not exceed one or more threshold values are colored blue while trusted parameters name are black.

Applies to: URL path level (2).

Class

Name of input validation class mapped to a parameter.

If the parameter is not trusted yet, the class name is blue.

Applies to: Detail level (3).

Source

Number of unique IP-addresses requesting the resource.

Applies to: Group (1), URL path (2) and Detail level (3).

Time

Number of unique timestamps in requests for the resource.

Applies to: Group (1), URL path (2) and Detail level (3).

ΔTime (delta time)

Time difference between the first and last observed request for the resource.

Applies to: Group (1), URL path (2) and Detail level (3).

3.1.1.1. Deleting applications or corresponding parameters

To delete a learned application or a corresponding parameter expand to the level desired and click the red X.

3.1.2. Global parameters learned

The Global parameters learned section shows all parameters observed on a number of paths that exceeds the Learner setting Global parameters Path duplication threshold.

Note that the list also includes observed parameter names which are still pending approval based on the Learner threshold settings. The number of approved, or trusted, observations is indicated with black number while a blue number shows the number of non-approved observations.

Global parameters

Expandable: Click + to expand.

Expands 1 level.

Group, URL path and details.

Parameter name (level 1)

Name of the parameter

Applications URL paths (level 2)

When Global parameter is expanded a list of URL paths which are observed taking the parameter as input is shown.

Class

Name of input validation class mapped to a parameter.

Applies to: Parameter name level (1).

Paths

Number of unique URL Paths observed using the parameter.

Applies to: Parameter name level (1).

Pending

Number of unique URl Paths using the parameter but where the parameter name is not approved yet - where threshold values is not reached yet.

Applies to: Parameter name level (1).

Trusted

Number of unique URl Paths using the parameter where the parameter name is approved - where threshold values is reached.

Applies to: Parameter name level (1).

3.1.3. Static content learned

This section shows all URL Paths to static resources learned. URL Paths are grouped by their extension.

Static content learned

Expandable: Click + to expand.

Expands 1 level.

Extension and URL Paths learned.

Extension (level 1)

The static content policy is based on allowing extension and URL Path based on characters in the URl path.

To be included in the static content policy, static resources must therefore have a file extension. A case where natural URLs are pointing to static content is handled by the Learner by building Global URL policies.

Static content URL paths (level 2)

When an extension is expanded the URL paths in that extension group is listed.

Paths

Number of unique URL Paths in the extension group.

Applies to: Extension level (1).

Source

Number of unique IP-addresses requesting the resource.

Applies to: Extension (1) and URL path level (2).

Time

Number of unique timestamps in requests for the resource.

Applies to: Extension (1) and URL path level (2).

ΔTime (delta time)

Time difference between the first and last observed request for the resource.

Applies to: Extension (1) and URL path level (2).

3.1.3.1. Deleting static content extensions

To delete a static content extension (a group) click the red X in the list.

3.1.4. Tools

This contains tools for tidying the learning data set.

Delete querys by name wildcard

Input field

Delete learned parameter names using simple wildcard matching.

Valid input

A string or a simple wildcard.

Use the following characters to specify wildcards:

* = any string any length.

? = one ocurrence of any character.

Input example

http://* - matches all querys (parameter names) beginning with http://

Default value

<none>

Preview displays parameter names matching the wildcard below the input field.

Delete performs deletion of parameters matching wildcard.

Delete querys by data

Input field

Delete learned parameter names using matching occurrence data.

Source

Number of IP addresses requesting the resource.

Valid input

number in range 0 -

Input example

10 - Querys requested by 10 or less IP addresses.

Default value

<none>

Time

Number of unique timestamps in requests for the resource.

Valid input

number in range 0 -

Input example

10 - Querys requested in a maximum of 10 intervals of 1 second.

Default value

<none>

ΔTime (delta time)

Time difference between the first and last recorded request for the resource.

Valid input

Time interval specified in seconds.

number in range 0 -

Input example

86400 - Querys with a recorded difference between first and last request of maximum 24 hours (24 * 60 * 60).

Default value

<none>

Preview displays parameter names matching search criteria below the input fields.

Delete performs deletion of parameters matching search criteria.

3.1.5. Lower button bar

The lower button bar contains the following buttons.

Re-analyze data

Button

To see the effect of deleting selected learning data in the resulting policy section click this button. Wait a few seconds and reload the page.
Reset learn data

Button

Use with caution!

When clicking this button and accepting the confirm pop-up window.

All learning data for that proxy will be deleted!

If learning is enabled the learning and data sampling process will start from scratch.

3.2. Learning status

3.2.1. Learning progress indicators

The two bars in the top of the page indicates the current state of sampling and verification.

The Learner works in two stages when profiling the website.

  1. Data sampling

    This is the process of collecting information about the website in terms of what paths/applications are used, what parameters do they take as input, what extensions are used for static content, etc.

  2. Verification

    The verification process 1) validates the data samples using statistical methods like analyzing spread in IP sources and time, number of requests, etc. and 2) verifies that the resulting policy covers the requests sampled.

    As the Web Security Manager Learner extracts characteristics like extensions, specific directories in paths and global parameters (parameter names a number of applications take as input - like print=1) and even patterns used in global parameters the verification process may start before the Data sampling progress has reached 100%.

    Verification is calculated as the number of sample runs in a row with no policy changes relative to the required number configured in learner settings.

    When Verification has reached 100% Web Security Manager will either build and commit a new policy or notify the administrator by email that verification has reached 100% and a new policy can be built and committed.

3.2.2. Policy history

When a new policy is generated and committed, either automatically or manually, it is added to the Policy history list.

Policy history

The policy number.

Type

Automatic or manual (requested by administrator user)

Changes

Click link to see resulting policy and changes compared to the former (if any).

Sample run

The sample run number at which the policy was generated.

Web Apps

The number of entries in the Web Application Policy.

Global URLs

The number of entries in the Global URL Policy.

Global Parms

The number of entries in Global Parameter Policy.

Static

The number of entries in static file types Static Content Policy.

3.2.3. Resulting policy

This section shows a sample of the policy resulting from the Learner settings effective.

When the settings are changed the resulting policy sample is rebuilt using the new threshold values. This is done as a background job and depending on the load on the Web Security Manager node and the complexity of the sample data it may take anywhere from e few seconds to a minute or two to build the policy. If the new policy is not visible yet, wait a while and refresh the window.

Commit to WAF

Button

Builds a policy which is accessible and editable in the Global Patterns and Web applications windows.

When clicked the policy displayed in the table will be committed to the WAF engine, that is: made active for filtering requests.

If policy verification has not reached a warning message (two actually) will be displayed asking to confirm the action. Remember: Web Security Manager is a white-list based web application firewall. If the policy put into production does not match real life requests building the policy prematurely (not fully verified) is likely to result in false positives. If verification has not reached 100% it means that it not verified that the policy does not generate false posivites. Have patience and wait for verification to reach 100%.

Web applications

Expandable: Click + to expand.

Learned web applications.

Expand the item to get a list of applications learned. For each application is shown:

  • URL path

  • Methods learned

  • Parameters.

    Parameters are shown as name, value pairs where the value is the name of the input validation class learned for that parameter.

    Note that only the applications private parameters are shown here. Parameters which the application have in common with other applications are included in the Global parameters list.

Global URL patterns

Expandable: Click + to expand.

Global URL Path Policy built from learned applications.

For each application group (see Section 3.1.1, “Applications learned” below) a regular expression is built which matches all samples in that specific group.

Most CMS based web systems have a number of global parameters, like for instance print=1, which can be appended to most requests. Without the combination of Global URL Path Policy and Global Parameters Policy pages with static content that take global parameters, like index.php?print=1, would be learned as web applications and the URL paths would have to be added to the policy as web applications. This can potentially result in a huge policy which is never up to date because new content is added all the time.

By making global policies that account for all the static content which is served dynamically only "real" web applications with a number of private parameters have to be mapped in detail.

Thus the global patterns allows for building a condensed, yet fine grained, policy which also account for future standard content added to the web site.

Global parameters

Expandable: Click + to expand.

Global Parameters Policy built from learned applications.

Displayed in the format:

name = value

Depending on the Name grouping threshold value the name can either be a literal string or a regular expression matching a number of parameter names with name and value similarities.

The value is displayed as a class name. When the policy is built the corresponding regular expression will be used.

Static content allowed extensions

Expandable: Click + to expand.

Learned static path extensions which will be allowed.

Static content path allowed characters

Expandable: Click + to expand.

Unique characters and character classes (like 'A' - all international word characters) learned from static path samples.

Also the regular expression built to match requests for static content is shown. Note the last set of parantheses preceded by an escaped period \.(\w+). This part will be matched with the list of allowed extensions to determine if the extension is alowed.

3.2.4. Sample run information

The Learner analyzes request samples in chunks of approximately 10,000 requests (or more if the system is very busy) . For each sample run an entry is added to the Sample run information table which shows total and delta values of summarizing the learning process.

Sample run

The sample run number.

Hits total

The total number of hits processed during the learning process.

URL paths

Total number of unique URL paths identified.

Parameters

Total number of unique parameter names identified. Uniqueness is determined by URL path. Two parameters with the same name but mapped as belonging to different URL paths are therefore identified as two unique parameters. When the policy is built Web Security Manager identifies parameters with similar names and input data as as global in scope and builds global patterns matching such parameters.

Changes

When the chunk of raw sample data has been processed Web Security Manager builds a policy based on the total sample population. This policy is compared to the policy built in the last sample run and changes are recorded.

The number shown is the sum changes recorded to the Web Application Policy (ACL), Global URL Policy (GURL), Global Parameter Policy (GParm) and the Static Content Policy (EXT).

Click on the number shown to get a change report detailing the changes.

ACL

The number of changes to the Web Application Policy compared to the sample run before.

GURL

The number of changes to the Global URL Policy compared to the sample run before.

Gparm

The number of changes to Global Parameter Policy compared to the sample run before.

Ext

The number of changes to the Static Content Policy compared to the sample run before.

[Note] Note

The number of policy changes recorded is calculated with the Learner settings effective when the sample data is analyzed. Whereas the resulting policy (below) is recalculated when the Learner settings are changed this is not the case with the sample run policy builds. It is therefore possible that the two sections show different results. The next sample run is run using the new settings.

3.2.5. Lower button bar

The lower button bar contains the following buttons.

Re-analyze data

Button

To see the effect of deleting selected learning data in the resulting policy section click this button. Wait a few seconds and reload the page.
Reset learn data

Button

Use with caution!

When clicking this button and accepting the confirm pop-up window.

All learning data for that proxy will be deleted!

If learning is enabled the learning and data sampling process will start from scratch.

3.3. Learning settings

3.3.1. Policy generation options

Learning

Drop down list

Enable/disable learning

Develop static extensions list

Check box

Enable / disable static extensions learning.

If enabled, Web Security Manager will treat static content separately and develop a static content policy (from learned static content.

Default: <enabled>

See Section 1.3.1, “Validate static requests separately” for more information

Enable global parameters generation

Check box

Enable / disable global parameters generation

If enabled, Web Security Manager will identify parameters which many or all learned applications have in common. If a (configurable) number of applications takes a specific parameter as input the parameter will be learned as a global parameter and added to the Global Parameters Policy (Section 1.3.4, “Query and Cookie validation”).

Default: <enabled>

Enable global parameters name grouping

Check box

Enable / disable global parameters name grouping.

If enabled, Web Security Manager™ will analyze the global parameter names to identify name similarities and build parameter groups based on common characteristics.

If the number of parameter names in a group exceeds a configurable threshold a parameter name pattern will be built matching all parameter names in the group.

Grouped parameter names with corresponding input validation classes are inserted in the Global Parameters Policy (Section 1.3.4, “Query and Cookie validation”).

Default: <enabled>

Develop static extensions list

Check box

Enable / disable static extensions learning.

If enabled, Web Security Manager will treat static content separately and develop a static content policy (from learned static content.

Default: <enabled>

See Section 1.3.1, “Validate static requests separately” for more information

Enable autostart of policy generation

Check box

Enable / disable autostart of policy generation.

If enabled, when a (configurable) number of sample data chunks has been processed without resulting in a number of policy changes exceeding thresholds a policy will be generated, the operating mode will automatically be changed to Detect and the Learner will stop collecting data samples.

Default: <enabled>

Avoid learning from broken bots

Check box

Enable / disable checks for broken robots.

If enabled, Web Security Manager will try to identify requests originating from robots not behaving correctly. An example is robots that for example maps the URL /index.asp?page=8&print=1 but for some reason translates the print parameter to &amp;print=1 when requesting it. Because the parameter &amp;print is not requested in in general from many different sources it will never exceed threshold values and consequently will not be included in the policy - but it is annoying to look at.

Default: <enabled>

Learn from hostile sources (IPs)

Check box

Enable / disable exclusion of sample data from hostime sources.

If disabled, requests from sources from which entries in the deny log classified as attacks also originates will not be included in the sample population used for generating the policy.

As all learning samples are valideted against negative policy rules obvious attacks will not be included no matter what the setting of this option is. But as attackers often "sneak around" trying different probes to detect vulnerability against for instance SQL injection (entering O'Neill instead of 'or 1=1) chances are that the classes mapped for input validation becomes looser than they have to. Disabling learning from hostile sources reduces the likelihood that this will happen.

Default: <disabled>

Auto enable request origin validation (CSRF protection)

Check box

Enable / disable automatic activation of request origin validation.

If Session protection and generation of request form validation tokens (CSRF protection) is enabled (see Section 1.3.7, “Session and CSRF protection”) the Learner will map applications taking input from forms generated by the web system by detecting the validation token parameter (___pffv___) inserted by Web Security Manager and correlating other input parameters to the presense of the validation token.

If parameters are detected that are only present in requests where the validation token is also present (like "amount" or "submit") then the application is created in the web applications policy with one of these parameters as "validation parameter" - that is: a parameter which when present in requests will trigger a validation of the request form origin based on the validation token which is tied to the current user session.

If Auto enable request origin validation (CSRF protection) is enabled the Learner will map the validation parameter and enable origin validation (CSRF protection) for that application. If disabled the Learner will only map the validation parameter.

Default: <disabled>

Keep validation settings for enabled applications

Check box

Enable / disable automatic overwriting of request origin validation activation settings.

This input is only active when Auto enable request origin validation (CSRF protection) is disabled.

Suppose you want the Learner to map validation parameters for request origin validation but that you only wan to activate it for certain applications. In order to avoid the Learner overwriting the activation settings for the activated applications next time it develops a policy activate this control.

Default: <disabled>

3.3.2. Global parameters

Path duplication threshold

Input field

Define how many unique paths (applications) are required to take the parameter as for the parameter to be regarded global.

Valid input

Number of paths

Input example

5

Default value

3

Name grouping threshold

Input field

Define how many occurrences of a global parameter with similar patterns in name it requires for the generation of a name pattern.

Valid input

Number of parameters

Input example

5

Default value

3

3.3.3. Policy verification

Policy verification thresholds allow for granular control of when the Learner will generate a policy or notify by email that thresholds are reached.

Web application policy changes threshold

Input field

Define the upper threshold of web application policy changes.

Valid input

Number of changes to the web application policy.

Input example

0

Default value

0

Static content policy changes threshold

Input field

Define the upper threshold of static content policy changes.

Valid input

Number of changes to the static content policy.

Input example

0

Default value

0

Global parameters policy changes threshold

Input field

Define the upper threshold of global parameters policy changes.

Valid input

Number of changes to the global parameters policy.

Input example

0

Default value

0

Global URL patterns policy changes threshold

Input field

Define the upper threshold of global URL patterns policy changes.

Valid input

Number of changes to the global URL patterns policy.

Input example

0

Default value

0

Verification runs

Input field

The Verification runs threshold controls how many trial policies without changes exceeding the threshold values below are required before a learned policy is considered verified and ready to be committed to WAF.

The process of verifying the policy before committing to WAF is important because it reduces the risk of false positives.

Valid input

Number of trial policies built in a row without changes.

Input example

30

Default value

20

3.3.4. Learning thresholds

To avoid learning from worms, attacks and other unauthorized access the Learner employs a combination of heuristic attack classification, statistics and server responses.

The statistic analysis is based on aggregates, delta, min values etc.

The statistics based approving of request samples is divided into approving:

  1. URL Paths based on the URL Paths group membership.

    This approval only affects the URL Path, not parameters and associated input values.

  2. Parameters

3.3.4.1. Path groups

The approval of URL paths, applications as well as static resources, is based on the URL Path group membership.

The threshold values below control the statistics based approval of groups.

IP addresses

Input field

The minimum number of unique IP addresses observed requesting URL paths belonging to a group.

Valid input

Number

Input example

500

Default value

100

Timestamps

Input field

The minimum number of unique timestamps observed on requests for URL paths belonging to a group.

The timestamp granularity is in seconds.

Valid input

A number in the interval 0 - 9999999999.

Input example

300

Default value

100

Time spread

Input field

The minimum difference in seconds between the first and the last request for URL Paths belonging to the group.

Valid input

A number in the interval 0 - 9999999999.

Input example

259200 (3 days)

Default value

36000 (ten hours)

3.3.4.2. Query names

The threshold values below control the statistics based approval of query names.

Note that the threshold values applies to unique URL Path/Query combinations.

The term Query name refers to a request parameter name (ie. name=value).

IP addresses

Input field

The minimum number of unique IP addresses observed requesting the URL Path/Query combination.

Valid input

Number

Input example

100

Default value

100

Timestamps

Input field

The minimum number of unique timestamps observed requesting the URL Path/Query combination.

The timestamp granularity is in seconds.

Valid input

A number in the interval 0 - 9999999999.

Input example

100

Default value

100

Time spread

Input field

The minimum difference in seconds between the first and the last request for the URL Path/Query combination.

Valid input

A number in the interval 0 - 9999999999.

Input example

36000 (ten hours)

Default value

36000 (ten hours)

3.3.4.3. Input validation class selection for query values

The threshold values below control the statistics based selection of input validation class selection for approved Querys (above).

The term Query value refers to a request parameter value (ie. name=value).

The methods available depend on the license type.

Determine valid input class using value frequency analysis

Check box +

Input fields

If enabled input validation class selection will be based on value the relative frequency of class samples.

This method utilizes that in most cases valid samples will vastly outnumber invalid samples (like for instance attack probes not matching signatures - searching for o'neill for instance to test for proper input handling in forms).

Input validation classes are ranked according to possible complexity in input with simple classes having the lowest rank.

When input values to a parameter are learned the values are mapped to input validation classes. The higher the rank of the class the more general input is accepted.

When the policy is built the class with the highest rank is chosen provided enough samples of the class has been recorded with respect to its relative weight in the sample population in terms of hits, unique IP sources and unique timestamps.

Input fields for relative thresholds:

Source frequency threshold

Valid input

A value in the interval 0.0 - 99.9

Input example

1.5

Default value

1.0

Timestamp frequency threshold

Valid input

A value in the interval 0.0 - 99.9

Input example

1.5

Default value

1.0

Hits frequency threshold

Valid input

A value in the interval 0.0 - 99.9

Input example

1.5

Default value

1.0

Determine valid input class using value counting

Check box +

Input field

If enabled input validation class selection will be based on value counting.

Class samples required for query value

Input validation classes are ranked according to possible complexity in input with simple classes having the lowest rank.

When input values to a parameter are learned the values are mapped to input validation classes. The higher the rank of the class the more general input is accepted.

When the policy is built the class with the highest rank is chosen provided enough samples of the class has been recorded. This threshold is defined by

Class samples required for query value.

Valid input

A number in the interval 0 - 9999999999.

Input example

10

Default value

1

The lower threshold selected the higher is the risk that a few invalid samples will affect the class selection resulting in a policy that is too lose.

3.3.5. Learn data sampling

Learn data sampling settings allow for limiting learn data sampling to specific source IP addresses or specific URL Paths. Similarly it is possible to exclude learning from IP addresses and URL Paths.

Path - Only learn from the paths below

Check box +

Input field

If enabled the Learner will only record sample data from the URL Paths specified in the input area.

In combination with very general global policies it is possible to learn and filter specific applications only.

Valid input

One or more URL path regular expresions separated by new-line.

Input example

/cgi-bin/.*

Default value

<none>

Path - Do not learn from the paths below

Check box +

Input field

If enabled the Learner will not record sample data from the URL Paths specified in the input area.

Valid input

One or more URL path regular expressions separated by new-line.

Input example

/admin/.*

Default value

<none>

IP - Only learn from the IP addresses below

Check box +

Input field

If enabled the Learner will only record sample data from the IP addresses specified in the input area.

Valid input

IP address with net mask (IP/mask) in CIDR notation

Input example

192.168.0.8/32 - the IP address 192.168.0.8

192.168.0.0/24 - IP addresses 192.168.0.0 - 255

192.168.0.8/29 - IP addresses 192.168.0.8-15

Default value

<none>

IP - Do not learn from the IP addresses below

Check box +

Input field

If enabled the Learner will not record sample data from the IP addresses specified in the input area.

Valid input

IP address with net mask (IP/mask) in CIDR notation

Input example

192.168.0.8/32 - the IP address 192.168.0.8

192.168.0.0/24 - IP addresses 192.168.0.0 - 255

192.168.0.8/29 - IP addresses 192.168.0.8-15

Default value

<none>

3.3.6. Lower button bar

Build policy

Button

Builds a policy which is accessible and editable in the Global Patterns and Web applications windows.

When clicked a confirm dialog is shown with the question:

"Disable data sampling and switch to detect mode when a policy is generated?"

Select cancel if the Learner should continue the data sampling and learning process and OK if you want the Learner to switch to detect mode.

If cancel is selected the built policy willhave no effect but the editing and reporting tools will be available.

Reset learn data

Button

Use with caution!

When clicking this button and accepting the confirm pop-up window.

All learning data for that proxy will be deleted!

If learning is enabled the learning and data sampling process will start from scratch.

Default values

Revert to default values.

Save settings

Click Save settings to save settings.

© 2005 - 2012 Alert Logic inc.