Table of Contents
Access Control Policies define a list of allowed requests and parameters against a given web system to which access is filtered by Profense™.
An Access Policy or ACL (Access Control List) is defined by a collection of proxy global policies and application specific policies . This mix provides the ability to specify short yet fine grained access control policies:
These are general rules which specify criteria for allowing requests on a proxy global basis. Rules are specified by extension and by specifying a grammar (using regular expressions) for valid URLs and parameters.
Global patterns include Static content policies, Global URL policies and Global parameters policies.
In access policy terms a web application is defined as an URL path which takes one or more parameters as input.
The web application policy list consists of one or more URL paths each with a specific policy, a web application policy entry.
The web application policy entry is defined by its URL path and valid input for one or more of the URLs parameters are defined using either a list of allowed values, grammar (a regular expression) or a class which is a predefined regular expression.
Web application policy entries always take precedence over global rules. It is perfectly possible though to utilize a mix of global and specific rules - even for a single application.
Incoming requests are validated in the following order:
Static content policy
If the extension and path of the requested filename matches the policy defined in Static content policy and the request has no parameters, the request is allowed.
Global URL path policy
If the request has no parameters and one of the global URL policy patterns matches it it is allowed.
Web applications policy
If the request (including possible parameters) matches an entry in the detailed web application policy it is allowed.
Web applications policy + global parameters policy:
If a request matches an entry in the web applications policy but one or more parameters are offending, these parameters are checked against the global parameters policy.
If there is a combined match the request is allowed.
Global URL policy + global parameters policy:
If a requested URL with parameters matches a global URL policy pattern and all supplied parameters match global parameter patterns the request is allowed.
In Auto mode Global URL policy + global parameters + negative check for unknown parameters:
In Auto mode, if a query (a request
parameter) does not match any rules it is validated using negative
signature based policy rules. If it allowed it is added to the
learning sample population and when enough samples are recorded the
parameter is included in the positive policy.
No match:
The request is denied.
As mentioned above an access policy for a proxy is composed of proxy global policies and application specific policies . In addition to these elements a class element is available. The class elements are predefined regular expressions which are defined on a proxy global basis but can be used for specifying ACL list entries.
Requests are validated against the policy elements in the order and precedence described in Section 2, “Validation order and precedence”
The Static content policy allows requests without parameters based on file extension (i.e. .gif) and allowed path characters .
The file extension is defined as a list of comma separated values.
When initial policy configuration with rules is selected the default list is: <css,png,ico,jpg,js,jpeg,gif,swf>.
Allowed path characters are defined by selecting them on a list.
The letter A denotes all international alphanumeric characters and other characters are represented by their glyph, their UTF-8 number and a description.
When initial policy configuration with rules is selected the default allowed path characters are
Hyphen-minus ("-", UTF-8: 2d)
All international alphanumeric
Space (" ", UTF-8: 20)
The Learner automatically generates a static content policy.
As static content is not supposed to have any parameters (hence
the denotation "static") only requests without parameters and with the
method GET are validated against this rule.
In order to simplify the development of the access policy URL matching criteria can be specified on a proxy global basis.
Global URLs are defined using regular expressions.
For examples please refer to Table 3.6, “Examples of global URL regular expressions”
The Learner automatically generates global url patterns.
In the global parameters section, parameters which all or many URLs have in common can be added. For instance in many CM systems an URL can be viewed in a printer friendly version by adding a specific parameter to the URL.
When adding parameters to the list both the name and the value of the parameter is interpreted by Profense™ as regular expressions. Like with the global URL-regular expressions full match from start to end is implied.
For examples please refer to Table 3.8, “Examples of global parameters regular expressions”
The Learner automatically generates global parameters patterns.
Classes are useful when a user wants to use a predeclared set of criteria used by Profense™ for input request validation. Eg. if you have lots of HTML forms that use an input field "email", you can define a class and a regular expression which defines what a valid e-mail address is. This class can then be used throughout the entire ACL.
When a class is changed, all affected ACL
entries are automatically updated to reflect the change.
Regular expressions are used to define classes. Full match is implied for each regular expression, meaning that each will match from the start to the end of the request (a caret ^ and dollar $ will be appended if not already present).
The Learner
makes excessive use of classes.
The classes are ranked by complexity in ascending order. I.e.
num, alphanumeric,
text, etc. The class order is used by the Learner to
find a class which matches all samples of for instance an input
parameter.
|
Note |
|---|---|
|
Classes cannot be changed or deleted when the proxy is running
in |
Profense™ comes with a list of predefined classes.
Profense™ has full support for standard PCRE (Perl Compatible Regular Expressions).
Following below is a brief regular expression "survival guide". For a more thorough explanation of the subject some links and books are recommended at the end of the section.
A regular expression is a formula for matching strings that follow some pattern.
Regular expressions are made up of normal characters and special characters. Normal characters include upper and lower case letters and digits. The characters with special meanings and are described in detail below.
In the simplest case, a regular expression looks like a standard text string. For example, the regular expression "john" contains no special characters. It will match "john" and "john doe" but it will not match "John".
In an input validation context we always want the expression to match the whole string. The expression above would now be expressed as ^john$, where the characters ^ and $ means starting of line and end of line. Now john will only match "john" but not "john doe". To obtain match of "john doe" as well as "john smith" etc. we employ a few more simple special characters. In its simplest form "john lastname " could be expressed as "^john.*$" meaning: A string starting with the characters "john" followed by zero or more (the "*") occurrences of any character (the "."). For those familiar with the simple wild-card character "*" in (a.o.) DOS and Unix, ".*" equals "*" - that is: anything .
Specifying anything is not very useful in an input validation context. With regular expressions much more fine grained input validation masks can be defined with the rich set of meta characters, character classes, repetition quantifiers, etc.
A brief explanation with some examples follows below.
^
|
Beginning of string (implied in Profense™) |
$
|
End of string (implied in Profense™) |
.
|
Any character except newline |
*
|
Match 0 or more times |
+
|
Match 1 or more times |
?
|
Match 0 or 1 times; or: shortest match quantifier (i.e. *?) |
|
|
Alternative (like logical OR) |
()
|
Grouping |
[]
|
Set of characters (a list of characters) |
{}
|
Repetition modifier |
\
|
Quote or special |
Table 3.1. Metacharacters in regular expressions
To present a metacharacter as a data character standing for itself, precede it with \ (e.g. \. matches the full stop character "." only).
|
Note |
|---|---|
|
In Profense™ all regular expressions are forced to match the entire string (URL path or parameter value) by automatically prefixing an expression with "^" and suffixing it with "$". |
a*
|
Zero or more a's |
a+
|
One or more a's |
a?
|
Zero or one a's (i.e., optional a) |
a{m}
|
Exactly m a's |
a{m,}
|
At least m a's |
a{m,n}
|
At least m but at most n a's |
repetition?
|
Same as repetition but the shortest match is taken |
Table 3.2. Repetition in regular expressions
Read "a's" as "occurrences of strings, each of which matches the pattern a".
Read repetition as any of the repetition expressions listed above it.
Shortest match means that the shortest string matching the pattern is taken. The default is "greedy matching", which finds the longest match.
\t
|
tab |
\n
|
newline |
\r
|
return (CR) |
\xhh
|
character with hex. code hh |
\b
|
"word" boundary (zero space assertion) |
\B
|
not a "word" boundary |
\w
|
matches any single international character classified as a "word" character (alphanumeric or _). Examples: A, z, 1, 9, Æ, â |
\W
|
matches any non-"word" character |
\s
|
matches any whitespace character (space, tab, newline) |
\S
|
matches any non-whitespace character |
\d
|
matches any digit character, equiv. to [0-9] |
\D
|
matches any non-digit character |
\pN
|
Matches any UNICODE character classified as numeric |
Table 3.3. Notations with \ in Profense™ regular expressions
A character set is denoted by [...]. Different meanings apply inside a character set ("character class") so that, instead of the normal rules given here, the following apply:
[characters]
|
matches any of the characters in the list (c,h,a,r,a,c,t,e,r,s) |
[x-y]
|
matches any of the characters from x to y (inclusively) in the ASCII code |
[\-]
|
matches the hyphen character - |
[\n]
|
matches the newline; other single character denotations with \ apply normally, too |
[^something]
|
Negation. Matches any character except those that [something] denotes; that is, immediately after the leading [ the circumflex ^ means "not" applied to all of the rest |
Table 3.4. Character sets in regular expressions
The lookaround construct allows for the creation of regular expressions matching something but only when it is followed/preceded or not followed/preceded by something else . Note that the lookaround construct is a zero-width assertion. It is testing for a match of something else but it will not actually match it - that is why it is called an assertion. The lookaround constructs allows for the creation of otherwise impossible or too complex expressions.
In an input validation context look ahead could be used for specifying an expression allowing angle brackets <> but only when they are not closed.
| a(?! expression ) | Negative lookahead. Matches "a" when not followed by expression , where expression is any regular expression. |
| a(?= expression ) | Positive lookahead. Matches "a", when followed by expression . |
| (?<! fixed-expression )a | Negative lookbehind. Matches "a" when not preceded by fixed-expression , where fixed-expression is any regular expression specifying a fixed number of characters. That is "aaa" wil work but a+ will not work. |
| (?<= fixed-expression )a | Positive lookbehind. Matches "a" when preceded by fixed-expression . |
Table 3.5. Lookaround in regular expressions
The URL regular expressions filter matches URLs without parameters on a proxy global basis. If a request matches any of the defined regular expressions, it will be marked as valid by Profense™ and forwarded to the back-end server.
| Expression | Matches |
|---|---|
(/[\w\-]+)+\.html
|
URL with the extension html containing any international word characters, digits, _ and -. (\w matches upper and lower case alphanumeric characters plus _). |
/abc(?:/[\w\-]+)*\.html
|
Same URL starting with /abc,
including the URL /abc.html. |
(/[\w\-]+)+\.html?
|
Same URL matching extensions html
and htm
|
(/[\w\-]+)+\.(html|pdf)
|
Same URL matching extensions html
and pdf. |
(/[abcdefgh]+)+\.html
|
URL with the extension html
containing any of the lower case letters
abcdefgh. |
/index\.html
|
Exact match of /index.html
|
(/[\w\-]+)+/?
|
"Natural" URL containing international alphanumeric
characters, digits, _ and
-. |
/sw[0-9]{0,12}\.asp
|
URL with the extension asp starting
with /sw followed by 0-12 digits. |
/(login|logout)
|
Only URLs /login or
/logout
|
(/[\w\-]+)+\.(htm|html|shtml|pdf)
|
Any international characters URL with one of the extensions htm, html shtml or pdf. |
Table 3.6. Examples of global URL regular expressions
| regular expression | matches |
|---|---|
^[\w \.@()\-]+$
|
International alphanumeric characters, underscore, a space, dot, @, parentheses and a dash. |
^[0-9a-za-z. ]+$
|
digits, ASCII characters a-z, a dot and a space. |
^[0-9]+$
|
only digits. [0-9] can also be
expressed as \d
|
^[\d]{1,5}$
|
one to five digits. |
^[a-z]+$
|
only lower case ASCII characters from a-z. |
^[a-z]{0,32}$
|
matches only lower case ASCII characters from a-z and limits the total length to maximum 32 characters. |
Table 3.7. Examples of regular expressions for input validation
When specifying global parameters both the name and the value are defined using regular expressions.
| Name | Value | Matches |
|---|---|---|
usepf
|
true
|
The specific parameter usepf with
the static value true
|
parm\d{3}
|
[a-zA-Z\d]{3,32}
|
All parameters with name starting with
parm followed by three digits with the
value any combination of letters a-Z (upper
and lowercase) or digits with a minimum length of 3 and a
maximum length of 32 characters. |
\w{1,25}
|
[\w\s_,/:()@$*\.\-]*
|
Any parameter with name consisting of international word characters and with values containing zero or more"friendly characters". |
Table 3.8. Examples of global parameters regular expressions
The following classes are predefined in
Profense™. The classes are presented in the
order the Automatic Policy Generator evaluates them
when automatically mapping classes to input parameters.
| Class | Regular expression | Description |
|---|---|---|
| empty | No values allowed | |
| num |
\d{1,32}
|
Digits - a maximum of 32 digits |
| payment_card |
(?:\d{4}[\-\x20]?){2}\d{4,5}[\-\x20]?(?:\d{2,4})?
|
Payment card numbers, allows for spaces and hyphens between number groups. |
| alphanum |
\w{1,32}
|
International alphanumeric characters. No spaces. max. 32 chars. |
| alphanum_long |
\w{1,256}
|
International alphanumeric characters. No spaces. max. 256 chars. |
| ms_ident |
{?[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}}?
|
Microsoft identifier with optional preceding and trailing curly brackets. |
| path |
(?!.*(\.\.|//).*)[\w\-/]{1,512}
|
Path containing /, alphanumeric characters and - (hyphen). Consecutive / and period not allowed through negative look-ahead. |
| text_long |
[\w\x20+.,\-:]{1,256}
|
Simple international text. max 256 chars. |
| text_very_long |
[\w\x20+.,\-:]{1,32000}
|
"Simple" international text - max 32000 chars. |
[\w.+-]+@(?:[\w-]+\.)+[A-Za-z]{2,4}
|
Simple email. Several subdomains possible. | |
| standard |
[\w\x20_:,.@/()\-={}]{1,4096}
|
Text input, international, several special characters allowed including newline. Max 4096 chars. |
| standard_long |
[\w\x20_:,.@/()\-={}]+
|
Text input, international, several special characters allowed including newline. Max chars. defined by proxy settings max request size. |
| url |
(?:https?://)?(?!.*(\.\.|//).*)[\w\x20,.@(){}/?=&\-]+
|
Simple international URL match. With parameters. Consecutive "/" or "." not allowed (negative look ahead) |
| printable |
[^\x00-\x08\x0c\x0e-\x1f\x7f\x80-\x9f]+
|
Any number of printable characters. Defined by negating character class containing non-printable characters. |
| anything |
.+
|
Anything but newline. |
| Anything_multiline |
(.|\n)+
|
Anything including newline. |
Table 3.9. Predefined standard classes in Profense™
A number of web sites and books are describing regular expressions in more detail.
A general description
The code project
.NET specific tutorial, includes a software tool for testing.
Excellent web site dealing extensively with the subject.
There are many good books covering regular expressions. Here we mention a few.
Introduction and quick reference from O'Reilly.
Introduction and quick reference from O'Reilly.
Learning to use regular expressions efficiently. Does not pretend to be introductory in any way. Also from O'Reilly.
Sounds appealing. If you are new to regular expressions this is probably a good place to start. From Sams Publishing.
In Profense™ access policy elements can be added and/or altered in a number of ways:
When enabled, the analyzes incoming requests employing a combination of statistics, heuristic attack classification and server responses. The Learner builds a complete profile of the web site including static requests, web applications and input parameters.
Based on the web site profile a policy is created using a combination of global patterns and specific web application policy entries. In most cases no post adjustment to the automatically built policy is necessary.
When creating a proxy a few very general policy elements can be added.
Access policy rules can be created or modified by allowing rejected requests from the log section.
When adding or modifying acl entries from log
classes are used to map input validation patterns
to parameter values.
All access policy elements can of course be added , modified and deleted manually.
When creating a new proxy it can either be configured with no policy settings or with a few basic.
No policy is configured as the policy is expected to be configured automatically by the learner.
With this option selected a few very general policy rules are configured.
Initial
operating mode.
Pass through and Learn:
Normal policy configuration.
Detect and Block:
Normal policy configuration with basic (loose) policy
rules.
If Normal policy configuration with basic (loose)
policy rules is selected when adding the new proxy it is
initially configured with the following settings and policies.
| Headers compliance checking | Pragmatic |
| Allowed static extensions |
css,png,ico,js,jpg,jpeg,gif,swf
|
| Allowed static path characters |
Hyphen-minus ("-", UTF-8: 2d),
all international alphanumeric,
space (" ", UTF-8: 20) |
| Global URL rules |
Generally allow URL paths per extension:
Allow natural url:
|
| Global parameters |
Allow most parameters with "friendly" content:
|
| ACL entries |
Allow root requests:
|
Table 3.10. Optional initial policy configuration configuration parameters
The Learner builds a complete profile of the web site including static requests, web applications and input parameters bu analyzing incoming requests.
To avoid learning from worms, attacks and other unauthorized access the Learner employs a combination of heuristic attack classification, statistics and server responses.
When the proxy is set to run in Learn mode the
Learner keeps analyzing requests until no changes to the resulting
policy are recorded. That is, for every 10,000 requests the Learner
builds a trial policy, compares it to the former trial policy and
records the number of changes. When a configurable number of trial
policies in a row (default 10) has not resultet in a number changes
between each trial build exceeding a configurable threshold (default 0)
a policy is built and activated and the proxy mode is promoted to
Detect. In this mode the proxy will record but not
perform block actions. Promoting the proxy to Block
mode should be performed manually.
By default the Learner is configured to generate a short yet fine grained policy. This is achieved by identifying global characteristics of the web site and generating global patterns matching those characteristics. The global patterns typically account for the majority of the web systems content and applications leaving only the "real" web applications to be accounted for by specific web application policy entries.