ISAPI Rewrite 非官方中文配置手册_蓝色版本
作者:
ISAPI Rewrite 非官方中文配置手册_蓝色版本
发这个帮助文件是因为在给朋友设置主机的时候发现他们的ISAPI Rewrite的设置都有些不正确,有些甚至阻碍了站点的正常运行。就把官方自带的英文帮助粗略的翻译了一下。大家可以自由使用和转载。但转载时如果你愿意请注明是 蓝色 翻译的版本。
===================
ISAPI Rewrite 配置简介:
在NT/2000/XP和2003平台上,ISAPI Rewrite在系统帐户下射入INETINFO进程与 IIS 以共存模式过滤器运行。所以系统帐户应该给予对ISAPI- REWIRITE DLLS Internet匿名访问组 可读可执行权限 和所有的 HTTPD。INI 文件可读权限,还应该给予系统帐户对于所有包括 HTTPD。INI 文件的文件夹的可写权限,这将允许产生 HTTP。 PARSE。ERRORS 日志文件,来记录配置文件语法错误。对于PROXY模块也需要额外的权限,因为它将运行于连接池或HIGH-ISPLATED 应用模式,IIS帐户共享池和HIGH-ISOLATION池应被给予对RWHELPERE。DLL的可读权限。缺省情况下IWAM-《计算机名》被用于所有的池,在相应的COM+应用设置中应借助COM+ADMINISTRATION MMC SNAP-IN建立池帐户
配置文件格式化:
配置文件分为两种: GLOBAL(服务器全局模式)和 INDIVIDUAL(站点独立设置模式)文件,GLOBAL(服务器端全局模式)配置文件应该被放在ISAPI-REWRITE 安装目录中,名为 httpd.ini 。这个文件可以通过开始菜单的快捷方式访问和配置,这个文件里面的映射规则为全局规则,将针对所有站点起效。INDIVIDUAL(站点独立设置模式)配置文件应该被放在虚拟站点的根目录内,也被命名为httpd.ini文件,这里面应该放置针对站点的映射配置设置,只针对被放置的虚拟站点有效。两种类型的 httpd.ini 都是标准的windows ini 文件。所有的映射规则应该被放置在 [ISAPI_Rewrite] 之后。之前的文件文本将被忽略。
HTTPD.INI文件示例
[ISAPI_Rewrite]
# This is a comment
# 300 = 5 minutes
CacheClockRate 300
RepeatLimit 20
# 设置其他人没有下载 httpd.ini 和 httpd.parse.errors 文件的权限
RewriteRule /httpd(?:\.ini|\.parse\.errors) / [F,I,O]
# Block external access to the Helper ISAPI Extension
RewriteRule .*\.isrwhlp / [F,I,O]
# 配置规则
RewriteCond Host: (.+)
RewriteCond 指令
Syntax:(句法) RewriteCond TestVerb CondPattern [Flags]
这一指令定义一个条件规则,在 RewriteRule 或者 RewriteHeader或 RewriteProxy指令前预行RewriteCond指令,后面的规则只有它的,模式匹配URI的当前状态并且额外的条件也被应用才会被应用。
TestVerb
Specifies verb that will be matched against regular expression.
特别定义的动词匹配规定的表达式
TestVerb=(URL | METHOD | VERSION | HTTPHeaderName: | %ServerVariable) where:
URL - returns Request-URI of client request as described in RFC 2068 (HTTP 1.1);
返回客户端在RFC2068中描述的需求的Request-URI
METHOD - returns HTTP method of client request (OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE);
返回客户端需求(OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE)的HTTP方法
VERSION - returns HTTP version;
返回HTTP版本
HTTPHeaderName - returns value of the specified HTTP header. HTTPHeaderName can be any valid HTTP header name. Header names should include the trailing colon ":". If specified header does not exists in a client's request TestVerb is treated as empty string.
返回特定义的HTTP头文件的值
HTTPHeaderName =
Accept:
Accept-Charset:
Accept-Encoding:
Accept-Language:
Authorization:
Cookie:
From:
Host:
If-Modified-Since:
If-Match:
If-None-Match:
If-Range:
If-Unmodified-Since:
Max-Forwards:
Proxy-Authorization:
Range:
Referer:
User-Agent:
Any-Custom-Header
得到更多的关于HTTP头文件的和他们的值的信息参考RFC2068
ServerVariable 返回特定义的服务器变量的值 。例如服务器端口,全部服务器变量列表应在IIS文档中建立,变量名应用%符预定;
CondPattern
The regular expression to match TestVerb
规则表达式匹配TestVerb
[Flags]
Flags is a comma-separated list of the following flags:
O (nOrmalize)
Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flag is useful with URLs and URL-encoded headers
RewriteRule 指令
Syntax: RewriteRule Pattern FormatString [Flags]
这个指令可以不止发生一次,每个指令定义一个单独的重写规则,这些规则的定义命令很重要,因为这个命令在应用运行时规则是有用途的
I (ignore case)
不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令
F (Forbidden)
对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。
L (last rule)
不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写
N (Next iteration)
强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略
NS (Next iteration of the same rule)
以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,
P (force proxy)
强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,必须确认代理字符串是一个有效的URI包括协议主机等等否则代理将返回错误
R (explicit redirect)
强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则
RP (permanent redirect)
几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CL (Case Lower)
小写
CU (Case Upper)
大写
RewriteHeader directive
Syntax: RewriteHeader HeaderName Pattern FormatString [Flags]
这个指令是RewriteRule的更概括化变种,它不仅重写URL的客户端需求部分,而且重写HTTP头,这个指令不仅用于重写。生成,删除任何HTTP头,甚至改变客户端请求的方法
HeaderName
指定将被重写的客户头,可取的值与 RewriteCond 指令中TestVerb参数相同
Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表
I (ignore case)
不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令
F (Forbidden)
对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。
L (last rule)
不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写
N (Next iteration)
强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略
NS (Next iteration of the same rule)
以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,
R (explicit redirect)
强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则
RP (permanent redirect)
几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CL (Case Lower)
小写
CU (Case Upper)
大写
要重移动头,FORMAT STRING模式应该生成一个空字符串,例如这一规则将从客户请求中重移代理信息
RewriteHeader User-Agent: .* $0
并且这一规则将把OLD-URL HEADER 加入请求中。
RewriteCond URL (.*)RewriteHeader Old-URL: ^$ $1
最后一个例子将通过改变请求方法定向所有的WEBDAV请求到/WEBDAV。ASP
RewriteCond METHOD OPTIONS
RewriteRule (.*) /webdav.asp?$1
RewriteHeader METHOD OPTIONS GET
RewriteProxy directive
Syntax: RewriteProxy Pattern FormatString [Flags]
强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,这将允许IIS作为代理服务器并且重路由到其他站点和服务器
Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表
D (Delegate security)
代理模式将试图以当前假冒的用户资格登陆远程服务器,
C (use Credentials)
代理模式将试图一在URL或基本授权头文件中指定的资格登陆远程服务器,用这个标记你可以使用http://user:password@host.com/path/ syntax 作为URL
F (Follow redirects)
缺省情况下ISAPI_Rewrite 将试图将MAP远程服务器返回的重定向指令到本地服务器命名空间,如果远程服务器返回重定向点到那台服务器其他的某个位置,ISAPI_Rewrite 将修改这一重定向指令指向本服务器名,这将避免用户看到真实(内部)服务器名称
使用F标记强制代理模式内部跟踪远程服务器返回的重定向指令,使用这个标记如果你根本不需要接受远程服务器的重定向指令,在WINHTTP设置中有重定向限制以避免远程重定向循环
I (ignore case)
不管大小写强行指定字符匹配
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CacheClockRate directive
Syntax: CacheClockRate Interval
这个指令只在GLOBAL配置内容中出现,如果这个指令在SITE-LEVEL内容中出现将被忽略并把错误信息写入httpd.parse.errors 文件
ISAPI_Rewrite caches每次在第一次加载时配置,使用这个指令你可以限定当一个特定站点从缓存中清理的不活动周期,把这个参数设置的足够大你可以强制 ISAPI_Rewrite 永不清理缓存,记住任何配置文件的改变将在下次请求后立即更新而忽略这个周期
Interval
限定特定配置被清理出缓存的不作为时间(以秒计),缺省值3600(1小时)
EnableConfig and DisableConfig directives
Syntax:
EnableConfig [SiteID|"Site name"]
DisableConfig [SiteID|"Site name"]
对所选站点激活或不激活SITE-LEVEL配置或者改变缺省配置,缺省SITE-LEVEL配置不激活,这个指令只出现在GLOBAL配置内容中
SiteID
Numeric metabase identifier of a site
Site name
Name of the site as it appears in the IIS console
不用参数使用这个命令将改变缺省配置到ENABLE/DISABLE配置进程
例子
下面例子将使配置仅作用于ID=1(典型是缺省站点)名字是MY SITE的站点
DisableConfig
EnableConfig 1
EnableConfig"My site"
下边例子将激活名称为SOMESITE配置因为它分割设置重载了缺省设置
EnableConfig"Some site"
DisableConfig
EnableRewrite and DisableRewrite directives
Syntax:
EnableRewrite [SiteID|"Site name"]
DisableRewrite [SiteID|"Site name"]
对所选站点激活或不激活重写或者改变缺省配置,缺省重写配置激活,这个指令只出现在GLOBAL配置内容中
SiteID
Numeric metabase identifier of a site
Site name
Name of the site as it appears in the IIS console.
不使用参数这个命令将全部激活或者不激活
RepeatLimit directive
Syntax: RepeatLimit Limit
这个指令可以出现在GLOBAL和SITE-LEVEL配置文件中,如果出现在GLOBAL配置文件中竟改变GLOBAL对于所有站点的限制,出现在SITE-LEVEL配置中竟只改变对于这个站点的限制并且这个限制不能超过GLOBAL限制
ISAPI_Rewrite在实行规则时允许循环,这个指令允许限制最大可能循环的数量,可以设置为0或1而不支持循环,
LIMIT
限制最大循环数量,缺省32
RFStyle directive
Syntax: RFStyle Old | New
Configuration Utility
ISAPI_Rewrite Full包括配置功用(可以在 ISAPI_Rewrite 程序组中启动),它允许你浏览测试状态并输入注册码(如果在安装过程中没有注册),并且调整部分与代理模式操作相关的产品功能,UTILITY是由三个页面组成的属性表
Trial page允许你浏览TIRAL状态并输入注册码(如果在安装过程中没有注册)
Settings page
这页包含对下列参数的编辑框
Helper URL
这个参数影响过滤器和代理模块之间的联系方式,它即可以是以点做前缀的文件扩展名(如 .isrwhlp)也可以是绝对路径,
第一种情况下扩展名将追加在初始请求URI上并且代理模块竟通过SCRIPT MAP激活,缺省扩展名isrwhlp在安装进程中加在global script map 中,如果你改变这个扩展名或者你的应用不继承global script map 设置你应该手动添加向script map 所需求的入口。这个应该有如下参数
Executable: An absolute path to the rwhelper.dll in the short form
Extension: Desired extension (.isrwhlp is default)
Verbs radio button: All Verbs
Script engine checkbox: Checked
Check that file exists checkbox: Unchecked
我们已经创建了一个WSH script proxycfg.vbs ,可以简单在一个a script maps中注册,她位于安装文件夹并且可以在命令行一如下方式运行
cscript proxycfg.vbs [-r] [MetabasePath]
Optional -r 强制注册扩展名
Optional MetabasePath parameter allows specification of the first metabase key to process. By default it is "/localhost/W3SVC".
要在所有现存的 script maps 中注册你可以以如下命令行激活 script
cscript proxycfg.vbs -r
第二种情况下你应该提供一个URI作为'Helper URL'的值,你也应该map 一个 ISAPI_Rewrite的安装文件夹作为美意个站点的虚拟文件家
注意:根据顾客反应,IIS5(也许包括IIS4)对长目录名有问题。所以我们强烈推荐使用短目录名
Worker threads limit
这个参数限制在代理扩展线程池中工作线程数,缺省为0意味着这个限制等于处理器数量乘以2
Active threads limit
这个参数限制当前运行线程数,这个数量不可大于"Worker threads limit". 缺省0意思是等于处理器数量
Queue size 这个参数定义最大请求数量,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Queue timeout
这个参数定义你在内部请求队列中防止新请求的最大等待时间,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Connect timeout
以毫秒设定代理模块连接超时
Send timeout
以毫秒设定代理模块发送超时
Receive timeout
以毫秒设定代理模块发送超时
About page.
It contains copyright information and a link to the ISAPI_Rewrite's web site.
Regular expression syntax
这一部分覆盖了 ISAPI_Rewrite规定的表达句法
Literals
所有字符都是原意除了 ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$".,这些字符在用“\”处理时是原意,原意指一个字符匹配自身
Wildcard
The dot character "." matches any single character except null character and newline character
以下为句法
Repeats
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.
Examples:
"ba*" will match all of "b", "ba", "baaa" etc.
"ba+" will match "ba" or "baaaa" for example but not "b".
"ba?" will match "b" or "ba".
"ba{2,4}" will match "baa", "baaa" and "baaaa".
Non-greedy repeats
Non-greedy repeats are possible by appending a '?' after the repeat; a non-greedy repeat is one which will match the shortest possible string.
For example to match html tag pairs one could use something like:
"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
In this case $1 will contain the text between the tag pairs, and will be the shortest possible matching string.
Parenthesis
Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab". All sub matches marked by parenthesis can be back referenced using \N or $N syntax. It is permissible for sub-expressions to match null strings. Sub-expressions are indexed from left to right starting from 1, sub-expression 0 is the whole expression.
Non-Marking Parenthesis
Sometimes you need to group sub-expressions with parenthesis, but don't want the parenthesis to spit out another marked sub-expression, in this case a non-marking parenthesis (?:expression) can be used. For example the following expression creates no sub-expressions:
"(?:abc)*"
Alternatives
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.
Examples:
"a(b|c)" could match "ab" or "ac".
"abc|def" could match "abc" or "def".
Sets
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.
Examples:
Character literals:
"[abc]" will match either of "a", "b", or "c".
"[^abc] will match any character other than "a", "b", or "c".
Character ranges:
"[a-z]" will match any character in the range "a" to "z".
"[^A-Z]" will match any character other than those in the range "A" to "Z".
Character classes
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are:
alnum Any alpha numeric character.
alpha Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
blank Any blank character, either a space or a tab.
cntrl Any control character.
digit Any digit 0-9.
graph Any graphical character.
lower Any lower case character a-z. Other characters may also be included depending upon the locale.
print Any printable character.
punct Any punctuation character.
space Any whitespace character.
upper Any upper case character A-Z. Other characters may also be included depending upon the locale.
xdigit Any hexadecimal digit character, 0-9, a-f and A-F.
word Any word character - all alphanumeric characters plus the underscore.
unicode Any character whose code is greater than 255, this applies to the wide character traits classes only.
There are some shortcuts that can be used in place of the character classes:
\w in place of [:word:]
\s in place of [:space:]
\d in place of [:digit:]
\l in place of [:lower:]
\u in place of [:upper:]
Collating elements
Collating elements take the general form [.tagname.] inside a set declaration, where tagname is either a single character, or a name of a collating element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is equivalent to [,]. ISAPI_Rewrite supports all the standard POSIX collating element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", "nj", "dz", "lj", each in lower, upper and title case variations. Multi-character collating elements can result in the set matching more than one character, for example [[.ae.]] would match two characters, but note that [^[.ae.]] would only match one character.
Equivalence classes
Equivalenceclassestakethegeneralform[=tagname=] inside a set declaration, where tagname is either a single character, or a name of a collating element, and matches any character that is a member of the same primary equivalence class as the collating element [.tagname.]. An equivalence class is a set of characters that collate the same, a primary equivalence class is a set of characters whose primary sort key are all the same (for example strings are typically collated by character, then by accent, and then by case; the primary sort key then relates to the character, the secondary to the accentation, and the tertiary to the case). If there is no equivalence class corresponding to tagname, then [=tagname=] is exactly the same as [.tagname.].
To include a literal "-" in a set declaration then: make it the first character after the opening "[" or "[^", the endpoint of a range, a collating element, or precede it with an escape character as in "[\-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, a collating element, or precede with an escape character.
Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.
Back references
A back reference is a reference to a previous sub-expression that has already been matched, the reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" to the second etc. For example the expression "(.*)\1" matches any string that is repeated about its mid-point for example "abcabc" or "xyzxyz". A back reference to a sub-expression that did not participate in any match, matches the null string. In ISAPI_Rewrite all back references are global for entire RewriteRule and corresponding RewriteCond directives. Sub matches are numbered up to down and left to right beginning from the first RewriteCond directive of the corresponding RewriteRule directive, if there is one.
Forward Lookahead Asserts
There are two forms of these; one for positive forward lookahead asserts, and one for negative lookahead asserts:
"(?=abc)" matches zero characters only if they are followed by the expression "abc".
"(?!abc)" matches zero characters only if they are not followed by the expression "abc".
Word operators
The following operators are provided for compatibility with the GNU regular expression library.
"\w" matches any single character that is a member of the "word" character class, this is identical to the expression "[[:word:]]".
"\W" matches any single character that is not a member of the "word" character class, this is identical to the expression "[^[:word:]]".
"\<" matches the null string at the start of a word.
"\>" matches the null string at the end of the word.
"\b" matches the null string at either the start or the end of a word.
"\B" matches a null string within a word.
Escape operator
The escape character "\" has several meanings.
The escape operator may introduce an operator for example: back references, or a word operator.
The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.
Single character escape sequences:
The following escape sequences are aliases for single characters:
Escape sequence Character code Meaning
\a 0x07 Bell character.
\t 0x09 Tab character.
\v 0x0B Vertical tab.
\e 0x1B ASCII Escape character.
\0dd 0dd An octal character code, where dd is one or more octal digits.
\xXX 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits.
\x{XX} 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character.
\cZ z-@ An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'.
Miscellaneous escape sequences:
The following are provided mostly for perl compatibility, but note that there are some differences in the meanings of \l \L \u and \U:
Escape sequence Meaning
\w Equivalent to [[:word:]].
\W Equivalent to [^[:word:]].
\s Equivalent to [[:space:]].
\S Equivalent to [^[:space:]].
\d Equivalent to [[:digit:]].
\D Equivalent to [^[:digit:]].
\l Equivalent to [[:lower:]].
\L Equivalent to [^[:lower:]].
\u Equivalent to [[:upper:]].
\U Equivalent to [^[:upper:]].
\C Any single character, equivalent to '.'.
\X Match any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
\Q The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
\E The end quote operator, terminates a sequence begun with \Q.
What gets matched?
The regular expression will match the first possible matching string, if more than one string starting at a given location can match then it matches the longest possible string. In cases where their are multiple possible matches all starting at the same location, and all of the same length, then the match chosen is the one with the longest first sub-expression, if that is the same for two or more matches, then the second sub-expression will be examined and so on. Note that ISAPI_Rewrite uses MATCH algorithm. The result is matched only if the expression matches the whole input sequence. For example:
RewriteCond URL ^/somedir/.* #will match any request to somedir directory and subdirectories, while
RewriteCond URL ^/somedir/ #will match only request to the root of the somedir.
Special note about "pathological" regular expressions
ISAPI_Rewrite uses a very powerful regular expressions engine Regex++ written by Dr. John Maddock. But as any real thing it's not ideal: There exists some "pathological" expressions which may require exponential time for matching; these all involve nested repetition operators, for example attempting to match the expression "(a*a)*b" against N letter a's requires time proportional to 2N. These expressions can (almost) always be rewritten in such a way as to avoid the problem, for example "(a*a)*b" could be rewritten as "a*b" which requires only time linearly proportional to N to solve. In the general case, non-nested repeat expressions require time proportional to N2, however if the clauses are mutually exclusive then they can be matched in linear time - this is the case with "a*b", for each character the matcher will either match an "a" or a "b" or fail, where as with "a*a" the matcher can't tell which branch to take (the first "a" or the second) and so has to try both.
Boost 1.29.0 Regex++ could detect "pathological" regular expressions and terminate theirs matching. When a rule fails ISAPI_Rewrite sends "500 Internal Server error - Rule Failed" status to a client to indicate configuration error. Also the failed rule is disabled to prevent performance losses
Format string syntax
In format strings, all characters are treated as literals except: "(", ")", "$", "\", "?", ":".
To use any of these as literals you must prefix them with the escape character \
The following special sequences are recognized:
Grouping:
Use the parenthesis characters ( and ) to group sub-expressions within the format string, use \( and \) to represent literal '(' and ')'.
Sub-expression expansions:
The following perl like expressions expand to a particular matched sub-expression:
$` Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match.
$' Expands to all the text from the end of the match to the end of the input string.
$& Expands to all of the current match.
$0 Expands to all of the current match.
$N Expands to the text that matched sub-expression N.
Conditional expressions:
Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:
?Ntrue_expression:false_expression
Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.
Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.
Escape sequences:
The following escape sequences are also allowed:
\a The bell character.
\f The form feed character.
\n The newline character.
\r The carriage return character.
\t The tab character.
\v A vertical tab character.
\x A hexadecimal character - for example \x0D.
\x{} A possible unicode hexadecimal character - for example \x{1A0}
\cx The ASCII escape character x, for example \c@ is equivalent to escape-@.
\e The ASCII escape character.
\dd An octal character constant, for example \10
Examples例子
Emulating host-header-based virtual sites on a single site
例如你在两个域名注册www.site1.com 和 www.site2.com,现在你可以创建两个不同的站点而使用单一的物理站点。把以下规则加入到你的httpd.ini 文件
[ISAPI_Rewrite]
#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]
#Emulate site1
RewriteCond Host: (?:www\.)?site1\.com
RewriteRule (.*) /site1$1 [I,L]
#Emulate site2
RewriteCond Host: (?:www\.)?site2\.com
RewriteRule (.*) /site2$1 [I,L]
现在你可以把你的站点放在/site1 和 /site2 目录中.
或者你可以应用更多的类规则:
[ISAPI_Rewrite]
#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]
RewriteCond Host: (www\.)?(.+)
RewriteRule (.*) /$2$3
为站点应该命名目录为 /somesite1.com, /somesite2.info, etc.
Using loops (Next flag) to convert request parameters
假如你希望有物理URL如 http://www.myhost.com/foo.asp?a=A&b=B&c=C 使用请求如 http://www.myhost.com/foo.asp/a/A/b/B/c/C 参数数量可以从两个请求之间变化
至少有两个解决办法。你可以简单的为每一可能的参数数量添加一个分隔规则或者你可以使用一个技术说明如下面的例子
ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^/]*)?/([^/]*)/([^/]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]
这个规则将从请求的URL中抽取一个参数追加在请求字符的末尾并且从头重启规则进程。所以它将循环直到所有参数被移动到适当的位置,或者直到超过RepeatLimit
也存在许多这个规则的变种。但使用不同的分隔字符,例如。使用URLS如http://www.myhost.com/foo.asp~a~A~b~B~c~C 可以应中下面的规则:
ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^~]*)?~([^~]*)~([^~]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]
Running servers behind IIS
假如我们有一个内网服务器运行IIS而几个公司服务器运行其他平台,这些服务器不能从INTERNET直接进入,而只能从我们公司的网络进入,有一个简单的例子可以使用代理标记映射其他服务器到IIS命名空间:
[ISAPI_Rewrite]
RewriteProxy /mappoint(.+) http\://sitedomain$1 [I,U]
Moving sites from UNIX to IIS
这个规则可以帮助你把URL从 /~username 改变到 /username 和从 /file.html 改变到 /file.htm. 这个在你仅仅把你的站从UNIX移动到IIS并且保持搜索引擎和其他外部页面对老页面的连接时是有用的
[ISAPI_Rewrite]
#redirecting to update old links
RewriteRule (.*)\.html $1.htm
RewriteRule /~(.*) http\://myserver/$1 [R]
Moving site location
许多网管问这样的问题:他们要重定向所有的请求到一个新的网络服务器,当你需要建立一个更新的站点取代老的的时候经常出现这样的问题,解决方案是用ISAPI_Rewrite 于老服务器中
[ISAPI_Rewrite]
#redirecting to update old links
RewriteRule (.+) http\://newwebserver$1 [R]
Browser-dependent content
Dynamically generated robots.txt
robots.txt是一个搜索引擎用来发现能不能被索引的文件,但是为一个大站创建一个有许多动态内容的这个文件是很复杂的工作,我们可以写一个robots.asp script
现在使用单一规则生成 robots.txt
[ISAPI_Rewrite]
RewriteRule /robots\.txt /robots.asp
Making search engines to index dynamic pages
站点的内容存储在XML文件中,在服务器上有一个/XMLProcess.asp 文件处理XML文件并返回HTML到最终用户,URLS到文档有如下形式
http://www.mysite.com/XMLProcess.asp?xml=/somdir/somedoc.xml
但是许多公共引擎不能索引此类文档,因为URLS包含问号(文档动态生成),
ISAPI_Rewrite可以完全消除这个问题
[ISAPI_Rewrite]
RewriteRule /doc(.*)\.htm /XMLProcess.asp\?xml=$1.xml
现在使用如同http://www.mysite.com/doc/somedir/somedoc.htm的URL进入文档,搜索引擎将不知道不是somedoc.htm 文件并且内容是动态生成的
Negative expressions (NOT
有时当模式不匹配你需要应用规则,这种情况下你可以使用在规则表达式中称为Forward Lookahead Asserts
例如你需要不使用IE把所有用户移动到别的地点
[ISAPI_Rewrite]
# Redirect all non Internet Explorer users
# to another location
RewriteCond User-Agent: (?!.*MSIE).*
RewriteRule (.*) /nonie$1
Dynamic authentification
例如我们在站点上有一些成员域,我们在这个域上需要密码保护文件而我们不喜欢用BUILT-IN服务器安全,这个情况下可以建立一个ASP脚本(称为proxy.asp),这个脚本将代理所有请求到成员域并且检查请求允许,这里有一个简单的模板你可以放进你自己的授权代码
现在我们要通过配置 ISAPI_Rewrite 通过这个页面代理请求:
[ISAPI_Rewrite]
# Proxy all requests through proxy.asp
RewriteRule /members(.+) /proxy.asp\?http\://mysite.com/members$1
保护图片 防止盗链
Blocking inline-images (stop hot linking
假设我们在http://www.mysite.com/下有些页面调用一些GIF、jpg、png图片,不允许别人盗链引用到他们自己的页面上,因为这样大大增加了服务器流量。
当然我们不能100%保护图片,但我们至少可以在得到浏览器发出的HTTP Referer header的地方限制这种情况,因为这个可以判断是否我们自己的站点调用了我们自己的图片。
[ISAPI_Rewrite]
RewriteCond Host: (.+)
RewriteCond Referer: (?!http://\1.*).*
RewriteRule .*\.(?:gif|jpg|png) /block.gif [I,O]
===================
ISAPI Rewrite 配置简介:
在NT/2000/XP和2003平台上,ISAPI Rewrite在系统帐户下射入INETINFO进程与 IIS 以共存模式过滤器运行。所以系统帐户应该给予对ISAPI- REWIRITE DLLS Internet匿名访问组 可读可执行权限 和所有的 HTTPD。INI 文件可读权限,还应该给予系统帐户对于所有包括 HTTPD。INI 文件的文件夹的可写权限,这将允许产生 HTTP。 PARSE。ERRORS 日志文件,来记录配置文件语法错误。对于PROXY模块也需要额外的权限,因为它将运行于连接池或HIGH-ISPLATED 应用模式,IIS帐户共享池和HIGH-ISOLATION池应被给予对RWHELPERE。DLL的可读权限。缺省情况下IWAM-《计算机名》被用于所有的池,在相应的COM+应用设置中应借助COM+ADMINISTRATION MMC SNAP-IN建立池帐户
配置文件格式化:
配置文件分为两种: GLOBAL(服务器全局模式)和 INDIVIDUAL(站点独立设置模式)文件,GLOBAL(服务器端全局模式)配置文件应该被放在ISAPI-REWRITE 安装目录中,名为 httpd.ini 。这个文件可以通过开始菜单的快捷方式访问和配置,这个文件里面的映射规则为全局规则,将针对所有站点起效。INDIVIDUAL(站点独立设置模式)配置文件应该被放在虚拟站点的根目录内,也被命名为httpd.ini文件,这里面应该放置针对站点的映射配置设置,只针对被放置的虚拟站点有效。两种类型的 httpd.ini 都是标准的windows ini 文件。所有的映射规则应该被放置在 [ISAPI_Rewrite] 之后。之前的文件文本将被忽略。
HTTPD.INI文件示例
[ISAPI_Rewrite]
# This is a comment
# 300 = 5 minutes
CacheClockRate 300
RepeatLimit 20
# 设置其他人没有下载 httpd.ini 和 httpd.parse.errors 文件的权限
RewriteRule /httpd(?:\.ini|\.parse\.errors) / [F,I,O]
# Block external access to the Helper ISAPI Extension
RewriteRule .*\.isrwhlp / [F,I,O]
# 配置规则
RewriteCond Host: (.+)
RewriteCond 指令
Syntax:(句法) RewriteCond TestVerb CondPattern [Flags]
这一指令定义一个条件规则,在 RewriteRule 或者 RewriteHeader或 RewriteProxy指令前预行RewriteCond指令,后面的规则只有它的,模式匹配URI的当前状态并且额外的条件也被应用才会被应用。
TestVerb
Specifies verb that will be matched against regular expression.
特别定义的动词匹配规定的表达式
TestVerb=(URL | METHOD | VERSION | HTTPHeaderName: | %ServerVariable) where:
URL - returns Request-URI of client request as described in RFC 2068 (HTTP 1.1);
返回客户端在RFC2068中描述的需求的Request-URI
METHOD - returns HTTP method of client request (OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE);
返回客户端需求(OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE)的HTTP方法
VERSION - returns HTTP version;
返回HTTP版本
HTTPHeaderName - returns value of the specified HTTP header. HTTPHeaderName can be any valid HTTP header name. Header names should include the trailing colon ":". If specified header does not exists in a client's request TestVerb is treated as empty string.
返回特定义的HTTP头文件的值
HTTPHeaderName =
Accept:
Accept-Charset:
Accept-Encoding:
Accept-Language:
Authorization:
Cookie:
From:
Host:
If-Modified-Since:
If-Match:
If-None-Match:
If-Range:
If-Unmodified-Since:
Max-Forwards:
Proxy-Authorization:
Range:
Referer:
User-Agent:
Any-Custom-Header
得到更多的关于HTTP头文件的和他们的值的信息参考RFC2068
ServerVariable 返回特定义的服务器变量的值 。例如服务器端口,全部服务器变量列表应在IIS文档中建立,变量名应用%符预定;
CondPattern
The regular expression to match TestVerb
规则表达式匹配TestVerb
[Flags]
Flags is a comma-separated list of the following flags:
O (nOrmalize)
Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flag is useful with URLs and URL-encoded headers
RewriteRule 指令
Syntax: RewriteRule Pattern FormatString [Flags]
这个指令可以不止发生一次,每个指令定义一个单独的重写规则,这些规则的定义命令很重要,因为这个命令在应用运行时规则是有用途的
I (ignore case)
不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令
F (Forbidden)
对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。
L (last rule)
不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写
N (Next iteration)
强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略
NS (Next iteration of the same rule)
以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,
P (force proxy)
强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,必须确认代理字符串是一个有效的URI包括协议主机等等否则代理将返回错误
R (explicit redirect)
强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则
RP (permanent redirect)
几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CL (Case Lower)
小写
CU (Case Upper)
大写
RewriteHeader directive
Syntax: RewriteHeader HeaderName Pattern FormatString [Flags]
这个指令是RewriteRule的更概括化变种,它不仅重写URL的客户端需求部分,而且重写HTTP头,这个指令不仅用于重写。生成,删除任何HTTP头,甚至改变客户端请求的方法
HeaderName
指定将被重写的客户头,可取的值与 RewriteCond 指令中TestVerb参数相同
Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表
I (ignore case)
不管大小写强行指定字符匹配,这个FLAG影响RewriteRule指令和相应的RewriteCond 指令
F (Forbidden)
对客户端做反应,停止REWRITING进程并且发送403错误,注意在这种情况下FORMATSTRING 是无用的并可以设置为任何非空字符串。
L (last rule)
不应用任何重写规则在此停止重写进程,使用这个FLAG以阻止当前被重写的URI被后面的规则再次重写
N (Next iteration)
强制REWRITINGENGINE调整规则目标并且从头重启规则检查(所有修改将保存),重启次数由RepeatLimit指定的值限制,如果这个数值超过N FLAG将被忽略
NS (Next iteration of the same rule)
以N标记工作不从相同的规则重启规则规则进程(例如强制重复规则应用),通过RepeatLimit指令指定一个反复实行某一规则的最大数目,
R (explicit redirect)
强制服务器对客户端发出重定向指示即时应答,提供目的URI的新地址,重定向规则经常是最后规则
RP (permanent redirect)
几乎和[R]标记相同但是发布301HTTP状态而不是302HTTP状态代码
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CL (Case Lower)
小写
CU (Case Upper)
大写
要重移动头,FORMAT STRING模式应该生成一个空字符串,例如这一规则将从客户请求中重移代理信息
RewriteHeader User-Agent: .* $0
并且这一规则将把OLD-URL HEADER 加入请求中。
RewriteCond URL (.*)RewriteHeader Old-URL: ^$ $1
最后一个例子将通过改变请求方法定向所有的WEBDAV请求到/WEBDAV。ASP
RewriteCond METHOD OPTIONS
RewriteRule (.*) /webdav.asp?$1
RewriteHeader METHOD OPTIONS GET
RewriteProxy directive
Syntax: RewriteProxy Pattern FormatString [Flags]
强制目的URI在内部强制为代理需求并且立即通过ISAPI扩展应付代理需求,这将允许IIS作为代理服务器并且重路由到其他站点和服务器
Pattern
限定规则表达式以匹配Request-URI,
FormatString
限定将生成新的URI的FormatString
[Flags]
是一个下列FLAGS的命令分隔列表
D (Delegate security)
代理模式将试图以当前假冒的用户资格登陆远程服务器,
C (use Credentials)
代理模式将试图一在URL或基本授权头文件中指定的资格登陆远程服务器,用这个标记你可以使用http://user:password@host.com/path/ syntax 作为URL
F (Follow redirects)
缺省情况下ISAPI_Rewrite 将试图将MAP远程服务器返回的重定向指令到本地服务器命名空间,如果远程服务器返回重定向点到那台服务器其他的某个位置,ISAPI_Rewrite 将修改这一重定向指令指向本服务器名,这将避免用户看到真实(内部)服务器名称
使用F标记强制代理模式内部跟踪远程服务器返回的重定向指令,使用这个标记如果你根本不需要接受远程服务器的重定向指令,在WINHTTP设置中有重定向限制以避免远程重定向循环
I (ignore case)
不管大小写强行指定字符匹配
U (Unmangle Log)
当URI是源需求而不是重写需求时记载URI
O (nOrmalize)
在实行之前标准化字符串。标准化包括URL-ENCODING,不合法的字符的再移动等,这个标记对于URLS和URLS-ENDODED头是有用的
CacheClockRate directive
Syntax: CacheClockRate Interval
这个指令只在GLOBAL配置内容中出现,如果这个指令在SITE-LEVEL内容中出现将被忽略并把错误信息写入httpd.parse.errors 文件
ISAPI_Rewrite caches每次在第一次加载时配置,使用这个指令你可以限定当一个特定站点从缓存中清理的不活动周期,把这个参数设置的足够大你可以强制 ISAPI_Rewrite 永不清理缓存,记住任何配置文件的改变将在下次请求后立即更新而忽略这个周期
Interval
限定特定配置被清理出缓存的不作为时间(以秒计),缺省值3600(1小时)
EnableConfig and DisableConfig directives
Syntax:
EnableConfig [SiteID|"Site name"]
DisableConfig [SiteID|"Site name"]
对所选站点激活或不激活SITE-LEVEL配置或者改变缺省配置,缺省SITE-LEVEL配置不激活,这个指令只出现在GLOBAL配置内容中
SiteID
Numeric metabase identifier of a site
Site name
Name of the site as it appears in the IIS console
不用参数使用这个命令将改变缺省配置到ENABLE/DISABLE配置进程
例子
下面例子将使配置仅作用于ID=1(典型是缺省站点)名字是MY SITE的站点
DisableConfig
EnableConfig 1
EnableConfig"My site"
下边例子将激活名称为SOMESITE配置因为它分割设置重载了缺省设置
EnableConfig"Some site"
DisableConfig
EnableRewrite and DisableRewrite directives
Syntax:
EnableRewrite [SiteID|"Site name"]
DisableRewrite [SiteID|"Site name"]
对所选站点激活或不激活重写或者改变缺省配置,缺省重写配置激活,这个指令只出现在GLOBAL配置内容中
SiteID
Numeric metabase identifier of a site
Site name
Name of the site as it appears in the IIS console.
不使用参数这个命令将全部激活或者不激活
RepeatLimit directive
Syntax: RepeatLimit Limit
这个指令可以出现在GLOBAL和SITE-LEVEL配置文件中,如果出现在GLOBAL配置文件中竟改变GLOBAL对于所有站点的限制,出现在SITE-LEVEL配置中竟只改变对于这个站点的限制并且这个限制不能超过GLOBAL限制
ISAPI_Rewrite在实行规则时允许循环,这个指令允许限制最大可能循环的数量,可以设置为0或1而不支持循环,
LIMIT
限制最大循环数量,缺省32
RFStyle directive
Syntax: RFStyle Old | New
Configuration Utility
ISAPI_Rewrite Full包括配置功用(可以在 ISAPI_Rewrite 程序组中启动),它允许你浏览测试状态并输入注册码(如果在安装过程中没有注册),并且调整部分与代理模式操作相关的产品功能,UTILITY是由三个页面组成的属性表
Trial page允许你浏览TIRAL状态并输入注册码(如果在安装过程中没有注册)
Settings page
这页包含对下列参数的编辑框
Helper URL
这个参数影响过滤器和代理模块之间的联系方式,它即可以是以点做前缀的文件扩展名(如 .isrwhlp)也可以是绝对路径,
第一种情况下扩展名将追加在初始请求URI上并且代理模块竟通过SCRIPT MAP激活,缺省扩展名isrwhlp在安装进程中加在global script map 中,如果你改变这个扩展名或者你的应用不继承global script map 设置你应该手动添加向script map 所需求的入口。这个应该有如下参数
Executable: An absolute path to the rwhelper.dll in the short form
Extension: Desired extension (.isrwhlp is default)
Verbs radio button: All Verbs
Script engine checkbox: Checked
Check that file exists checkbox: Unchecked
我们已经创建了一个WSH script proxycfg.vbs ,可以简单在一个a script maps中注册,她位于安装文件夹并且可以在命令行一如下方式运行
cscript proxycfg.vbs [-r] [MetabasePath]
Optional -r 强制注册扩展名
Optional MetabasePath parameter allows specification of the first metabase key to process. By default it is "/localhost/W3SVC".
要在所有现存的 script maps 中注册你可以以如下命令行激活 script
cscript proxycfg.vbs -r
第二种情况下你应该提供一个URI作为'Helper URL'的值,你也应该map 一个 ISAPI_Rewrite的安装文件夹作为美意个站点的虚拟文件家
注意:根据顾客反应,IIS5(也许包括IIS4)对长目录名有问题。所以我们强烈推荐使用短目录名
Worker threads limit
这个参数限制在代理扩展线程池中工作线程数,缺省为0意味着这个限制等于处理器数量乘以2
Active threads limit
这个参数限制当前运行线程数,这个数量不可大于"Worker threads limit". 缺省0意思是等于处理器数量
Queue size 这个参数定义最大请求数量,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Queue timeout
这个参数定义你在内部请求队列中防止新请求的最大等待时间,如果你曾经看到Queue timeout expired" 信息在 the Application event log中你可以增加这个参数
Connect timeout
以毫秒设定代理模块连接超时
Send timeout
以毫秒设定代理模块发送超时
Receive timeout
以毫秒设定代理模块发送超时
About page.
It contains copyright information and a link to the ISAPI_Rewrite's web site.
Regular expression syntax
这一部分覆盖了 ISAPI_Rewrite规定的表达句法
Literals
所有字符都是原意除了 ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$".,这些字符在用“\”处理时是原意,原意指一个字符匹配自身
Wildcard
The dot character "." matches any single character except null character and newline character
以下为句法
Repeats
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.
Examples:
"ba*" will match all of "b", "ba", "baaa" etc.
"ba+" will match "ba" or "baaaa" for example but not "b".
"ba?" will match "b" or "ba".
"ba{2,4}" will match "baa", "baaa" and "baaaa".
Non-greedy repeats
Non-greedy repeats are possible by appending a '?' after the repeat; a non-greedy repeat is one which will match the shortest possible string.
For example to match html tag pairs one could use something like:
"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
In this case $1 will contain the text between the tag pairs, and will be the shortest possible matching string.
Parenthesis
Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab". All sub matches marked by parenthesis can be back referenced using \N or $N syntax. It is permissible for sub-expressions to match null strings. Sub-expressions are indexed from left to right starting from 1, sub-expression 0 is the whole expression.
Non-Marking Parenthesis
Sometimes you need to group sub-expressions with parenthesis, but don't want the parenthesis to spit out another marked sub-expression, in this case a non-marking parenthesis (?:expression) can be used. For example the following expression creates no sub-expressions:
"(?:abc)*"
Alternatives
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.
Examples:
"a(b|c)" could match "ab" or "ac".
"abc|def" could match "abc" or "def".
Sets
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.
Examples:
Character literals:
"[abc]" will match either of "a", "b", or "c".
"[^abc] will match any character other than "a", "b", or "c".
Character ranges:
"[a-z]" will match any character in the range "a" to "z".
"[^A-Z]" will match any character other than those in the range "A" to "Z".
Character classes
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are:
alnum Any alpha numeric character.
alpha Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
blank Any blank character, either a space or a tab.
cntrl Any control character.
digit Any digit 0-9.
graph Any graphical character.
lower Any lower case character a-z. Other characters may also be included depending upon the locale.
print Any printable character.
punct Any punctuation character.
space Any whitespace character.
upper Any upper case character A-Z. Other characters may also be included depending upon the locale.
xdigit Any hexadecimal digit character, 0-9, a-f and A-F.
word Any word character - all alphanumeric characters plus the underscore.
unicode Any character whose code is greater than 255, this applies to the wide character traits classes only.
There are some shortcuts that can be used in place of the character classes:
\w in place of [:word:]
\s in place of [:space:]
\d in place of [:digit:]
\l in place of [:lower:]
\u in place of [:upper:]
Collating elements
Collating elements take the general form [.tagname.] inside a set declaration, where tagname is either a single character, or a name of a collating element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is equivalent to [,]. ISAPI_Rewrite supports all the standard POSIX collating element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", "nj", "dz", "lj", each in lower, upper and title case variations. Multi-character collating elements can result in the set matching more than one character, for example [[.ae.]] would match two characters, but note that [^[.ae.]] would only match one character.
Equivalence classes
Equivalenceclassestakethegeneralform[=tagname=] inside a set declaration, where tagname is either a single character, or a name of a collating element, and matches any character that is a member of the same primary equivalence class as the collating element [.tagname.]. An equivalence class is a set of characters that collate the same, a primary equivalence class is a set of characters whose primary sort key are all the same (for example strings are typically collated by character, then by accent, and then by case; the primary sort key then relates to the character, the secondary to the accentation, and the tertiary to the case). If there is no equivalence class corresponding to tagname, then [=tagname=] is exactly the same as [.tagname.].
To include a literal "-" in a set declaration then: make it the first character after the opening "[" or "[^", the endpoint of a range, a collating element, or precede it with an escape character as in "[\-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, a collating element, or precede with an escape character.
Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.
Back references
A back reference is a reference to a previous sub-expression that has already been matched, the reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" to the second etc. For example the expression "(.*)\1" matches any string that is repeated about its mid-point for example "abcabc" or "xyzxyz". A back reference to a sub-expression that did not participate in any match, matches the null string. In ISAPI_Rewrite all back references are global for entire RewriteRule and corresponding RewriteCond directives. Sub matches are numbered up to down and left to right beginning from the first RewriteCond directive of the corresponding RewriteRule directive, if there is one.
Forward Lookahead Asserts
There are two forms of these; one for positive forward lookahead asserts, and one for negative lookahead asserts:
"(?=abc)" matches zero characters only if they are followed by the expression "abc".
"(?!abc)" matches zero characters only if they are not followed by the expression "abc".
Word operators
The following operators are provided for compatibility with the GNU regular expression library.
"\w" matches any single character that is a member of the "word" character class, this is identical to the expression "[[:word:]]".
"\W" matches any single character that is not a member of the "word" character class, this is identical to the expression "[^[:word:]]".
"\<" matches the null string at the start of a word.
"\>" matches the null string at the end of the word.
"\b" matches the null string at either the start or the end of a word.
"\B" matches a null string within a word.
Escape operator
The escape character "\" has several meanings.
The escape operator may introduce an operator for example: back references, or a word operator.
The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.
Single character escape sequences:
The following escape sequences are aliases for single characters:
Escape sequence Character code Meaning
\a 0x07 Bell character.
\t 0x09 Tab character.
\v 0x0B Vertical tab.
\e 0x1B ASCII Escape character.
\0dd 0dd An octal character code, where dd is one or more octal digits.
\xXX 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits.
\x{XX} 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character.
\cZ z-@ An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'.
Miscellaneous escape sequences:
The following are provided mostly for perl compatibility, but note that there are some differences in the meanings of \l \L \u and \U:
Escape sequence Meaning
\w Equivalent to [[:word:]].
\W Equivalent to [^[:word:]].
\s Equivalent to [[:space:]].
\S Equivalent to [^[:space:]].
\d Equivalent to [[:digit:]].
\D Equivalent to [^[:digit:]].
\l Equivalent to [[:lower:]].
\L Equivalent to [^[:lower:]].
\u Equivalent to [[:upper:]].
\U Equivalent to [^[:upper:]].
\C Any single character, equivalent to '.'.
\X Match any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
\Q The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
\E The end quote operator, terminates a sequence begun with \Q.
What gets matched?
The regular expression will match the first possible matching string, if more than one string starting at a given location can match then it matches the longest possible string. In cases where their are multiple possible matches all starting at the same location, and all of the same length, then the match chosen is the one with the longest first sub-expression, if that is the same for two or more matches, then the second sub-expression will be examined and so on. Note that ISAPI_Rewrite uses MATCH algorithm. The result is matched only if the expression matches the whole input sequence. For example:
RewriteCond URL ^/somedir/.* #will match any request to somedir directory and subdirectories, while
RewriteCond URL ^/somedir/ #will match only request to the root of the somedir.
Special note about "pathological" regular expressions
ISAPI_Rewrite uses a very powerful regular expressions engine Regex++ written by Dr. John Maddock. But as any real thing it's not ideal: There exists some "pathological" expressions which may require exponential time for matching; these all involve nested repetition operators, for example attempting to match the expression "(a*a)*b" against N letter a's requires time proportional to 2N. These expressions can (almost) always be rewritten in such a way as to avoid the problem, for example "(a*a)*b" could be rewritten as "a*b" which requires only time linearly proportional to N to solve. In the general case, non-nested repeat expressions require time proportional to N2, however if the clauses are mutually exclusive then they can be matched in linear time - this is the case with "a*b", for each character the matcher will either match an "a" or a "b" or fail, where as with "a*a" the matcher can't tell which branch to take (the first "a" or the second) and so has to try both.
Boost 1.29.0 Regex++ could detect "pathological" regular expressions and terminate theirs matching. When a rule fails ISAPI_Rewrite sends "500 Internal Server error - Rule Failed" status to a client to indicate configuration error. Also the failed rule is disabled to prevent performance losses
Format string syntax
In format strings, all characters are treated as literals except: "(", ")", "$", "\", "?", ":".
To use any of these as literals you must prefix them with the escape character \
The following special sequences are recognized:
Grouping:
Use the parenthesis characters ( and ) to group sub-expressions within the format string, use \( and \) to represent literal '(' and ')'.
Sub-expression expansions:
The following perl like expressions expand to a particular matched sub-expression:
$` Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match.
$' Expands to all the text from the end of the match to the end of the input string.
$& Expands to all of the current match.
$0 Expands to all of the current match.
$N Expands to the text that matched sub-expression N.
Conditional expressions:
Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:
?Ntrue_expression:false_expression
Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.
Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.
Escape sequences:
The following escape sequences are also allowed:
\a The bell character.
\f The form feed character.
\n The newline character.
\r The carriage return character.
\t The tab character.
\v A vertical tab character.
\x A hexadecimal character - for example \x0D.
\x{} A possible unicode hexadecimal character - for example \x{1A0}
\cx The ASCII escape character x, for example \c@ is equivalent to escape-@.
\e The ASCII escape character.
\dd An octal character constant, for example \10
Examples例子
Emulating host-header-based virtual sites on a single site
例如你在两个域名注册www.site1.com 和 www.site2.com,现在你可以创建两个不同的站点而使用单一的物理站点。把以下规则加入到你的httpd.ini 文件
[ISAPI_Rewrite]
#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]
#Emulate site1
RewriteCond Host: (?:www\.)?site1\.com
RewriteRule (.*) /site1$1 [I,L]
#Emulate site2
RewriteCond Host: (?:www\.)?site2\.com
RewriteRule (.*) /site2$1 [I,L]
现在你可以把你的站点放在/site1 和 /site2 目录中.
或者你可以应用更多的类规则:
[ISAPI_Rewrite]
#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]
RewriteCond Host: (www\.)?(.+)
RewriteRule (.*) /$2$3
为站点应该命名目录为 /somesite1.com, /somesite2.info, etc.
Using loops (Next flag) to convert request parameters
假如你希望有物理URL如 http://www.myhost.com/foo.asp?a=A&b=B&c=C 使用请求如 http://www.myhost.com/foo.asp/a/A/b/B/c/C 参数数量可以从两个请求之间变化
至少有两个解决办法。你可以简单的为每一可能的参数数量添加一个分隔规则或者你可以使用一个技术说明如下面的例子
ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^/]*)?/([^/]*)/([^/]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]
这个规则将从请求的URL中抽取一个参数追加在请求字符的末尾并且从头重启规则进程。所以它将循环直到所有参数被移动到适当的位置,或者直到超过RepeatLimit
也存在许多这个规则的变种。但使用不同的分隔字符,例如。使用URLS如http://www.myhost.com/foo.asp~a~A~b~B~c~C 可以应中下面的规则:
ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^~]*)?~([^~]*)~([^~]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]
Running servers behind IIS
假如我们有一个内网服务器运行IIS而几个公司服务器运行其他平台,这些服务器不能从INTERNET直接进入,而只能从我们公司的网络进入,有一个简单的例子可以使用代理标记映射其他服务器到IIS命名空间:
[ISAPI_Rewrite]
RewriteProxy /mappoint(.+) http\://sitedomain$1 [I,U]
Moving sites from UNIX to IIS
这个规则可以帮助你把URL从 /~username 改变到 /username 和从 /file.html 改变到 /file.htm. 这个在你仅仅把你的站从UNIX移动到IIS并且保持搜索引擎和其他外部页面对老页面的连接时是有用的
[ISAPI_Rewrite]
#redirecting to update old links
RewriteRule (.*)\.html $1.htm
RewriteRule /~(.*) http\://myserver/$1 [R]
Moving site location
许多网管问这样的问题:他们要重定向所有的请求到一个新的网络服务器,当你需要建立一个更新的站点取代老的的时候经常出现这样的问题,解决方案是用ISAPI_Rewrite 于老服务器中
[ISAPI_Rewrite]
#redirecting to update old links
RewriteRule (.+) http\://newwebserver$1 [R]
Browser-dependent content
Dynamically generated robots.txt
robots.txt是一个搜索引擎用来发现能不能被索引的文件,但是为一个大站创建一个有许多动态内容的这个文件是很复杂的工作,我们可以写一个robots.asp script
现在使用单一规则生成 robots.txt
[ISAPI_Rewrite]
RewriteRule /robots\.txt /robots.asp
Making search engines to index dynamic pages
站点的内容存储在XML文件中,在服务器上有一个/XMLProcess.asp 文件处理XML文件并返回HTML到最终用户,URLS到文档有如下形式
http://www.mysite.com/XMLProcess.asp?xml=/somdir/somedoc.xml
但是许多公共引擎不能索引此类文档,因为URLS包含问号(文档动态生成),
ISAPI_Rewrite可以完全消除这个问题
[ISAPI_Rewrite]
RewriteRule /doc(.*)\.htm /XMLProcess.asp\?xml=$1.xml
现在使用如同http://www.mysite.com/doc/somedir/somedoc.htm的URL进入文档,搜索引擎将不知道不是somedoc.htm 文件并且内容是动态生成的
Negative expressions (NOT
有时当模式不匹配你需要应用规则,这种情况下你可以使用在规则表达式中称为Forward Lookahead Asserts
例如你需要不使用IE把所有用户移动到别的地点
[ISAPI_Rewrite]
# Redirect all non Internet Explorer users
# to another location
RewriteCond User-Agent: (?!.*MSIE).*
RewriteRule (.*) /nonie$1
Dynamic authentification
例如我们在站点上有一些成员域,我们在这个域上需要密码保护文件而我们不喜欢用BUILT-IN服务器安全,这个情况下可以建立一个ASP脚本(称为proxy.asp),这个脚本将代理所有请求到成员域并且检查请求允许,这里有一个简单的模板你可以放进你自己的授权代码
现在我们要通过配置 ISAPI_Rewrite 通过这个页面代理请求:
[ISAPI_Rewrite]
# Proxy all requests through proxy.asp
RewriteRule /members(.+) /proxy.asp\?http\://mysite.com/members$1
保护图片 防止盗链
Blocking inline-images (stop hot linking
假设我们在http://www.mysite.com/下有些页面调用一些GIF、jpg、png图片,不允许别人盗链引用到他们自己的页面上,因为这样大大增加了服务器流量。
当然我们不能100%保护图片,但我们至少可以在得到浏览器发出的HTTP Referer header的地方限制这种情况,因为这个可以判断是否我们自己的站点调用了我们自己的图片。
[ISAPI_Rewrite]
RewriteCond Host: (.+)
RewriteCond Referer: (?!http://\1.*).*
RewriteRule .*\.(?:gif|jpg|png) /block.gif [I,O]