Pi-hole regular expressions tutorial¶
We provide a short but thorough introduction to our regular expressions implementation. This may come in handy if you are designing rules to deny or allow domains (see also our cheat sheet below!). In our implementation, all characters match themselves except for the following special characters: .[]{}()\*+?|^$. If you want to match those, you need to escape them like \. for a literal period, but no rule without exception (see character groups below for further details).
Anchors (^ and $)¶
First of all, we look at anchors that can be used to indicate the start or the end of a domain, respectively. If you don't specify anchors, the match may be partial (see examples below).
| Example | Interpretation | 
|---|---|
| domain | partial match. Without anchors, a text may appear anywhere in the domain. This matches some.domain.com,domain.comandverylongdomain.comand more | 
| ^localhost$ | exact match matching only localhostbut neithera.localhostnorlocalhost.com | 
| ^abc | matches any domain starting ( ^) in "abc" likeabcdomain.com,abc.domain.combut notdef.abc.com | 
| com$ | matches any domain ending ( $) in "com" such asdomain.combut notdomain.com.co.uk | 
Wildcard (.)¶
An unescaped period stands for any single character.
| Example | Interpretation | 
|---|---|
| ^domain.$ | matches domaina,domainb,domainc, but notdomain | 
Bounds and multipliers ({}, *, +, and ?)¶
With bounds, one can denote the number of times something has to occur:
| Bound | Meaning | 
|---|---|
| ab{4} | matches a domain that contains a single afollowed by fourb(matching onlyabbbb) | 
| ab{4,} | matches a domain that contains a single afollowed by at least fourb(matching alsoabbbbbbbb) | 
| ab{3,5} | matches a domain that contains a single afollowed by three to fiveb(matching onlyabbb,abbbb, andabbbbb) | 
Multipliers are shortcuts for some of the bounds that are needed most often:
| Multipliers | Bounds equivalent | Meaning | 
|---|---|---|
| ? | {0,1} | never or once (optional) | 
| * | {0,} | never or more (optional) | 
| + | {1,} | once or more (mandatory) | 
To illustrate the usefulness of multipliers (and bounds), we provide a few examples:
| Example | Interpretation | 
|---|---|
| ^r-*movie | matches a domain like r------movie.comwhere the number of dashes can be arbitrary (also none) | 
| ^r-?movie | matches only the domains rmovie.comandr-movie.combut not those with more than one dash | 
| ^r-+movie | matches only the domains with at least one dash, i.e., not rmovie.com | 
| ^a?b+ | matches domains like abbbb.com(zero or oneaat the beginning followed by one or moreb) | 
Character groups ([])¶
With character groups, a set of characters can be matched:
| Character group | Interpretation | 
|---|---|
| [abc] | matches a,b, orc(using explicitly specified characters) | 
| [a-c] | matches a,b, orc(using a range) | 
| [a-c]+ | matches any non-zero number of a,b,c | 
| [a-z] | matches any single lowercase letter | 
| [a-zA-Z] | matches any single letter | 
| [a-z0-9] | matches any single lowercase letter or any single digit | 
| [^a-z] | Negation matching any single character except lowercase letters | 
| abc[0-9]+ | matches the string abcfollowed by a number of arbitrary length | 
Bracket expressions are an exception to the character escape rule. Inside them, all special characters, including the backslash (\), lose their special powers, i.e. they match themselves exactly. Furthermore, to include a literal ] in the list, make it the first character (like []] or [^]] if negated). To include a literal -, make it the first or last character, or the second endpoint of a range (e.g. [a-z-] to match a to z and -).
Groups (())¶
Using groups, we can enclose regular expressions, they are most powerful when combined with bounds or multipliers (see also alternations below).
| Example | Interpretation | 
|---|---|
| (abc) | matches abc(trivial example) | 
| (abc)* | matches zero or more copies of abclikeabcabcbut notabcdefabc | 
| (abc){1,3} | matches one, two or three copies of abc:abc,abcabc,abcabcabcbut nothing else | 
Alternations (|)¶
Alternations can be used as an "or" operator in regular expressions.
| Example | Interpretation | 
|---|---|
| (abc)|(def) | matches abcanddef | 
| domain(a|b)\.com | matches domaina.comanddomainb.combut notdomain.comordomainx.com | 
| domain(a|b)*\.com | matches domain.com,domainaaaa.comdomainbbb.combut notdomainab.com(any number ofaorbin betweendomainand.com) | 
Character classes ([:class:])¶
In addition to character groups, there are also some special character classes available, such as
| Character class | Equivalent | Pi-hole specific | Interpretation | 
|---|---|---|---|
| [[:digit:]] | [0-9] | No | digits | 
| [[:lower:]] | [a-z] | No | lowercase letters* | 
| [[:upper:]] | [A-Z] | No | uppercase letters* | 
| [[:alpha:]] | [A-Za-z] | No | alphabetic characters* | 
| [[:alnum:]] | [A-Za-z0-9] | No | alphabetic characters* and digits | 
| [[:blank:]] | [ \t] | Yes | blank characters | 
| [[:cntrl:]] | N/A | Yes | control characters | 
| [[:graph:]] | N/A | Yes | all printable characters except space | 
| [[:print:]] | N/A | Yes | printable characters including space | 
| [[:punct:]] | N/A | Yes | printable characters not space or alphanumeric | 
| [[:space:]] | [ \f\n\r\t\v] | Yes | white-space characters | 
| [[:xdigit:]] | [0-9a-fA-F] | Yes | hexadecimal digits | 
* FTL matches case-insensitive by default as case does not matter in domain names
Note that character classes are abbreviations, they need to be used in character groups, i.e., enclosed in []. As such, the equivalent of [0-9] would be [[:digit:]], not [:digit:]. It is allowed to mix character classes with classical character groups. For example, [a-z0-9] is identical to [a-z[:digit:]].
Advanced examples¶
After going through our quick tutorial, we provide some more advanced examples so you can test your knowledge.
Block domain with only numbers¶
^[0-9][^a-z]+\.((com)|(edu))$
Blocks domains containing only numbers (no letters) and ending in .com or .edu. This blocks 555661.com, and 456.edu, but not 555g555.com
Block domains without subdomains¶
^[a-z0-9]+([-]{1}[a-z0-9]+)*\.[a-z]{2,7}$
A domain name shall not start or end with a dash but can contain any number of them. It must be followed by a TLD (we assume a valid TLD length of two to seven characters)
Cheatsheet¶
| Expression | Meaning | Example | 
|---|---|---|
| ^ | Beginning of string | ^clientmatches strings that begin withclient, such asclient.server.combut notmore.client.server.com(exception: within a character range ([])^means negation) | 
| $ | End of string | ing$matchesexcitingbut notingenious | 
| * | Match zero or more of the previous | ah*matchesahhhhhora | 
| ? | Match zero or one of the previous | ah?matchesaorah | 
| + | Match one or more of the previous | ah+matchesahorahhhbut nota | 
| . | Wildcard character, matches any character | do.*matchesdo,dog,door,dot, etc.;do.+matchesdog,door,dot, etc. but notdo(wildcard with+requires at least one extra character for matching) | 
| ( ) | Group | Enclose regular expressions, see the example for | | 
| | | Alternation | (mon|tues)daymatchesmondayortuesdaybut notfridayormondiag | 
| [ ] | Matches a range of characters | [cbf]armatchescar,bar, orfar; | 
| [^] | Negation | [^0-9]matches any character except0to9 | 
| { } | Matches a specified number of occurrences of the previous | [0-9]{3}matches any three-digit number like315but not31;[0-9]{2,4}matches two- to four-digit numbers like12,123, and1234but not1or12345;[0-9]{2,}matches any number with two or more digits like1234567,123456789, but not1 | 
| \ | Used to escape a special character not inside [] | google\.commatchesgoogle.com |