1 | =head1 NAME |
---|
2 | |
---|
3 | perlreref - Perl Regular Expressions Reference |
---|
4 | |
---|
5 | =head1 DESCRIPTION |
---|
6 | |
---|
7 | This is a quick reference to Perl's regular expressions. |
---|
8 | For full information see L<perlre> and L<perlop>, as well |
---|
9 | as the L</"SEE ALSO"> section in this document. |
---|
10 | |
---|
11 | =head2 OPERATORS |
---|
12 | |
---|
13 | =~ determines to which variable the regex is applied. |
---|
14 | In its absence, $_ is used. |
---|
15 | |
---|
16 | $var =~ /foo/; |
---|
17 | |
---|
18 | !~ determines to which variable the regex is applied, |
---|
19 | and negates the result of the match; it returns |
---|
20 | false if the match succeeds, and true if it fails. |
---|
21 | |
---|
22 | $var !~ /foo/; |
---|
23 | |
---|
24 | m/pattern/igmsoxc searches a string for a pattern match, |
---|
25 | applying the given options. |
---|
26 | |
---|
27 | i case-Insensitive |
---|
28 | g Global - all occurrences |
---|
29 | m Multiline mode - ^ and $ match internal lines |
---|
30 | s match as a Single line - . matches \n |
---|
31 | o compile pattern Once |
---|
32 | x eXtended legibility - free whitespace and comments |
---|
33 | c don't reset pos on failed matches when using /g |
---|
34 | |
---|
35 | If 'pattern' is an empty string, the last I<successfully> matched |
---|
36 | regex is used. Delimiters other than '/' may be used for both this |
---|
37 | operator and the following ones. |
---|
38 | |
---|
39 | qr/pattern/imsox lets you store a regex in a variable, |
---|
40 | or pass one around. Modifiers as for m// and are stored |
---|
41 | within the regex. |
---|
42 | |
---|
43 | s/pattern/replacement/igmsoxe substitutes matches of |
---|
44 | 'pattern' with 'replacement'. Modifiers as for m// |
---|
45 | with one addition: |
---|
46 | |
---|
47 | e Evaluate replacement as an expression |
---|
48 | |
---|
49 | 'e' may be specified multiple times. 'replacement' is interpreted |
---|
50 | as a double quoted string unless a single-quote (') is the delimiter. |
---|
51 | |
---|
52 | ?pattern? is like m/pattern/ but matches only once. No alternate |
---|
53 | delimiters can be used. Must be reset with L<reset|perlfunc/reset>. |
---|
54 | |
---|
55 | =head2 SYNTAX |
---|
56 | |
---|
57 | \ Escapes the character immediately following it |
---|
58 | . Matches any single character except a newline (unless /s is used) |
---|
59 | ^ Matches at the beginning of the string (or line, if /m is used) |
---|
60 | $ Matches at the end of the string (or line, if /m is used) |
---|
61 | * Matches the preceding element 0 or more times |
---|
62 | + Matches the preceding element 1 or more times |
---|
63 | ? Matches the preceding element 0 or 1 times |
---|
64 | {...} Specifies a range of occurrences for the element preceding it |
---|
65 | [...] Matches any one of the characters contained within the brackets |
---|
66 | (...) Groups subexpressions for capturing to $1, $2... |
---|
67 | (?:...) Groups subexpressions without capturing (cluster) |
---|
68 | | Matches either the subexpression preceding or following it |
---|
69 | \1, \2 ... The text from the Nth group |
---|
70 | |
---|
71 | =head2 ESCAPE SEQUENCES |
---|
72 | |
---|
73 | These work as in normal strings. |
---|
74 | |
---|
75 | \a Alarm (beep) |
---|
76 | \e Escape |
---|
77 | \f Formfeed |
---|
78 | \n Newline |
---|
79 | \r Carriage return |
---|
80 | \t Tab |
---|
81 | \038 Any octal ASCII value |
---|
82 | \x7f Any hexadecimal ASCII value |
---|
83 | \x{263a} A wide hexadecimal value |
---|
84 | \cx Control-x |
---|
85 | \N{name} A named character |
---|
86 | |
---|
87 | \l Lowercase next character |
---|
88 | \u Titlecase next character |
---|
89 | \L Lowercase until \E |
---|
90 | \U Uppercase until \E |
---|
91 | \Q Disable pattern metacharacters until \E |
---|
92 | \E End case modification |
---|
93 | |
---|
94 | For Titlecase, see L</Titlecase>. |
---|
95 | |
---|
96 | This one works differently from normal strings: |
---|
97 | |
---|
98 | \b An assertion, not backspace, except in a character class |
---|
99 | |
---|
100 | =head2 CHARACTER CLASSES |
---|
101 | |
---|
102 | [amy] Match 'a', 'm' or 'y' |
---|
103 | [f-j] Dash specifies "range" |
---|
104 | [f-j-] Dash escaped or at start or end means 'dash' |
---|
105 | [^f-j] Caret indicates "match any character _except_ these" |
---|
106 | |
---|
107 | The following sequences work within or without a character class. |
---|
108 | The first six are locale aware, all are Unicode aware. The default |
---|
109 | character class equivalent are given. See L<perllocale> and |
---|
110 | L<perlunicode> for details. |
---|
111 | |
---|
112 | \d A digit [0-9] |
---|
113 | \D A nondigit [^0-9] |
---|
114 | \w A word character [a-zA-Z0-9_] |
---|
115 | \W A non-word character [^a-zA-Z0-9_] |
---|
116 | \s A whitespace character [ \t\n\r\f] |
---|
117 | \S A non-whitespace character [^ \t\n\r\f] |
---|
118 | |
---|
119 | \C Match a byte (with Unicode, '.' matches a character) |
---|
120 | \pP Match P-named (Unicode) property |
---|
121 | \p{...} Match Unicode property with long name |
---|
122 | \PP Match non-P |
---|
123 | \P{...} Match lack of Unicode property with long name |
---|
124 | \X Match extended unicode sequence |
---|
125 | |
---|
126 | POSIX character classes and their Unicode and Perl equivalents: |
---|
127 | |
---|
128 | alnum IsAlnum Alphanumeric |
---|
129 | alpha IsAlpha Alphabetic |
---|
130 | ascii IsASCII Any ASCII char |
---|
131 | blank IsSpace [ \t] Horizontal whitespace (GNU extension) |
---|
132 | cntrl IsCntrl Control characters |
---|
133 | digit IsDigit \d Digits |
---|
134 | graph IsGraph Alphanumeric and punctuation |
---|
135 | lower IsLower Lowercase chars (locale and Unicode aware) |
---|
136 | print IsPrint Alphanumeric, punct, and space |
---|
137 | punct IsPunct Punctuation |
---|
138 | space IsSpace [\s\ck] Whitespace |
---|
139 | IsSpacePerl \s Perl's whitespace definition |
---|
140 | upper IsUpper Uppercase chars (locale and Unicode aware) |
---|
141 | word IsWord \w Alphanumeric plus _ (Perl extension) |
---|
142 | xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit |
---|
143 | |
---|
144 | Within a character class: |
---|
145 | |
---|
146 | POSIX traditional Unicode |
---|
147 | [:digit:] \d \p{IsDigit} |
---|
148 | [:^digit:] \D \P{IsDigit} |
---|
149 | |
---|
150 | =head2 ANCHORS |
---|
151 | |
---|
152 | All are zero-width assertions. |
---|
153 | |
---|
154 | ^ Match string start (or line, if /m is used) |
---|
155 | $ Match string end (or line, if /m is used) or before newline |
---|
156 | \b Match word boundary (between \w and \W) |
---|
157 | \B Match except at word boundary (between \w and \w or \W and \W) |
---|
158 | \A Match string start (regardless of /m) |
---|
159 | \Z Match string end (before optional newline) |
---|
160 | \z Match absolute string end |
---|
161 | \G Match where previous m//g left off |
---|
162 | |
---|
163 | =head2 QUANTIFIERS |
---|
164 | |
---|
165 | Quantifiers are greedy by default -- match the B<longest> leftmost. |
---|
166 | |
---|
167 | Maximal Minimal Allowed range |
---|
168 | ------- ------- ------------- |
---|
169 | {n,m} {n,m}? Must occur at least n times but no more than m times |
---|
170 | {n,} {n,}? Must occur at least n times |
---|
171 | {n} {n}? Must occur exactly n times |
---|
172 | * *? 0 or more times (same as {0,}) |
---|
173 | + +? 1 or more times (same as {1,}) |
---|
174 | ? ?? 0 or 1 time (same as {0,1}) |
---|
175 | |
---|
176 | There is no quantifier {,n} -- that gets understood as a literal string. |
---|
177 | |
---|
178 | =head2 EXTENDED CONSTRUCTS |
---|
179 | |
---|
180 | (?#text) A comment |
---|
181 | (?imxs-imsx:...) Enable/disable option (as per m// modifiers) |
---|
182 | (?=...) Zero-width positive lookahead assertion |
---|
183 | (?!...) Zero-width negative lookahead assertion |
---|
184 | (?<=...) Zero-width positive lookbehind assertion |
---|
185 | (?<!...) Zero-width negative lookbehind assertion |
---|
186 | (?>...) Grab what we can, prohibit backtracking |
---|
187 | (?{ code }) Embedded code, return value becomes $^R |
---|
188 | (??{ code }) Dynamic regex, return value used as regex |
---|
189 | (?(cond)yes|no) cond being integer corresponding to capturing parens |
---|
190 | (?(cond)yes) or a lookaround/eval zero-width assertion |
---|
191 | |
---|
192 | =head2 VARIABLES |
---|
193 | |
---|
194 | $_ Default variable for operators to use |
---|
195 | $* Enable multiline matching (deprecated; not in 5.9.0 or later) |
---|
196 | |
---|
197 | $& Entire matched string |
---|
198 | $` Everything prior to matched string |
---|
199 | $' Everything after to matched string |
---|
200 | |
---|
201 | The use of those last three will slow down B<all> regex use |
---|
202 | within your program. Consult L<perlvar> for C<@LAST_MATCH_START> |
---|
203 | to see equivalent expressions that won't cause slow down. |
---|
204 | See also L<Devel::SawAmpersand>. |
---|
205 | |
---|
206 | $1, $2 ... hold the Xth captured expr |
---|
207 | $+ Last parenthesized pattern match |
---|
208 | $^N Holds the most recently closed capture |
---|
209 | $^R Holds the result of the last (?{...}) expr |
---|
210 | @- Offsets of starts of groups. $-[0] holds start of whole match |
---|
211 | @+ Offsets of ends of groups. $+[0] holds end of whole match |
---|
212 | |
---|
213 | Captured groups are numbered according to their I<opening> paren. |
---|
214 | |
---|
215 | =head2 FUNCTIONS |
---|
216 | |
---|
217 | lc Lowercase a string |
---|
218 | lcfirst Lowercase first char of a string |
---|
219 | uc Uppercase a string |
---|
220 | ucfirst Titlecase first char of a string |
---|
221 | |
---|
222 | pos Return or set current match position |
---|
223 | quotemeta Quote metacharacters |
---|
224 | reset Reset ?pattern? status |
---|
225 | study Analyze string for optimizing matching |
---|
226 | |
---|
227 | split Use regex to split a string into parts |
---|
228 | |
---|
229 | The first four of these are like the escape sequences C<\L>, C<\l>, |
---|
230 | C<\U>, and C<\u>. For Titlecase, see L</Titlecase>. |
---|
231 | |
---|
232 | =head2 TERMINOLOGY |
---|
233 | |
---|
234 | =head3 Titlecase |
---|
235 | |
---|
236 | Unicode concept which most often is equal to uppercase, but for |
---|
237 | certain characters like the German "sharp s" there is a difference. |
---|
238 | |
---|
239 | =head1 AUTHOR |
---|
240 | |
---|
241 | Iain Truskett. |
---|
242 | |
---|
243 | This document may be distributed under the same terms as Perl itself. |
---|
244 | |
---|
245 | =head1 SEE ALSO |
---|
246 | |
---|
247 | =over 4 |
---|
248 | |
---|
249 | =item * |
---|
250 | |
---|
251 | L<perlretut> for a tutorial on regular expressions. |
---|
252 | |
---|
253 | =item * |
---|
254 | |
---|
255 | L<perlrequick> for a rapid tutorial. |
---|
256 | |
---|
257 | =item * |
---|
258 | |
---|
259 | L<perlre> for more details. |
---|
260 | |
---|
261 | =item * |
---|
262 | |
---|
263 | L<perlvar> for details on the variables. |
---|
264 | |
---|
265 | =item * |
---|
266 | |
---|
267 | L<perlop> for details on the operators. |
---|
268 | |
---|
269 | =item * |
---|
270 | |
---|
271 | L<perlfunc> for details on the functions. |
---|
272 | |
---|
273 | =item * |
---|
274 | |
---|
275 | L<perlfaq6> for FAQs on regular expressions. |
---|
276 | |
---|
277 | =item * |
---|
278 | |
---|
279 | The L<re> module to alter behaviour and aid |
---|
280 | debugging. |
---|
281 | |
---|
282 | =item * |
---|
283 | |
---|
284 | L<perldebug/"Debugging regular expressions"> |
---|
285 | |
---|
286 | =item * |
---|
287 | |
---|
288 | L<perluniintro>, L<perlunicode>, L<charnames> and L<locale> |
---|
289 | for details on regexes and internationalisation. |
---|
290 | |
---|
291 | =item * |
---|
292 | |
---|
293 | I<Mastering Regular Expressions> by Jeffrey Friedl |
---|
294 | (F<http://regex.info/>) for a thorough grounding and |
---|
295 | reference on the topic. |
---|
296 | |
---|
297 | =back |
---|
298 | |
---|
299 | =head1 THANKS |
---|
300 | |
---|
301 | David P.C. Wollmann, |
---|
302 | Richard Soderberg, |
---|
303 | Sean M. Burke, |
---|
304 | Tom Christiansen, |
---|
305 | Jim Cromie, |
---|
306 | and |
---|
307 | Jeffrey Goff |
---|
308 | for useful advice. |
---|
309 | |
---|
310 | =cut |
---|