153 167 2002 Processing Mechanism on Regular Expressions Yoshiyuki SAKAMOTO and Hiroyuki EDO awk NFA DFA DFANFA POSIX NFA DFA NFA NFA NFATcl, Perl, Python, GNU Emacs, ed, sed, vi, grep egrep, awk DFA egrep, awk lex, flex DFA POSIX NFA POSIX NFA 153
2002 awk Aho,Weinberger,Kernighan DFA awk Brian Kernighan DFA GNU awk Arnold Robbins DFA NFA MKS awk Mortice Kern Systems POSIX NFA mawk Mike Brennan POSIX NFA egrep Alfred Aho DFA MKS egrep Mortice Kern Systems POSIX NFA GNU Emacs Richard Stallman NFA(POSIX NFA ) Expect Don Libes NFA expr Dick Haight NFA grep Ken Thompson NFA GNU grep Mike Haertel 2.0 DFA NFA GNU find GNU NFA lex Mike Lesk DFA flex Vern Pazon DFA lex Mortice Kern Systems POSIX NFA more Eric Schienbrood NFA less Mark Nudelman (NFA) Perl Larry Wall NFA Python Guido van Rossum NFA sed Lee McMahon NFA Tcl John Ousterhout NFA vi Bill Joy NFA 2.1 Perl 2.2 cat The dragging belly indicates your cat is too fat cat indicates fat cat belly ourfat belly 154
The dragging belly indicates your cat is too fat 2.3 usa u s a \ w \ W \ d \ D \ s \ S 2.4? + {min, max} \ < \ w+s \ > regexes s \ w+ s \ w+ regexess \ > [0-9]+March_1998 998 Subject Subject ^Subject:_ ^Subject:_(. ) Perl $1 Subject.. $line Subject: Re: happy birthday Perl if ($line = ~ m/^subject: (. )/ ) if { print "The subject is: $1 \ n"; if } The subject is: Re: happy birthday ^(Re:_)?. ^(Re:_)? Re:_. ^(Re:_) Re: 155
2002 if ($line = ~ m/^subject: (Re: )?(. )/ ) if {print "the subject is: $2 \ n"; if } The subject is: happy birthday ^Subject: ( (Re:_)?. ) Re:_ ^Subject:_(. ).. ^. ([0-9][0-9]) about_24_characters_long $1 24 ^. ([0-9]+) [0-9]+ [0-9] [0-9]. NFAregex directeddfa text-directed 3.1 NFA to(nite knight night) tonight t to(nite knight night) to nite knight night (nite knight night) NFA NFA 3.2 DFA knight g h t NFA DFA NFA DFA t onight t o(nite knight night) 156
toni ght to(ni te knight night) 3.3 Nondeterministic Finite Automaton:NFA Deterministic Finite Automaton:DFA NFA tonight to(ni(ght te) knight) tonite toknight tonight to(k?night nite) DFA DFA DFA DFA NFA NFA 4.1 NFA hot_tonic_tonight! to(nite knight night) t h tonic to. nite n + i + t toni c knight tonight! 4.2 LIFO last in first out: 157
2002 4.3 abc abc a a bc a b?c a bc ab? c b ab c ab? c ac b a c ab? c abx a bx ab? c a bx ab? c 1 ab x abx 4.4? [0-9]+ a_1234_num a_1 234_num a_12 34_num a_123 4_num a_1234 _num a_ 1234_num ^. ([0-9][0-9]) CA_95472,_usa. 12 0-9 0-9 0-9 158
McDonald s 0-9 0-9 20-9 CA_95472,_USA $172 NFADFA DFA NFA NFA 5.1. The name "McDonald s" is said "makudonarudo" in Japanese The name "McDonald s" is said "mkudonarudo" in Japanese.. McDonald s. [^"] [^"] McDonald s [^"] The name "McDonald s" is said "makudonarudo" in Japanese 5.2 <B>...</B> <B> </B> <B>Billions</B>and<B>Ziillions</B> of suns <B>. </B> <B> </B> 5.3 159
2002 <B>. </B> <B>Billions</B> and <B>Zillions</B > of suns <B>Billions</B> and <B>Zillions</B > of suns Perl 5.4 1.625 300 1.625000000 02828 3.0000000002882 $price $price = ~ s/( \. \ d \ d[1-9]?)\ d /$1/ \. \ d \ [-9]? 1 \ \ d+ $price = ~ s /( \. \ d \ d[1-9]?)\ d+/$1/ 5.5 NFA ^(Subject Date):_ Subject :_ tour to tournament three_tournaments_ won tour NFA tournament POSIX NFA DFA 5.6 5.4( \.\ d \ d[1-9]?)\ d ( \.\ d \ d \.\ d \ d[1-9])\ d ( \.\ d \ d[1-9]?)\ d ) Jan 31 Jan_[0123][0-9] Jan_00 Jan_39 Jan_7 Jan_(0?[1-9] [12][0-9] 3[01]) 160
Jan 31 is my dad s birthday Jan 3 Jan_(31 [123]0 [012]? [1-9]) Jan_(0[1-9] [12][0-9]? 3[01]? [4-9]) 5.7 NFA 5.8 [abc] a b c NFA DFA NFA,DFA POSIX 6.1 Longest-Leftmost oneselfsufficient one(self)?(selfsufficient)? NFA onseselfsufficient 158 161
2002 DFA oneselfsufficient DFA SRC=array.c builtin.c eval.c field.c gaw kmisc.c io.c main.c \ Missing.c msg.c no de.c re.c version.c ( \ \ \ n. ) NFA. 6.2 POSIX POSIX POSOIX longest of the leftmost DFA DFA DFANFA 6.3 DFA NFA DFA NFA NFA NFA POSIX NFA NFA NFA NFADFA 6.4 DFA NFA 7.1 NFA 162
7.2 NFA^ \ w+=. ( \ \ \ n. ) SRC=array.c builtin.c eval.c field.c gaw kmisc.c io.c main.c \ missing.c msg.c no de.c re.c version.c ^ \ w+=[^ \ n \\] ( \\\ n[^ \ n \\] ) IP IP ^[0-9]+ \.[0-9]+ \.[0-9]+ \.[0-9]$ ^ \ d \ d \ d \.\ d \ d \ d \.\ d \ d \ d \.\ d \ d \ d \ $ \ d d \ d \ [01] \ d \ d 2[0-4] \ d 25[0-5] [01]? \ d \ d? 2[0-4] \ d 25[0-5] ^([01]? \ d \ d? 2[0-4] \ d 25[0-5]) \. ([01]? \ d \ d? 2[0-4] \ d 25[0-5]) \.([01 ]? \ d \ d? 2[0-4] \ d 25[0-5]) \.([01]? \ d \ d? 2[0-4] \ d 25[0-5])$ 7.3 ". " "[^"]. " \ (.\ \ ) \ ([^)] \ ) \ ([^( )] \ ) (this) foo 7.4 163
2002 -?([0-9]+(1 \.[0-9] )? \.[0-9]+) 7.5 / / HTML <CODE><...> <A_HREF="...">anchor_text</A> anchor_text HTML.mailrc alias jeff jfriedl@ora.com alias for your passport, you need a "2 \"x3 \ "likeness" of yourself \\" "([^"] \\") " DFA POSIX "( \ \?" [^"]) " "someone has?" forgotten?" the closing quote 164
"( \\" [^ \\"]) " DFA POSIX NFA NFA 7.6 7.7 /usr/local/bin/gcc gcc $filename $Wholepath= ~ m!([^/] )$!; #$path $FileName = $1; # /usr/local/bin/prel 40 Tcl $1 $2 if($wholepath = ~m!^(. )/(. )$!){ $LeadingPath = $1; $FileName = $2; }else{ $LeadingPath = "."; # file.txt "./file.txt" $FileName = $WholePath; } Tcl if [regexp -indices. /$WholePath Match] Perl Tcl Python $filename = ~ S!^. /!!; regsub"^. /" $filename " "filename filename = regsub.sub("^. /"," ", filename) 165
2002 { # set LeadingPath [string range $Whole Path 0 [expr [lindex $ Match 1] -1]] set File Name [string range $ Whole path [expr [Index $ Match 1]+1] end] } { # set LeadingPath. set FileName $Whole Path } rindex 8.1 NFADFA NFA POSIX NFA DFA POSIX POSIX DFA NFA NFA POSIX NFA NFA 8.2 NFA NFA NFA NFA DFA Jeffrey E.F.Friedl 1999 166
I 2000 II 2001 2001 167