2015/10/26

[펌] 정규 표현식 옵션 (Update:20151027)

http://kiros33.blog.me/130179104643


Revision History
2015/10/27 15:29:32 - 포맷 정리 및 정규식 설명쪽에 링크 추가

Reference Page
정규식 사용법

원본 : Regular Expression Options


원본글을 내가 보기 좋게 다시 정리만 했습니다. 간단한 영어라서 설명을 패쑤 -_-;;; 

Regular Expression Options

Regular expression matches and substitutions have a whole set of options which you can toggle on by appending one or more of the i, m, s, g, e or x modifiers to the end of the operation. See Programming Perl Page 153 for more information. Some example:


$string = 'Big Bad WOLF!';
print "There's a wolf in the closet!" if $string =~ /wolf/i;
# i is used for a case insensitive match

옵션
i
Case insensitive match.
g
Global match (see below).
e
Evalute right side of s/// as an expression.
o
Only compile variable patterns once (see below).
m
Treat string as multiple lines. ^ and $ will match at start and end of internal lines, as well as at beginning and end of whole string. Use \A and \Z to match beginning and end of whole string when this is turned on.
s
Treat string as a single line. "." will match any character at all, including newline.
x
Allow extra whitespace and comments in pattern. 
 
Global Matches


Adding the g modifier to the pattern causes the match to be global. Called in a scalar context (such as an if or while statement), it will match as many times as it can.

This will match all codons in a DNA sequence, printing them out on separate lines:

Code:

$sequence = 'GTTGCCTGAAATGGCGGAACCTTGAA';
while ( $sequence =~ /(.{3})/g ) {
print $1,"\n";
}

Output:

GTT
GCC
TGA
AAT
GGC
GGA
ACC
TTG

If you perform a global match in a list context (e.g. assign its result to an array), then you get a list of all the subpatterns that matched from left to right. This code fragment gets arrays of codons in three reading frames:


@frame1 = $sequence =~ /(.{3})/g;
@frame2 = substr($sequence,1) =~ /(.{3})/g;
@frame3 = substr($sequence,2) =~ /(.{3})/g;

The position of the most recent match can be determined by using the pos function.

Code:


#file:pos.pl
my $seq = "XXGGATCCXX";

if ( $seq =~ /(GGATCC)/gi ){
my $pos = pos($seq);
print "Our Sequence: $seq\n";
print '$pos = ', "1st postion after the match: $pos\n";
print '$pos - length($1) = 1st postion of the match: ',($pos-length($1)),"\n";
print '($pos - length($1))-1 = 1st postion before the the match: ',($pos-length($1)-1),"\n";
}

Output:


~]$ ./pos.pl
Our Sequence: XXGGATCCXX
$pos = 1st postion after the match: 8
$pos - length($&) = 1st postion of the match: 2
($pos - length($&))-1 = 1st postion before the the match: 1

Variable Interpolation and the "o" Modifier

If you use a variable inside a pattern template, as in /$pattern/ be aware that there is a small performance penalty each time Perl encounters a pattern it hasn't seen before. If $pattern doesn't change over the life of the program, then use the o ("once") modifier to tell Perl that the variable won't change. The program will run faster:


$codon = '.{3}';
@frame1 = $sequence =~ /($codon)/og;

Testings Your Regular Expressions

To be sure that you are getting what you think you want you can use the following "Magic" Perl Automatic Match Variables $&, $`, and $'

Code: 

#file:matchTest.pl

if ("Hello there, neighbor" =~ /\s(\w+),/){
print "That actually matched '$&'.\n";
print "That was ($`) ($&) ($').\n";
}

Output:


That actually matched ' there,'.
That was (Hello) ( there,) ( neighbor).


검색 : Regular Expression, RegEx, RegExp, 정규식, 정규표현식, 정규 표현식, Option Options, 옵션

댓글 없음:

댓글 쓰기