넓고 얕은 지식을 지향하는 무식한 개발자: [펌] 정규 표현식 옵션 (Update:20151027)

http://kiros33.blog.me/130179104643

Revision History
2015/10/27 15:29:32 - 포맷 정리 및 정규식 설명쪽에 링크 추가

Reference Page
정규식 사용법

원본 : Regular Expression Options

원본글을 내가 보기 좋게 다시 정리만 했습니다. 간단한 영어라서 설명을 패쑤 -_-;;;

Regular Expression Options

Regular expression matches and substitutions have a whole set of options which you can toggle on by appending one or more of the i, m, s, g, e or x modifiers to the end of the operation. See Programming Perl Page 153 for more information. Some example:

$string = 'Big Bad WOLF!';

print "There's a wolf in the closet!" if $string =~ /wolf/i;

# i is used for a case insensitive match

옵션

Case insensitive match.

Global match (see below).

Evalute right side of s/// as an expression.

Only compile variable patterns once (see below).

Treat string as multiple lines. ^ and $ will match at start and end of internal lines, as well as at beginning and end of whole string. Use \A and \Z to match beginning and end of whole string when this is turned on.

Treat string as a single line. "." will match any character at all, including newline.

Allow extra whitespace and comments in pattern.

Global Matches

Adding the g modifier to the pattern causes the match to be global. Called in a scalar context (such as an if or while statement), it will match as many times as it can.

This will match all codons in a DNA sequence, printing them out on separate lines:

Code:

$sequence = 'GTTGCCTGAAATGGCGGAACCTTGAA';

while ( $sequence =~ /(.{3})/g ) {

print $1,"\n";

}

Output:

GTT

GCC

TGA

AAT

GGC

GGA

ACC

TTG

If you perform a global match in a list context (e.g. assign its result to an array), then you get a list of all the subpatterns that matched from left to right. This code fragment gets arrays of codons in three reading frames:

@frame1 = $sequence =~ /(.{3})/g;

@frame2 = substr($sequence,1) =~ /(.{3})/g;

@frame3 = substr($sequence,2) =~ /(.{3})/g;

The position of the most recent match can be determined by using the pos function.

Code:

#file:pos.pl

my $seq = "XXGGATCCXX";

if ( $seq =~ /(GGATCC)/gi ){

my $pos = pos($seq);

print "Our Sequence: $seq\n";

print '$pos = ', "1st postion after the match: $pos\n";

print '$pos - length($1) = 1st postion of the match: ',($pos-length($1)),"\n";

print '($pos - length($1))-1 = 1st postion before the the match: ',($pos-length($1)-1),"\n";

}

Output:

~]$ ./pos.pl

Our Sequence: XXGGATCCXX

$pos = 1st postion after the match: 8

$pos - length($&) = 1st postion of the match: 2

($pos - length($&))-1 = 1st postion before the the match: 1

Variable Interpolation and the "o" Modifier

If you use a variable inside a pattern template, as in /$pattern/ be aware that there is a small performance penalty each time Perl encounters a pattern it hasn't seen before. If $pattern doesn't change over the life of the program, then use the o ("once") modifier to tell Perl that the variable won't change. The program will run faster:

$codon = '.{3}';

@frame1 = $sequence =~ /($codon)/og;

Testings Your Regular Expressions

To be sure that you are getting what you think you want you can use the following "Magic" Perl Automatic Match Variables $&, $`, and $'

Code:

#file:matchTest.pl

if ("Hello there, neighbor" =~ /\s(\w+),/){

print "That actually matched '$&'.\n";

print "That was ($`) ($&) ($').\n";

}

Output:

That actually matched ' there,'.

That was (Hello) ( there,) ( neighbor).

검색 : Regular Expression, RegEx, RegExp, 정규식, 정규표현식, 정규 표현식, Option Options, 옵션

넓고 얕은 지식을 지향하는 무식한 개발자

2015/10/26

[펌] 정규 표현식 옵션 (Update:20151027)

http://kiros33.blog.me/130179104643

댓글 없음:

댓글 쓰기