| | | |
Home /
فیلتر کردن و استفاده از regex در لینوکس شل
Uploading ....
1: The Unix Operating System — Filters & Regular Expressions Selecting Fields with ‘cut’ • The cut command uses one delimiter between two fields • A number of whitespaces may confuse it Example: Try to print only file size and name $ ls -l gnasl -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ ls -l | cut -d ’ ’ -f 5,9 staff 12 $ _ The ‘awk’ Filter • Strictly speaking, not just a filter but a programming language • Without knowing the language, it’s still useful for some tasks Example: Select fields from ls -l output with awk $ ls -l gnasl | awk ’{ print $5, $9 }’ 2894 gnasl $ ls -l gnasl | awk ’{ print $5, "\t", $9 }’ 2894 gnasl $ _
44 1: The Unix Operating System — Filters & Regular Expressions Regular Expressions • Regular Expressions can be used for describing Text Patterns • Example: ^g matches text lines starting with a lowercase “g” • Dialects differ, depending on the tools used Basic Operators These are understood by most tools supporting regular expressions: \ [AaBbCc] [a-z] [^a-z] . * ^ $ \< and \> activate or deactivate an operator, example: ̈\\ ̈ produces a backslash matches one character from the set {A, a, B, b, C, c} matches a range, here between “a” and “z” matches one character that is not within the range specified here matches one character (any) matches zero to infinity occurrances of the preceding expression, example: ̈ * ̈ matches any number of space characters matches the beginning of the current line matches the current line’s end matches the beginning and the end of a word, example: ̈\<Hugo\> ̈ matches “Hugo” as a whole word
45 1: The Unix Operating System — Filters & Regular Expressions Example: ‘ls’ Output Display only symbolic links: $ ls -l | grep "^l" lrwxrwxrwx 1 hugo staff 17 Jul 26 2001 foo -> bar lrwxrwxrwx 1 hugo staff 17 Sep 13 2001 x -> ../y $_ Example: Log File Select only the entries from the 28th and 29th of March 2001 in the Apache log file. Here’s the format from which we want to get the information: $ tail -1 access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" $ _ This is the regular expression used for getting the entries: $ grep "2[89]/Mar/2001.*/.*\.html" access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" [...] myhost [29/Mar/2001:17:00:12 +0200] "GET /b.html" [...] $ _
46 1: The Unix Operating System — Filters & Regular Expressions The ‘sed’ Filter • • • • • sed stands for Stream Editor It can be used to manipulate text in a data stream Like grep, sed can use regular expressions We concentrate on the substitute command here More than one expression can be specified using “-e” Example: Evaluate a configuration file $ cat config.conf # Configuration file set A b set B c $ grep -v "^ *#" config.conf | sed ’s/^set *//’ \ > | sed ’s/ */=/’ A=b B=c $ grep -v "^ *#" config.conf \ > | sed -e ’s/^set *//’ -e ’s/ */=/’ A=b B=c $ eval ‘grep -v "^ *#" config.conf \ > | sed -e ’s/^set *//’ -e ’s/ */=/’‘ $ echo $A b $ _
47 1: The Unix Operating System — Filters & Regular Expressions More ‘sed’ • The substitute command can take options: Ignore case: “i” and global replace: “g” (replace not only the first match) • They get appended to the expression: ’s/foo/bar/gi’ • What if the source or destination pattern contains slashes? • Escape the slashes with backslashes (can be difficult if the pattern is a variable’s content) or use a different separator, any character is allowed! Example: Remove double slashes in path specs $ echo /usr//local/bin:/home/herbert///data \ > | sed ’|//*|/|g’ /usr/local/bin:/home/herbert/data $ _ • We can also reference matches from the search pattern • \( and \) address a subpattern in the search field • \1 selects the first, \2 the second etc. in the replace field Example: $ echo "Hugo <hugo@hotmail.com>" \ > | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/" hugo at hotmail.com $ _
48 1: The Unix Operating System — Filters & Regular Expressions Extended Regular Expressions • Some tools understand more than just the basic operators • Such tools are e.g. perl and egrep • Other tools may support them: use “\” to activate! ? + { n} {n,m} {n,} text1|text2 (text) matches none or one occurrance of the preceding pattern matches one to infinity occurrances of the preceding pattern matches exactly n occurrances of the preceding pattern matches n to m occurrances of the preceding pattern matches at least n occurrances of the preceding pattern matches text containing either text1 or text2 bundles “text” to a unit for repetition operators (“*”, “+” etc.), and it can now be selected by “\1”, “\2” etc. Example: $ ls -l | egrep "hugo|harry" -rw-r--r-- 1 harry staff 1315 Feb 14 11:05 annab -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ _
49 1: The Unix Operating System — Filters & Regular Expressions A better ‘sed’ using ‘perl’ • • • • The perl interpreter can be used like sed Advantage: no escaping of extended syntax necessary! Also: perl can work on more than one line! Syntax: perl -pe ’s/source/destination/’ Example: $ echo "Hugo <hugo@hotmail.com>" \ > | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/" hugo at hotmail.com $ echo "Hugo <hugo@hotmail.com>" | perl \ > -pe "s/[^<]*<([^@]*)@([^>]*)>.*/\1 at \2/" hugo at hotmail.com Longer Example: Generate HTML from Inline Comments The problem: • It is always nice to keep module descriptions at one place • So why not generate HTML from the program sources? • Convention: – Extract only comments starting with double hash – Ignore other comments and program code – Add tags for special elements (function, type, variable, ...)
50 1: The Unix Operating System — Filters & Regular Expressions Example source: $ cat example.sh #!/bin/sh ############################################### # ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. # ############################################### hugo () { echo "hello world" } # ## @function main program ## ## The main program calls hugo and exits. # hugo $ _
51 1: The Unix Operating System — Filters & Regular Expressions Step 1: Discard unwanted lines $ egrep "^ *##" example.sh | egrep -v "^ *###" ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. ## @function main program ## ## The main program calls hugo and exits. $ _ Step 2: Add HTML-Tags and remove hashes $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe ’s/^ *## *$/<p>/; s/^ *## *//’ @function hugo print a friendly message to stdout. <p> This function print a "hello world" to stdout. Quite nice. @function main program <p> The main program calls hugo and exits. $ _
52 1: The Unix Operating System — Filters & Regular Expressions Step 3: Translate pseudo-tags $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe ’s/^ *## *$/<p>/; s/^ *## *//’ \ > -e ’; s|@function *(.*)|<h2>Function \1</h2>|’ <h2>Function hugo</h2> print a friendly message to stdout. <p> This function print a "hello world" to stdout. Quite nice. <h2>Function main program</h2> <p> The main program calls hugo and exits. $ _ Last Step: Make it a HTML-File $ ( echo "<html><head>Program Documentation</head> > <body><h1>Program Documentation</h1>" > egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe ’s/^ *## *$/<p>/; s/^ *## *//’ \ > -e ’; s|@function *(.*)|<h2>Function \1</h2>|’ > echo "</body></html>") <html><head>Program Documentation</head> <body><h1>Program Documentation</h1> [...] </body></html> $ _
53
|
|
|
| | | |
|