Home / فیلتر کردن و استفاده از regex در لینوکس شل

فیلتر کردن و استفاده از regex در لینوکس شل


  1: The Unix Operating System — Filters & Regular Expressions
Selecting Fields with ‘cut’
• The cut command uses one delimiter between two fields
• A number of whitespaces may confuse it
Example: Try to print only file size and name
$ ls -l gnasl
-rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl
$ ls -l | cut -d ’ ’ -f 5,9
staff 12
$ _
The ‘awk’ Filter
• Strictly speaking, not just a filter but a programming language
• Without knowing the language, it’s still useful for some tasks
Example: Select fields from ls -l output with awk
$ ls -l gnasl | awk ’{ print $5, $9 }’
2894 gnasl
$ ls -l gnasl | awk ’{ print $5, "\t", $9 }’
2894 gnasl
$ _

44  1: The Unix Operating System — Filters & Regular Expressions
Regular Expressions
• Regular Expressions can be used for describing Text Patterns
• Example: ^g matches text lines starting with a lowercase “g”
• Dialects differ, depending on the tools used
Basic Operators
These are understood by most tools supporting regular
expressions:
\
[AaBbCc]
[a-z]
[^a-z]
.
*
^
$
\< and \>
activate or deactivate an operator, example:
 ̈\\ ̈ produces a backslash
matches one character from the set {A, a, B, b,
C, c}
matches a range, here between “a” and “z”
matches one character that is not within the
range specified here
matches one character (any)
matches zero to infinity occurrances of the
preceding expression, example:  ̈ * ̈ matches
any number of space characters
matches the beginning of the current line
matches the current line’s end
matches the beginning and the end of a word,
example:  ̈\<Hugo\> ̈ matches “Hugo” as a
whole word

45  1: The Unix Operating System — Filters & Regular Expressions
Example: ‘ls’ Output
Display only symbolic links:
$ ls -l | grep "^l"
lrwxrwxrwx 1 hugo staff 17 Jul 26 2001 foo -> bar
lrwxrwxrwx 1 hugo staff 17 Sep 13 2001 x -> ../y
$_
Example: Log File
Select only the entries from the 28th and 29th of March 2001 in
the Apache log file. Here’s the format from which we want to
get the information:
$ tail -1 access_log
myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html"
$ _
This is the regular expression used for getting the entries:
$ grep "2[89]/Mar/2001.*/.*\.html" access_log
myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html"
[...]
myhost [29/Mar/2001:17:00:12 +0200] "GET /b.html"
[...]
$ _

46  1: The Unix Operating System — Filters & Regular Expressions
The ‘sed’ Filter





sed stands for Stream Editor
It can be used to manipulate text in a data stream
Like grep, sed can use regular expressions
We concentrate on the substitute command here
More than one expression can be specified using “-e”
Example: Evaluate a configuration file
$ cat config.conf
# Configuration file
set A b
set B c
$ grep -v "^ *#" config.conf | sed ’s/^set *//’ \
> | sed ’s/ */=/’
A=b
B=c
$ grep -v "^ *#" config.conf \
> | sed -e ’s/^set *//’ -e ’s/ */=/’
A=b
B=c
$ eval ‘grep -v "^ *#" config.conf \
> | sed -e ’s/^set *//’ -e ’s/ */=/’‘
$ echo $A
b
$ _

47  1: The Unix Operating System — Filters & Regular Expressions
More ‘sed’
• The substitute command can take options: Ignore case: “i”
and global replace: “g” (replace not only the first match)
• They get appended to the expression: ’s/foo/bar/gi’
• What if the source or destination pattern contains slashes?
• Escape the slashes with backslashes (can be difficult if the
pattern is a variable’s content) or use a different separator,
any character is allowed!
Example: Remove double slashes in path specs
$ echo /usr//local/bin:/home/herbert///data \
> | sed ’|//*|/|g’
/usr/local/bin:/home/herbert/data
$ _
• We can also reference matches from the search pattern
• \( and \) address a subpattern in the search field
• \1 selects the first, \2 the second etc. in the replace field
Example:
$ echo "Hugo <hugo@hotmail.com>" \
> | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/"
hugo at hotmail.com
$ _

48  1: The Unix Operating System — Filters & Regular Expressions
Extended Regular Expressions
• Some tools understand more than just the basic operators
• Such tools are e.g. perl and egrep
• Other tools may support them: use “\” to activate!
?
+
{ n}
{n,m}
{n,}
text1|text2
(text)
matches none or one occurrance of the
preceding pattern
matches one to infinity occurrances of the
preceding pattern
matches exactly n occurrances of the
preceding pattern
matches n to m occurrances of the preceding
pattern
matches at least n occurrances of the
preceding pattern
matches text containing either text1 or
text2
bundles “text” to a unit for repetition
operators (“*”, “+” etc.), and it can now be
selected by “\1”, “\2” etc.
Example:
$ ls -l | egrep "hugo|harry"
-rw-r--r-- 1 harry staff 1315 Feb 14 11:05 annab
-rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl
$ _

49  1: The Unix Operating System — Filters & Regular Expressions
A better ‘sed’ using ‘perl’




The perl interpreter can be used like sed
Advantage: no escaping of extended syntax necessary!
Also: perl can work on more than one line!
Syntax: perl -pe ’s/source/destination/’
Example:
$ echo "Hugo <hugo@hotmail.com>" \
> | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/"
hugo at hotmail.com
$ echo "Hugo <hugo@hotmail.com>" | perl \
> -pe "s/[^<]*<([^@]*)@([^>]*)>.*/\1 at \2/"
hugo at hotmail.com
Longer Example: Generate HTML from Inline Comments
The problem:
• It is always nice to keep module descriptions at one place
• So why not generate HTML from the program sources?
• Convention:
– Extract only comments starting with double hash
– Ignore other comments and program code
– Add tags for special elements (function, type, variable, ...)

50  1: The Unix Operating System — Filters & Regular Expressions
Example source:
$ cat example.sh
#!/bin/sh
###############################################
#
## @function hugo
## print a friendly message to stdout.
##
## This function print a "hello world" to
## stdout. Quite nice.
#
###############################################
hugo () {
echo "hello world"
}
#
## @function main program
##
## The main program calls hugo and exits.
#
hugo
$ _

51  1: The Unix Operating System — Filters & Regular Expressions
Step 1: Discard unwanted lines
$ egrep "^ *##" example.sh | egrep -v "^ *###"
## @function hugo
## print a friendly message to stdout.
##
## This function print a "hello world" to
## stdout. Quite nice.
## @function main program
##
## The main program calls hugo and exits.
$ _
Step 2: Add HTML-Tags and remove hashes
$ egrep "^ *##" example.sh | egrep -v "^ *###" \
> | perl -pe ’s/^ *## *$/<p>/; s/^ *## *//’
@function hugo
print a friendly message to stdout.
<p>
This function print a "hello world" to
stdout. Quite nice.
@function main program
<p>
The main program calls hugo and exits.
$ _

52  1: The Unix Operating System — Filters & Regular Expressions
Step 3: Translate pseudo-tags
$ egrep "^ *##" example.sh | egrep -v "^ *###" \
> | perl -pe ’s/^ *## *$/<p>/; s/^ *## *//’ \
> -e ’; s|@function *(.*)|<h2>Function \1</h2>|’
<h2>Function hugo</h2>
print a friendly message to stdout.
<p>
This function print a "hello world" to
stdout. Quite nice.
<h2>Function main program</h2>
<p>
The main program calls hugo and exits.
$ _
Last Step: Make it a HTML-File
$ ( echo "<html><head>Program Documentation</head>
> <body><h1>Program Documentation</h1>"
> egrep "^ *##" example.sh | egrep -v "^ *###" \
> | perl -pe ’s/^ *## *$/<p>/; s/^ *## *//’ \
> -e ’; s|@function *(.*)|<h2>Function \1</h2>|’
> echo "</body></html>")
<html><head>Program Documentation</head>
<body><h1>Program Documentation</h1>
[...]
</body></html>
$ _

53



     RSS of this page