Awk笔记

Why use awk?

Awk is a great tool to handle log and other text file.

Basic syntax

Awk allows you to write two special routines that can be executed before any input is read and after all input is read. These are the procedures associated with the BEGIN and END rules, respectively.In other words, you can do some preprocessing before the main input loop is ever executed and you can do some postprocessing after the main input loop has terminated. The BEGIN and ENDprocedures are optional. like this:

$ awk "BEGIN {do some preprocessing} { do the work, worked on the input file} END {do some postprocessing}"

Some Constants

| Item | Description | | :——– | ——–| |FS | field separator. by default, its value is a single space| |OFS | output field separator | |NF |number of fields for the current input record| |RS |record separator| |ORS |output reacord sepatator| |FILENAME |the name of the current input file| CONVFMT : number-to-string conversions

default value is “%.6g” $ awk ‘BEGIN {str = (5.5 + 3.2) “ is a nice value”; print str; }’ 8.7 is a nice value
change its value to “%d” $ awk ‘BEGIN {CONVFMT=”%d”;str = (5.5 + 3.2) “ is a nice value”;print str; }’ 8 is a nice value |

NOTE: In POSIX awk, assigning a new value to FS has no effect on the current input line; it only affects the next input line.

Escape Sequences

| Sequence |Description | | :——– | ——– | |\a| Alert character, usually ASCII BEL character |\b| Backspace |\f| Formfeed |\n| Newline |\r| Carriage return |\t| Horizontal tab |\v| Vertical tab |\ ddd| Character represented as 1 to 3 digit octal value |\x| hex Character represented as hexadecimal value |\ c| Any literal character c (e.g., " for “ )

Relational Operators

| Operator |Description | | :——– | ——– | |< |Less than |>| Greater than |<=| Less than or equal to |>=| Greater than or equal to |==| Equal to |!=| Not equal to |~| Matches |!~| Does not match

Format Specifiers Used in printf

| Character | Description | | :——– | ——– | |c| ASCII character |d| Decimal integer |i| Decimal integer. (Added in POSIX) |e| Floating-point format ([-] d . precision e [+-] dd ) |E| Floating-point format ([-] d . precision E [+-] dd ) |f| Floating-point format ([-] ddd . precision ) |g| e or f conversion, whichever is shortest, with trailing zeros removed |G| E or f conversion, whichever is shortest, with trailing zeros removed |o| Unsigned octal value |s| String |x| Unsigned hexadecimal number. Uses a - f for 10 to 15 |X| Unsigned hexadecimal number. Uses A - F for 10 to 15 |% |Literal %

Awk’s Built-In String Functions

| Awk Function |Description | | :————– | ——– | |gsub ( r , s , t ) |Globally substitutes s for each match of the regular expression r in the string t . Returns the number of substitutions. If t is not supplied, defaults to $0 . |index ( s , t ) | Returns position of substring t in string s or zero if not present. | length ( s ) | Returns length of string s or length of $0 if no string is supplied. | match ( s , r ) | Returns either the position in s where the regular expression r begins, or 0 if no occurrences are found. Sets the values of RSTARTand RLENGTH . | split ( s , a , sep) | Parses string s into elements of array a using field separator sep ; returns number of elements. If sep is not supplied, FS is used. Array splitting works the same way as field splitting. | sprintf (“ fmt “,expr ) | Uses printf format specification for expr . | sub ( r , s , t ) | Substitutes s for first match of the regular expression r in the string t . Returns 1 if successful; 0 otherwise. If t is not supplied, defaults to $0 . | substr ( s , p , n) | Returns substring of string s at beginning position p up to a maximum length of n . If n is not supplied, the rest of the string from p is used. | tolower ( s ) | Translates all uppercase characters in string s to lowercase and returns the new string. | toupper ( s ) | Translates all lowercase characters in string s to uppercase and returns the new string.

Limitations

| Item | Limit | | :——– | ——– | |Number of fields per record |100 |Characters per input record | 3000 |Characters per output record |3000 |Characters per field | 1024 |Characters per printf string | 3000 |Characters in literal string |400 |Characters in character class |400 |Files open | 15 |Pipes open | 1

Examples

# filesum: list files and total size in bytes
# input: long listing produced by "ls -l"


#1 output column headers
BEGIN {print "BYTES", "\t", "FILE"}



#2 test for 9 fields; files begins with "-"
NF == 9 && /^-/ {
sum += $5 # accumulate size of file
++filenum # count number of files
print $5, "\t", $9
}


#3 test for 9 fields; directory begins with "d"
NF == 9 && /^d/ {
sum += $5
++filenum 
print $5, "\t", $9
}


#4 test for ls -lR line ./dir
$1 ~ /^\..*:$/{
print "\t" $0 # print <dir> and name
}


#5 once all is done
END {
# print total file size and number of files
print "Total: ", sum, "bytes (" filenum " files)"
}