Awk tips and tricks
This article is providing some tips with usage of awk. It’s easy to find examples related to printing fields but sometime task at hand is much complex and it’s difficult to find examples. Purpose of this article is to put together collection of few complex awk tasks.
awk array with multiple fields.
Example truncated input data
04:24:37.380368 W inode 28:1473638384 8 0.572 0 171714558 xx.xx:xxFC4EF cli xx.xx.254.13 Unknown Msg handler mnMsgFileSync
04:24:37.380397 W inode 27:971436648 8 0.600 0 174866765 xx.xx:xxFC4EB cli xx.xx.254.13 Unknown Msg handler mnMsgFileSync
04:24:52.373662 W inode 27:2489962208 8 0.417 0 399522524 xx.xx:xxFC4EB cli xx.xx.254.13 Unknown Msg handler mnMsgFileSync
04:24:53.968406 W inode 40:2714263576 8 0.584 0 400573187 xx.xx:xx.xx3 cli xx.xx.254.14 Unknown Msg handler mnMsgFileSync
04:24:56.991527 W inode 30:2209256552 8 9.198 0 3016843661 xx.xx:xxFC4EA cli xx.xx.254.13 Unknown Msg handler mnMsgFileSync
04:24:58.584002 W inode 39:2834510888 8 0.633 0 401623941 xx.xx:xx.xx0 cli xx.xx.254.14 Unknown Msg handler mnMsgFileSync
04:24:58.584995 W inode 40:2714264080 8 0.500 0 400573250 xx.xx:xx.xx3 cli xx.xx.254.14 Unknown Msg handler mnMsgFileSync
04:25:24.250620 W logData 28:994229720 8 0.554 0 0 xx.xx:xxFC4EF cli xx.xx.254.13 LogData SGExceptionLogBufferFullThread
04:25:27.347644 W inode 40:2897297112 8 0.408 0 6617288667 xx.xx:xx.xx3 cli xx.xx.254.14 MBHandler FsyncHandlerThread
04:25:28.049684 W inode 40:2897297104 8 0.516 0 6617288666 xx.xx:xx.xx3 cli xx.xx.254.14 MBHandler FsyncHandlerThread
04:25:45.288638 W logData 28:994229720 8 0.552 0 0 xx.xx:xxFC4EF cli xx.xx.254.13 LogData SGExceptionLogBufferFullThread
04:26:15.036732 W inode 40:2714263672 8 0.462 0 400573199 xx.xx:xx.xx3 cli xx.xx.254.14 Unknown Msg handler mnMsgFileSync
04:26:15.226701 W logData 28:994229720 8 0.547 0 0 xx.xx:xxFC4EF cli xx.xx.254.13 LogData SGExceptionLogBufferFullThread
04:26:27.407754 W inode 40:2897297112 8 0.465 0 6617288667 xx.xx:xx.xx3 cli xx.xx.254.14 MBHandler FsyncHandlerThread
04:26:28.110642 W inode 40:2897297104 8 0.473 0 6617288666 xx.xx:xx.xx3 cli xx.xx.254.14 MBHandler FsyncHandlerThread
Here first column is Timestamp, second column kind of operation R/W, 6th column is time taken for operation in milisecond and 11th column is IP address.
Requirement: Calculate total and avg amount of time taken for R/W operation for IPaddress. Showing count of operation is bonus.
output should look like
IPAddress R/W TotalTime(ms) Count AvgTime(ms)
--------------------------------------------
xx.xx.254.14 R 4 11 0.38
xx.xx.254.14 W 124 242 0.51
xx.xx.254.24 R 634 17 37.34
xx.xx.254.24 W 0 2 0.44
xx.xx.254.25 R 586 15 39.12
xx.xx.254.13 R 1 6 0.29
xx.xx.254.13 W 114 203 0.56
Assuming output is stored in test.txt file. Expression will become like below.
cat test.txt | grep ^[0-9] | awk '{if ($2 ~ /R|W/) {sum[$11][$2]+=$6;count[$11][$2]++}} END{printf "%s %4s %s %s %s\n","IPAddress","R/W","TotalTime(ms)","Count","AvgTime(ms)" ; for (i=1;i<45;i++) {printf "-"} {printf "\n"}; for (i in sum) for (j in sum[i]) printf "%s %s %8d %8d %8.2f\n", i,j, sum[i][j],count[i][j],sum[i][j]/count[i][j]}'
Explanation:
- grep is for safety to filter only lines starting with digit.
- As we are worried about only R/W operations hence check for same.
- Using two arrays; first one with ipaddress,operation type as key and totaltime taken as value; second one with same key but value will be count of entries.- Printing columns and separator.
- Printing arrays, we need to divide two arrays to get avg. as one array contains total time and second contain number of entries.
Create awk array from command output
Store command output in a variable
# export result=$(df -lPh | grep -v Size | awk '{print $1,$2}')
Print stored output
# echo "$result"
devtmpfs 32G
tmpfs 32G
tmpfs 32G
tmpfs 32G
/dev/mapper/vg_root-lv_root 50G
/dev/sda1 2.0G
/dev/sda2 100M
/dev/mapper/vg_root-lv_tmp 3.9G
/dev/sdb1 2.0G
/dev/sdb2 100M
/dev/mapper/vg_root-lv_var 20G
/var/lib/sss/db 72M
/dev/mapper/vg_root-lv_spare 834G
gpfs0 7.9P
tmpfs 6.3G
To convert this output to awk array
Print all filesystems
# awk -v input="$result" 'BEGIN {split(input,map,"\n") ; for (i in map) {$0 = map[i] ; dfmap[$1]=$2} {for (i in dfmap) {printf "%s %s\n", i,d
fmap[i]}}}'
gpfs0 7.9P
/dev/mapper/vg_root-lv_root 50G
/dev/sda1 2.0G
/dev/mapper/vg_root-lv_spare 834G
/dev/sdb1 2.0G
/dev/sda2 100M
/dev/sdb2 100M
tmpfs 32G
/var/lib/sss/db 72M
/dev/mapper/vg_root-lv_var 20G
/dev/mapper/vg_root-lv_tmp 3.9G
devtmpfs 32G
Print only one filesystem
# awk -v input="$result" 'BEGIN {split(input,map,"\n") ; for (i in map) {$0 = map[i] ; dfmap[$1]=$2} {for (i in dfmap) {if (i == "gpfs0") {print i,dfmap[i]}}}}'
gpfs0 7.9P
Explanation:
- Put output in awk variable input
- Split input based on newline character. map will contain output
- Iterate through map, store the map output in $0 which will be like output coming from stdin for second command. dfmap is a array
- Iterate through array print the output