Awk tips and tricks

bash   awk
By Vikrant
January 9, 2022

This article is providing some tips with usage of awk. It’s easy to find examples related to printing fields but sometime task at hand is much complex and it’s difficult to find examples. Purpose of this article is to put together collection of few complex awk tasks.

awk array with multiple fields.

Example truncated input data

04:24:37.380368  W       inode   28:1473638384       8    0.572         0    171714558  xx.xx:xxFC4EF cli    xx.xx.254.13 Unknown   Msg handler mnMsgFileSync
04:24:37.380397  W       inode   27:971436648        8    0.600         0    174866765  xx.xx:xxFC4EB cli    xx.xx.254.13 Unknown   Msg handler mnMsgFileSync
04:24:52.373662  W       inode   27:2489962208       8    0.417         0    399522524  xx.xx:xxFC4EB cli    xx.xx.254.13 Unknown   Msg handler mnMsgFileSync
04:24:53.968406  W       inode   40:2714263576       8    0.584         0    400573187  xx.xx:xx.xx3 cli    xx.xx.254.14 Unknown   Msg handler mnMsgFileSync
04:24:56.991527  W       inode   30:2209256552       8    9.198         0   3016843661  xx.xx:xxFC4EA cli    xx.xx.254.13 Unknown   Msg handler mnMsgFileSync
04:24:58.584002  W       inode   39:2834510888       8    0.633         0    401623941  xx.xx:xx.xx0 cli    xx.xx.254.14 Unknown   Msg handler mnMsgFileSync
04:24:58.584995  W       inode   40:2714264080       8    0.500         0    400573250  xx.xx:xx.xx3 cli    xx.xx.254.14 Unknown   Msg handler mnMsgFileSync
04:25:24.250620  W     logData   28:994229720        8    0.554         0            0  xx.xx:xxFC4EF cli    xx.xx.254.13 LogData   SGExceptionLogBufferFullThread
04:25:27.347644  W       inode   40:2897297112       8    0.408         0   6617288667  xx.xx:xx.xx3 cli    xx.xx.254.14 MBHandler FsyncHandlerThread   
04:25:28.049684  W       inode   40:2897297104       8    0.516         0   6617288666  xx.xx:xx.xx3 cli    xx.xx.254.14 MBHandler FsyncHandlerThread   
04:25:45.288638  W     logData   28:994229720        8    0.552         0            0  xx.xx:xxFC4EF cli    xx.xx.254.13 LogData   SGExceptionLogBufferFullThread
04:26:15.036732  W       inode   40:2714263672       8    0.462         0    400573199  xx.xx:xx.xx3 cli    xx.xx.254.14 Unknown   Msg handler mnMsgFileSync
04:26:15.226701  W     logData   28:994229720        8    0.547         0            0  xx.xx:xxFC4EF cli    xx.xx.254.13 LogData   SGExceptionLogBufferFullThread
04:26:27.407754  W       inode   40:2897297112       8    0.465         0   6617288667  xx.xx:xx.xx3 cli    xx.xx.254.14 MBHandler FsyncHandlerThread   
04:26:28.110642  W       inode   40:2897297104       8    0.473         0   6617288666  xx.xx:xx.xx3 cli    xx.xx.254.14 MBHandler FsyncHandlerThread   

Here first column is Timestamp, second column kind of operation R/W, 6th column is time taken for operation in milisecond and 11th column is IP address.

Requirement: Calculate total and avg amount of time taken for R/W operation for IPaddress. Showing count of operation is bonus.

output should look like

IPAddress  R/W TotalTime(ms) Count AvgTime(ms)
--------------------------------------------
xx.xx.254.14 R        4       11     0.38
xx.xx.254.14 W      124      242     0.51
xx.xx.254.24 R      634       17    37.34
xx.xx.254.24 W        0        2     0.44
xx.xx.254.25 R      586       15    39.12
xx.xx.254.13 R        1        6     0.29
xx.xx.254.13 W      114      203     0.56

Assuming output is stored in test.txt file. Expression will become like below.

cat test.txt | grep ^[0-9] | awk '{if ($2 ~ /R|W/) {sum[$11][$2]+=$6;count[$11][$2]++}} END{printf "%s %4s %s %s %s\n","IPAddress","R/W","TotalTime(ms)","Count","AvgTime(ms)" ; for (i=1;i<45;i++) {printf "-"} {printf "\n"}; for (i in sum) for (j in sum[i]) printf "%s %s %8d %8d %8.2f\n", i,j, sum[i][j],count[i][j],sum[i][j]/count[i][j]}'

Explanation:

  • grep is for safety to filter only lines starting with digit.
  • As we are worried about only R/W operations hence check for same.
  • Using two arrays; first one with ipaddress,operation type as key and totaltime taken as value; second one with same key but value will be count of entries.- Printing columns and separator.
  • Printing arrays, we need to divide two arrays to get avg. as one array contains total time and second contain number of entries.

Create awk array from command output

Store command output in a variable

# export result=$(df -lPh | grep -v Size | awk '{print $1,$2}')

Print stored output

# echo "$result"
devtmpfs 32G
tmpfs 32G
tmpfs 32G
tmpfs 32G
/dev/mapper/vg_root-lv_root 50G
/dev/sda1 2.0G
/dev/sda2 100M
/dev/mapper/vg_root-lv_tmp 3.9G
/dev/sdb1 2.0G
/dev/sdb2 100M
/dev/mapper/vg_root-lv_var 20G
/var/lib/sss/db 72M
/dev/mapper/vg_root-lv_spare 834G
gpfs0 7.9P
tmpfs 6.3G

To convert this output to awk array

Print all filesystems

# awk -v input="$result" 'BEGIN {split(input,map,"\n") ; for (i in map) {$0 = map[i] ; dfmap[$1]=$2} {for (i in dfmap) {printf "%s %s\n", i,d
fmap[i]}}}'
gpfs0 7.9P
/dev/mapper/vg_root-lv_root 50G
/dev/sda1 2.0G
/dev/mapper/vg_root-lv_spare 834G
/dev/sdb1 2.0G
/dev/sda2 100M
/dev/sdb2 100M
tmpfs 32G
/var/lib/sss/db 72M
/dev/mapper/vg_root-lv_var 20G
/dev/mapper/vg_root-lv_tmp 3.9G
devtmpfs 32G

Print only one filesystem

# awk -v input="$result" 'BEGIN {split(input,map,"\n") ; for (i in map) {$0 = map[i] ; dfmap[$1]=$2} {for (i in dfmap) {if (i == "gpfs0") {print i,dfmap[i]}}}}'
gpfs0 7.9P

Explanation:

  • Put output in awk variable input
  • Split input based on newline character. map will contain output
  • Iterate through map, store the map output in $0 which will be like output coming from stdin for second command. dfmap is a array
  • Iterate through array print the output