Posted: September 30, 2020 | | by Ricardo Gerardi (Editorial Team, Sudoer alumni, Red Hat)
gawk is the GNU implementation of the Awk programming language, first developed for the UNIX operating system in the 1970s. The Awk programming language specializes in dealing with data formatting in text files, particularly text data organized in columns.
Using the Awk programming language, you can manipulate or extract data, generate reports, match patterns, perform calculations, and more, with great flexibility. Awk allows you to accomplish somewhat difficult tasks with a single line of code. To achieve the same results using traditional programming languages such as C or Python would require additional effort and many lines of code.
gawk
also refers to the command-line utility available by default with most Linux distributions. Most distributions also provide a symbolic link for awk
pointing to gawk
. For simplicity, from now on, we'll refer to the utility only as awk
.
awk
processes data straight from standard input - STDIN. A common pattern is to pipe the output of other programs into awk
to extract and print data, but awk
can also process data from files.
In this article, you'll use awk
to analyze data from a file with space-separated columns. Let's start by reviewing the sample data.
Example data
For the examples in this guide, let's use the output of the command ps ux
saved in the file psux.out
. Here's a sample of the data in the file:
$ head psux.outUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDricardo 1446 0.0 0.2 21644 11536 ? Ss Sep10 0:00 /usr/lib/systemd/systemd --userricardo 1448 0.0 0.1 49212 5848 ? S Sep10 0:00 (sd-pam)ricardo 1459 0.0 0.1 447560 7148 ? Sl Sep10 0:00 /usr/bin/gnome-keyring-daemon --daemonize --loginricardo 1467 0.0 0.1 369144 6080 tty2 Ssl+ Sep10 0:00 /usr/libexec/gdm-wayland-session /usr/bin/gnome-sessionricardo 1469 0.0 0.1 277692 4112 ? Ss Sep10 0:00 /usr/bin/dbus-broker-launch --scope userricardo 1471 0.0 0.1 6836 4408 ? S Sep10 0:00 dbus-broker --log 4 --controller 11 --machine-id 16355057c7274843823dd747f8e2978b --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000ricardo 1474 0.0 0.3 467744 14132 tty2 Sl+ Sep10 0:00 /usr/libexec/gnome-session-binaryricardo 1531 0.0 0.1 297456 4280 ? Ssl Sep10 0:00 /usr/libexec/gnome-session-ctl --monitorricardo 1532 0.0 0.3 1230908 12920 ? S<sl Sep10 0:01 /usr/bin/pulseaudio --daemonize=no
You can download the complete file from here, using this command:
$ curl -o psux.out https://gitlab.com/-/snippets/2013935/raw\?inline\=false
If you decide to use the output of ps ux
on your system, adjust the values shown in the examples to match your results.
Next, let's use awk
to view data from the sample file.
Basic usage
A basic awk
program consists of a pattern followed by an action enclosed in curly braces. You can provide a program to the awk
utility inline by enclosing it in single quotation marks, like this:
$ awk 'pattern { action }'
awk
processes the input data—standard input or file—line by line, executing the given action for each line—or record—that matches the pattern. If the pattern is omitted, awk
executes the action on all records. An action can be as simple as printing data from the line or as complex as a full program. For example, to print all lines from the example file, use this command:
$ awk '{ print }' psux.outUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDricardo 1446 0.0 0.2 21644 11536 ? Ss Sep10 0:00 /usr/lib/systemd/systemd --user.... OUTPUT TRUNCATED ....
While this example is not really useful, it illustrates the awk
command's basic utilization.
If you're using the command ps ux
on your machine, you can pipe its output directly into awk
, instead of providing the input file name:
$ ps ux | awk '{ print }'
Next, let's use awk
column processing capabilities to extract part of the data from the sample file.
Printing fields
The power of awk
starts to become evident when you use its column processing features. awk
automatically splits each line—or record—into fields. By default, it uses the space character to separate each field, but you can change that by providing the command line parameter -F
followed by the desired separator.
After splitting, awk
assigns each field to a numbered variable, starting with the character $
. For example, the first field is $1
, the second $2
, and so on. The special variable $0
contains the entire record before splitting.
By using the field variables, you can extract data from the input. For example, to print only the command name from the sample file, use the variable $11
because the command name is the eleventh column on each line:
$ awk '{ print $11 }' psux.outCOMMAND/usr/lib/systemd/systemd(sd-pam)/usr/bin/gnome-keyring-daemon.... OUTPUT TRUNCATED ....
You can also print multiple fields by separating them with commas. For example, to print the command name and the CPU utilization on column three, use this command:
$ awk '{ print $11, $3 }' psux.outCOMMAND %CPU/usr/lib/systemd/systemd 0.0(sd-pam) 0.0/usr/bin/gnome-keyring-daemon 0.0.... OUTPUT TRUNCATED ....
Finally, use the built-in printf
function to format the output and align the columns. Provide a 40 character padding to the right of first columns to accommodate longer command names:
$ awk '{ printf("%-40s %s\n", $11, $3) }' psux.outCOMMAND %CPU/usr/lib/systemd/systemd 0.0(sd-pam) 0.0/usr/bin/gnome-keyring-daemon 0.0/usr/libexec/gdm-wayland-session 0.0.... OUTPUT TRUNCATED ....
Now that you can manipulate and extract individual fields from each record, let's apply the pattern feature to filter the records.
[ You might also like:Manipulating text at the command line with sed ]
Pattern matching
In addition to manipulating fields, awk
allows you to filter which records to execute actions on through a powerful pattern matching feature. In its most basic usage, provide a regular expression enclosed by slash /
characters to match records. For example, to filter by records that match firefox, use /firefox/
:
$ awk '/firefox/ { print $11, $3 }' psux.out/usr/lib64/firefox/firefox 66.2/usr/lib64/firefox/firefox 8.3/usr/lib64/firefox/firefox 15.6/usr/lib64/firefox/firefox 9.0/usr/lib64/firefox/firefox 31.5/usr/lib64/firefox/firefox 20.6/usr/lib64/firefox/firefox 31.0/usr/lib64/firefox/firefox 0.0/usr/lib64/firefox/firefox 0.0/usr/lib64/firefox/firefox 0.0/usr/lib64/firefox/firefox 0.0/usr/lib64/firefox/firefox 0.0/usr/lib64/firefox/firefox 0.0
You can also use fields and a comparison expression as pattern matching criteria. For example, to print data from the process that matches PID 6685, compare field $2
, like this:
$ awk '$2==6685 { print $11, $3 }' psux.out/usr/lib64/firefox/firefox 0.0
awk
is smart enough to understand numeric fields, allowing you to use relative comparisons like greater than or less than. For example, to show all process that use over 5% CPU, use $3 > 5
:
$ awk '$3 > 5 { print $11, $3 }' psux.out/usr/bin/gnome-shell 5.1/usr/lib64/firefox/firefox 66.2/usr/lib64/firefox/firefox 8.3/usr/lib64/firefox/firefox 15.6/usr/lib64/firefox/firefox 9.0/usr/lib64/firefox/firefox 31.5/usr/lib64/firefox/firefox 20.6/usr/lib64/firefox/firefox 31.0
You can combine patterns with operators. For example, to show all processes that match firefox and use over 5% CPU, combine both patterns with the &&
operator for a logical AND
:
$ awk '/firefox/ && $3 > 5 { print $11, $3 }' psux.out/usr/lib64/firefox/firefox 66.2/usr/lib64/firefox/firefox 8.3/usr/lib64/firefox/firefox 15.6/usr/lib64/firefox/firefox 9.0/usr/lib64/firefox/firefox 31.5/usr/lib64/firefox/firefox 20.6/usr/lib64/firefox/firefox 31.0
Finally, because you're using pattern matching, awk
no longer prints the header line. You can add your own header line by using the BEGIN
pattern to execute a single action before processing any records:
$ awk 'BEGIN { printf("%-26s %s\n", "Command", "CPU%")} $3 > 10 { print $11, $3 }' psux.outCommand CPU%/usr/lib64/firefox/firefox 66.2/usr/lib64/firefox/firefox 15.6/usr/lib64/firefox/firefox 31.5/usr/lib64/firefox/firefox 20.6/usr/lib64/firefox/firefox 31.0
Next, let's manipulate the data in individual fields.
Field manipulation
As we discussed in the previous section, awk
understands numeric fields. This allows you to perform data manipulation, including numeric calculations. For example, consider printing the memory utilization on column six for all firefox processes:
$ awk '/firefox/ { print $11, $6 }' psux.out/usr/lib64/firefox/firefox 301212/usr/lib64/firefox/firefox 118220/usr/lib64/firefox/firefox 168468/usr/lib64/firefox/firefox 101520/usr/lib64/firefox/firefox 194336/usr/lib64/firefox/firefox 111864/usr/lib64/firefox/firefox 163440/usr/lib64/firefox/firefox 38496/usr/lib64/firefox/firefox 174636/usr/lib64/firefox/firefox 37264/usr/lib64/firefox/firefox 30608/usr/lib64/firefox/firefox 174636/usr/lib64/firefox/firefox 174660
The command ps ux
displays the memory utilization in Kilobytes, which is hard to read. Let's convert it to Megabytes by diving the field value by 1024:
$ awk '/firefox/ { print $11, $6/1024 }' psux.out/usr/lib64/firefox/firefox 294.152/usr/lib64/firefox/firefox 115.449/usr/lib64/firefox/firefox 164.52/usr/lib64/firefox/firefox 99.1406/usr/lib64/firefox/firefox 189.781/usr/lib64/firefox/firefox 109.242/usr/lib64/firefox/firefox 159.609/usr/lib64/firefox/firefox 37.5938/usr/lib64/firefox/firefox 170.543/usr/lib64/firefox/firefox 36.3906/usr/lib64/firefox/firefox 29.8906/usr/lib64/firefox/firefox 170.543/usr/lib64/firefox/firefox 170.566
You can also round numbers up and add the suffix MB using printf
to improve readability:
$ awk '/firefox/ { printf("%s %4.0f MB\n", $11, $6/1024) }' psux.out/usr/lib64/firefox/firefox 294 MB/usr/lib64/firefox/firefox 115 MB/usr/lib64/firefox/firefox 165 MB/usr/lib64/firefox/firefox 99 MB/usr/lib64/firefox/firefox 190 MB/usr/lib64/firefox/firefox 109 MB/usr/lib64/firefox/firefox 160 MB/usr/lib64/firefox/firefox 38 MB/usr/lib64/firefox/firefox 171 MB/usr/lib64/firefox/firefox 36 MB/usr/lib64/firefox/firefox 30 MB/usr/lib64/firefox/firefox 171 MB/usr/lib64/firefox/firefox 171 MB
Finally, combine this idea with the BEGIN
and END
patterns to perform more advanced data manipulation. For example, let's calculate the total memory utilization for all firefox processes by defining a variable sum in the BEGIN
action, adding the value of column six $6
for each line that matches firefox to the sum variable, and then printing it out with the END
action in Megabytes:
$ awk 'BEGIN { sum=0 } /firefox/ { sum+=$6 } END { printf("Total Firefox memory: %.0f MB\n", sum/1024) }' psux.outTotal Firefox memory: 1747 MB
[ Download now: A sysadmin's guide to Bash scripting. ]
What's next?
gawk
is a powerful and flexible tool to process text data, particularly data arranged in columns. This article provided a few useful examples of using this tool to extract and manipulate data, but gawk
can do much more. For additional information about gawk
, consult the manual pages in your Linux distribution.
The Awk language has many more resources than what we explored in this guide. For detailed information about it, consult the official GNU Awk User's Guide.
Topics: Linux Command line utilities