gawk Linux Command {With 10 Examples} | phoenixNAP KB (2024)

Introduction

The gawk command is the GNU version of awk. Gawk is a powerful text-processing and data-manipulating tool with many features and practical uses.

This guide will teach you how to use the Linux gawk command with examples.

gawk Linux Command {With 10 Examples} | phoenixNAP KB (1)

Prerequisites

  • A system running Linux.
  • Access to the terminal.
  • A text file. This tutorial uses the file people as an example.

gawk Linux Command Syntax

The basic gawk syntax looks like this:

gawk [options] [actions/filters] input_file

The command cannot be run without any arguments. The options are not mandatory, but for gawk to produce output, at least one action should be assigned. Actions and filters are different subcommands and selection criteria that enable gawk to manipulate data from the input file.

Note: Encase options and actions in single quotes.

gawk Options

The gawk command is a versatile tool thanks to its numerous arguments. With gawk being the GNU implementation of awk, long, GNU-style options are available. Each long option has a corresponding short one.

Common options are presented below:

OptionDescription
-fprogram-file, --file program-fileReads commands from a file, which serves as a script, instead of the first argument in the terminal.
-Ffs, --field-separator fsUsesthe predefined variable fsas the input field separator.
-vvar=val, --assign var=valAssigns a valueto the variablebefore executing a script.
-b, --characters-as-bytesTreats all data as single-byte characters.
-c,--traditionalExecutes gawk in compatibility mode.
-C,--copyrightDisplays the GNU Copyright message.
-d[file], --dump-variables[=file]Shows a list of variables, their types, and values.
-eprogram-text, --source program-textAllows the mixing of library functions and source code.
-Efile,--execfileTurns off terminal variable assignments.
-L[value], --lint[=value]Prints warning messages about code not portable to other AWK implementations.
-S,--sandboxRunsgawk in sandbox mode.

gawk Built-in Variables

The gawk command offers several built-in variables used to store and add value to the command. Variables are manipulated from the terminal and only affect the program when a user assigns value to them. Some important gawk built-in variables are:

VariableDescription
ARGCShows the number of terminal arguments.
ARGINDDisplays the ARGV file index.
ARGVPresents an array of terminal arguments.
ERRNOContains strings describing a system error.
FIELDWIDTHSDisplays white-space separated list of field widths.
FILENAMEPrints the input file name.
FNRShows input record number.
FSRepresents the input field separator.
IGNORECASETurns case-sensitive search on or off.
NFPrints the input file field count.
NRPrints the current file line count.
OFSDisplays the output field separator.
ORSShows the output record separator.
RSPrints the input record separator.
RSTARTRepresents the index of the first matched character.
RLENGTHRepresents the matched string length.

gawk Examples

The use of gawk pattern-matching and language-processing functions are extensive. This article aims to provide practical examples through which users learn to use the gawk utility.

Important: The gawk command is case-sensitive. Use the IGNORECASE variable to ignore case.

Print Files

By default, gawk with a print argument displays every line from the specified file. For instance, running the cat command on the people text file prints the following:

gawk Linux Command {With 10 Examples} | phoenixNAP KB (2)

The gawk command displays the same result:

gawk '{print}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (3)

Print a Column

In text files, spaces are usually used as delimiters for columns. The people file consists of four columns:

  1. Ordinal numbers.
  2. First names.
  3. Last names.
  4. Year of birth.

Use gawk to show only a specific column in the terminal. For instance:

gawk '{print $2}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (4)

The command prints only the second column. To print multiple columns, like column one (ordinal numbers) and column two (first names), run:

gawk '{print $1, $2}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (5)

The gawk command also works without the comma between $1 and $2. However, there are no spaces between columns in the output:

gawk '{print $1 $2}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (6)

Filter Columns

The gawk command offers additional filtering options. For instance, print lines containing the capital letter O with:

gawk '/O/ {print}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (7)

To show only lines containing letters O or A, use piping:

gawk '/O|A/ {print}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (8)

The command prints any line that includes a word with capital O or A. On the other hand, use logical AND (&&) to show lines including both O and the year 1995:

gawk '/O/ && /1995/' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (9)

The filters work with numbers as well. For example, show only people born in the 1990s with:

gawk '/199*/ {print}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (10)

The output shows only lines in which the fourth column includes the value 199.

Customize the output even more by combining previously mentioned options. For example, print only the first and last names of people born in 1995 or 2003 with:

gawk '/1995|2003/ {print $2, $3}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (11)

The command prints columns two and three as stated in the {print $2, $3} part. The output only shows lines containing the numbers 1995 and 2003, even though columns containing those numbers are hidden.

The gawk command also lets users print everything except for the lines containing the specified string with the logical NOT(!). For instance, omit lines containing the string 19 in the output:

gawk '!/19/' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (12)

Add Line Numbers

The people file includes line numbers in the first column. In case users are working on a file without line numbers, gawk presents options to add them.

For instance, the humans file doesn't include any ordinal numbers:

gawk Linux Command {With 10 Examples} | phoenixNAP KB (13)

To add line numbers, execute gawk with FNR and next:

gawk '{ print FNR, $0; next}' humans
gawk Linux Command {With 10 Examples} | phoenixNAP KB (14)

The command adds a line number before each line. The same result is achieved with the NR variable:

gawk '{print NR, $0}' mobile.txt
gawk Linux Command {With 10 Examples} | phoenixNAP KB (15)

Find Line Count

To count the total number of lines in the file, use the END statement and the NR variable with gawk:

gawk 'END { print NR }' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (16)

The command reads each line. Once gawk reaches END, it prints the value of NR - which contains the total number of lines. Running the same command without the END statement prints only the value of NR - the number of lines:

gawk Linux Command {With 10 Examples} | phoenixNAP KB (17)

Filter Lines Based on Length

Use the following command option to print only lines longer than 20 characters:

gawk 'length>20' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (18)

It also works with multiple arguments. For instance, show lines longer than 17 but shorter than 20 characters:

gawk 'length<20 && length>17' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (19)

To display lines that are exactly 20 characters long, run:

gawk 'length==20' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (20)

Print Info Based on Conditions

The gawk command allows for the use of the if-else statements. For instance, another way to filter only people born after 1999 is with a simple if statement:

gawk '{ if ($4>1999) print }' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (21)

The if statement sets the condition that entries in column four have to be larger than 1999. The output shows only entries that satisfy the condition. Expand the command into an if-else statement to print lines not satisfying the original condition.

gawk '{if ($4>1999) print $0," ==>00s"; else print $0, "==>90s"}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (22)

The command includes:

  • If statement. If the condition is satisfied, gawk adds a string "==>90s" to the output line.
  • Else statement. In case the line doesn't satisfy the condition, gawk still prints that line in the output, adding the "==>00s" string to the output.

Add a Header

In the same way in which the END statement allows users to modify the output at the end of the file, the BEGIN statement formats the data at the beginning.

When used with awk, the BEGIN sections are always executed first. After that, awk executes the remaining lines. One way to use the BEGIN statement is to add a header to the output.

Execute the following command to add a section above the awk output:

gawk 'BEGIN {print "No/First&Last Name/Year of Birth"} {print $0}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (23)

Find the Longest Line Length

Combine previous arguments with the if and END statementsto find the longest line in the people file:

gawk '{ if (length($0) > max) max = length($0) } END { print max }' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (24)

Find the Number of Fields

The gawk command also allows users to display the number of fields with the NF variable. The simplest way to display the number of fields prints a difficult-to-read output:

gawk '{print NF}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (25)

The command outputs the number of fields per line without any additional info. To customize the output and make it more human-readable, adjust the initial command:

gawk '{print NR, "-->", NF}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (26)

The command now includes:

  • The NR variable that adds line numbers to each output line.
  • The --> string that separates line numbers from the field numbers.

Another way to show line and field numbers in the people file is to print columns with NF. Note that the people file includes ordinal numbers in column one. Therefore the NR variable is omitted:

gawk '{print $0, "-->", NF}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (27)

Finally, to print the total number of fields, execute:

gawk '{num_fields = num_fields + NF} END {print num_fields}' people
gawk Linux Command {With 10 Examples} | phoenixNAP KB (28)

The file does have ten lines and four columns. Hence, the output is correct.

Conclusion

After going through this tutorial, you know how to use thegawkfor advanced text processing and data manipulation.

Also consider using grep, a powerful Linux tool for searching for strings, words, and patterns.

gawk Linux Command {With 10 Examples} | phoenixNAP KB (2024)

References

Top Articles
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 5754

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.