Rule Text Processing in Linux using Grep, Sed, and Awk

Are you a Linux user? Not a lover? 🫣 Are you fully exploring the power of your Linux system? At the end of this article you'll be!

Therefore, it's inevitable to learn some important utilities we often use for text processing in Linux such as "grep", "sed" and "awk". This article will introduce these tools, highlight their key use cases, and provide practical examples to solve common Linux text processing challenges.

Let's jump into the segment without wasting time…

Grep

The name "grep" stands for Global Regular Expression Print, and it's a command line utility in Linux and it's a powerful tool also. Grep stands out as one of the most versatile and essential tool for any Linux user specially if you're gonna perform data searching and extraction from different data types. It's an old Unix utility and released nearly 50 years ago since 2024. You don't need to install this tool. It's pre-installed in Linux. It allows searching for matches to certain text patterns (regular expressions) in files.

# Get help
grep -h

Grep is used for a wide range of text processing tasks, including:

Searching for specific text patterns in files
Filtering outputs of other commands
Extracting useful information from log files
Analyzing large datasets

Basic Syntax:

grep [options] pattern [file_name]

If file name is not provided then "grep" will take standard input from the shell environment.

Use Cases

Mainly it can be used for searching plain text data for lines that match a regular expression.

Note that Grep tool is case sensitive.

Let's try the below command…

echo "Hello World!" | grep "World"

Output:

Hello World!

In the above command, grep takes standard input and processing it to match for the "World" text and it prints the entire line of text. Therefore, need to look at the default behavior of the Grep command, it searches for the pattern match throughout the inputted text and if it finds a match then it prints the entire line that contains the word we're searching for.

Let's try for a text file that contains following data.

File Name: myfile.txt

File Name: myfile.txt
Created At: August 5, 2024
Updated At: August 9, 2024
Secure: True

cat myfile.txt | grep "Name"

Output:

File Name: myfile.txt

Also we can run like the below if we're in the input file directory:

grep "Name" myfile.txt

Or need to provide an absolute path of the input file:

grep "Name" /home/santhos/Desktop/myfile.txt

Now you know that Grep tool finds the specified text in the input stream and prints the entire line if there's a match.

Now, let's update the "myfile.txt" file and try to get the same text we tried to get before.

File Name: myfile.txt

File Name: myfile.txt
Created At: August 5, 2024
Updated At: August 9, 2024
Secure: True

Author Name: Santhos Sunthar
Function name: UtilityTest

So, now we could see the output like below…

cat myfile.txt | grep "Name"

Output:

File Name: myfile.txt
Author Name: Santhos Sunthar

Therefore, now you could get an idea about the basic function of this tool and the output pattern.

Let's do something differently!

cat myfile.txt | grep -i "Name"

Output:

File Name: myfile.txt
Author Name: Santhos Sunthar
Function name: UtilityTest

Have you noticed the difference between the previous command and the last one? We've added "-i" attribute to tell grep to perform case insensitive search right. Therefore, we could get results for "Name" and "name" keywords.

How can we get line numbers of the word we search in the specified txt file? We need to add an another attribute "-n" to get line numbers.

cat myfile.txt | grep -i -n "Name"

Output:

3:File Name: myfile.txt
9:Author Name: Santhos Sunthar
10:Function name: UtilityTest

Sed

"sed" stands for "Stream Editor." It's used for parsing and transforming text in a data stream or a file using a simple and compact programming language.

Use Cases

Substitute text in a file
Remove lines or text patterns
Insert text at specific locations

# Get help
sed -h

It's a powerful and handy tool to substitute text.

echo "Hello World!" | sed s/World/Folks/

Output:

Hello Folks!

What will happen for the below command execution?

echo "Hello World! Ah, add another World..." | sed s/World/Folks/

Output:

Hello Folks! Ah, add another World

As you could see, only the first value of "World" was being substituted but not the second one. Therefore, if we need to substitute all the occurrences of the input data then we have to execute the command like below…

echo "Hello World! Ah, add another World..." | sed s/World/Folks/g

Output:

Hello Folks! Ah, add another Folks

We can substitute text in our myfile.txt too!

sed s/name/Name/g myfile.txt

Output:

# myfile.txt

File Name: myfile.txt
Created At: August 5, 2024
Updated At: August 9, 2024
Secure: True

Author Name: Santhos Sunthar
Function Name: UtilityTest

How can we remove the first line that starts with "#" from our myfile.txt?

sed /^#/d myfile.txt

Output:

File Name: myfile.txt
Created At: August 5, 2024
Updated At: August 9, 2024
Secure: True

Author Name: Santhos Sunthar
Function name: UtilityTest

In the above command execution, all the comments that contains "#" were removed in the "myfile.txt".

Let's insert a new line in the second line of myfile.txt…

sed '2i\# Configuration File' myfile.txt

Output:

# myfile.txt
# Configuration File
File Name: myfile.txt
Created At: August 5, 2024
Updated At: August 9, 2024
Secure: True

Author Name: Santhos Sunthar
Function name: UtilityTest

Also, we can remove blank lines from the myfile.txt.

sed /^$/d myfile.txt

Output:

# myfile.txt
File Name: myfile.txt
Created At: August 5, 2024
Updated At: August 9, 2024
Secure: True
Author Name: Santhos Sunthar
Function name: UtilityTest

Now If we have multiple files, So how can we substitute text in those all files at once? Let's try this command…

sed s/anything/special/g *.txt

Awk

"awk" is a powerful programming language designed for text processing, and It is typically used as a data extraction and reporting tool.

Use Cases

Select specific fields from structured text files
Perform complex text transformations
Summarize data from files

Let's print the first column of myfile.txt.

awk '{print $1}' myfile.txt

Output:

#
File
Created
Updated
Secure:
Author
Function

Let's print the third column of myfile.txt.

awk '{print $3}' myfile.txt

Output:

myfile.txt
August
August

Santhos
UtilityTest

Now you could have an idea on the "awk" command and its text processing function. It goes through all lines one by one and treats spaces as a default delimiter to split a line of text and find the word which is placed in the specified number in the command. For an example, the tool splits the line and find the word in third occurrence in the last command execution.

Can we count the number of lines that starts with "#" in the myfile.txt?

awk '/#/{count++}END{print count}' myfile.txt

Output:

Above output shows that the myfile.txt only contains a commented line that's "# myfile.txt".

Let's filter the "Author Name" and "Function Name"…

awk -F: '/Author Name|Function name/{print $2}' myfile.txt

Output:

Santhos Sunthar
GrepTest

We can format it in a good way!

awk -F': ' '/Author Name|Function name/ {printf "%s: %s\n", $1, $2}' myfile.txt

Output:

Author Name: Santhos Sunthar
Function name: GrepTest

Let's extract the timeline information of myfile.txt…

awk -F': ' '/Created At|Updated At/{print $2}' myfile.txt

Output:

August 5, 2024
August 9, 2024

Let's check whether the myfile.txt is secure or not…

awk -F: '/Secure/{if($2~/True/) print "myfile.txt is secure"; else print "myfile.txt is not secure"}' myfile.txt

Output:

myfile.txt is secure

You learnt these commands now to explore text processing in Linux!

Let's explore the benefits of pipelining these commands…

I'm gonna extract timeline information to modify it. So, let's change the month "August" to "July".

grep 'At:' myfile.txt | sed 's/August 9/August 10/' | awk -F': ' '{print $2}'

Output:

August 5, 2024
August 10, 2024

Let's format the output data nicely using "awk" and "sed".

awk -F': ' '/File Name|Author Name|Secure/ {print $1 ": " $2}' myfile.txt | sed 's/^/-> /'

Output:

-> File Name: myfile.txt
-> Secure: True
-> Author Name: Santhos Sunthar

By combining "grep", "sed" and "awk", you can create powerful and flexible scripts to handle complex text processing tasks in Linux.

Conclusion

I hope you liked this article, and mastering in "grep", "sed" and "awk" can significantly enhance your ability to process and analyze text data in Linux systems. Therefore, why are you waiting to make your hands dirty? Let's practice these amazing utilities in your favorite Linux distro! 🙂

Santhos Suntharalingam

Connect

Rule Text Processing in Linux using Grep, Sed, and Awk

Grep

Basic Syntax:

Use Cases

Sed

Use Cases

Awk

Use Cases

Conclusion

Tags

Recent Posts

Contents