Linux/BSD command line wizardry: Learn to think in sed, awk, and grep – Ars Technica

IT programmer as genius or wizard sitting behind computer.

As a relatively isolated junior sysadmin, I remember seeing answers on Experts Exchange and later Stack Exchange that baffled me. Authors and commenters might chain 10 commands together with pipes and angle brackets—something I never did in day-to-day system administration. Honestly, I doubted the real-world value of that. Surely, this was just an exercise in e-braggadocio, right?

Trying to read the man pages for the utilities most frequently seen in these extended command chains didn’t make them seem more approachable, either. For example, the sed man page weighs in at around 1,800 words alone without ever really explaining how regular expressions work or the most common uses of sed itself.

If you find yourself in the same boat, grab a beverage and buckle in. Instead of giving you encyclopedic listings of every possible argument and use case for each of these ubiquitous commands, we’re going to teach you how to think about them—and how to easily, productively incorporate them in your own daily command-line use.

Redirection 101

Before we can talk about sed, awk, and grep, we need to talk about something a bit more basic—command-line redirection. Again, we’re going to keep this very simple:

OperatorFunctionExample
;Process the command on the right after you’re done processing the command on the left.echo one ; echo two
>Place the output of the thing on the left in the empty file named on the right.ls /home/me > myfilesonce.txt ; ls /home/me > myfilesonce.txt
>>Append the output of the thing on the left to the end of the existing file on the right.ls /home/me > myfilestwice.txt ; ls /home/me >> myfilestwice.txt
<Use the file on the right as the standard input of the command on the left.cat < sourcefile > targetfile
|Pipe the standard output of the thing on the left into the standard input of the thing on the right.echo “test123” | mail -s “subjectline” emailaddress

Understanding these redirection operators is crucial to understanding the kinds of wizardly command lines you’re presumably here to learn. They make it possible to treat individual, simple utilities as part of a greater whole.

And that last concept—breaking one complex task into several simpler tasks—is equally necessary to learning to think in complex command-line invocations in the first place!

Grep finds strings

When first learning about tools like grep, I find it helps to think of them as far simpler than they truly are. In that vein, grep is the tool you use to find lines that contain a particular string of text.

For example, let’s say you’re interested in finding which ports the apache web browser has open on your system. Many utilities can accomplish this goal; netstat is one of the older and better-known options. Typically, we’d invoke netstat using the -anp arguments—for all sockets, numeric display, and displaying the owning pid of each socket.

Unfortunately, this produces a lot of output—frequently, several tens of pages. You could just pipe all that output to a pager, so you can read it one page at a time, with netstat -anp | less. Or, you might instead redirect it to a file to be opened with a text editor: netstat -anp > netstat.txt.

But there’s a better option. Instead, we can use grep to return only the lines we really want. In this case, what we want to know about is the apache webserver. So:

me@banshee:~$ sudo netstat -anp | head -n5
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 192.168.188.1:53        0.0.0.0:*               LISTEN      5128/dnsmasq        
tcp        0      0 192.168.254.1:53        0.0.0.0:*               LISTEN      5057/dnsmasq        
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      4893/dnsmasq        

me@banshee:~$ sudo netstat -anp | wc -l
1694

me@banshee:~$ sudo netstat -anp | grep apache
tcp6       0      0 :::80                   :::*                    LISTEN      4011/apache2  

me@banshee:~$ sudo netstat -anp | head -n2 ; sudo netstat -anp | grep apache
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6       0      0 :::80                   :::*                    LISTEN      4011/apache2  

We introduced some new commands above: head, which limits output to the first n lines and then truncates it. There’s also wc, which, with the argument -l, tells you how many lines of text hit its standard input.

So we can translate the four commands above into plain English:

  1. sudo netstat -anp | head -n5 : “Find all the open network sockets, but limit output to the first five lines.”
  2. sudo netstat -anp | wc -l : “Find all the open network sockets, then tell me how many total lines of text you’d have used to tell me.”
  3. sudo netstat -anp | grep apache : “Find all the open network sockets, but only show me the results that include the word ‘apache.'”
  4. sudo netstat -anp | head -n2 ; sudo netstat -anp | grep apache : “Find all the open network sockets, but only show me the two header lines—then do it again, but only show me the ‘apache’ results.”

By thinking of grep as something much simpler than it actually is, we can jump immediately to finding productive ways to use it—and we can chain these simple uses together to easily describe more complex tasks!

Once you’re comfortable with using grep to find simple strings as seen above, it can do far more complex tasks. These include but are not limited to: case-insensitive use, more complex patterns (including full regular expressions), exclusion (only show me lines that don’t include the pattern), and much, much more. But don’t worry about that until after you’re familiar with simple grep uses. Once you start, it’s truly hard to imagine life without grep anymore!