Composing commands with pipes

Objectives

  • Learn how to compose commands.

  • Learn how to redirect output to a file.

Instructor note

  • Demo/teaching: 15 min

  • Exercise: 15 min

Note

  • First we will demonstrate a couple of commands and concepts and later we will use these in an exercise.

  • The first part can be done as demo or as type-along.

  • If you want to type-along with the instructors, you can download and extract the example like this:

    cd
    wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz
    tar xvf exercise-pipes.tar.gz
    cd exercise-pipes
    

Commands which we will use in this episode

find and grep

We have seen these two commands earlier. Can you recall what they do?

$ find somepath
$ grep "some text" somefile

We introduce two new commands. One sorts the file alphabetically (it can do much more), the other removes duplicate repetitive lines:

$ sort somefile
$ uniq somefile

Actual examples (let’s try them together in the extracted exercise-pipes directory):

$ cd exercise-pipes
$ grep "blue" rgb.txt
$ sort rgb.txt
$ uniq repetitive.txt

One more new command: the wc command one can count words and lines:

$ wc somefile
$ wc -l somefile

Composing commands

Now let’s have some fun with pipes! The Unix pipe | can pipe the output from one command as input into another command:

We can pipe the output from grep right into wc -l for instance:

$ grep "blue" rgb.txt | wc -l

The first command keeps only the lines that contain “blue”. This is piped into wc -l which will count how many lines it got.

Redirecting and appending output to a file

Instead of piping output to another command or to “stdout” (standard output), we can redirect it to a file:

$ grep "blue" rgb.txt > blue-colors.txt

We can also append to an existing file with >>:

$ grep "red" rgb.txt > colors.txt
$ grep "blue" rgb.txt >> colors.txt
$ grep "green" rgb.txt >> colors.txt

The first command created the file colors.txt. The second and third command append to the existing file.

Chaining pipes

We can chain commands almost (?) without limits:

$ grep "line" repetitive.txt | sort | uniq | wc -l
$ grep "line" repetitive.txt | sort | uniq > output

History

As commands get more complex, it become more important to know that you can check your command history and even grep through it:

$ history
$ history | grep somecommand

Finding files and running some command on each of them

A common use case for pipes is to locate files using find, and to pipe the result into another command:

$ find calculations/ | grep "error"

But sometimes you want to run a command on each of the file that we found separately. xargs is great for that:

$ find . -type f | xargs wc -l

Discussion

Can you see and understand the difference between these two:

$ find . -type f | xargs wc -l
$ find . -type f | wc -l

xargs can also process result from find in parallel using several processors with -P and much more (man xargs).

Exercise

Exercise (15 min): Compose commands with pipes to form arbitrarily complex commands/queries

It is perfectly fine to only do part of the steps below. They may be too many for 15 minutes. But some of them can be great homework exercise for later.

  1. Download and extract the exercise folder:

    cd
    wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz
    tar xvf exercise-pipes.tar.gz
    cd exercise-pipes
    
  2. Find all files under exercise-pipes that contain “error” in their file name.

  3. From file rgb.txt, create a file sorted-red-colors.txt containing all red colors, sorted alphabetically.

  4. Find all file names that contain “error” and for each of them print their last line (last line of a file can be printed with tail -n 1).

Keypoints

  • The Unix philosophy is that a command should do one thing and one thing only and to do it well.

  • With perhaps 10-20 Unix commands we can achieve almost anything by composing commands.

  • Unix lets you assemble commands for your usecase instead of giving you programs that can do “everything” but would be hard to re-assemble and change.