Composing commands with pipes
Objectives
Learn how to compose commands.
Learn how to redirect output to a file.
Instructor note
Demo/teaching: 15 min
Exercise: 15 min
Note
First we will demonstrate a couple of commands and concepts and later we will use these in an exercise.
The first part can be done as demo or as type-along.
If you want to type-along with the instructors, you can download and extract the example like this:
cd wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz tar xvf exercise-pipes.tar.gz cd exercise-pipes
Commands which we will use in this episode
find and grep
We have seen these two commands earlier. Can you recall what they do?
$ find somepath
$ grep "some text" somefile
We introduce two new commands. One sorts the file alphabetically (it can do much more), the other removes duplicate repetitive lines:
$ sort somefile
$ uniq somefile
Actual examples (let’s try them together in the extracted exercise-pipes
directory):
$ cd exercise-pipes
$ grep "blue" rgb.txt
$ sort rgb.txt
$ uniq repetitive.txt
One more new command: the wc
command one can count words and lines:
$ wc somefile
$ wc -l somefile
Composing commands
Now let’s have some fun with pipes! The Unix pipe |
can pipe the output
from one command as input into another command:
We can pipe the output from grep
right into wc -l
for instance:
$ grep "blue" rgb.txt | wc -l
The first command keeps only the lines that contain “blue”. This is piped into
wc -l
which will count how many lines it got.
Redirecting and appending output to a file
Instead of piping output to another command or to “stdout” (standard output), we can redirect it to a file:
$ grep "blue" rgb.txt > blue-colors.txt
We can also append to an existing file with >>
:
$ grep "red" rgb.txt > colors.txt
$ grep "blue" rgb.txt >> colors.txt
$ grep "green" rgb.txt >> colors.txt
The first command created the file colors.txt
. The second and third command
append to the existing file.
Chaining pipes
We can chain commands almost (?) without limits:
$ grep "line" repetitive.txt | sort | uniq | wc -l
$ grep "line" repetitive.txt | sort | uniq > output
History
As commands get more complex, it become more important to know that you can check your command history and even grep through it:
$ history
$ history | grep somecommand
Finding files and running some command on each of them
A common use case for pipes is to locate files using find
, and to pipe the
result into another command:
$ find calculations/ | grep "error"
But sometimes you want to run a command on each of the file that we found
separately. xargs
is great for that:
$ find . -type f | xargs wc -l
Discussion
Can you see and understand the difference between these two:
$ find . -type f | xargs wc -l
$ find . -type f | wc -l
xargs
can also process result from find
in parallel using several
processors with -P
and much more (man xargs
).
Exercise
Exercise (15 min): Compose commands with pipes to form arbitrarily complex commands/queries
It is perfectly fine to only do part of the steps below. They may be too many for 15 minutes. But some of them can be great homework exercise for later.
Download and extract the exercise folder:
cd wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz tar xvf exercise-pipes.tar.gz cd exercise-pipes
Find all files under
exercise-pipes
that contain “error” in their file name.From file
rgb.txt
, create a filesorted-red-colors.txt
containing all red colors, sorted alphabetically.Find all file names that contain “error” and for each of them print their last line (last line of a file can be printed with
tail -n 1
).
Solution
This step will hopefully produce the exercise folder. Nothing to change here.
Finding all files that contain “error” in their file name can be done with a pipe:
$ find calculations/ | grep "error" calculations/07-error.out calculations/04-error.out calculations/03-error.out
It can also be done using
find
directly:$ find calculations/ -name "*error*" calculations/07-error.out calculations/04-error.out calculations/03-error.out
From file
rgb.txt
, create a filesorted-red-colors.txt
with all red colors, sorted alphabetically:$ grep "red" rgb.txt | sort > sorted-red-colors.txt
This is one way to find all file names that contain “error” and to see the last line in each of them (using
tail -n 1
):$ find calculations/ | grep "error" | xargs tail -n 1 ==> calculations/07-error.out <== calculation timed out ==> calculations/04-error.out <== ran out of disk quota ==> calculations/03-error.out <== ran out of memory
Keypoints
The Unix philosophy is that a command should do one thing and one thing only and to do it well.
With perhaps 10-20 Unix commands we can achieve almost anything by composing commands.
Unix lets you assemble commands for your usecase instead of giving you programs that can do “everything” but would be hard to re-assemble and change.