Composing commands with pipes
Objectives
Learn how to compose commands.
Learn how to run a specific command on a whole bunch of files at once.
Learn how to redirect output to a file.
Instructor note
Demo/teaching: 30 min
How to get the example files to practice with
If you want to type-along with the instructors or try this later on your own following our steps (below), you can download and extract the example like this:
cd
wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz
tar xf exercise-pipes.tar.gz
cd exercise-pipes
The commands above first move you to your home directory, then download and extract a new directory called “exercise-pipes”, and then change your directory into the “exercise-pipes” directory.
Commands which we will use in this episode
find and grep
We have seen these two commands earlier. Can you recall what they do?
$ find SOMEPATH
$ grep "some text" SOMEFILE
Solution
find SOMEPATH
will list all files and directories below SOMEPATH.grep "some text" SOMEFILE
will list all lines inside SOMEFILE, which contain “some text”.
We introduce two new commands. One sorts the file alphabetically (it can do much more), the other removes duplicate repetitive lines:
$ sort SOMEFILE
$ uniq SOMEFILE
Actual examples (let’s try them together in the extracted exercise-pipes
directory):
$ cd exercise-pipes
$ grep "blue" rgb.txt
$ sort rgb.txt
$ uniq repetitive.txt
One more new command: the wc
command one can count words and lines:
$ wc SOMEFILE
$ wc -l SOMEFILE
Composing commands
Now let’s have some fun with pipes! The Unix pipe |
can pipe the output
from one command as input into another command:
We can pipe the output from grep
right into wc -l
for instance:
$ grep "blue" rgb.txt | wc -l
The first command keeps only the lines that contain “blue”. This is piped into
wc -l
which will count how many lines it got.
Redirecting and appending output to a file
Instead of piping output to another command or to “stdout” (standard output), we can redirect it to a file:
$ grep "blue" rgb.txt > blue-colors.txt
We can also append to an existing file with >>
:
$ grep "red" rgb.txt > colors.txt
$ grep "blue" rgb.txt >> colors.txt
$ grep "green" rgb.txt >> colors.txt
The first command created the file colors.txt
. The second and third command
append to the existing file.
Chaining pipes
We can chain commands almost (?) without limits:
$ grep "line" repetitive.txt | sort | uniq | wc -l
$ grep "line" repetitive.txt | sort | uniq > output
Searching through history
As commands get more complex, it becomes more important to know that you can check your command history and even search (grep) through it:
$ history
$ history | grep SOMECOMMAND
For example you might find this useful (what were all my commands which contained “sbatch” in them?):
$ history | grep sbatch
You might find it useful to see the date and time of each command in your command history:
$ export HISTTIMEFORMAT="%Y-%m-%d %T "
$ history
You can add the export HISTTIMEFORMAT="%Y-%m-%d %T "
line to your .bashrc
if you would like to have this feature always enabled.
Finding files and running some command on each of them
A common use case for pipes is to locate files using find
, and to pipe the
result into another command.
This command will find all files which contain the word “error” in their file name:
$ find calculations/ -type f | grep "error"
calculations/03-error.out
calculations/07-error.out
calculations/04-error.out
But what if you want to find all files which contain “error” inside the file? Then you need to do this instead:
$ find calculations/ -type f | xargs grep "error"
calculations/06.out:error: ran out of memory
calculations/08.out:error: ran out of time
Further below we have a more advanced example with xargs
if you want to see
how to make xargs
use the output from find as arguments to a command, where
the arguments are not at the end of the command.
xargs
can also process result from find
in parallel using several
processors with -P
and much more (see help text with $ man xargs
).
Finding large files/folders
Useful examples of using pipes to find the largest files in some folder
Example showing what folder/file is largest:
$ du -h --max-depth=1 SOMEFOLDER | sort -hr
Example showing what folder have the most files
$ find SOMEFOLDER -maxdepth 1 -type d -exec sh -c 'echo -n "{}: "; find "{}" -type f | wc -l' \; | sort -n -k2 -
It is good practice to test xargs commands with “echo” first
Before running commands with xargs
, it is a good idea to test them by
inserting an echo
and see what the command would do. This way you can avoid
mistakes with renaming or removing files by accident.
For example, try it with the above command:
$ find calculations/ -type f | xargs echo grep "error"
Compare the result with the one without echo
.
Using -exec instead of xargs
Some people prefer to use -exec
instead of piping to achieve the same goal of
running a command (in this case grep
) on each result separately:
$ find calculations/ -type f | xargs grep "error"
calculations/06.out:error: ran out of memory
calculations/08.out:error: ran out of time
$ find calculations/ -type f -exec grep "error" {} \;
error: ran out of memory
error: ran out of time
Solving the above problem with a recursive grep
If you want to search through all files in a directory and its subdirectories
you can use the -r
flag with grep
:
$ grep -r "error" calculations/
calculations/06.out:error: ran out of memory
calculations/08.out:error: ran out of time
This is a good alternative to using find
and xargs
in this case.
Exercise
Exercise: Solving typical tasks with composition
We can do this together but we also recommend that you try this later on your own.
Download and extract the exercise folder:
cd wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz tar xvf exercise-pipes.tar.gz cd exercise-pipes
Find all files under
exercise-pipes
that contain “error” in their file name.From file
rgb.txt
, create a filesorted-red-colors.txt
containing all red colors, sorted alphabetically.Find all file names that contain “error” in their file name and for each of them print their last line (last line of a file can be printed with
tail -n 1
).Find all file names that contain “error” inside the file (not just in the file name).
Solution
This step will hopefully produce the exercise folder. Nothing to change here.
Finding all files that contain “error” in their file name can be done with a pipe:
$ find calculations/ | grep "error" calculations/07-error.out calculations/04-error.out calculations/03-error.out
It can also be done using
find
directly:$ find calculations/ -name "*error*" calculations/07-error.out calculations/04-error.out calculations/03-error.out
From file
rgb.txt
, create a filesorted-red-colors.txt
with all red colors, sorted alphabetically:$ grep "red" rgb.txt | sort > sorted-red-colors.txt
This is one way to find all file names that contain “error” and to see the last line in each of them (using
tail -n 1
):$ find calculations/ | grep "error" | xargs tail -n 1 ==> calculations/07-error.out <== calculation timed out ==> calculations/04-error.out <== ran out of disk quota ==> calculations/03-error.out <== ran out of memory
Here we use
grep
on each file separately with the help ofxargs
:$ find calculations/ -type f | xargs grep "error" calculations/06.out:error: ran out of memory calculations/08.out:error: ran out of time
How to safely rename many files at once (advanced)
What if you want to rename many files at once but you don’t want to do it manually file by file (that would be tedious and error-prone)?
For example, we have these files here:
$ find calculations/ -type f
calculations/05.out
calculations/09.out
calculations/01.out
calculations/03-error.out
calculations/02.out
calculations/07-error.out
calculations/06.out
calculations/04-error.out
calculations/08.out
But instead of “something-error.out” I would like them to be called “something-problem.out”?
There are many ways to do this but one nice command for renaming files is rename
.
The following command would rename calculations/03-error.out
to calculations/03-problem.out
:
$ rename "error" "problem" calculations/03-error.out
Here is one way to couple find
with rename
but we added an extra echo
so that we can verify
what this will do, before running the actual command:
$ find calculations/ -type f | grep "error" | xargs -I {} echo rename "error" "problem" {}
rename error problem calculations/03-error.out
rename error problem calculations/07-error.out
rename error problem calculations/04-error.out
The above did not run the rename
commands, only printed them with echo
.
Once we are confident that this is what we wanted to do, we can run it without
the echo
:
$ find calculations/ -type f | grep "error" | xargs -I {} rename "error" "problem" {}
$ find calculations/ -type f
calculations/05.out
calculations/07-problem.out
calculations/09.out
calculations/04-problem.out
calculations/01.out
calculations/02.out
calculations/03-problem.out
calculations/06.out
calculations/08.out
The -I {}
part defines how we want to refer to the files that find
gave us.
You could do this instead with the same effect:
$ find calculations/ -type f | grep "error" | xargs -I _ rename "error" "problem" _
As an exercise, try to rename the files back.
Keypoints
The Unix philosophy is that a command should do one thing and one thing only and to do it well.
With perhaps 10-20 Unix commands we can achieve almost everything imaginable by composing commands.
Unix lets you assemble commands for your usecase instead of giving you programs that can do “everything” but which might be hard to re-assemble and change.