Composing commands with pipes

Objectives

  • Learn how to compose commands.

  • Learn how to run a specific command on a whole bunch of files at once.

  • Learn how to redirect output to a file.

Instructor note

  • Demo/teaching: 30 min

How to get the example files to practice with

If you want to type-along with the instructors or try this later on your own following our steps (below), you can download and extract the example like this:

cd
wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz
tar xf exercise-pipes.tar.gz
cd exercise-pipes

The commands above first move you to your home directory, then download and extract a new directory called “exercise-pipes”, and then change your directory into the “exercise-pipes” directory.

Commands which we will use in this episode

find and grep

We have seen these two commands earlier. Can you recall what they do?

$ find SOMEPATH
$ grep "some text" SOMEFILE

We introduce two new commands. One sorts the file alphabetically (it can do much more), the other removes duplicate repetitive lines:

$ sort SOMEFILE
$ uniq SOMEFILE

Actual examples (let’s try them together in the extracted exercise-pipes directory):

$ cd exercise-pipes
$ grep "blue" rgb.txt
$ sort rgb.txt
$ uniq repetitive.txt

One more new command: the wc command one can count words and lines:

$ wc SOMEFILE
$ wc -l SOMEFILE

Composing commands

Now let’s have some fun with pipes! The Unix pipe | can pipe the output from one command as input into another command:

We can pipe the output from grep right into wc -l for instance:

$ grep "blue" rgb.txt | wc -l

The first command keeps only the lines that contain “blue”. This is piped into wc -l which will count how many lines it got.

Redirecting and appending output to a file

Instead of piping output to another command or to “stdout” (standard output), we can redirect it to a file:

$ grep "blue" rgb.txt > blue-colors.txt

We can also append to an existing file with >>:

$ grep "red" rgb.txt > colors.txt
$ grep "blue" rgb.txt >> colors.txt
$ grep "green" rgb.txt >> colors.txt

The first command created the file colors.txt. The second and third command append to the existing file.

Chaining pipes

We can chain commands almost (?) without limits:

$ grep "line" repetitive.txt | sort | uniq | wc -l
$ grep "line" repetitive.txt | sort | uniq > output

Searching through history

As commands get more complex, it becomes more important to know that you can check your command history and even search (grep) through it:

$ history
$ history | grep SOMECOMMAND

For example you might find this useful (what were all my commands which contained “sbatch” in them?):

$ history | grep sbatch

You might find it useful to see the date and time of each command in your command history:

$ export HISTTIMEFORMAT="%Y-%m-%d %T "
$ history

You can add the export HISTTIMEFORMAT="%Y-%m-%d %T " line to your .bashrc if you would like to have this feature always enabled.

Finding files and running some command on each of them

A common use case for pipes is to locate files using find, and to pipe the result into another command.

This command will find all files which contain the word “error” in their file name:

$ find calculations/ -type f | grep "error"

calculations/03-error.out
calculations/07-error.out
calculations/04-error.out

But what if you want to find all files which contain “error” inside the file? Then you need to do this instead:

$ find calculations/ -type f | xargs grep "error"

calculations/06.out:error: ran out of memory
calculations/08.out:error: ran out of time

Further below we have a more advanced example with xargs if you want to see how to make xargs use the output from find as arguments to a command, where the arguments are not at the end of the command.

xargs can also process result from find in parallel using several processors with -P and much more (see help text with $ man xargs).

Finding large files/folders

Useful examples of using pipes to find the largest files in some folder

Example showing what folder/file is largest:

$ du -h --max-depth=1 SOMEFOLDER | sort -hr

Example showing what folder have the most files

$ find SOMEFOLDER -maxdepth 1 -type d -exec sh -c 'echo -n "{}: "; find "{}" -type f | wc -l' \; | sort -n -k2 -

It is good practice to test xargs commands with “echo” first

Before running commands with xargs, it is a good idea to test them by inserting an echo and see what the command would do. This way you can avoid mistakes with renaming or removing files by accident.

For example, try it with the above command:

$ find calculations/ -type f | xargs echo grep "error"

Compare the result with the one without echo.

Using -exec instead of xargs

Some people prefer to use -exec instead of piping to achieve the same goal of running a command (in this case grep) on each result separately:

$ find calculations/ -type f | xargs grep "error"

calculations/06.out:error: ran out of memory
calculations/08.out:error: ran out of time

$ find calculations/ -type f -exec grep "error" {} \;

error: ran out of memory
error: ran out of time

Solving the above problem with a recursive grep

If you want to search through all files in a directory and its subdirectories you can use the -r flag with grep:

$ grep -r "error" calculations/

calculations/06.out:error: ran out of memory
calculations/08.out:error: ran out of time

This is a good alternative to using find and xargs in this case.

Exercise

Exercise: Solving typical tasks with composition

We can do this together but we also recommend that you try this later on your own.

  1. Download and extract the exercise folder:

    cd
    wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz
    tar xvf exercise-pipes.tar.gz
    cd exercise-pipes
    
  2. Find all files under exercise-pipes that contain “error” in their file name.

  3. From file rgb.txt, create a file sorted-red-colors.txt containing all red colors, sorted alphabetically.

  4. Find all file names that contain “error” in their file name and for each of them print their last line (last line of a file can be printed with tail -n 1).

  5. Find all file names that contain “error” inside the file (not just in the file name).


How to safely rename many files at once (advanced)

What if you want to rename many files at once but you don’t want to do it manually file by file (that would be tedious and error-prone)?

For example, we have these files here:

$ find calculations/ -type f

calculations/05.out
calculations/09.out
calculations/01.out
calculations/03-error.out
calculations/02.out
calculations/07-error.out
calculations/06.out
calculations/04-error.out
calculations/08.out

But instead of “something-error.out” I would like them to be called “something-problem.out”?

There are many ways to do this but one nice command for renaming files is rename. The following command would rename calculations/03-error.out to calculations/03-problem.out:

$ rename "error" "problem" calculations/03-error.out

Here is one way to couple find with rename but we added an extra echo so that we can verify what this will do, before running the actual command:

$ find calculations/ -type f | grep "error" | xargs -I {} echo rename "error" "problem" {}

rename error problem calculations/03-error.out
rename error problem calculations/07-error.out
rename error problem calculations/04-error.out

The above did not run the rename commands, only printed them with echo. Once we are confident that this is what we wanted to do, we can run it without the echo:

$ find calculations/ -type f | grep "error" | xargs -I {} rename "error" "problem" {}

$ find calculations/ -type f

calculations/05.out
calculations/07-problem.out
calculations/09.out
calculations/04-problem.out
calculations/01.out
calculations/02.out
calculations/03-problem.out
calculations/06.out
calculations/08.out

The -I {} part defines how we want to refer to the files that find gave us. You could do this instead with the same effect:

$ find calculations/ -type f | grep "error" | xargs -I _ rename "error" "problem" _

As an exercise, try to rename the files back.


Keypoints

  • The Unix philosophy is that a command should do one thing and one thing only and to do it well.

  • With perhaps 10-20 Unix commands we can achieve almost everything imaginable by composing commands.

  • Unix lets you assemble commands for your usecase instead of giving you programs that can do “everything” but which might be hard to re-assemble and change.