# Composing commands with pipes ```{objectives} - Learn how to compose commands. - Learn how to run a specific command on a whole bunch of files at once. - Learn how to redirect output to a file. ``` ```{instructor-note} - Demo/teaching: 30 min ``` ````{admonition} How to get the example files to practice with If you want to type-along with the instructors or try this later on your own following our steps (below), you can download and extract the example like this: ``` cd wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz tar xf exercise-pipes.tar.gz cd exercise-pipes ``` The commands above first move you to your home directory, then download and extract a new directory called "exercise-pipes", and then change your directory into the "exercise-pipes" directory. ```` ## Commands which we will use in this episode ````{discussion} find and grep We have seen these two commands earlier. Can you recall what they do? ```console $ find SOMEPATH $ grep "some text" SOMEFILE ``` ```{solution} - `find SOMEPATH` will list all files and directories below SOMEPATH. - `grep "some text" SOMEFILE` will list all lines inside SOMEFILE, which contain "some text". ``` ```` We introduce two new commands. One sorts the file alphabetically (it can do much more), the other removes duplicate repetitive lines: ```console $ sort SOMEFILE $ uniq SOMEFILE ``` Actual examples (let's try them together in the extracted `exercise-pipes` directory): ```console $ cd exercise-pipes $ grep "blue" rgb.txt $ sort rgb.txt $ uniq repetitive.txt ``` One more new command: the `wc` command one can count words and lines: ```console $ wc SOMEFILE $ wc -l SOMEFILE ``` ## Composing commands Now let's have some fun with pipes! The Unix pipe `|` can pipe the output from one command as input into another command: We can pipe the output from `grep` right into `wc -l` for instance: ```console $ grep "blue" rgb.txt | wc -l ``` The first command keeps only the lines that contain "blue". This is piped into `wc -l` which will count how many lines it got. ## Redirecting and appending output to a file Instead of piping output to another command or to "stdout" (standard output), we can redirect it to a file: ```console $ grep "blue" rgb.txt > blue-colors.txt ``` We can also append to an existing file with `>>`: ```console $ grep "red" rgb.txt > colors.txt $ grep "blue" rgb.txt >> colors.txt $ grep "green" rgb.txt >> colors.txt ``` The first command created the file `colors.txt`. The second and third command append to the existing file. ## Chaining pipes We can chain commands almost (?) without limits: ```console $ grep "line" repetitive.txt | sort | uniq | wc -l $ grep "line" repetitive.txt | sort | uniq > output ``` ## Searching through history As commands get more complex, it becomes more important to know that you can check your command history and even search (grep) through it: ```console $ history $ history | grep SOMECOMMAND ``` For example you might find this useful (what were all my commands which contained "sbatch" in them?): ```console $ history | grep sbatch ``` You might find it useful to see the date and time of each command in your command history: ```console $ export HISTTIMEFORMAT="%Y-%m-%d %T " $ history ``` You can add the `export HISTTIMEFORMAT="%Y-%m-%d %T "` line to your `.bashrc` if you would like to have this feature always enabled. ## Finding files and running some command on each of them A common use case for pipes is to locate files using `find`, and to pipe the result into another command. This command will find all files which contain the word "error" **in their file name**: ```console $ find calculations/ -type f | grep "error" calculations/03-error.out calculations/07-error.out calculations/04-error.out ``` But what if you want to find all files which contain "error" **inside the file**? Then you need to do this instead: ```console $ find calculations/ -type f | xargs grep "error" calculations/06.out:error: ran out of memory calculations/08.out:error: ran out of time ``` Further below we have a more advanced example with `xargs` if you want to see how to make `xargs` use the output from find as arguments to a command, where the arguments are not at the end of the command. `xargs` can also process result from `find` in parallel using several processors with `-P` and much more (see help text with `$ man xargs`). ### Finding large files/folders Useful examples of using pipes to find the largest files in some folder Example showing what folder/file is largest: ```console $ du -h --max-depth=1 SOMEFOLDER | sort -hr ``` Example showing what folder have the most files ```console $ find SOMEFOLDER -maxdepth 1 -type d -exec sh -c 'echo -n "{}: "; find "{}" -type f | wc -l' \; | sort -n -k2 - ``` ````{admonition} It is good practice to test xargs commands with "echo" first Before running commands with `xargs`, it is a good idea to test them by inserting an `echo` and see what the command would do. This way you can avoid mistakes with renaming or removing files by accident. For example, try it with the above command: ```console $ find calculations/ -type f | xargs echo grep "error" ``` Compare the result with the one without `echo`. ```` ````{admonition} Using -exec instead of xargs Some people prefer to use `-exec` instead of piping to achieve the same goal of running a command (in this case `grep`) on each result separately: ```console $ find calculations/ -type f | xargs grep "error" calculations/06.out:error: ran out of memory calculations/08.out:error: ran out of time $ find calculations/ -type f -exec grep "error" {} \; error: ran out of memory error: ran out of time ``` ```` ````{admonition} Solving the above problem with a recursive grep If you want to search through all files in a directory and its subdirectories you can use the `-r` flag with `grep`: ```console $ grep -r "error" calculations/ calculations/06.out:error: ran out of memory calculations/08.out:error: ran out of time ``` This is a good alternative to using `find` and `xargs` in this case. ```` ## Exercise `````{exercise} Exercise: Solving typical tasks with composition We can do this together but we also recommend that you try this later on your own. 1. Download and extract the exercise folder: ``` cd wget https://gitlab.sigma2.no/training/tutorials/unix-for-hpc/-/raw/master/content/episodes/pipes/exercise-pipes.tar.gz tar xvf exercise-pipes.tar.gz cd exercise-pipes ``` 1. Find all files under `exercise-pipes` that contain "error" **in their file name**. 1. From file `rgb.txt`, create a file `sorted-red-colors.txt` containing all red colors, **sorted alphabetically**. 1. Find all file names that contain "error" in their file name and for each of them **print their last line** (last line of a file can be printed with `tail -n 1`). 1. Find all file names that contain "error" **inside the file** (not just in the file name). ````{solution} 1. This step will hopefully produce the exercise folder. Nothing to change here. 1. Finding all files that contain "error" in their file name can be done with a pipe: ```console $ find calculations/ | grep "error" calculations/07-error.out calculations/04-error.out calculations/03-error.out ``` It can also be done using `find` directly: ```console $ find calculations/ -name "*error*" calculations/07-error.out calculations/04-error.out calculations/03-error.out ``` 1. From file `rgb.txt`, create a file `sorted-red-colors.txt` with all red colors, sorted alphabetically: ```console $ grep "red" rgb.txt | sort > sorted-red-colors.txt ``` 1. This is one way to find all file names that contain "error" and to see the last line in each of them (using `tail -n 1`): ```console $ find calculations/ | grep "error" | xargs tail -n 1 ==> calculations/07-error.out <== calculation timed out ==> calculations/04-error.out <== ran out of disk quota ==> calculations/03-error.out <== ran out of memory ``` 1. Here we use `grep` on each file separately with the help of `xargs`: ```console $ find calculations/ -type f | xargs grep "error" calculations/06.out:error: ran out of memory calculations/08.out:error: ran out of time ``` ```` ````` --- ## How to safely rename many files at once (advanced) What if you want to rename many files at once but you don't want to do it manually file by file (that would be tedious and error-prone)? For example, we have these files here: ```console $ find calculations/ -type f calculations/05.out calculations/09.out calculations/01.out calculations/03-error.out calculations/02.out calculations/07-error.out calculations/06.out calculations/04-error.out calculations/08.out ``` But instead of "something-error.out" I would like them to be called "something-problem.out"? There are many ways to do this but one nice command for renaming files is `rename`. The following command would rename `calculations/03-error.out` to `calculations/03-problem.out`: ```console $ rename "error" "problem" calculations/03-error.out ``` Here is one way to couple `find` with `rename` but we added an extra `echo` so that we can verify what this will do, before running the actual command: ```console $ find calculations/ -type f | grep "error" | xargs -I {} echo rename "error" "problem" {} rename error problem calculations/03-error.out rename error problem calculations/07-error.out rename error problem calculations/04-error.out ``` The above did not run the `rename` commands, only printed them with `echo`. Once we are confident that this is what we wanted to do, we can run it without the `echo`: ```console $ find calculations/ -type f | grep "error" | xargs -I {} rename "error" "problem" {} $ find calculations/ -type f calculations/05.out calculations/07-problem.out calculations/09.out calculations/04-problem.out calculations/01.out calculations/02.out calculations/03-problem.out calculations/06.out calculations/08.out ``` The `-I {}` part defines how we want to refer to the files that `find` gave us. You could do this instead with the same effect: ```console $ find calculations/ -type f | grep "error" | xargs -I _ rename "error" "problem" _ ``` As an exercise, try to rename the files back. --- ```{keypoints} - The Unix philosophy is that a command should do one thing and one thing only and to do it well. - With perhaps 10-20 Unix commands we can achieve almost everything imaginable by composing commands. - Unix lets you assemble commands for your usecase instead of giving you programs that can do "everything" but which might be hard to re-assemble and change. ```