You a fan of xargs
? Find yourself using the non-POSIX -P
that's available in
a lot of implementations a lot? Check out GNU parallel
.
If that quick paragraph made no sense to you, I'd recommend reading about
xargs
a little before reading this blog post. To sum up, let's say you have a
bunch of files you need to run a process on. Most programs nowadays will let you
feed all of those into them, but back in the day programs wouldn't let you do
this. They'd only take one as input. So what is one to do? xargs
.
If you, like me, always call xargs
with -P
, you'll love parallel
. Let's
take a look at a snippet to get the jist of what I mean:
# Convert all PNG files to WEBP find ./ -type f -name '*.png' | sed 's/\.png$//' | \ xargs -I {} -P $( nproc ) magick {}.png WEBP:{}.webp
Here's equivalent code in parallel
:
# Convert all PNG files to WEBP (using parallel) find ./ -type f -name '*.png' | \ parallel magick {} WEBP:{.}.webp
Not only are we not calling sed anymore, we also have reduced the amount of
flags we need for our parallelization program. parallel
has a very powerful
input manipulation system, which we'll dive into later. A notable change from
xargs
is that each command assumes only one line of input.
One thing I would like to note about parallel
is that the man page seems
almost designed to confuse. It certainly gives the vibes that it's written from
the program's designer's perspective, not necessarily from a user's. I suppose
it sticks with *nix
tradition then.
parallel
is very well suited for this work. For example, here's a script I
wrote (abbreviated):
find ./ -type f -name '*.mp4' | \ parallel \ 'check_file="$( ffprobe -do-the-thing )" if [[ "$check_file" != "" ]]; then if [[ "$check_file" == 1 ]]; then echo {} fi fi' | parallel \ 'ffmpeg -do-some-processing -i {} {.}.mkv'
You'll notice I called parallel twice. Once as a filter, to check if I need to do the processing, and then second as the process that I needed to do.
Before we get into the funky brace expansions, let's check out another really
killer feature of parallel
, --sshlogin
. That's right, you can run processes
on other computers with parallel
! Let's say you have one computer you take
with you everywhere, and you like downloading lots of YouTube videos for
"backup" (that you'll never ever watch).
cat videoList-urls.txt | parallel --sshlogin downloadMachine yt-dlp
That's it. parallel
will invoke yt-dlp
on downloadMachine
using URLs from
videoList-urls.txt
linewise. Here's a breakdown of what it does:
cat videoList-urls.txt https://www.youtube.com/watch?v=M0s4puDX914 https://www.youtube.com/watch?v=fc4HH4xYKns https://www.youtube.com/watch?v=TquNwzLB8Yw
when used as input, parallel
will execute the following statements on
downloadMachine
:
yt-dlp https://www.youtube.com/watch?v=M0s4puDX914 yt-dlp https://www.youtube.com/watch?v=fc4HH4xYKns yt-dlp https://www.youtube.com/watch?v=TquNwzLB8Yw
Neat, eh? As shown above, you can also use arguments to your command:
cat videoList-urls.txt | parallel --sshlogin downloadMachine \ yt-dlp --write-thumbnail --write-info-json --write-subtitles \ --write-comments --sponsorblock-mark all \ --sponsorblock-remove sponsor,intro
This will execute yt-dlp
with all those fancy fun arguments on
downloadMachine
.
This is another one of those killer features that elevate parallel
. Below I
list the ones I find useful, the man page explores these in a helpful way too.
-
{}
- This is the default expansion. Whatever was given as input will be reflected in output.
-
{/}
-
This expands to the basename of input. Effectively, it's as if you ran
basename
on the input. Ex: /path/to/file/video.mp4 -> video.mp4
-
This expands to the basename of input. Effectively, it's as if you ran
-
{.}
- This expands to the input filename minus the file extension. Ex: /path/to/file/video.mp4 -> /path/to/file/video
-
{//}
- This expands to the input filename's directory. Ex: /path/to/file/video.mp4 -> /path/to/file
-
{=perl expression=}
- Yes. You can run perl. Go check the manpage, this shit is awesome.
parallel
has some other cool things too, namely --bar
and --eta
. Ever
wanted a readout from xargs
that'll tell you roughly how much longer it'll
take? Yep. parallel
does that. Personally I've found the estimates to be not
excellent, but having a readout on how many input lines it's completed and how
many more are left to do is reason enough to use it.
There's a lot more cool shit, but that's about as far as I've gone personally. Please email me if you know some more cool shit.