Consider the following scenario. I have two programs A and B. Program A outputs to stdout lines of strings while program B process lines from stdin. The way to use these two programs is of course:
foo@bar:~$ A | B
Now I've noticed that this eats up only one core; hence I am wondering:
Are programs A and B sharing the same computational resources? If so, is there a way to run A and B concurrently?
Another thing that I've noticed is that A runs much much faster than B, hence I am wondering if could somehow run more B programs and let them process the lines that A outputs in parallel.
That is, A would output its lines, and there would be N instances of programs B that would read these lines (whoever reads them first) process them and output them on stdout.
So my final question is:
Is there a way to pipe the output to A among several B processes without having to take care of race conditions and other inconsistencies that could potentially arise?
A problem with split --filter
is that the output can be mixed up, so you get half a line from process 1 followed by half a line from process 2.
GNU Parallel guarantees there will be no mixup.
So assume you want to do:
A | B | C
But that B is terribly slow, and thus you want to parallelize that. Then you can do:
A | parallel --pipe B | C
GNU Parallel by default splits on \n and a block size of 1 MB. This can be adjusted with --recend and --block.
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
--block-size
will depend on the amount of RAM and how fast you can start a new B
. In your situation I would use --block 100M
and see how that performed. — Jun 15, 2013 at 13:11 sh
- is great. The problem lies in passing it to sh: downloading and running executable code from a site. Mind you, maybe I'm just being too paranoid, since one could object that a custom-made RPM or DEB is basically the same thing, and even posting the code on a page to be copied and pasted would result in people doing so blindly anyway. — Jun 15, 2013 at 13:43 External links referenced by this document: