Unix & Linux
linux ssh scp parallelism
Updated Sat, 23 Jul 2022 21:07:31 GMT

How to parallelize the for loop while scp the files?


I am running my below shell script from machineA which is copying the files machineB and machineC into machineA. If the files are not there in machineB, then it should be there in machineC.

The below shell script will copy the files into TEST1 and TEST2 directory in machineA..

#!/bin/bash
set -e
readonly TEST1=/data01/test1
readonly TEST2=/data02/test2
readonly SERVER_LOCATION=(machineB machineC)
readonly FILE_LOCATION=/data/snapshot
dir1=$(ssh -o "StrictHostKeyChecking no" david@${SERVER_LOCATION[0]} ls -dt1 "$FILE_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${SERVER_LOCATION[1]} ls -dt1 "$FILE_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
echo $dir1
echo $dir2
if [ "$dir1" = "$dir2" ]
then
    rm -rf $TEST1/*
    rm -rf $TEST2/*
    for el in $test1_partition
    do
        scp david@${SERVER_LOCATION[0]}:$dir1/pp_monthly_9800_"$el"_200003_5.data $TEST1/. || scp david@${SERVER_LOCATION[1]}:$dir2/pp_monthly_9800_"$el"_200003_5.data $TEST1/.
    done
    for sl in $test2_partition
    do    
        scp david@${SERVER_LOCATION[0]}:$dir1/pp_monthly_9800_"$sl"_200003_5.data $TEST2/. || scp david@${SERVER_LOCATION[1]}:$dir2/pp_monthly_9800_"$sl"_200003_5.data $TEST2/.
    done
fi

Is there a way to run process parallelly in the loop of a bash script

Currently it copies the file from machineB and machineC into machineA TEST1 directory first, and if it is done, then only it will go and copy the files from machineB and machineC into machineA TEST2 directory.. Is there any way I transfer the files both in TEST1 and TEST2 directory simultaneously?

I am running Ubuntu 12.04




Solution

In addition to sending them to the background, use the wait built in to wait for all background processes to finish before continuing.

for el in $test1_partition
do
    (scp david@${SERVER_LOCATION[0]}:$dir1/pp_monthly_9800_"$el"_200003_5.data $TEST1/. || scp david@${SERVER_LOCATION[1]}:$dir2/pp_monthly_9800_"$el"_200003_5.data $TEST1/.) &
    WAITPID="$WAITPID $!"
done
for sl in $test2_partition
do    
    (scp david@${SERVER_LOCATION[0]}:$dir1/pp_monthly_9800_"$sl"_200003_5.data $TEST2/. || scp david@${SERVER_LOCATION[1]}:$dir2/pp_monthly_9800_"$sl"_200003_5.data $TEST2/.) &
    WAITPID="$WAITPID $!"
done
wait $WAITPID
echo "All files done copying."




Comments (5)

  • +0 – Thanks for the suggestion. What will happen in case of any errors? Will I be able to identify those if I go with your suggestion? In general if there is any error, I would like to show them on the console and stopped executing the script so that's why I was using set -e — Feb 16, 2014 at 04:39  
  • +0 – Handle the error in the subshell. I.e., whatever you put after the || is what handles the error. — Feb 16, 2014 at 05:31  
  • +0 – Suppose for whatever reason if the files are not present in both the machine, then what will happen in that case, it will throw an error right? — Feb 16, 2014 at 05:34  
  • +0 – Yes, but you'll still need to handle the error in the subshell. It won't be surfaced to the parent process. — Feb 16, 2014 at 05:42  
  • +0 – hmmm.. Can you provide an example how would I do this in this example? I would like to clearly indicate to parent process about this error.. — Feb 16, 2014 at 05:48  


Linked Articles

Local articles referenced by this article: