I posted already this question to SO but got no answer so I try it now here:
In some crunching number program, I have a function which can be just 1 or 0 in three dimensions. I do not know in advance the function, but I need to know the total "surface" of the function which is equal to zero. In a similar problem I could draw a rectangle over the 2D representation of the map of United Kingdom. The function is equal to 0 at sea, and 1 at the earth. I need to know the total water surface. I wonder what is the best parallel algorithm or method for doing this.
I thought first about the following approach; a) divide 2D map area into a rectangular grid. For each point that belongs to the center of each cell, check whether it is earth of water. This can be done in parallel. At the end of the procedure I will have a matrix with ones and zeroes. I will get the area with some precision. Now I want to increase this precision, so b) choose the cells that are in the border regions between zeroes and ones (what is the best criterion for doing this?) and in those cells, divide them again into successive cells and repeat the process until one gets the desired accuracy. I guess that in this process, the critical parameters are the grid size for each new stage, and how to store and check the cells that belong to the border area. Finally the most optimal method, from the computational point of view, is the one that performs the minimal number of checks in order to get the value of the total surface with the desired accuracy.
You can partition your domain for subsequent parallel processing as follows. There are a few possible options. You can try a static geometric decomposition, including recursive bisection, quad or oct-trees and a space-filling curve.
Recursive bisection techniques are used to partition a domain (e.g., a finite element grid) into subdomains of approximately equal computational cost while attempting to minimize communication costs, that is, the number of channels crossing task boundaries A divide and conquer approach is taken. The domain is first cut in one dimension to yield two subdomains. Cuts are then made recursively in the new subdomains until we have as many subdomains as we require tasks. Notice that this recursive strategy allows the partitioning algorithm itself to be executed in parallel. Advanced variants include recursive graph bisection and recursive spectral bisection.
Here is an example. In the following picture, assume that magenta points requires twice as much work as cyan ones.
Then, you continue applying recursive bisection by cycling through the axes until # pieces = # processors, obtaining something like the following:
The idea behind quad-trees is the following one. Partition all of space into quadrants (octants in 3-space). If any piece contains too much work, partition it into quadrants, and so on. See the next picture for an example.
The best general-purpose geometric load-balancing comes from space-lling curves. The order in which points are visited in the space-lling curve determines how the geometric objects are grouped together to be assigned to the processors. The following image shows an Hilbert space-filling curve.
You problem may also be amenable to a solution using what is known as a scattered decomposition. Used when there is a structured domain space (e.g., an image) and the processing requirements are clustered, such as modeling a crash or processing an image with only a few items of interest. Suppose there are P processors. Cover the problem domain with non-overlapping copies of a grid of size P and assign each processor a cell in each of the grids. If you use more pieces:
Finding a good tradeoff may require experimenting running the application and measuring how parallel time is affected by different scattering strategies.
Here is an example. The first picture show an example image that must be processed. Say it contains sea water and an island, and you may want to process the vegetation on the island. The image is partitioned using a very simple domain decomposition technique. The second picture shows an example of scattered decomposition.
Finally, you can use an advanced technique known as Adaptive Mesh refinement (AMR): grids are broken into blocks of xed extents, when needed blocks are rened into children with same extents. Block size are chosen to optimize balance of cache efciency (optimization possible since loops operate on xed size array), communication, and load balance. Often balancing blocks per processor sufces; if communication is excessive then use a space-lling curve. Rebalancing requires only simple collective communication operations to decide where blocks go. The technique can be used for arbitrary dimensions. If block size is 1, adaptive blocks are quad or oct-trees. See the next picture for an example.
Hope this helps.
Local articles referenced by this article: