Dataflow design pattern for Multicore-enabled programs
0
Multicore technology is the solution chosen by hardware manufacturer to translate the advances in semiconductor manufacturing into more processing power. For years, software developers used to run their code on the newer processor with no (or few) modifications to have it run faster. After the move of multicore this is not the case. Software has to be written in a multithreaded or multiprocess manner to take full advantage of the hardware. Software developers will be forced to develop massively multithreaded programs as a way to better use such processors. Multicore CPUs will make parallelism explicit. After more than 40 years of research in the field of parallelising compilers, the results are still limited.
This shift in software development raises the following challenges:
1-Scalability: how to keep each additional processor busy.
2-Correctness:how to avoid race conditions, deadlocks and other bugs characteristic of multi-processor applications.
3-Ease of programming: How to hide the concurrency control code from the software developer and let him focus on business logic.
4-Testing: How to test such applications while reproducing a bug is not always possible.
Using the low-level imperative ways to handle the concurrency is too low-level to build reliable software. In addition, there is no code reuse. We need to search for design patterns and to build frameworks to deal with concurrency control while keeping the software developer away from this and let him focus on his main job: the business logic.
One of the proposed pattern to handle this problem is the Dataflow pattern. Dataflow is a data-centric architecture that is focused on the flow of data through components of a system. It is a network of concurrently executing processes that can communicate by sending data over channels. It is defined as a directed, acyclic graphs where the shared nothing) executing processes (workers) are the nodes and the dataflow channels connecting them are the edges.
In the Dataflow pattern you first decompose your system into independent operators (workers). These operators will work concurrently. Each operator has a number of input channels and a number of output channels. Two operators are considered parallel if non of their i/o channels are connected. If one (or more) output channel of an operator is connected to one (or more) input channels of another operator then these operators are considered to be pipelined. Both of partitioning and pipelining helps in performance increasing.
Data flow is not the only pattern that may help in solving the architectural problem of multicore enabled programs. You can find many of them here.
Now, let us go with a simple example (extracted from Pervasive Datarush sample code) to show how data flow pattern can help in code re-usability and handling concurrency issues for you.
| UnitPriceSorted.txt ID,CHANNEL,PRICE P001,phone,1.00 P001,retail,1.20 P001,wholesale,0.80 P002,retail,0.95 P004,wholesale,1.00 P005,wholesale,7.25 |
UnitSalesSorted.txt |
Output.txt |
You can download the solution in 3 different ways: 1-Sequential, 2-Using BlockingQueue, 3- Using Pervasive Datarush (download Datarush and find the join sample).
After running these codes on an Intel core duo 2GHz, 2GB RAM , 1 GB JVM heap with 100,000 record per input file, the results are:
Sequential: 1800ms
BlockingQueue: 1260 ms
Datarush 1350: ms
It gives us about 25% speedup. You should note that using 2 parallel processors will not save you half of the running time. This is due to the concurrency control overhead and the inherent sequential nature in your problem.
The analysis of the performance gained by adding more parallel processors is not our focus here. I have listed the numbers just to say we have a good-enough performance gain by utilizing the 2 cores. Our main focus here is the ease of programming.
After reviewing the given code you can notice that we started by decomposing our program into 4 operators as shown in the following graph:

Note that the two operators of ReadDelimitedText are parallel to each other, and they are pipelined to JoinSortedRows which is pipelined to writeDilimitedText. Also note that this program cannot utilize more than 4 cores. If we need to utilize more than 4 cores we can go inside each operator and find out how to break it into a smaller independent operators. This is great! It enables us to divide and concur and design our parallel programs in a top-down methodology. This process ends when we faced with a pure sequential module. Some problems are sequential by nature and cannot be parallelized. For more information about this you can start reading in parallel algorithms and parallel complexity classes.
After the decomposition step, the dataflow step comes. In this step we wire each of the output channel of each operator with one of the input channels of another operator. You can see how this simple pattern helps you to focus on the logic and makes the concurrency control transparent. To appreciate this you can try to write this program without Blocking queues or a framework like datarush. This will make you deal with a number of wait/notify or await/signal statement that are spread here and there in your code which makes the code unreadable, unmaintainable and unreusable.
Post a Comment
eSpace podcast Prodcast
Archive
Latest Comments
- SpectraMind Commented on Egypt Wins UK's National Outsourcing Association Award
- Rofaida Awad Commented on Go Egypt Go!
- Different Mike Commented on Only idiots change their iPhone root password!
- Mike Commented on Only idiots change their iPhone root password!
- smile Commented on Only idiots change their iPhone root password!

