Dataflow design pattern for Multicore-enabled programs

0

Multicore technology is the solution chosen by hardware manufacturer to translate the advances in semiconductor manufacturing into more processing power. For years, software developers used to run their code on the newer processor with no (or few) modifications to have it run faster. After the move of multicore this is not the case. Software has to be written in a multithreaded or multiprocess manner to take full advantage of the hardware. Software developers will be forced to develop massively multithreaded programs as a way to better use such processors. Multicore CPUs will make parallelism explicit. After more than 40 years of research in the field of parallelising compilers, the results are still limited.

This shift in software development raises the following challenges:

1-Scalability: how to keep each additional processor busy.

2-Correctness:how to avoid race conditions, deadlocks and other bugs characteristic of multi-processor applications.

3-Ease of programming: How to hide the concurrency control code from the software developer and let him focus on business logic.

4-Testing: How to test such applications while reproducing a bug is not always possible.

Using the low-level imperative ways to handle the concurrency is too low-level to build reliable software. In addition, there is no code reuse. We need to search for design patterns and to build frameworks to deal with concurrency control while keeping the software developer away from this and let him focus on his main job: the business logic.

One of the proposed pattern to handle this problem is the Dataflow pattern. Dataflow is a data-centric architecture that is focused on the flow of data through components of a system. It is a network of concurrently executing processes that can communicate by sending data over channels. It is defined as a directed, acyclic graphs where the shared nothing) executing processes (workers) are the nodes and the dataflow channels connecting them are the edges.

In the Dataflow pattern you first decompose your system into independent operators (workers). These operators will work concurrently. Each operator has a number of input channels and a number of output channels. Two operators are considered parallel if non of their i/o channels are connected. If one (or more) output channel of an operator is connected to one (or more) input channels of another operator then these operators are considered to be pipelined. Both of partitioning and pipelining helps in performance increasing.

Data flow is not the only pattern that may help in solving the architectural problem of multicore enabled programs. You can find many of them here.

Now, let us go with a simple example (extracted from Pervasive Datarush sample code) to show how data flow pattern can help in code re-usability and handling concurrency issues for you. The problem is to read two text input files each of them contains an ordered list of records and join them using "left join" operation then write the results to an output file. These files have the following data:

 

UnitPriceSorted.txt

ID,CHANNEL,PRICE
P001,phone,1.00
P001,retail,1.20
P001,wholesale,0.80
P002,retail,0.95
P004,wholesale,1.00
P005,wholesale,7.25

 

UnitSalesSorted.txt

ID,CHANNEL,SOLD
P001,phone,123
P001,retail,279
P001,wholesale,400
P002,retail,201
P003,phone,52
P004,retail,13
P004,wholesale,200

 

Output.txt

ID,CHANNEL,PRICE,SOLD
P001,phone,1.00,123
P001,retail,1.20,279
P001,wholesale,0.80,400
P002,retail,0.95,201
P003,phone,,52
P004,retail,,13
P004,wholesale,1.00,200
P005,wholesale,7.25,

You can download the solution in 3 different ways: 1-Sequential, 2-Using BlockingQueue, 3- Using Pervasive Datarush (download Datarush and find the join sample).

After running these codes on an Intel core duo 2GHz, 2GB RAM , 1 GB JVM heap with 100,000 record per input file, the results are:

Sequential: 1800ms

BlockingQueue: 1260 ms

Datarush 1350: ms

It gives us about 25% speedup. You should note that using 2 parallel processors will not save you half of the running time. This is due to the concurrency control overhead and the inherent sequential nature in your problem.

The analysis of the performance gained by adding more parallel processors is not our focus here. I have listed the numbers just to say we have a good-enough performance gain by utilizing the 2 cores. Our main focus here is the ease of programming.

After reviewing the given code you can notice that we started by decomposing our program into 4 operators as shown in the following graph:

JoinSample

Note that the two operators of ReadDelimitedText are parallel to each other, and they are pipelined to JoinSortedRows which is pipelined to writeDilimitedText. Also note that this program cannot utilize more than 4 cores. If we need to utilize more than 4 cores we can go inside each operator and find out how to break it into a smaller independent operators. This is great! It enables us to divide and concur and design our parallel programs in a top-down methodology. This process ends when we faced with a pure sequential module. Some problems are sequential by nature and cannot be parallelized. For more information about this you can start reading in parallel algorithms and parallel complexity classes.

After the decomposition step, the dataflow step comes. In this step we wire each of the output channel of each operator with one of the input channels of another operator. You can see how this simple pattern helps you to focus on the logic and makes the concurrency control transparent. To appreciate this you can try to write this program without Blocking queues or a framework like datarush. This will make you deal with a number of wait/notify or await/signal statement that are spread here and there in your code which makes the code unreadable, unmaintainable and unreusable.

 

 

 

Post a Comment

eSpace podcast Prodcast

RSS iTunes