Sunday, September 14, 2014

Step Partitioning



Time needed to complete  ~75 minutes
Prerequisite for this tutorial is Configure Spring Batch Admin to use MySQL

1.Introduction

In the situation where single threaded batch job can’t finish the work in a given time interval and tuning  JVM didn't do the trick, then it is time for scaling the batch job using multithreading. In spring batch partitioning is one way for scaling batch jobs that can improve performance.  

2.What is used in this tutorial:


1- Maven 3
2- Jdk 1.7
3- Tomcat 7.0.55
4- Eclipse Luna 4.4
5- Spring Core 3.2.9
6- Spring Batch 2.2.7
7- Spring Batch Admin Manager 1.3.0
8- MySQL 5 Database


3.Project Structure



4. Configuration and Other files

- The database configuration where the data will be written is in file database.xml path src/main/resources/META-INF/spring/batch/dbConfig/database.xml, this is not the same as the database used by spring batch admin to store job repository data. First create the database named company_db and in the database.xml change db_username and db_password to suite company_db username and password.




- The job definition that will read from csv files,process and write in database table is in src/main/resources/META-INF/spring/batch/jobs/job-order-multithreading.xml , Listing 4.2. The job has two steps, organizeFilesStep step will create tmp folder in mt_data folder that will contains folders with names as number of partitions then in each folder a number of csv files will be copyed(see DistributeFiles.class). Picture 4.1.





- The second step is step1.boss , picture 4.2, in this step 3 threads (grid-size) will be created, where thread1 will read csv files only from tmp/1 folder, thread2 will read only from tmp/2 folder and thread3 will read only from tmp/3 folder. Each thread will have it's own reader,processor and writer. Readers and Writers bean have scope attribute prototype , and it is a good place to learn what happened if scope="prototype" is removed. Try it and see the data race to lineNumber in reader. Reader by default are singleton's without prototype scope there will be one reader instance and three threads that reading and updating the next lineNumber to be read, causing incorrect behavior. An important parameter is commit-interval changing it's value will have a high impact on the performance, e.g.  value 3 will degenerate performance. 



- The Order class listing 4.3.



- MakePartition, Listing 4.5.


- SQL for creating table company_db.tel_order



5. CSV files

- The are 100 csv input files in folder mt_data containing 1900 orders.


6. Running orderMTJob in eclipse


- To run the application use code in class src/main/java/com/web/app/MultithreadingOrderApp.java Listing 6.1. Changing  jobParameters  numberOfPartitions(grid-size) will change the execution time for the batch job, bigger i not better, it depends on the number of threads(logical processor) in the processor. Example 8 cores-16 threads processor.




7. Running orderJob From Spring Batch Admin

- In pom.xml change username_tc and password_tc to suite your local tomcat. 
- Start tomcat.
- Deploy application from run configuration in goal type clean install tomcat7:redeploy , click apply and 
Run buttons.

Note: If these steps are unfamiliar read steps 4 and 5 deploying spring batch admin Link How To

- open address localhost:8080 /springBatchAdminMysql/jobs  in web browser , picture 7.1





- Click orderMTJob, Change dateTime parameter, click launch button.  job will completed with COMPLETED status. picture 7.2. orderMTJob steps.



- Click on executions -> ID number of job execution -> COMPLETED under column Status picture 7.3.



8. Eclipse Project

No comments:

Post a Comment