Sunday, September 14, 2014

Step Partitioning



Time needed to complete  ~75 minutes
Prerequisite for this tutorial is Configure Spring Batch Admin to use MySQL

1.Introduction

In the situation where single threaded batch job can’t finish the work in a given time interval and tuning  JVM didn't do the trick, then it is time for scaling the batch job using multithreading. In spring batch partitioning is one way for scaling batch jobs that can improve performance.  

2.What is used in this tutorial:


1- Maven 3
2- Jdk 1.7
3- Tomcat 7.0.55
4- Eclipse Luna 4.4
5- Spring Core 3.2.9
6- Spring Batch 2.2.7
7- Spring Batch Admin Manager 1.3.0
8- MySQL 5 Database


3.Project Structure



4. Configuration and Other files

- The database configuration where the data will be written is in file database.xml path src/main/resources/META-INF/spring/batch/dbConfig/database.xml, this is not the same as the database used by spring batch admin to store job repository data. First create the database named company_db and in the database.xml change db_username and db_password to suite company_db username and password.




- The job definition that will read from csv files,process and write in database table is in src/main/resources/META-INF/spring/batch/jobs/job-order-multithreading.xml , Listing 4.2. The job has two steps, organizeFilesStep step will create tmp folder in mt_data folder that will contains folders with names as number of partitions then in each folder a number of csv files will be copyed(see DistributeFiles.class). Picture 4.1.





- The second step is step1.boss , picture 4.2, in this step 3 threads (grid-size) will be created, where thread1 will read csv files only from tmp/1 folder, thread2 will read only from tmp/2 folder and thread3 will read only from tmp/3 folder. Each thread will have it's own reader,processor and writer. Readers and Writers bean have scope attribute prototype , and it is a good place to learn what happened if scope="prototype" is removed. Try it and see the data race to lineNumber in reader. Reader by default are singleton's without prototype scope there will be one reader instance and three threads that reading and updating the next lineNumber to be read, causing incorrect behavior. An important parameter is commit-interval changing it's value will have a high impact on the performance, e.g.  value 3 will degenerate performance. 



- The Order class listing 4.3.



- MakePartition, Listing 4.5.


- SQL for creating table company_db.tel_order



5. CSV files

- The are 100 csv input files in folder mt_data containing 1900 orders.


6. Running orderMTJob in eclipse


- To run the application use code in class src/main/java/com/web/app/MultithreadingOrderApp.java Listing 6.1. Changing  jobParameters  numberOfPartitions(grid-size) will change the execution time for the batch job, bigger i not better, it depends on the number of threads(logical processor) in the processor. Example 8 cores-16 threads processor.




7. Running orderJob From Spring Batch Admin

- In pom.xml change username_tc and password_tc to suite your local tomcat. 
- Start tomcat.
- Deploy application from run configuration in goal type clean install tomcat7:redeploy , click apply and 
Run buttons.

Note: If these steps are unfamiliar read steps 4 and 5 deploying spring batch admin Link How To

- open address localhost:8080 /springBatchAdminMysql/jobs  in web browser , picture 7.1





- Click orderMTJob, Change dateTime parameter, click launch button.  job will completed with COMPLETED status. picture 7.2. orderMTJob steps.



- Click on executions -> ID number of job execution -> COMPLETED under column Status picture 7.3.



8. Eclipse Project

Thursday, September 11, 2014

Logging invalid records



Time needed to complete  ~60 minutes
Prerequisite for this tutorial is Configure Spring Batch Admin to use MySQL

1.Introduction

In post Validate input data the batch job stop immediately after finding invalid order record, in real life  scenario if 15 out of 1 million record that needs to be processed are invalid and there is an invalid record in the first 100 records, the batch job will failed, no work is done, the question is can the batch job isolate these records and continues processing and later developers will deal with the invalid records. Spring batch has mechanism  to do that, and in this post will be explained, invalid data will be logged  to the file and for developers there will be hint in what stage(read ,process ,write) the error popped up to know where to look.

2.What is used in this tutorial:


1- Maven 3
2- Jdk 1.7
3- Tomcat 7.0.55
4- Eclipse Luna 4.4
5- Spring Core 3.2.9
6- Spring Batch 2.2.7
7- Spring Batch Admin Manager 1.3.0
8- MySQL 5 Database


3.Project Structure



4. Configuration and Other files

- The database configuration where the data will be written is in file database.xml path src/main/resources/META-INF/spring/batch/dbConfig/database.xml, this is not the same as the database used by spring batch admin to store job repository data. First create the database named company_db and in the database.xml change db_username and db_password to suite company_db username and password.




- The job definition that will read,process and write is in src/main/resources/META-INF/spring/batch/jobs/job-orderInvalidLogging.xml , Listing 2. All exceptions will be skipped and data that caused the exception will be logged. It is interesting to see how changing skip-limit value will change the behavior of the job execution. For example changing skip-limit =2 will cause job to throw:
 org.springframework.batch.core.step.skip.SkipLimitExceededException: Skip limit of '2' exceeded



- The Order class using hibernate validator annotations, listing 3, and there is a custom @AccptedValues validator listing 4 and Listing 5.







- Because of the orderDate property in Order class, mapper is required for the reader, if not implemented
reader will not know the date format.

- The validation process takes place in OrderProcessor class, listing 7.


- Until now all the logging in the previous posts was console output, logging to file will be added to log the invalid records will in jobApplication.log file, Listing 8, logging property file


5. CSV files

- The input file with invalid data that will be used src/main/resources/cvs/skipInvalidOrderData.csv invalid date at:
 line 3 value fixe and future order date.
 line 5 order date format
 line 7 value prepai

- File with valid data can be found in src/main/resources/cvs/orders_12082014.csv





6. Running orderValidationJob in eclipse


- To run the application use code in class src/main/java/com/web/app/LoggingInvalidOrderApp.java Listing 11.



- The result of running is in Listing 12.



7. Running orderJob From Spring Batch Admin

- In pom.xml change username_tc and password_tc to suite your local tomcat. 
- Start tomcat.
- Deploy application from run configuration in goal type clean install tomcat7:redeploy , click apply and 
Run buttons.

Note: If these steps are unfamiliar read steps 4 and 5 deploying spring batch admin Link How To

- open address localhost:8080 /springBatchAdminMysql/jobs  in web browser , picture 2





- Click orderLoggingJob, Change dateTime parameter, click launch button.  job will completed with COMPLETED status.

- Click on executions -> ID number of job execution -> COMPLETED under column Status to the 2 skips by processor and 1 skip by reader picture 3.


- the jobApplication.log file, listing 13, 2 record are skipped in processing phase, while one record was skipped because of parsing error of date in reading phase of the job.


8. Eclipse Project

Wednesday, September 10, 2014

Validate Input data



Time needed to complete  ~60 minutes

1.Introduction

What money is for bankers, data is for IT developers . Money can be dirty, data can be dirty too, for this reason developers need to check data using validation  before sending for processing. This article will show how validation can be used in spring batch jobs.

2.What is used in this tutorial:


1- Maven 3
2- Jdk 1.7
3- Tomcat 7.0.55
4- Eclipse Luna 4.4
5- Spring Core 3.2.9
6- Spring Batch 2.2.7
7- Spring Batch Admin Manager 1.3.0
8- MySQL 5 Database


3.Project Structure



4. Configuration  files

- The database configuration where the data will be written is in file database.xml path src/main/resources/META-INF/spring/batch/dbConfig/database.xml, this is not the same as the database used by spring batch admin to store job repository data. First create the database named company_db and in the database.xml change db_username and db_password to suite company_db username and password.




- The job definition that will read,process and write is in src/main/resources/META-INF/spring/batch/jobs/job-orderValidate.xml , Listing 2.



- The Order class using hibernate validator annotations, listing 3, and there is a custom @AccptedValues validator listing 4 and Listing 5.







- Because of the orderDate property in Order class, mapper is required for the reader, if not implemented
reader will not know the date format.

- The validation process takes place in OrderProcessor class, listing 7.


5. CSV files

- The input file with invalid data that will be used src/main/resources/cvs/invalidOrderData.csv line 7 mobilex and 12-08-2015.
- File with valid data can be found in src/main/resources/cvs/orders_12082014.csv





6. Running orderValidationJob in eclipse


- To run the application use code in class src/main/java/com/web/app/OrderValidationApp.java Listing 10.



- The result of running is in Listing 11.



7. Running orderJob From Spring Batch Admin

- In pom.xml change username_tc and password_tc to suite your local tomcat. 
- Start tomcat.
- Deploy application from run configuration in goal type clean install tomcat7:redeploy , click apply and 
Run buttons.

Note: If these steps are unfamiliar read steps 4 and 5 deploying spring batch admin Link How To

- open address localhost:8080 /springBatchAdminMysql/jobs  in web browser , picture 2





- Click orderValidationJob, Change dateTime parameter, click launch button.  job will completed with FAILED status.

- Click on executions -> ID number of job execution -> FAILED under column Status to see validation messages picture 3.


8. Eclipse Project

Tuesday, September 9, 2014

Dealing with empty inputs

Time needed to complete  ~20 minutes
Prerequisite for this tutorial is Configure Spring Batch Admin to use MySQL

1.Introduction


Errors in programming are the norm not the exception. Finding and fixing errors is not an easy task, specially when user reports an error that don't exist, both code and data are correct. A common situation in batch processing is empty file, no input. Clients see no output result, the conclusion is batch job didn't finished, report bug. To prevent this situation spring batch defines a listener after step is finished to check the number of reading. 

2.What is used in this tutorial:


1- Maven 3
2- Jdk 1.7
3- Tomcat 7.0.55
4- Eclipse Luna 4.4
5- Spring Core 3.2.9
6- Spring Batch 2.2.7
7- Spring Batch Admin Manager 1.3.0
8- MySQL 5 Database


3.Project Structure



4. Configuration  files

- The database configuration where the data will be written is in file database.xml path src/main/resources/META-INF/spring/batch/dbConfig/database.xml, this is not the same as the database used by spring batch admin to store job repository data. First create the database named company_db and in the database.xml change db_username and db_password to suite company_db username and password.




- The job definition that will read,process and write is in src/main/resources/META-INF/spring/batch/jobs/job-emptyFile.xml , Listing 2.



- A bean afterStepListener is defined and referenced in the orderEmptyFileJob job definition, afterStepListener will use code in a class com.web.listener.EmptyFileHandler, Listing 3.


- afterStep method is annotated with @afterStep, and the method will be executed after step, checking the readcount and if it is not greater than zero will log a message that file is empty and return job status failed.

- A bean afterJobListener is defined and referenced in the orderEmptyFileJob job definition, afterJobListener will use code in a class com.web.listener.JobListener, Listing 4.


5. CSV file

- The input empty file that will be used src/main/resources/cvs/emptyFile.csv
- For non empty file test use data in listing 5.1.



6. Running orderEmptyFileJob in eclipse

- To run the application use code in class src/main/java/com/web/app/EmptyFileHandlingApp.java Listing 5.


- The result of running is Listing 6.



7. Running orderJob From Spring Batch Admin

- In pom.xml change username_tc and password_tc to suite your local tomcat. 
- Start tomcat. 
- Deploy application from run configuration in goal type clean install tomcat7:redeploy , click apply and 
Run buttons.

Note: If these steps are unfamiliar read steps 4 and 5 deploying spring batch admin Link How To

- open address localhost:8080 /springBatchAdminMysql/jobs  in web browser , picture 2



- Click orderEmptyFileJob, Change dateTime parameter, click launch button, picture 3.  job will completed with FAILED status.



- Click on executions then ID number of job execution, to see exit message EMPTY FILE, picture 4.



8. Eclipse Project