esProc Assists Jasper in Calculating Loan Payments

Problem source: http://community.jaspersoft.com/questions/851148/loop-jasper

The calculation of loan payments according to loan amount will involve loop operation and inter-row operation. It is difficult to write code for it using stored procedure or Scriptlets. Yet with the help of esProc, the Jasper calculation will become easier. Here is an example.

Database table loan contains loan data including loan amount, terms by the month and yearly interest rate. You need to create a Jasper grouped report in which, under each sum of loan amount, payment details per term such as payment, interest rate, principal and principal balance are listed. Below is a selection from the loan table:

esProc_report_jasper_loan_1

The following esProc code is used for preparing the data:

esProc_report_jasper_loan_2

A1: Execute SQL to retrieve records from loan.

A2: Add two computed columns – mRate (monthly rate) and mPayment (monthly payment) to A1. Below is the result:

esProc_report_jasper_loan_3

A3: Calculate payment details for each term based on loan information. Loop function A2.() calculates A2’s members sequentially, uses “()” to perform computation step by step(steps are separated by commas), and returns the computed result of the expression in the last step. new function is used to create a two-dimensional table. A3’s result is cascade data, as shown below:

esProc_report_jasper_loan_4

A4: Combine the cascade data to create a two-dimensional table containing payment details for each term of every loan amount.

A5: Return A4’s result to the report. A reporting tool will identify esProc with JDBC interface as a normal database.

Then create a simple grouped table with Jasper in the following template:

esProc_report_jasper_loan_5

Below is a preview of the finished report:

esProc_report_jasper_loan_6

Advertisements
Posted in Report | Tagged , , , | Leave a comment

esProc Assists Jasper in Calculating Opening Balance

Problem source: http://community.jaspersoft.com/questions/850400/how-create-report-opening-balance-using-jasper-reports .

The calculation of opening balances according to deposit and withdraw amount involves inter-row operation. Jasper can do the calculation but the code is complicated. You can use esProc to assist the job and make it easy. The following is a simple example.

data.csv contains deposit and withdraw information of different sum of money in an account. You need to calculate the opening balance based on this file. Below is a selection from the original data:

esProc_report_opening_banlance_1

esProc code:

esProc_report_opening_banlance_2

A1: Import the file separated by commas.

A2: Calculate opening balance and return result to the report. The initial value of the account is 43. Operations using parentheses will return result of the expression after the last comma. A2’s result is as follows:

esProc_report_opening_banlance_3

esProc provides JDBC interface. Reporting tools will identify it as a normal database.

Now you can create a simple list table with Jasper in the following template:

esProc_report_opening_banlance_4

Below is a preview of the finished report:

esProc_report_opening_banlance_5

A report calls an esProc script in the same way as it calls the stored procedure. Save the above script as balance.dfx. You can invoke it with call balance() and input parameters into it from Jasper’s SQL designer.

Posted in Report | Tagged , , , | Leave a comment

esProc Assists BIRT in Splitting Fields and Converting Them to Records

Problem source: http://developer.actuate.com/community/forum/index.php?/topic/36204-split-data-row/

To split fields apart and recombine them into records, you can use the stored procedure or a report script. But as the code is complex, an alternative choice is using esProc to help the reporting tool with the job. Let’s look at an example.

Database table data has two fields: ID and ANOMALIES. You need to split ANOMALIES field into strings by the space and combine each string with the corresponding original ID value to generate new records. Below is the original data:

esProc_report_birt_splitfiled_1

esProc code:

 

 

esProc_report_birt_splitfiled_2

 

A1: Query the database.

A2: Split ANOMALIES field to create a new two-dimensional table. Then use conj function to concatenate the records generated from the processing of each of A1’s members. “~” represents the set member under processing. A2’s result is like this:

esProc_report_birt_splitfiled_3

 

A3: Return A2’s result to the reporting tool. esProc provides JDBC interface outward to be identified by a reporting tool as a normal database.

Create a simple list table with BIRT:

esProc_report_birt_splitfiled_4

A preview would be like this:

 

 

esProc_report_birt_splitfiled_5

 

The way a report calls an esProc script is the same as that it calls the stored procedure. Save the above script as BIRTsplitrow.dfx. You can invoke it with call BIRTsplitrow() and input parameters into it from BIRT stored procedure designer.

Posted in Report | Tagged , , , | Leave a comment

esProc Assists BIRT in Handling Intragroup Inter-row Calculations

Problem source:

http://developer.actuate.com/community/forum/index.php?/topic/36160-dealing-with-previous-rows-groups-sorts-and-subtotals/ .

Generally intragroup inter-row calculations are handled with window functions or report scripts. But the code is rather complicated. Instead, you can use esProc to assist reporting tools in dealing with them. Below is such an example.

Database table sample has three fields, among which id is the grouping field. You need to design a grouped table where id is the grouping field and the other fields holding detail data include v1, v2 and the computed field crossline. crossline will be calculated like this: v1+v2+previous v1+previous v2. Below is the original data:

esProc_report_birt_interrow_1

esProc code for data preparation:

 

esProc_report_birt_interrow_2

A1: Query the database and create an additional field with constant values for later use.

A2: Group A1 by id, modify crossline values for each group and combine groups of data together. Result is as follows:

 

esProc_report_birt_interrow_3

 

A3: Return A2’s result to the reporting tool. esProc provides JDBC interface outward to be identified by a reporting tool as a normal database.

Posted in Report | Tagged , , | Leave a comment

esProc File Computing: Parallel Grouping and Aggregation

The article esProc File Computing: Parallel Query and Filter explained how to query and filter data. Here let’s look at the case of combining the article’s topic with grouping and aggregation. Still we only discuss cases with relatively small result sets that can be loaded into memory in one go.

Steps of querying and filtering data and then grouping and aggregating it using multithreads are: Each of the threads queries and filters and groups and aggregates a part of the data; then results of all threads’ are combined and grouped and aggregated; the result will then be filtered in main program. The following example is cited to illustrate this.

As big data is generally stored in files, an Orders.b file will be taken as an example. Its data is as follows:

ORDERID CLIENT     SELLERID AMOUNT ORDERDATE NOTE

1       287  47     5825         2013-05-31       gafcaghafdgie f ci…

2       89     22     8681         2013-05-04       gafcaghafdgie f ci…

3       47     67     7702         2009-11-22       gafcaghafdgie f ci…

4       76     85     8717         2011-12-13       gafcaghafdgie f ci…

5       307  81     8003         2008-06-01       gafcaghafdgie f ci…

6       366  39     6948         2009-09-25       gafcaghafdgie f ci…

7       295  8       1419         2013-11-11       gafcaghafdgie f ci…

8       496  35     6018         2011-02-18       gafcaghafdgie f ci…

9       273  37     9255         2011-05-04       gafcaghafdgie f ci…

10     212  0       2155         2009-03-22       gafcaghafdgie f ci…

Note: In the above data, note field exists only for the purpose of increasing each record’s length, but does not have any practical meaning.

You need to group ordering records in the year of 2013 by client and sellerid and compute the number of orders (ordered) and the sum of order amount (amount); and finally filter the result according to the criterion that the order amount is greater than 500,000.

Since Orders.b contains relatively big data, it needs to be processed by segments. First write the script group.dfx in esProc for multithreaded computing:

2015-05-14_140411

A1: Set the number of parallel threads for data aggregation as 4.

A2: The code from B2 to B5 is executed using multithreads through the keyword fork. There are 4 threads, which get A2’s value as 1, 2, 3, 4 respectively.

B2: Use cursor function to divide the file into 4 segments roughly and get the cursor of A2 (only the desired fields are fetched).

B3: Query and find data in the year of 2013.

B4: Group and aggregate the cursor data.

B5: Return B4, the filtering result of the current thread.

A6: The returned results of the four threads are concatenated in the main thread.

A7: Group and aggregate the concatenation result.

A8: Filter A7’s grouped data according to the criterion that order amount is greater than 500,000

A9: Return the final aggregate result to the external program.

Save the esProc script as group.dfx when it is finished. It is then called by the external program via esProc JDBC. See esProc Tutorial for the calling method.

For the performance test of data query and filter approach using esProc multithreading, see Performance Test of esProc File Traversal Algorithm.

Posted in External Memory | Tagged , , , | Leave a comment

esProc File Computing: Parallel Query and Filter

Raqsoft esProc provides the ability of performing file-based computation. It supports multithreaded parallel processing to deal with problems involving relatively big data. The multiprocessing can make full use of the computational power of the machine’s multi-core CPU to achieve an almost equal or better performance than the conventional database.

Here only the case involving a relatively small result set, that is, the memory can hold the entire result of data computing, is explored.

The following is the architecture of esProc multithreaded parallel processing:

esProc_file_query_filter_1

esProc Multithreaded Parallel Processing Architecture

As shown in the above figure, esProc distributes a task to multiple subscripts through a main script. Each subscript accesses a part of the local data and computes it. After the subscripts finish their computations, they return results to the main script that gets the final result and passes it to the host application, such as a reporting tool.

Each subscript is a thread. Theoretically the number of multithreaded parallel tasks a server allows is determined by the number of CPU cores as well as the performance of a parallel hardware. The more the number of CPU cores a server has and the better the hardware’s parallel access ability is, the more the number of parallel tasks a server can hold and the faster the task is performed. So the multithreaded parallel processing can take the greatest advantage of the machine’s computational power.

Steps of data query and filter using multithreads are: Each thread handles querying a part of data and then results of all thread’s queries are combined. Here is an example. As big data is commonly stored in a file, the Orders.txt file is used to illustrate this, as shown below:

ORDERID CLIENT     SELLERID AMOUNT ORDERDATE NOTE

1       287  47     5825         2013-05-31       gafcaghafdgie f ci…

2       89     22     8681         2013-05-04       gafcaghafdgie f ci…

3       47     67     7702         2009-11-22       gafcaghafdgie f ci…

4       76     85     8717         2011-12-13       gafcaghafdgie f ci…

5       307  81     8003         2008-06-01       gafcaghafdgie f ci…

6       366  39     6948         2009-09-25       gafcaghafdgie f ci…

7       295  8       1419         2013-11-11       gafcaghafdgie f ci…

8       496  35     6018         2011-02-18       gafcaghafdgie f ci…

9       273  37     9255         2011-05-04       gafcaghafdgie f ci…

10     212  0       2155         2009-03-22       gafcaghafdgie f ci…

In the above data, note field exists only for the purpose of increasing each record’s length, but does not have any practical meaning.

You need to query and filter the data according to the criteria “sellerid=1 and client=50 and orderdate>2013” and passes the result to the external Java program.

Because Orders.txt contains a great amount of data, it needs to be divided into multiple segments for being processed. First you use esProc to write the script select.dfx for multithreaded parallel query. The following is the script:

esProc_file_query_filter_2

A1: Set the number of parallel tasks as 4.

A2: The code from B2 to B4 is executed using multithreads through the keyword fork. There are 4 threads, which get A2’s value as 1, 2, 3, 4 respectively.

B2: Use cursor function to divide the file into 4 segments roughly and get the cursor of A2 (only the desired fields are fetched).

B3: Filter data in the cursor.

B4: Return B3, the filtering result of the current thread.

A5: The returned results of the four threads are concatenated in the main thread.

A6: Return the final result to the external program.

Save the esProc script as select.dfx when it is finished. It is then called by the external program via esProc JDBC. See esProc Tutorial for the calling method.

If you converted the text file into the binary format esProc provides, the performance would have been increased more. The conversion code is as follows:

esProc_file_query_filter_3

A1: Create a text file cursor.

A2: Export data in the text file cursor to that of binary format.

Modify select.dfx into this:

esProc_file_query_filter_4

You can see that the script is almost the same except that options used in B2’s cursor function have become @bz for retrieving binary data.

On the hardware of the same standard, it takes 24 seconds to complete querying and filtering data of 3.4G size in a text file, but only 4 seconds in a binary file.

For the performance test of data query and filter approach using esProc multithreading, see esProc Performance Test of File Traversal Algorithm. According to the result of testing and comparison with Oracle, when data volume is less than the available memory space, Oracle has a better performance; while when data volume exceeds the usable memory space, usually esProc outperforms Oracle.

The above explored the case where parallel program is run on a single computer. For situations where even bigger data is involved, you can use a cluster of servers in esProc to further improve performance through multi-computer parallel processing system.

Posted in External Memory | Tagged , , , , | Leave a comment

esProc File Computing: Cascaded Foreign Key Relationships

There are some computational tasks of processing structured data that involves multi-table data sources joined through cascaded foreign key relationships. For example, you want to find particular orders from the order management system, where the following structure of cascaded foreign key relationships is involved. Computational target: Find orders in which the suppliers and customers are from the same region; group these orders by customers and compute total order amount and number of orders in each group.

Data structure is shown in the following figure:

esProc_file_foreign_key_1

To perform the computation with the database, you use the following SQL script:

select cid,count(ordered),sum(price*quantity) from orders

left join customer on orders.cid=customer.cid

left join region a1 on customer.city=a1.city 

left join product on orders.pid=product.pid

left join supplier on product.pid=supplier.sid

left join region a2 on supplier.city=a2.city

where a1.region=a2.region and to_char(orderdate,’YYYY’)=?

Group by cid

Now let’s look at how it is solved using esProc file computing approach.

Define the cellet parameter year and perform related computing with cascaded foreign keys relationships using the following esProc script orders.dfx:

esProc_file_foreign_key_2

A1: Create a file cursor for local Orders table to process data in segments and avoid memory overflow.

A2: Find orders for a certain year. year is a pre-specified cellset parameter.

A3-A6: Import dimension tables from local machine – Customer, Product, Supplier and Region.

A7: Switch values of the foreign key “city” in the table sequence Customer into corresponding records references in the table sequence Region.

A8: Switch values of the foreign key “city” in the table sequence Supplier into corresponding records references in the table sequence Region.

A9: Switch values of the foreign key “sid” in the table sequence Product into corresponding records references in the table sequence Supplier.

A10: Switch values of the foreign key “pid” in the table sequence Orders into corresponding records references in the table sequence Product.

A11: Switch values of the foreign key “cid” in the table sequence Orders into corresponding records references in the table sequence Customer.

A12: Filter records of the table sequence Orders by the criterion. With switches from A7 to A11, you can write the filtering criterion as pid.sid.city.region==cid.city.region.

A13: Perform group and aggregate based on the filtering result.

A14: Return A13 to an external program.

If there is the great volume of data in the fact table, Orders, you can use esProc multithreaded-parallel-processing approach to increase processing performance. As for the dimension tables, they are small enough to be entirely loaded into memory. All dimension tables can be accessed by every thread.

esProc script for doing this:

esProc_file_foreign_key_3

A1-A4: Import dimension tables from local machine – Customer, Product, Supplier and Region.

A5: Switch values of the foreign key “city” in the table sequence Customer into corresponding records references in the table sequence Region.

B5: Switch values of the foreign key “city” in the table sequence Supplier into corresponding records references in the table sequence Region.

C5: Switch values of the foreign key “sid” in the table sequence Product into corresponding records references in the table sequence Supplier.

A6: Set the number of parallel tasks.

A7: Execute parallel processing according to A6.

B7: Create a file cursor for local Orders table, with each thread processing a part of the data.

B8: Switch values of the foreign key “pid” in the table sequence Orders into corresponding records references in the table sequence Product.

B9: Switch values of the foreign key “cid” in the table sequence Orders into corresponding records references in the table sequence Customer.

B10: Filter records of the table sequence Orders by the specified criterion and year. With all those switches, you can write the filtering criterion as pid.sid.city.region==cid.city.region.

B11: Data grouping and aggregation.

B12: Each thread returns its result got in B11.

A13: Merge A7’s results.

A14: Group and aggregate the merged data.

A15: Return A14 to an external program.

One point to note is that you can import data from either a database or a file using esProc script. So you can decide where the fact table and dimension tables should be stored as needed. For example, store tables whose data is manipulated little in the file system and put others with frequent data manipulation in the database. The more the data is stored in the file system, the higher the performance and the less the pressure on the database.

Posted in External Memory | Tagged , , | Leave a comment