Collection of Solutions to Knotty Report Building Problems

There are many knotty problems around report development. These include complex layout, which is difficult to build with a reporting tool alone; complex data sources requiring dynamic access and involving different types of databases or non-database data; as well as the handling of order-related computation, subsets of grouped data, fixed grouping, string splitting and concatenation, date intervals and other complicated computations, all of which you may find hard to deal with in SQL.

Using esProc as the computing middleware will handle those issues with simple scripts. esProc encapsulates rich functions for processing structured data, as well as supports dynamic parsing of expressions, heterogeneous data source handling, order-related computations, set operations, irregular grouping and post-grouping computation. A reporting tool will pass parameters to the esProc script, execute it in the same way as it executes a database stored procedure, and get the returned result set via JDBC interface. You can learn more from How to Use esProc to Assist Reporting Tools.

In the following, there are examples of the typical problems concerning report development, and their esProc solutions.

Data transposition

Sometimes you might want to display data in a cross-tab or grouped report, but you could not use the source data directly. The data needs to be transposed first. With either SQL or the stored procedure, you must write code from the lowest level using a high-level language, which involves substantial amount of work and is really hard. esProc can remedy this failing through its support for order-related computation and dynamic scripting. Here is a simple example of transposing rows and columns.

Below is a selection of the database table SALES that stores order data:

OrderID Client SellerId Amount OrderDate
1 DSG 13 19480 2014-06-20 00:00
2 ERN 18 979 2014-06-13 00:00
3 JFE 19 28972 2014-12-11 00:00
4 OFS 21 4829 2014-02-24 00:00
5 ERN 22 21392 2014-02-01 00:00

You need to calculate the total order amount, the maximum/minimum order amount and the total number of orders each month in a specified year, and transpose the data into 13 columns and 4 rows. That is, the four algorithms constitute the first column named subtotal and each month is a column named 1, 2, 3, 4…

esProc script:

esProc_report_problems_1

Explanation: First perform group and aggregate operations in SQL, and then run a loop to transpose every row to column, and fill in values in sequence to each column of A2. Here is the result:

esProc_report_problems_2

Though the transposition in this example is simple, there are other unusual layouts such as post-grouping multi-layer transposition and dynamic location transposition. You can refer to esProc Arranges Data for Reports with Unconventional Layouts for more.

Multi-column layout

Most reporting tools support vertical layout but they really have difficulties in building unusual layouts such as arranging data in multiple columns horizontally or in an interlocking way. Normally you have to turn to high-level languages to work them out. With order-related computation and dynamic scripting features, esProc can help manage this well. Let’s take a look at horizontal multi-column layout:

The database table emp has three fields. Below is a selection of the table:

EId Name Dept
4 Emily HR
5 Ashley R&D
6 Matthew Sales
7 Alexis Sales
8 Megan Marketing
9 Victoria HR
10 Ryan R&D
11 Jacob Sales

The desired layout is that data is presented in a horizontally sequenced order in 3 columns, as shown below:

esProc_report_problems_3

You can first transform the original 3-field table to a 9-field table in esProc, and then build the above layout with the reporting tool. The code is as follows:

esProc_report_problems_4

Explanation: Divide rows into three parts by their sequence numbers and store them respectively in A2, B2 and C2. And then concatenate the fields in B2 and C2 with those in A2. Here is the result you’ll get:

esProc_report_problems_5

Apart from the horizontal multi-column layout, there are other uncommon report layouts, like interlocking multi-column layout, dynamic vertical multi-column layout based on a single field, duplicating rows in sequence and condition-controlled formats for grouped reports. More information can be found in esProc Arranges Data for Reports with Unconventional Layouts.

Unconventional data sources

In addition to conventional databases, data sources of a reporting tool could be JSON files, MongoDB, txt files, Excel and HDFS files. Some of them do not have any computability and some have not enough. Both cases require writing a lot of code for report development. esProc supports various data sources and is capable of helping a reporting tool out. For example, you might want to display data from a multi-level JSON file in a grouped report:

Cells.json is a multi-level, nested JSON file which you want to present as a grouped report. The grouping fields are name, type and image. “xlink:href”. There is also a field with 3 subdocuments in each document: custom. Identifier, custom. Classifier and custom. Output, which are of the same structure but contain different number of documents.

Source data:

{

“cells”: [

{

“name”: “b”,

“type”: “basic.Sensor”,

“custom”: {

“identifier”: [

{

“name”: “Name1”,

“URI”: “Value1”

},

{

“name”: “Name4”,

“URI”: “Value4”

}

],

“classifier”: [

{

“name”: “Name2”,

“URI”: “Value2”

}

],

“output”: [

{

“name”: “Name3”,

“URI”: “Value3”

}

]

},

“image”: {

“width”: 50,

“height”: 50,

“xlink:href”: “

AAADDPmHLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAABEJAAARCQBQGfEVAAAABl0RVh0

U29mdHdhcmUAd3Vi8f+k/EREURQtsda2Or/+nFLqP6T5Ecdi0aJFL85msz2Qxyf4JIumMAx/

ClmWt23GmL1kO54CXANAVH+WiN4Sx7EoNVkU3Z41BDHMeXAxjvOxNr7RJjzHX7S/

jAflwBxkJr/RwiOpWZ883Nzd+Wpld7tkBr/SJr7ZHZbHZeuVweSnPfniocMAWYwcGBaf

H0OoPamFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY7/B94QnX08zxKL

AAAAAElFTkSuQmCC”

}

},

……

]

}

 

esProc script:

esProc_report_problems_6

Explanation: Merge the three subdocuments into one two-dimensional table, represent them with a new field name cType and append grouping fields to the table. Thus a typical “table with subtables” has been created. The result is as follows:

esProc_report_problems_7

According to this esProc result, it’s easy to create a grouped report with a reporting tool.

There are more similar cases, including querying a multi-level subdocument in a JSON file, subreports using different JSON files, MongoDB collection join, and etc. You can find them in esProc Integrates Heterogeneous Data Sources for Report Development.

Dynamic data sources

Sometimes you might want to reference data sources dynamically through a parameter, merge data sources into one, or dynamically pass data source name to a subreport or Table control. Often reporting tools – especially those supporting single data source, such as BIRT and Jasper – have to use a high-level language to accomplish these computations, or trade security for reduced complexity.

esProc can help to perfect these computations, thanks to its support for dynamic parsing of expressions and for heterogeneous data sources. The following explains esProc way of switching data sources dynamically through parameter:

myDB1 and oraDB are data sources that point to different databases. Each holds a sOrder table with the same structure. The report requires connecting to data sources dynamically via a parameter, querying sOrder for orders whose amounts are greater than 1,000, and displaying them.

Below is a selection of sOrder in myDB1:

OrderID Client SellerId Amount OrderDate
1 WVF 5 440 2009-02-03
2 UFS 13 1863 2009-07-05
4 JFS 27 671 2009-07-08
5 DSG 15 3730 2009-07-09
6 JFE 10 1445 2009-07-10
7 OLF 16 625 2009-07-11
8 PAER 29 2491 2010-07-12
9 DY 20 518 2010-07-15
10 JDR 17 1120 2010-07-16

Below is a selection of sOrder in oraDB:

OrderID Client SellerId Amount OrderDate
101 WAN 22 396 2010-11-07
102 SPL 15 142 2010-11-08
103 LA 23 713 2010-11-11
104 ERN 5 5678 2010-11-11
105 FUR 28 154 2009-11-12
106 BSF 27 10742 2009-11-13
107 RHD 4 569 2009-11-14
108 BDR 12 480 2010-11-15
109 OFS 17 1106 2009-11-18

esProc script:

esProc_report_problems_8

Explanation: Both pSource and pAmount are report parameters. pSource represents the data source name; ${…} indicates parsing a string or a string variable into an expression. pAmount stands for the order amount.

When pSource=“myDB1”, A1 has the following result:

esProc_report_problems_9

When pSource=“oraDB”, A1 gets this result:

esProc_report_problems_10

More similar cases, including multi-data-source pre-join, result sets union, different data sources for main report and subreport, dynamic join between main report and subreports and displaying data by different time units specified by parameter, can be found in esProc Implements Dynamic Data Sources for Reporting Tools.

Heterogeneous data sources

Reporting tools can normally handle a type of single data source. To present the result of hybrid computation involving different types of data sources, the only choice is using a high-level language or ETL tools. The process is extremely tedious. esProc is more suitable for doing this because it supports the hybrid computation. Let’s look at the esProc approach through the join of MongoDB and MySQL:

emp1 is a MongoDB collection, whose CityID field is the logical foreign key pointing to CItyID field of cities, a MySQL table that has two fields – CityID and CityName. You need to query employee records from emp1 according to specified time period and switch its CityID field to CityName of cities.

esProc script:

esProc_report_problems_11

Explanation: Query the two data sources using MongoDB syntax and SQL syntax respectively, and then use esProc switch function to establish foreign key association between them, and finally retrieve the desired fields. Here is the result:

esProc_report_problems_12

Besides, you can perform hybrid computation between JSON, Excel and txt, or between a database and a non-database. More details can be found from esProc Integrates Heterogeneous Data Sources for Report Development.

Parallel data retrieval

Sometimes a considerable amount of data needs to be retrieved from the database for presentation or export. But data retrieval with JDBC is always slow – maybe even slower than other computations involved in the report development. This causes the extreme inefficiency of the development. In contrast, esProc can realize data retrieval with multiple threads in parallel using simple code, which solves the JDBC problem. The following is such an example:

The user status table stores detail data – about 3,600,000 records. You need to present it in report and print it out. The desired layout is like this:

esProc_report_problems_81

esProc script:

esProc_report_problems_13

Explanation: Divide data into 4 parts according to the value range of userid and set 4 threads; each is assigned one part through the parameters and returns the result to the main thread.

Note that only when sufficient database resource is available will the parallel processing increase performance considerably. If the database is already fully loaded, the parallel data retrieval will aggravate its burden instead.

Order-related computations

Order related computations include inter-row comparison, getting relative interval and computing ranks. As a SQL result set doesn’t have explicit sequence numbers, you must generate them through a computed column. This is quite a hassle. For databases that lack sufficient support for window functions, such as MySQL, the code will be complicated.

One of the intrinsic qualities of esProc is its support for order-related computations, making the implementation easier. Let’s look at two cases – computing link relative ratio and making a year-on-year comparison.

The database table sOrder stores orders per day from each seller. The report needs to present results of comparing the sales amount per month during a specified time period with that of the previous month and with that of the same month in the previous year. Below is the source data:

OrderID Client SellerId Amount OrderDate
26 TAS 1 2142 2009-08-05 00:00:00
33 DSGC 1 613 2009-08-14 00:00:00
84 GC 1 89 2009-10-16 00:00:00
133 HU 1 1420 2010-12-12 00:00:00
32 JFS 3 468 2009-08-13 00:00:00
39 NR 3 3016 2010-08-21 00:00:00
43 KT 3 2169 2009-08-27 00:00:00
71 JFE 3 240 2010-10-01 00:00:00
99 RA 3 1731 2009-11-05 00:00:00

Script for computing link relative ratio:

esProc_report_problems_14

Explanation: First group and aggregate data to get sales amount per month per year, and then compute link relative ratio using the formula “sales amount of the current month/sales amount of the previous month”. The result is as follows:

esProc_report_problems_15

Script for comparing sales amount between same months on an annual basis:

esProc_report_problems_16

Explanation: Group data and sort it by months, and then perform the computation with formula “sales amount of the current month/sales amount of the same month in the previous year. The result is as follows:

esProc_report_problems_17

A similar example is computing inventory over the row.

The database table inout stores warehouse-in and -out records per product per day during consecutive days and the initial inventory is zero. You need to display the data in a report and add an extra column to represent stock per day.

product date in out
cola 6/1/2015 100 25
cola 6/2/2015 50 50
cola 6/3/2015 40 0
cola 6/4/2015 20 50
juice 6/1/20 100 50
juice 6/2/2015 10 20
juice 6/3/2015 100 20
juice 6/4/2015 40 35
wine 6/1/2015 50 10
wine 6/2/2015 0 30
wine 6/3/2015 20 10
wine 6/4/2015 30 5

esProc script:

esProc_report_problems_18

Explanation: Compute current day’s stock for each kind of product (through data grouping) with the formula “current day’s warehouse-in – current day’s warehouse-out + warehouse stock of the previous day”. esProc uses [-1] to represent the previous record and produces intuitive expression. Here is the result:

esProc_report_problems_19

If the dates are discontinuous, you need to align data with a sequence of consecutive dates before computing inventory. The script is as follows:

esProc_report_problems_20

Another typical case involving order-related computation is the installment loan.

The loan table stores loan information, including total loan amount, payment terms by month and annual interest rate. You need to build a grouped report in which details of each term of payment for every loan record – payment, interest, principal, principal balance – will be presented. Below is the source data:

LoanID LoanAmt Term Rate
L01 100000 5 4.75
L02 20000 2 5
L03 500000 12 4.5

esProc script:

esProc_report_problems_21

Explanation: First calculate the monthly interest rate and each term’s payment for every loan using a formula; then create a two-dimensional detail table for each loan according to the available information and calculate the current term’s interest, principal and principal balance; finally concatenate these detail tables. Below is the detail table for loan 1:

esProc_report_problems_22

Post-grouping subset handling

Some cases, require non-aggregate operations, such as record query, member modification or set operations, after data is grouped, but it’s hard to do this in SQL as a SQL grouping has a built-in aggregate operation. Being able to handle post-grouping operations and set operations, esProc is better at dealing with those cases.

Compare goal difference

The database table Mytable stores goal differences of each player in each team for every competition. The report needs to present the number of competitions in which a specified player is superior in goal difference compared with every other player in the team. Below is the source data:

Player Team Game Goals
Messi A G1 2
Neymar A G1 0
Messi A G2 2
Neymar A G2 3
Messi A G3 4
Neymar A G3 1
Neymar A G4 5
Messi A G5 2
Neymar B G5 2
Valdes A G1 1

esProc script:

esProc_report_problems_23

Explanation: Use a SQL statement to query the goal difference of the specified player and that of each of his teammates in each competition, and then group data by his teammates and query from each group (set) the competitions (subset) in which the specified player scores more goals than his teammate. You can get the result by counting members of a subset. Here is the result:

esProc_report_problems_24

Tag members of a group

In the database table test, classLev1 and classLev2 are grouping fields of two levels and seqInlev2 represents intragroup sequence numbers. You need to create a list to present data and add a tag field to tag rows in a group. Rules for assigning values to tag field are: Assign “first” to the first row and “last” to the last row, and assign “middle” to those in between; if there is only one row in a group, assign null to it.

Below is the source data:

classLev1 classLev2 name unit seqInlev2
1 1 test1 box 1
1 1 test2 box 2
1 2 test1 box 1
1 2 test2 box 2
1 2 test3 box 3
1 3 test1 box 1
2 1 test1 box 1
2 1 test2 box 2
2 1 test3 box 3
2 1 test4 box 4
2 2 test1 box 1

esProc script:

esProc_report_problems_25

Explanation: Retrieve data with a SQL statement and create an empty field tag; group data and select groups in which there is more than one member; tag all rows with “middle” and then change the first row to “first” and the last row to “last” in each group. Here is the result:

esProc_report_problems_26

Querying branch stores where DVDs are in short supply

The Branch table stores information of branch stores selling DVDs; the DVD table stores titles of DVDs and categories which they belong to; DVDCopy table stores information about DVD copies (the real products), and associates with the Branch table through BranchID field and with DVD table through DVDID field. That the Status field in DVDCopy table is “Miss” means this copy is missing; if LastDateReturned field is empty, the copy has been borrowed but hasn’t returned yet. The report needs to present information of branch stores where the available DVD copies have fewer than 4 categories. Below is the source data:

Branch table:

BID Street City
B001 street1 New York
B002 street2 Houston
B003 street3 LA
B004 street4 Lincoln

DVD table:

DVDID Category Title
D001 science fiction Transformers IV
D002 science fiction Transformers II
D003 science fiction Guardians of the Galaxy
D004 act The Expendables III
D005 sport Need for Speed
D006 feature Grace of Monaco

DVDCopy table:

CopyID DVDID BID Status LastDateRented LastDateReturned MemberID
C000 D001 B001 7/10/2014 7/13/2014 M001
C001 D004 B001 7/10/2014 7/13/2014 M001
C002 D001 B001 7/10/2014 M001
C003 D005 B001 7/10/2014 7/13/2014 M003
C004 D006 B001 7/10/2014 7/13/2014 M003
C009 D004 B002 7/10/2014 7/13/2014 M009
C010 D005 B002 7/10/2014 7/13/2014 M010
C011 D006 B002 Miss 7/10/2014 7/13/2014 M010

esProc script:

esProc_report_problems_27

Explanation: Select available DVD copies possessed by branch stores from DVDCopy table and group them by BID to get copies of each branch store; find DVDs corresponding to the DVD copies each branch store has and calculate how many categories where they fall in; find branch stores that have fewer than 4 categories of available DVD copies and that don’t exist in the DVDCopy table (both types are eligible); then combine data of the two types stores. The result is as follows:

esProc_report_problems_28

Fixed grouping

Sometimes the grouping criteria come from other database tables or files, or are defined by external parameters, and their number is large and there is big difference between each other; other times the grouping criteria are dynamic and overlap each other. The problem is that SQL grouping criterion must be some field in the current database table, so it’s difficult to handle this kind of fixed grouping in SQL. esProc can easily implement this algorithm because it supports alignment grouping, enumeration grouping and the flexible equal grouping.

Grouping data by a parameter list

The sales table stores ordering records, in which CLIENT filed holds client names and AMOUNT field holds order amount. The report requires that CLIENT be grouped by the parameter list argClient and that AMOUNT be aggregated to calculate sums. Below is the source data:

OrderID Client SellerId Amount OrderDate
26 TAS 1 2142.4 2009/8/5
33 DSGC 1 613.2 2009/8/14
84 GC 1 88.5 2009/10/16
133 HU 1 1419.8 2010/12/12
32 JFS 3 468 2009/8/13
39 NR 3 3016 2010/8/21
43 KT 3 2169 2009/8/27

If the parameter list has relatively small number of members, you can compose CLIENT into a dummy table with SQL union statement. If it has a large number of members, you can store data permanently in a database table before composing. This problem, however, has a dynamic, external parameter list with unknown number of members and containing members that even don’t exist in CLIENT field. To solve it in SQL, you must create a temporary table. It’s unnecessary to create such a temporary table in esProc. The following is esProc code:

esProc_report_problems_29

Explanation: Group sales with align function by the parameter list, calculate sum for each group and join the result with the parameter list. Here is the result:

esProc_report_problems_30

Grouping data by overlapped criteria

Suppose you want to group and aggregate the sales table according to criteria that are probably overlapped, such as the order amount is between 1,000 and 4,000, less than 2,000 and greater than 3,000. You cannot implement the algorithm in SQL without using a great many of union/with/minus statements, and this really is difficult when the criteria are specified through a parameter. In contrast, esProc does this with just the following code:

esProc_report_problems_31

Explanation: enum function can perform grouping operation with grouping criteria specified by an external parameter. @r allows duplicate members. Here is the result:

esProc_report_problems_32

Cross-tab reports with rows and columns presented in intervals

You might want to count employees with a cross-tab report. Columns are grouped by age ranges and rows are grouped by salary ranges. For instance, 5 columns –  1-30,31-32,33-35,36-40,41-50, and 3 rows – 1-5000,5001-10000,10001-20000 – would be displayed if you type in [30,32,35,40,50] and [5000,10000,20000] respectively. The following is source data:

EId Name SurName Gender State Birthday Salary HireDate Dept
2 Ashley Wilson F New york 1980-07-19 00:00:00 11000 2008-03-16 00:00:00 2
3 Rachel Johnson F New mexico 1970-12-17 00:00:00 9000 2010-12-01 00:00:00 1
4 Emily Smith F Texas 1985-03-07 00:00:00 7000 2006-08-15 00:00:00 3
5 Ashley Smith F Texas 1975-05-13 00:00:00 16000 2004-07-30 00:00:00 1
6 Matthew Johnson M California 1984-07-07 00:00:00 11000 2005-07-07 00:00:00 4
7 Alexis Smith F Illinois 1972-08-16 00:00:00 9000 2002-08-16 00:00:00 2
8 Megan Wilson F California 1979-04-19 00:00:00 11000 1984-04-19 00:00:00 5
9 Victoria Davis F Texas 1983-12-07 00:00:00 3000 2009-12-07 00:00:00 6
10 Ryan Johnson M Pennsylvania 1976-03-12 00:00:00 13000 2006-03-12 00:00:00 1
11 Jacob Moore M Texas 1974-12-16 00:00:00 12000 2004-12-16 00:00:00 3
12 Jessica Davis F New york 1980-09-11 00:00:00 7000 2008-09-11 00:00:00 4
13 Daniel Davis M Florida 1982-05-14 00:00:00 10000 2010-05-14 00:00:00 5

esProc script:

esProc_report_problems_33

Explanation: Group A1 by parameter intervals argAge and argSalary, calculate the number of employees in each group, and finally change the column and row names to a format permitted by the report. You can also calculate ages with SQL, but the esProc age function can prevent SQL code from being altered during database migration. pseg function will return a member’s sequence number in the interval. This is the result you’ll get:

esProc_report_problems_34

Post-grouping order-related computations

Performing order-related computations based on grouped data makes code writing in SQL even harder. But usually esProc handles it easily.

Calculating the remaining class periods

Every student has the same 10-day long class periods. The database table Course stores each student’s attendance information during these days. The report needs to present the classes that are already attended and the remaining class periods for each student each month. Below is the source data:

ID Name Date
1 Sandy 2015-05-06
2 Candy 2015-05-06
3 Sandy 2015-05-28
4 Candy 2015-05-29
5 Candy 2015-06-01

esProc script:

esProc_report_problems_35

Explanation: Perform group and aggregate operations to calculate the number of attended classes, and add a computed column to get the remaining number of class periods. Here is the result:

esProc_report_problems_36

Getting top N members

The database table sOrder stores orders per day from each seller. The report needs to present each seller’s N biggest orders. N is a parameter. Below is the source data:

OrderID Client SellerId Amount OrderDate
26 TAS 1 2142 2009-08-05 00:00:00
33 DSGC 1 613 2009-08-14 00:00:00
84 GC 1 89 2009-10-16 00:00:00
133 HU 1 1420 2010-12-12 00:00:00
32 JFS 3 468 2009-08-13 00:00:00
39 NR 3 3016 2010-08-21 00:00:00
43 KT 3 2169 2009-08-27 00:00:00
71 JFE 3 240 2010-10-01 00:00:00
99 RA 3 1731 2009-11-05 00:00:00

esProc script:

esProc_report_problems_37

Explanation: Group data by SellerID; get N records that have the biggest Amount values from each group with top function and concatenate them with union function. -Amount means getting records from large to small values. Here is the result:

esProc_report_problems_38

Use A2.(~.maxp@a(Amount)) to retrieve the record, or the records holding the greatest value from each group. To retrieve the first record holding the greatest value from each group, sort the records by Amount in SQL and use A1.group@1(SellerId). Use A2.group(SellerId;~.to(argM,argN)).conj(#2) to get records from the Mth to the Nth from each group. To compare the Amount field values between the first record and the second record in a group, use A1.group(SellerId;~(1).Amount-~.m(2).Amount). The m function can get members by their sequence numbers and judge where a group ends (when a group has only one record).

Modifying data to make even value for each group

Suppose you want to create a grouped report that gets data from the database table stock, which makes Title as the grouping field and has detail fields ID and Number. But you need to modify the smallest Number value in each group to ensure that the sum of Number values is 800. Below is the source data:

ID Title Number
1 A 200
2 A 300
3 B 400
4 C 500

esProc script:

esProc_report_problems_39

Explanation: Sort data by Title and Number and then group data by Title; loop through each group to modify the first Number value so that the sum of Number is 800. Here is the result:

esProc_report_problems_40

String splitting and concatenation

Sometimes you need to handle structured data processing based on strings generated from splitting a larger string. Other times you need to split one row into multiple rows or combine multiple rows into one. It’s difficult to implement these calculations in SQL. esProc, however, is better at handling them thanks to its support for explicit sets.

Splitting one row into multiple rows

The database table data has two fields. You want to split one of the fields – ANOMOALIES – into multiple strings and combine each string with the corresponding ID value in the original table to generate new records. Below is the source data:

ID ANOMALIES
3903 B1 D1 CAT1
3904 D7 D2 B1 CAD4

esProc script:

esProc_report_problems_41

Explanation: Split ANOMOALIES field of each record into a sequence according to spaces, create two-dimensional tables according to the sequences respectively and concatenate these two-dimensional tables. This is the result you’ll get:

esProc_report_problems_42

Concatenating members by grouping fields

In the database table table1, the first three fields are grouping fields and the fourth one is the detail field. You need to build a report where values of the fourth field are combined according to the grouping fields. Below is the source data:

Col1 Col2 Col3 Col4
10 1234 Sam A12G3
10 1234 Sam K78DE
10 1234 Sam MAT12
20 1456 Tom E12F4
20 1456 Tom KAR3R
20 3217 Tom G45G4

esProc script: myDB1.query(“select * from table1”).group(Col1,Col2,Col3;~.(Col4).string():Col4

Explanation: The aim of this case is to implement group_concat. But not every database supports this function. You can use esProc to achieve the same target, regardless of which database you are using. The following is the result:

esProc_report_problems_43

Date handling

A sequence containing datetime data is more complicated than the one consisting of natural numbers. Thus you may found it’s very difficult to handle dates in SQL, such as getting work days, intersection of some dates and combining time intervals into a larger one. Equipped with built-in date functions and ordered sets, esProc is convenient to use in date handling.

Calculating work days

The database table project stores project information including beginning and ending dates. The report needs to present the information and an extra column for displaying the number of work days. Below is the source data:

pid pname manager beginDate endDate
p041-27 project1 steven 2014-01-08 2014-07-29
p042-v-1 project2 steven 2014-04-01 2014-06-05
p03-09 project3 mike 2014-08-12 2015-01-31
p02-04 project4 elisa 2014-09-22 2015-09-22

esProc script:

esProc_report_problems_44

Explanation: With workdays function, it’s easy to calculate the number of work days during a certain time period. Here is the result:

esProc_report_problems_45

Calculating peak travel dates

The database table trip stores business trip information. You need to get the 10 busiest dates and sort them by the number of trips. Below is the source data:

       star_date end_date
2015-01-05 2015-03-02
2015-02-01 2015-02-10
2015-02-05 2015-02-09
2015-02-20 2015-04-05
2015-03-06 2015-03-15
2015-03-08 2015-03-11
2015-03-10 2015-03-22
2015-03-12 2015-03-25
2015-03-15 2015-03-23
2015-03-15 2015-03-19
2015-03-18 2015-03-29

esProc script:

esProc_report_problems_46

Explanation: Split each trip into a sequence of dates according to the beginning and ending dates, concatenate the sequences and perform group and aggregate by dates; and then sort the records to get the first 10 ones. Here is the result:

esProc_report_problems_47

Merging overlapped time intervals

In the database table project, beginDate and endDate represent the beginning date and ending date of each project. The report needs to present the time intervals with or without projects (each is designated by “busy” and “free”) in sequence. Below is the source data:

pname beginDate endDate
project1 2015-01-06 2015-01-31
project2 2015-01-15 2015-02-11
project3 2015-03-10 2015-03-22
project4 2015-03-12 2015-03-25
project5 2015-03-15 2015-03-23
project6 2015-03-25 2015-04-10

esProc script:

esProc_report_problems_48

Explanation: A2-A4 merges smaller time intervals into bigger ones. This is done by first removing dates within larger intervals. That is, if the ending date of a time interval is after the beginning date of the next interval, then merge them as a bigger interval from the beginning date of the first interval. After that, group data by dates and use the ending date of the last record in each group as the bigger interval’s ending date. A5 calculates free time intervals. Finally, concatenate the busy and free time intervals and perform sort. Here is the result:

esProc_report_problems_49

 

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Report. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s