A Incremental Load with Data Archive

I would like to provide you an incremental load script for QlikView and Qlik Sense, which I wrote in 2014. Different from other solutions you can find today, my solution does not require the definition of the primary key; it also provides you data archive capability, which you can store in a separated location for security. Hope you like it.

The main idea of the solution is to split your source data into separated data partitions so each data partition is small, easy to load, and safe for backup. You can define your partition rule as by dates, by part types, or by any fields you have in your data. The assumption of the solution is that there will be no update to older partitions; new updates exist only on recent partitions. My codes are written for data partition based on dates, which is the most common scenario in staging. But, you can modify them for other data partition rules not using dates.

In your environment, you would use this script to load data from your source application, store them into data archive QVDs based on your data partition rule, and update your staging QVD based on the archive QVDs. The resulting staging QVD can then be loaded in your QlikView / Qlik Sense application, or your transformation script.

The main program requires the definition of the following variables:

  • vArchiveQVDPath: Storage path to your archive QVDs.
  • vTableQVDPath: Storage path to your staging QVD.
  • vQVDName: File name prefix of your archive and staging QVDs.
  • vOldPartitionRefreshThresholdInDays: The number of days before today to define which archive QVDs require refresh.

The following codes provide an example for these variables.

SET vArchiveQVDPath = '..\Data\Archive QVD\';
SET vTableQVDPath = '..\Data\Staging QVD\';
Set vQVDName = 'Test';
Set vOldPartitionRefreshThresholdInDays = 30;

In the example, all of our archive and staging QVD file names will start with “Test_”.  You would refresh archive QVDs, if their time stamp is within 30 days from the day you run your script.

Next, the script requires the following two sub-routines:

  • BuildStagingQVD: Build staging QVD by combining all archive QVDs.
  • BuildPartitionQVD: Build an archive QVD based on your definition of a data partition.

The following sections describe these sub-routines in detail.



SUB BuildStagingQVD (vQVDName, vNewQVDName)
TRACE [Fctl Call] BuildStagingQVD ('$(vQVDName)', '$(vNewQVDName)');
Load '' As _tmp AutoGenerate 0;
Let vQVDFileMask = '$(vArchiveQVDPath)' & '$(vQVDName)' & '_*.qvd';
For each vFile in filelist('$(vQVDFileMask)')
LOAD * From $(vFile)(qvd);
Next vFile
Drop Field _tmp From QVD_File;
If IsNull('$(vNewQVDName)') or '$(vNewQVDName)'='' then
Let vNewQVDName = vQVDName;
STORE QVD_File into $(vTableQVDPath)$(vNewQVDName).qvd(qvd);
Drop Tables QVD_File;

The BuildStagingQVD sub-routine builds staging QVD by combining all archive QVDs. It has the following two parameters:

  • vQVDName: File name prefix of your (source) archive QVDs.
  • vNewQVDName: File name of your (target) staging QVD. This parameter is optional. If not provided, it is set to be the same as the value of vQVDName.

It starts with reading archive QVDs with name prefix provided by vQVDName at the location provided by vArchiveQVDPath. It reads the archive QVDs one at a time and concatenate them into a temporary table called QVD_FIle. Once it reads all archive QVDs with the specified prefix, it writes the resulting temporary table into the staging QVD file with name specified by vNewQVDName at staging location provided by vTableQVDPath.



SUB BuildPartitionQVD (vQVDFullName, vLowerBound, vUpperBound)
// Note: vQVDName needs to be a complete file name with file extension, excluding file path.
TRACE [Fctl Call] BuildPartitionQVD ('$(vQVDFullName)', '$(vLowerBound)', '$(vUpperBound)');
Let vPartitionQVDFile = '$(vArchiveQVDPath)' & '$(vQVDFullName)';
This is $(vQVDFullName).
// WHERE "Import Date">=CAST('$(vLowerBound)' AS DATE) AND "Import Date"<CAST('$(vUpperBound)' AS DATE); // WHERE TSH.TS_DT>=TO_DATE('$(vLowerBound)','YYYYMMDD') AND TSH.TS_DT<TO_DATE('$(vUpperBound)','YYYYMMDD'); Rec_Count: NoConcatenate LOAD COUNT(1) As PTAB_CNT Resident Partition_Table; If Peek('PTAB_CNT')>0 then
Store Partition_Table into $(vPartitionQVDFile)(qvd);
TRACE [Info] No records are found. Skip partition build.;
DROP Tables Partition_Table, Rec_Count;

The BuildPartitionQVD sub-routine builds an archive QVD based on your definition of a data partition, by dates in our codes here. The sub-routine has 3 input parameters:

  • vQVDFullName: Full name of an archive QVD to write to. This will be provided by our main script.
  • vLowerBound: Lower bound used in the SQL command to get data from your source application.
  • vUpperBound: Upper bound used in the SQL command to get data from your source application.

It starts with reading data from your source application based on the condition limited by vLowerBound and vUpperBound provided by the main script. This is the part where you need to replace the following codes with SQL command specific to your source application database.

This is $(vQVDFullName).
// WHERE "Import Date">=CAST('$(vLowerBound)' AS DATE) AND "Import Date"<CAST('$(vUpperBound)' AS DATE); // WHERE TSH.TS_DT>=TO_DATE('$(vLowerBound)','YYYYMMDD') AND TSH.TS_DT<TO_DATE('$(vUpperBound)','YYYYMMDD');

It then stores the data into a temporary table Partition_Table. If there are data, it writes the temporary table into the archive QVD file provided by vQVDFullName at location vArchiveQVDPath.

Note that I purposely refresh the whole archive data with data from the source application for performance reason. If we keep an archive QVD small, it is fast to refresh the whole file (or data partition) than updating it or appending data to it. The objective is to make the process simple and fast.

Then, we will start to look at the main script below to see how the whole piece working together.

Main Script

Our main script starts with the creation of a data partition reference table: Partition_List.


LOAD PartitionNumber, Num(PartitionStart) As PartitionStart, Num(PartitionEnd) As PartitionEnd;
// SQL Server, first partition
SQL SELECT 1 As PartitionNumber,
DATEADD(q, DATEDIFF(q, 0, Min("Import Date")), 0) As PartitionStart,
DATEADD(q, DATEDIFF(q, 0, Min("Import Date"))+1, 0) As PartitionEnd
// Oracle, first partition
// SQL SELECT 1 As "PartitionNumber",
// TRUNC(Min(ACCRUAL_PROC_DT),'YEAR') As "PartitionStart",
// ADD_MONTHS(TRUNC(Min(ACCRUAL_PROC_DT),'YEAR'),12) As "PartitionEnd"
Do While Peek('PartitionEnd') < QuarterEnd(Today())
LOAD Peek('PartitionNumber')+1 As PartitionNumber,
AddMonths(Peek('PartitionStart'), 3) As PartitionStart,
AddMonths(Peek('PartitionEnd'), 3) As PartitionEnd
Autogenerate 1;

In the example, the main script looks at the minimum of the field “Import Date” in our source application table “RecDisc.dbo.SP_TABLE_ARC”, and use it as the beginning date of our data partition as “PartitionStart”. Then, assuming a data partition is one quarter of data based on the dates, we would set the ending date of our data partition as “PartitionEnd”. Add this as our partition, and then add other partition reference records until we reach today’s date. This results in our “Partition_List” reference table with the following fields:

  • PartitonNumber
  • PartitionStart (inclusive)
  • PartitionEnd (exclusive)

This part of codes is where you would replace them with your own source application data and partition rule.

Next, we define our staging QVD file name and path.

Let vStagingFile = '$(vArchiveQVDPath)' & '$(vQVDName)' & '.qvd'; // Change to Staging QVD path and file name.

If our reference table has only one record, it means we only need one archive QVD. Call our BuildPartitionQVD sub-routine to create the archive QVD and then call our BuildStagingQVD to create our staging QVD.


If Peek('PartitionNumber')=1 then
// Refresh vPartitionQVD
Let vPartitionQVD = '$(vQVDName)' & '_' & Date(Peek('PartitionStart',-1,'Partition_List'),'YYYYMMDD') & '.qvd';
TRACE [Info] Have only 1 partition. Refresh $(vPartitionQVD).;
Call BuildPartitionQVD('$(vPartitionQVD)', Date(Peek('PartitionStart',-1,'Partition_List'),'YYYYMMDD'), Date(Peek('PartitionEnd',-1,'Partition_List'),'YYYYMMDD'));
Call BuildStagingQVD($(vQVDName), Null());

Otherwise, we define the last two archive QVDs as vOldPartitionQVD and vNewPartitionQVD. If the PartitionStart of vNewPartitionQVD is more than vOldPartitionRefreshThresholdInDays days old, we refresh only vNewPartitionQVD; otherwise, we refresh both archive QVDs, followed by the creation of the staging QVD.


ELSE // >=2
Let vNewPartitionQVD = '$(vQVDName)' & '_' & Date(Peek('PartitionStart',-1,'Partition_List'),'YYYYMMDD') & '.qvd';
Let vOldPartitionQVD = '$(vQVDName)' & '_' & Date(Peek('PartitionStart',-2,'Partition_List'),'YYYYMMDD') & '.qvd';
If Today()-Peek('PartitionStart',-1,'Partition_List') < $(vOldPartitionRefreshThresholdInDays) then Set vRefreshOldPartition = 1; ELSE Set vRefreshOldPartition = 0; ENDIF // If vNewPartitionQVD exists, refresh vNewPartitionQVD. If Filesize('$(vArchiveQVDPath)' & '$(vNewPartitionQVD)')>0 then
TRACE [Info] Have multiple partitions. Current partition exists. Refresh $(vNewPartitionQVD).;
Call BuildPartitionQVD('$(vNewPartitionQVD)', Date(Peek('PartitionStart',-1,'Partition_List'),'YYYYMMDD'), Date(Peek('PartitionEnd',-1,'Partition_List'),'YYYYMMDD'));
If Today()-Peek('PartitionStart',-1,'Partition_List') < $(vOldPartitionRefreshThresholdInDays) then
TRACE [Info] Old partition end date is within the refresh threshold of $(vOldPartitionRefreshThresholdInDays) days. Refresh $(vOldPartitionQVD).;
Call BuildPartitionQVD('$(vOldPartitionQVD)', Date(Peek('PartitionStart',-2,'Partition_List'),'YYYYMMDD'), Date(Peek('PartitionEnd',-2,'Partition_List'),'YYYYMMDD'));
TRACE [Info] Build staging QVD.;
Call BuildStagingQVD('$(vQVDName)', Null());
// If vNewPartitionQVD does not exist but vOldPartitionQVD exists, refresh vOldPartitionQVD and create vNewPartitionQVD.
ELSEIF Not IsNull(Filesize('$(vArchiveQVDPath)' & '$(vOldPartitionQVD)')) then
TRACE [Info] Have multiple partitions. Current partition does not exist but old one exists.;
TRACE [Info] Refresh both $(vNewPartitionQVD) and $(vOldPartitionQVD).;
Call BuildPartitionQVD('$(vOldPartitionQVD)', Date(Peek('PartitionStart',-2,'Partition_List'),'YYYYMMDD'), Date(Peek('PartitionEnd',-2,'Partition_List'),'YYYYMMDD'));
Call BuildPartitionQVD('$(vNewPartitionQVD)', Date(Peek('PartitionStart',-1,'Partition_List'),'YYYYMMDD'), Date(Peek('PartitionEnd',-1,'Partition_List'),'YYYYMMDD'));
TRACE [Info] Rebuild staging QVDs.;
Call BuildStagingQVD('$(vQVDName)', Null());
// If neither vNewPartitionQVD not vOldPartitionQVD exists, create all files.
TRACE [Info] Have multiple partitions but they do not exist yet. Build all partitions.;
FOR vPartitionStart = 1 to Peek('PartitionNumber')
Let vPartitionQVD = '$(vQVDName)' & '_' & Date(Peek('PartitionStart',$(vPartitionStart)-1,'Partition_List'),'YYYYMMDD') & '.qvd';
Call BuildPartitionQVD('$(vPartitionQVD)', Date(Peek('PartitionStart',$(vPartitionStart)-1,'Partition_List'),'YYYYMMDD'), Date(Peek('PartitionEnd',$(vPartitionStart)-1,'Partition_List'),'YYYYMMDD'));
TRACE [Info] Build staging QVDs.;
Call BuildStagingQVD('$(vQVDName)', Null());


What to Say When the Police Tell You to Stop Filming Them

Indeed, it is important to know your right, when you are recording a public service in the US.

Reference: What to Say When the Police Tell You to Stop Filming Them

The part-time jobs of superheroes illustrated

Saving the world is great, everybody likes you and you get tons of media exposure. Hell, you even get books and movies written about you. But saving the world doesn’t pay the bills.

Illustrator Flying Mouse 365, whose nickname sounds like a superhero’s name, created many designs trying to imagine what part-time jobs superheroes had to take on when stuck with money.

Reference: The part-time jobs of superheroes illustrated

Awesome R

A curated list of awesome R frameworks, packages and software. Inspired by awesome-machine-learning.

Reference: Awesome R

Tiny lab devices could attack huge problem of drug-resistant infections

This is another research project involving “microfluidic” testing. This technology is related to the following articles I mentioned previously.

Reference: Tiny lab devices could attack huge problem of drug-resistant infections

Interworks Blog: 10 Questions

Interworks has a series of interviews with people working in data visualization. Check out this blog for their valuable experiences.

Reference: Interworks Blog: 10 Questions

Cheaper Robots, Fewer Workers

This NYT article continues to echo the movement of robotic manufacturing. It is an interesting special report on the manufacturing trend in China, where the government would like maintain its leadership in low cost manufacturing in the world.

Listed below are my past blog about robots and automation: