The Jacana (Home)
About us




Top of page
1. What is a subset?

The Subset is a "knowledge-base" covering a section of the DB2/400 database. It represents the Data base analysts's understanding of the file relationships. The Subset is the "data source" that you use in Jacana when designing reports. 

A Subset can be thought of as a map of a part of the DB2/400 database. When the user designs data fields into a report, the product uses this map to "navigate" the data base, and assemble the necessary fields to produce the requested output.

  • A Subset need not reflect an existing application, it can contain data from any files that can be logically associated, either directly or indirectly via other files, ie a new Subset can be created that covers an area that is already partially covered by an existing Subset.
  • A Subset should contain as much as possible of the information that is likely to be required relating to a specific reporting purpose. A Subset should NOT reflect the requirements of a single report. New Subsets should not be created at random as this can lead to maintenance and security problems as they proliferate. 

Data from the files is accessed based on the data file relationships defined using the Retrieval map function and the fields required to generate the output encompassed by the report definition. RPG/400 or RPG/ILE programs are generated to interrogate the data base files necessary to produce the requested output.


Top of page
2. What is a Data Group?

Data Groups are used to bring together logically related fields. A group is the 'unit' of information that OPEN/DB works with. Just as conventional programs read and write files, OPEN/DB reads and writes groups. A group is a "virtual record format". It is defined to OPEN/DB implicitly during Retrieval map creation/maintenance and can be modified by activities performed using the DataDictionary function. 

When a Data base file is linked into the Subset, its relationship to other files, already defined in the Subset, is expressed in terms of the number of matching records expected to be found. Files are linked either as "one-to-one" or as "one-to-many" relations. Each "one-to-many" relationship defines a new "Data Group". Fields and files that share a one-to-one relationship are defined into the same Data Group. 

Each Subset is made up of at least two groups, CTL and G01. 

The CTL (or "control") group contains fields that usually do not directly relate to the data base, but are required when designing a report. Fields such as: Page number, Report name, Job name, User name, etc. are defined into the "CTL" group. 

Every field in the Data dictionary exists in one and only one data group. Fields that come into existence as a result of a report definition (e.g. calculations or work fields) are automatically placed into the correct data group during the report definition process. For example, select/omit tests defined by the report designer are placed into the control group. 

Group G01 contains all the fields from the Subset's "Based-On-File" (BOF) and all other data values (eg. "Derived fields") that have a one-to-one relationship with the BOF.


Top of page
3. What is Extra Logic?

Extra logic is a powerful feature of OPEN/DB. It allows the Subset designer to specify custom program code to perform functions outside the scope of OPEN/DB and to define and populate fields that do not exist in the data base ("Derived fields"). 

Extra logic is introduced to the Subset using the "XLOGIC" function by adding code to the insert points, by using Macros and by direct association with a derived field using the Data dictionary function. 

A practical example of the application of extra logic to customize a Subset is where a client requires custom security over their data. Consider the following scenario. 

A site operates a proprietary inventory application. The application allows the site to control data access through its on-line maintenance programs. This is achieved via a "security" file that controls whether a given user is permitted to access/maintain inventory data for a given warehouse. 

This provides adequate security within the application, but when a third party product is used for data access (e.g. IBM Query/400) a security problem arises. OS/400 security supports control over access to the file as a whole and to the Query definition. To control which records from within the file a given user may see in a report, a new Query must be written for each user so that record selection is used to only select warehouses to which the user is authorised. Each Query must then have its access rights set such that only the correct user may run it, and no user may change it except the security administrator. Ten users may well mean ten new Query/400 definitions and ten actions to set correct object security. 

Using extra logic can simplify this greatly. A Subset is defined to access the inventory file. Extra logic is added to check the security file to see is the current user is allowed to see the current record. If the user is not permitted, the record is simply bypassed and the next record read. Now, any user can define reports. Other users can run these reports rather than create a new report to serve essentially the same purpose. No matter which user runs a report, the output will only ever show those data to which the user is authorised. 

This security is simple to apply. It is done once only - in one place. It is applied automatically to all future reports based on the Subset. It provides a seamless integration to the existing security of the proprietary application. It is invisible to end-users. It is maintenance free.


Top of page
4. When defining one-to-many relationships in my retrieval map, just how many is many ?

When creating a Subset, each new file link is defined as either a "one-to-one" or a "one-to-many" relationship. For "one-to-many" relationships, the anticipated number of records must be specified. This number is used to optimize the code generated for the link. This number is not a maximum - occurrences of the link that exceed this number will be correctly processed. You should take some care to choose an appropriate value as it has the potential to affect performance. 

This is an important field that can have a significant impact on performance if inappropriately specified. It represents an optimum number of records on the "many" side of the relationship for each record on the "one" side. 

The reason for the significance is the way that OPEN/DB performs its processing. For each source record read, OPEN/DB will read all target records and construct an array of the selected ones. The number of records field determines how big this array will be (and consequently how big associated reports programs will be). 

The number is used to allocate working storage in the generated programs to keep track of all the (selected) linked records in the target file for any one record in the source file. 

As a rule of thumb and taking into account your knowledge of the application and the way it is used in your organisation, you should aim to set this number high enough to cater for 95% or more of the cases. Note that: Unless the distribution of records is almost perfectly even, the number you specify should be somewhat larger than the "average". 

Taken to extremes, too large a value can lead to performance degradation arising from excessive paging. In very extreme cases, particularly where multiple nested groups are used, compilation failure is possible. This should not arise providing you give proper consideration to setting an appropriate value. Certainly, you should avoid settings that use the maximum (9,999) except in the most simple 2 level hierarchies. If the value set is too small, the program will run out of room and will have to store additional records externally to the program. This can result in performance degradation.

Remember that this number relates to the records selected by a given report during execution. In some applications, typical record selection tests can be predicted (eg. "Customer Type" or "Product Group" or a Transaction Date range etc.). An understanding of the business need for a given Subset can be helpful in determining the best value for "number of records". Keep in mind that the overall number of records is multiplied at each hierarchical relationship. Where Group 1 goes to Group 2 in a 1:100 relationship, the anticipated number of records is 101 (ie. 1 + 1 * 100). If Group 2 then goes to Group 3 also in a 1:100 relationship, then the equation becomes (1 + 1 * 100 *100 = 10,001). 

The possible effects on performance can be summarized as follows: 

  • If the number specified is too small, then excess records cause additional processing. If this occurs too frequently in execution, then performance degradation will result. 
  • If the number specified is larger than required, then the generated program will be larger and will use more storage than is necessary. However this is usually preferable providing the excess represents no more than a reasonable "margin for error" because the impact upon performance is less significant. 

In exceptional cases of imbalance (for example there are 200 records on the source file and 20,000 records on the target file, but 17,000 of the target records are related to a single source record), consider developing the Subset using the current target file as the source file and linking the current source file as a "one-to-one" target file in the new Subset.


Top of page
5. What happens when the report finds more records than the Subset expects?

The routine, "O#XG8071", is invoked when the dimension of a one-to-many relationship (as defined in the Subset) is exceeded by the data encountered during execution. This routine will handle the storing of excess records outside the report program. The number of records, which must be read before the dimension is exceeded, may vary from report to report depending upon which areas of the Subset are included in the report compilation ? ie. what fields, files (and thus Subset groups) the report requires and the select/omit rules in force for "this" execution. 

The record counts accumulate through the hierarchy of the Subset. Given one record for the based on file (BOF), a one?to?500 relationship to the next file (say group G02) will give a value of 500 as a group record limit when processing group G02 and an overall record limit of 501 for any single record from the BOF. 

If the same Subset definition is extended by adding, say, a 1:15 relationship from the second file (G02) thus creating group G03, then the maximum defined expectation for any single BOF record becomes: 15 * 500 + 1 = 7501 ... thus creating a situation where record 7502 being found for any single BOF record will cause O#XG8071 to be invoked to handle overflow processing. Overflow processing will add a processing overhead to the overall report run. 

In the above example, if the new relationship were defined as 1:30 (instead of 1:15), then the equation would become 30 * 500 + 1 = 15001. A value of 15001 exceeds an internal limit and is therefore automatically reduced to 9999. NOTE: if this 3rd file links from the BOF (G01) instead of G02 (a different hierarchy), the equation becomes: 30 + 7500 + 1 = 7531)

"One-to-one" relationships are omitted from this calculation because the Subset fields populated by these links are built into their respective group records. 

Considering a Subset comprising just the first 2 files from the above example (1:500), then for any given record read from the BOF, the potential maximum records to be stored when all the relationships defined by the Subset have been resolved is 501. If, during a given execution of the report, data is encountered which requires more than 501 records to be stored for any given BOF record, then O#XG8071 is invoked to store the excess records in a user space. 

The user space can expand dynamically to the iSeries limit 16Mb. If it is exceeded in execution, a message: "Starting position nnnnnnnn and length nnn caused space overflow" will be seen in the job log and the report run will terminate abnormally. (Home)Top of page footspace.gif (46 bytes) Copyright Momentum Utilities Pty Ltd - July 2016
Legal Information - Privacy - Contact Us