VSAM Tuning Parameters

KSDS index levels

A KSDS is a direct access file. Access is through a hierarchical index structure, which can have several levels. The lowest level is called the 'sequence set'. By default, VSAM gives you one index buffer, which is used for the highest index level. For random processing with default buffering, several I-O operations will be necessary to retrieve data, one for each subsequent index level, and one for the data CI. When using modern large cache DASD, there is a high probability that all of the index will be in cache, so I-O times are minimised, but it is still worthwhile trying to keep the number of index levels down to avoid I-O altogether. Index levels greater than 4 should be considered a problem.

   EADM Advert

Accelerate DB2 Write with zHyperWrite and "EADM™ by Improving DB2 Logs Volumes Response Time:

Index levels can be determined from the index statistics in a LISTCAT as shown below

STATISTICS
REC-TOTAL-----------6 - 6 index records
INDEX:
REC-DELETED-------0
LEVELS-----------------2 - 2 level index, 5 record sequence set

KSDS index levels

The number of records in the sequence set, is the same as the number of control areas in the data component. The bigger the sequence set, the more levels you will have in an index. You can reduce the number of control areas by making them bigger, though they will probably be at maximum size anyway. If you allocate a file in Cylinders, the CA size is 1 cylinder. If you allocate in Tracks, the CA size is the smaller of the primary, or secondary space allocation, up to a maximum of 15 tracks. Reducing the amount of freespace will also mean fewer control areas, but see below.

Freespace in a KSDS

Freespace is specified to avoid CI & CA splits, of which more later. A reasonably common specification is FREESPACE (20 25), which basically means that after initial load, 33% of the file is empty. If large values of freespace are specified, then the index levels can increase, and fewer actual records are retrieved in an I-O operation. Both of these impact random performance, and the second one will impact sequential performance. The bottom line is that you need to analyse the insert activity to your file to determine which freespace parameters are best, A reasonable rule of thumb is ;
few inserts, or most inserts at end of file FREESPACE(0 0)
inserts clustered at points in the file FREESPACE(0 50)
data inserted at random, all over the file FREESPACE(20 25)

CIa and CAs

CI/CA size

The CI size is the 'blocksize' for a VSAM file. A small CI size will impact sequential processing, a large CI size can adversely impact random processing. A small CI size will also use more disk space. A reasonable rule of thumb is to use a 4K CI for small files, and 8K for large files, with performance assisted by buffering when necessary. Another rule of thumb is to calculate the average record size, then find a CI size that is a reasonable multiple of that average. However be aware that some programs do not support all possible CI sizes. With concurrent access, VSAM locks out portions of a file at CI level, so a large CI size is usually not appropriate for a VSAM file used by online systems.

You cannot specify a CA size directly, but a big CA size will reduce the frequency of CA splits, and reduce the number of index levels. The downside of course, is when you do a CA split, you move more data. If a VSAM file is allocated in cylinders, the CA size will be one cylinder. Otherwise, it is the smaller of the primary and the secondary allocation unit.

CI/CA splits

People tend to get all excited if they see a VSAM file with numerous CI splits and especially, CA splits. With SLED DASD, splits meant extra seek time during sequential processing, as the VSAM dataset was not arranged in sequential order. How relevant is that in RAID boxes and a very large disk subsystem cache? However, when a VSAM file is actually performing a CA split, it has to move half a cylinder of data (assuming the file is defined in cylinders, and so the CA size is one cylinder). This can take an appreciable time, so it is worthwhile trying to avoid splits from happening, by manipulating CI/CA sizes, and freespace. Whether it is worthwhile running regular file reorganisations to remove CI & CA splits is open to question.

CA Reclaim

The CA reclaim function was introduced in z/OS V1R12 and lets you automatically reclaim the space used by empty CAs for KSDS files. Like any function, it should be used with care, as it works well with files that have a lot of empty CAs and a lot of CA splits that can be moved into them. If a file does not have many split CAs, then CA reclaim can actually degrade performance. You can check a file to see if it is suitable for CA reclaim by using the EXAMINE DATASET command which shows the number of empty CAs in a KSDS with message IDC01728I.

CA Reclaim can be used on KSDS files that may or may not be SMS managed, and processed by VSAM or VSAM RLS. It can be enabled in a sysplex with mixed z/OS levels, but toleration PTFs will be required for any LPARS where the z/OS level is prior to z/OS V1R12, and those lower-level systems cannot perform CA reclaim.
CA Reclaim cannot reclaim space for partially empty CAs, empty CAs that already existed when CA reclaim was enabled, CAs with RBA 0, CAs with the highest key of the KSDS, Data sets processed with GSR and KSDS files that were defined with the IMBED option.

Enabling and disabling CA reclaim

CA reclaim is disabled at the system level by default. To enable CA reclaim for the system, run the command SETSMS CA_RECLAIM (DATACLAS) and update the IGDSMSxx member of PARMLIB with CA_RECLAIM(DATACLAS), so the function is preserved at IPL time.
To disable CA reclaim at the system level, use the the SETSMS command SETSMS CA_RECLAIM (NONE) and change the parameter to CA_RECLAIM(NONE) in IGDSMSxx.

Once globally activated, by default all your dataclasses will be enabled for CA reclaim, so now every new KSDS file allocation have their catalog entry set to use CA reclaim, but exisiting files will not, as the dataclass is only triggered at allocation time. So how do you control it so that only CA reclaim is only used by those files that will benefit from it?

You can set CA reclaim on or off for each dataclass definition using the CA Reclaim definition on page 5 of the ISMF dataclass panels. By default, CA Reclaim is set to Y for ON, but you can change it to 'N' to switch it off for that dataclass.

You can change the CA reclaim for individual data sets with the IDCAMS ALTER command, so if you want to use CA reclaim on a file that was allocated before CA reclaim was activated you would use.

IDCAMS ALTER dataset.name RECLAIMCA

and CA reclaim will be enabled once the file is closed then opened again, but remember, CA Reclaim cannot reclaim space from empty CAs that already existed when CA reclaim was enabled, so it might be worth scheduling a reorg.
On the other hand, if you want to disable CA reclaim for a file, you would use -

IDCAMS ALTER dataset.name NORECLAIMCA

If you want to find out whether or not a file is using CA reclaim, you can use the IDCAMS LISTCAT command, and you will see the value set for CA reclaim in the Cluster section, and in the Index section you can see the number of CAs reclaimed and the number of reclaimed CAs that have been reused since the KSDS was created.

TRACING

If you need to trace a VSAM file to work through a problem, then the old way was to add an AMP=('TRACE=(parameters)') into the JCL which allocated that file. Z/OS 2.1 introduced the IDADVT started task, which lets you control and interact with VSAM traces using the MODIFY command.

This IBM knowledge centre article explains both tracing methods in detail

IMBED & REPLICATE

These parameters were both introduced to speed up access to KSDS files.
IMBED places the sequence set records on the first track of each CA record, and so speeds up sequential performance, as there is then no need to reference the index dataset.
REPLICATE will replicate as many index records as will fit on a track, to avoid rotational delay. On large cache controllers, neither of these parameters will improve read performance, as the index will almost always be held in cache. However, they will adversely affect write performance, as more data needs to be written, so they should be avoided.
These parameters are no longer supported from z/OS 1.4. Your allocation will not fail if you use them, but they will be ignored. The KEYRANGE and ORDERED parameters are also ignored. The recommendation is to remove them next time you run a reorg.

back to top