- VSAM structures
- VSAM commands
- Performance tuning
- JCL Buffers
- LSR Buffers
- System Buffers
- VSAM parameters
- IAM, a VSAM alternative
- VSAM Recovery
- VSAM RLS, DFSMStvs
A KSDS is a direct access file. Access is through a hierarchical index structure, which can have several levels. The lowest level is called the 'sequence set'. By default, VSAM gives you one index buffer, which is used for the highest index level. For random processing with default buffering, several I-O operations will be necessary to retrieve data, one for each subsequent index level, and one for the data CI. When using modern large cache DASD, there is a high probability that all of the index will be in cache, so I-O times are minimised, but it is still worthwhile trying to keep the number of index levels down to avoid I-O altogether. Index levels greater than 4 should be considered a problem.
Index levels can be determined from the index statistics in a LISTCAT as shown below
REC-TOTAL-----------6 - 6 index records
LEVELS-----------------2 - 2 level index, 5 record sequence set
The number of records in the sequence set is the same as the number of control areas in the data component. The bigger the sequence set, the more levels you will have in an index. You can reduce the number of control areas by making them bigger, though they will probably be at maximum size anyway. If you allocate a file in Cylinders, the CA size is 1 cylinder. If you allocate in Tracks, the CA size is the smaller of the primary and secondary space allocation, up to a maximum of 15 tracks (or 16 tracks if the file is striped).
If you allocate your file in Records or Kilobytes, the VSAM calculates the space needed and converts it to tracks or cylinders and then works out the CA size. The recommendation is to always allocate files in cylinders, to ensure your CA size is one cylinder. Reducing the amount of freespace will also mean fewer control areas, see below
If you get your buffering correct, then the index records should all be buffered. See the buffering pages for details
Freespace is specified to avoid CI & CA splits, of which more later. A reasonably common specification is FREESPACE (20 25), which basically means that
after initial load, 33% of the file is empty. If large values of freespace are specified, then the index levels can increase, and fewer actual records are retrieved in an I-O operation. Both of these impact random performance, and the second one will impact sequential performance. The bottom line is that you need to analyse the insert activity to your file to determine which freespace parameters are best, A reasonable rule of thumb is ;
few inserts, or most inserts at end of file, use FREESPACE(0 0) as freespace is not needed within the file
inserts clustered at points in the file, use FREESPACE(0 0), initial inserts will cause CA splits, but once the split happens, free space will be in the correct place
data inserted at random, all over the file, use FREESPACE(20 25)
The CI size is the unit of data transfer for a VSAM file. A small CI size will impact sequential processing, because more IOs are needed to read the same data. A large CI size can adversely impact random processing, because more data must be read to extract a single record. A small CI size will also use more disk space. A reasonable rule of thumb for random access files is to use a 4KB CI for small files, and 8KB for large files. Files that are normally accessed sequentially will benefit from a larger CI size, say 16KB- 32KB. If your file is accessed both randomly and sequentially, then use a small CI, with performance assisted by lots of buffers.
With concurrent access, VSAM locks out portions of a file at CI level, so a large CI size is usually not appropriate for a VSAM file used by online systems. However, LSR buffering fixes this issue by locking at record level.
You cannot specify a CA size directly, but a big CA size will reduce the frequency of CA splits, and reduce the number of index levels. The downside of course, is when you do a CA split, you move more data. If a VSAM file is allocated in cylinders, the CA size will be one cylinder (or 16 tracks if the file is striped). Otherwise, it is the smaller of the primary and the secondary allocation unit.
People tend to get all excited if they see a VSAM file with numerous CI splits and especially, CA splits. With SLED DASD, splits meant extra seek time during sequential processing, as the VSAM dataset was not arranged in sequential order. This seek time is eliminated in RAID boxes with a very large disk subsystem cache, though there will be a small catalog overhead spent navigating round the splits. However, when a VSAM file is actually performing a CA split, it has to move half a cylinder of data (assuming the file is defined in cylinders, and so the CA size is one cylinder). This can take an appreciable time, so it is worthwhile trying to avoid splits from happening by manipulating CI/CA sizes and freespace. However if you make your CI %freespace large to avoid splits, this will affect sequential processing and more IOs are needed to retrieve a given amount of data.
So is it worthwhile running regular file reorganisations? Look at it this way. If a file is prone to splitting, and you just reorganise it every week, the splits will keep coming back and the this probably affects performance worse that having the splits in the first place. Analyse your file and the access patterns and consider the following Rules of Thumb
If your file has few inserts, or if the inserts usually happen at the end of the file, then set FSPC(0 0) and leave the file alone. A reorg will not usually help performance.
If the inserts are localised at a particular key range, then set FSPC(0 0). This will result in many CI and CA splits, but they will be creating the localised free space that the file needs. An occasional reorg may help performance by reducing the catalog search overhead. A regular reorg would be a waste of time.
If your file has frequent inserts that are more or less evenly distributed through the file, set FSPC(20 25) and run the occasional reorg if service times permit.
The CA reclaim function was introduced in z/OS V1R12 and lets you automatically reclaim the space used by empty CAs for KSDS files. Like any function, it should be used with care, as it works well with files that have a lot of empty CAs and a lot of CA splits that can be moved into them. If a file does not have many split CAs, then CA reclaim can actually degrade performance. You can check a file to see if it is suitable for CA reclaim by using the EXAMINE DATASET command which shows the number of empty CAs in a KSDS with message IDC01728I.
CA Reclaim can be used on KSDS files that may or may not be SMS managed, and processed by VSAM or VSAM RLS. It can be enabled in a sysplex with mixed z/OS levels, but toleration PTFs will be required for any LPARS where the z/OS level is prior to z/OS V1R12, and those lower-level systems cannot perform CA reclaim.
CA Reclaim cannot reclaim space for partially empty CAs, empty CAs that already existed when CA reclaim was enabled, CAs with RBA 0, CAs with the highest key of the KSDS, Data sets processed with GSR and KSDS files that were defined with the IMBED option.
Enabling and disabling CA reclaim
CA reclaim is disabled at the system level by default. To enable CA reclaim for the system, run the command SETSMS CA_RECLAIM (DATACLAS) and update the IGDSMSxx member of PARMLIB with CA_RECLAIM(DATACLAS), so the function is preserved at IPL time.
To disable CA reclaim at the system level, use the the SETSMS command SETSMS CA_RECLAIM (NONE) and change the parameter to CA_RECLAIM(NONE) in IGDSMSxx.
Once globally activated, by default all your dataclasses will be enabled for CA reclaim, so now every new KSDS file allocation have their catalog entry set to use CA reclaim, but exisiting files will not, as the dataclass is only triggered at allocation time. So how do you control it so that only CA reclaim is only used by those files that will benefit from it?
You can set CA reclaim on or off for each dataclass definition using the CA Reclaim definition on page 5 of the ISMF dataclass panels. By default, CA Reclaim is set to Y for ON, but you can change it to 'N' to switch it off for that dataclass.
You can change the CA reclaim for individual data sets with the IDCAMS ALTER command, so if you want to use CA reclaim on a file that was allocated before CA reclaim was activated you would use.
IDCAMS ALTER dataset.name RECLAIMCA
and CA reclaim will be enabled once the file is closed then opened again, but remember, CA Reclaim cannot reclaim space from empty CAs that already existed when CA reclaim was enabled, so it might be worth scheduling a reorg.
On the other hand, if you want to disable CA reclaim for a file, you would use -
IDCAMS ALTER dataset.name NORECLAIMCA
If you want to find out whether or not a file is using CA reclaim, you can use the IDCAMS LISTCAT command, and you will see the value set for CA reclaim in the Cluster section, and in the Index section you can see the number of CAs reclaimed and the number of reclaimed CAs that have been reused since the KSDS was created.
If you need to trace a VSAM file to work through a problem, then the old way was to add an AMP=('TRACE=(parameters)') into the JCL which allocated that file. Z/OS 2.1 introduced the IDADVT started task, which lets you control and interact with VSAM traces using the MODIFY command.
This IBM knowledge centre article explains both tracing methods in detail4