VSAM Structures

VSAM (Virtual Sequential Access Method) files are a means of storing mainframe data in a structured way. To understand VSAM files, you need to start with records.
Data from applications are stored as logical records, which usually consist of fields. One field is a unique primary key, then there could be several secondary keys, and finally a number of general data fields. For example, a customer record could consist of a unique customer number, with the customer surname as a secondary key, and then a number of other fields like address components, bank details and preferences.
Logical records are then grouped together into physical records or blocks, and a block is the amount of data that is transferred in a single I/O operation.

Control Interval or CI is a purely VSAM term, and is the amount of data that VSAM will transfer from disk to the VSAM buffers in a single operation. A CI size can be between 512 bytes and 32 Kilobytes, potentially bigger than a block, so a CI operation can potentially transfer several blocks in one I/O operation. In this case, the blocks are 'chained' so they can are transferred in one operation. Once the CI is in the VSAM buffer, the required logical record is extracted and passed to the application
VSAM decides what the physical blocksize will be and this is based on the CI size and wether or not the VSA file is defined as Extended Format. It picks a size that maked most effective use of a 3390 track, for example a CI Size of 22528 bytes would use a block size of 5632 bytes and a CI Size of 24576 bytes would use a block size of 24576 bytes.

Applications might use long records, potentially bigger than a CI size. These records must be defined to span one or more CIs, and so they are called Spanned records. If a VSAM file will contain spanned records, the data set must be defined at create time with the SPANNED attribute in IDCAMS DEFINE command. A spanned record cannot share CIs with other records, and cannot be bigger that the Control Area (CA, explained later).

This combinaton of logical records, physical records and CIs are used in different ways by the different kinds of VSAM datasets. There are 4 types of VSAM Dataset, each of these is explained below.

   EADM Advert

Accelerate DB2 Write with zHyperWrite and "EADM™ by Improving DB2 Logs Volumes Response Time:

KSDS

The KSDS is one of the most popular types of VSAM dataset. It consists of at least three components, a CLUSTER definition, which is just a catalog entry, a DATA component and an INDEX component. It may also have alternate indexes, which are themselves three component KSDS files, and are connected to the base component by a PATH. The way this lot fits together is illustrated below.
KSDS file components

The KSDS is versatile, because it can be read sequentially through the data component, or individual records can be accessed directly through the index. The way the index works is illustrated below. Data records consist of a primary index, which has to be unique, and data. Records are grouped together into Control Intervals (CI), which is the unit of data transferred in a single I/O operation and could consist of several physical blocks, then into Control Areas (CA). The space in the control interval will include a 3 byte Record Descriptor Field for every record, and one 4 byte Control Interval Descriptor Field. You specify your CI size when you allocate the file, but the CA size is picked by the system. A VSAM data set is composed of an integer number of CAs. In most cases, a CA is the size of a 3390 cylinder (15 tracks). The minimum size of a CA is one track. The maximum size of a CA is 16 tracks when the data set is stripped. The CA size is implicitly defined when you specify the size of a data set at definition time. There is not a keyword to set the CA size.
KSDS index levels

KSDS records are inserted into a file based on the value of their index key. Records are stored with the lowest key value in the lowest address. When VSAM adds a record to a KSDS file, it must move records around to keep them in index order. To facilitate this, free space is maintained within each CI and most inserts can happen by shuffling records within the CI. If the CI is full, then it is split in half, using the free space that is allocated within a CA. If the CA itself is full, then the CA is split in half, with the higher index records added at the end of the file. This is the origin of the terms CI and CA splits. Generally speaking, the data movement required to perform a CI split, and especially a CA split, has a worse performance impact than the actual presence of splits.

More detail, especially for performance tuning a KSDS can be found in the VSAM tuning Guide

ESDS

An ESDS is a straightforward sequential dataset. It has a catalog CLUSTER entry and a DATA component, but nothing else. Records are always inserted at the end of the file, and cannot be physically deleted without rewriting the whole file. However records can be logically deleted, and then replaced by another record, as long as that record is the same size as the original. Records can also updated in place, as long as their size remains the same.
An ESDS can have an alternate index, which can be used to locate individual records by their RBA.

LDS

The LDS has no fixed internal record information, that is determined by the application. It is used for DB2 databases and SMS control files, among others. It has a CLUSTER and DATA components. If you look at a LDS with LISTCAT, the file will always appear to be empty, as it has no VSAM control information.

RRDS

The RRDS is included for completeness, but it is rarely used. It is a Direct Access dataset, with the data split up into fixed length slots and is difficult to extend. You can't replace individual records, as the file is not keyed. This means that you cannot load data into the file with the REPLACE option, you have to specify REUSE, which empties the file out first.

VSAM Enhancements

Extending the 4GB limit

It is hard to believe now, in the days of multiple terabye databases, but for many years z/OS file sizes were restricted to a maximum of 4GB. This 4GB limit was a serious restriction for VSAM files and quite a problem when it was reached. Extended Format (VSAM EF) files fixed this issue by adding an extra 32 control bytes to each data block. The main restriction is that VSAM-EF files must be DFSMS managed.

Transactional VSAM - Online/Batch file sharing

This is an enhancement to VSAM Record Level Sharing (RLS). RLS allowed batch jobs to read VSAM files while online CICS had the files open. This may be adequate for some applications, but other applications require that many tasks can concurrently update a VSAM file, while completely preserving data integrity.
Transactional VSAM Services became available with z/OS V2.10, and allows VSAM data set sharing between batch and online systems, and between two or more batch jobs. It does this by logging and using two-phase commit and backout protocols at file level. Transactional VSAM uses the Automatic Restart Manager (ARM), so in the case of a system failure, VSAM services will be restarted on another system by ARM.

back to top