z/os Mainframe dataset types
z/OS Dataset Types
This page describes some of the z/OS datasets (or files), and how to define them. Datasets can be stored on disk or tape. This page is mainly about disk datasets. One of the 'features' of z/OS data is that you have to decide up front how big your dataset is going to be before you define it. The mechanics of 3390 disks are described in the managing 3390 disks page. This page assumes that you know what a track and a cylinder is.
Physical Sequential Datasets
Generation Data Groups
If go into ISPF option 3.2, take option 'A', Allocate new data set, put a suitable dataset name in the 'Data Set Name' field and hit 'Enter' you will see the following fields on a screen.
Data Set Name . . . : data.set.name
Management class . . .
Storage class . . . .
Volume serial . . . .
Device type . . . . .
Data class . . . . . .
Space units . . . . .
Average record unit
Primary quantity . .
Directory blocks . .
Record format . . . .
Record length . . . .
Block size . . . . .
Data set name type :
A dataset name consists of a number of segments, separated by periods. An example would be 'my.prod.file.name'. Segments are composed of up to 8 alphabetical, numeric, and national characters, '@,#,$'. Most UK sites substitute the '$' for a '£';. Segments can also contain a hyphen '-'. Each
segment must start with a national character, or an alphabetic character. The maximum size of a dataset name is 44 characters, including the periods.
The first segment, called the 'high level qualifier', must be a valid catalog alias. See the ICF Catalog Management section for alias details. A dataset name must consist of 2 or more segments. You can use as many segments as will fit within the 44 character overall size limitation.
'Management class', 'Data class' and 'Storage class' are discussed in the DFSMS page. If the dataset is SMS managed, leave the Volume Serial and Device Type blank, as they will be ignored anyway.
'Space units' can be one of BLKS, TRKS, CYLS, KB, MB, BYTES or RECORDS. Tracks (TRKS) and Cylinders (CYLS) are legacy units, and are discussed in the managing 3390 disks page. Many of us prefer these legacy units, as we easily can relate them to the amount of space on a disk. The other space units, Blocks, Kilobytes, Megabytes, Bytes or Records are easier for developers to understand. Say you expect your file to contain 1000 records. Its then much easier to request space for 1000 records, than for you to work out how many tracks are needed for 1000 records yourself.
'Average record unit' is a multiplier used when defining by records or bytes, and is basically 1, or 1000 or 1000000. The 'Primary quantity' is the amount of space that you initially reserve for the file, and the Secondary Quantity is the amount of space which will be added if the primary quantity fills up. Up to 15 secondary extents will be added for each volume.
A common 'trick' when defining space units is to use a small primary allocation, just enough for the current space requirement, and a very large secondary. The reasoning behind this is that it gives the file plenty of room to grow. The problem with this approach is that it required lots of free space on disk to get that large secondary when the file does fill up, and that can lead to space errors. My own preference is to keep primary and secondary allocation sizes similar, and rely on SMS and multi-volume to prevent space issues.
- Directory blocks - used for PDS datasets, see below
- Record format - a dataset contains a set of records. This describes way the records are defined. Some popular combinations are -
- FB records are fixed length, and are grouped together into blocks
- FBA as above, but the first byte in each record is an ASA print control character
- VB records are variable length, and are grouped together into blocks
- VBS as above, but the records can span blocks.
- There are other options and combinations, 'U' means the record format is undefined, 'M' means the record contains machine code print control characters, and 'T' is the track-overflow feature.
- Record length - The logical record length in bytes. For variable size records this is the maximum length.
- Block size - its best to leave this blank, and let the system pick the optimum block size for the storage medium you are using. However, if you code it, and your record size is fixed, it must be a multiple of the record size. If you record size is variable, it must be at least 4 bytes bigger than the maximum record size.
- Data set name type - used to select PDS or PDSE datasets. see the relevant sections below.
If you are creating a file using JCL, you can specify this lot using the DCB JCL parameter -
An easier way, if you have an existing file with the same characteristics as your new file, is to use the LIKE parameter
The JCL above contains a DSORG parameter. That describes the type of file, common file types are described below.
A Physical Sequential file, DSORG=PS, is a simple file with records stored in the order that they are written. You can think of a record as a line of text. When you insert a new line of text, you start a new record. PS files are typically used for text and logs, and are so simple that there is little to say about them. Large PS files which are only ever required by one task at a time are very suitable for tape, in fact the only suitable file type for tape is PS. It is possible to improve the performance of PS files by striping them over several volumes, but this is only useful if they are used by several tasks concurrently. Extended format PS files must be SMS managed and can consist of 123 extents on each volume, to maximum of 7,257 extents over 59 volumes.
Large format PS files were introduced in z/OS 1.7 and are not the same as extended format. They do not need to be SMS managed and can grow beyond the 65535 tracks per volume limit for normal PS files, with a maximum of 16,777,215 tracks per volume. They can only consist of 16 extents per volume, with a maximum of 944 extents over 59 volumes.
You allocate a large format PS dataset using the parameter DSNTYPE=LARGE
The minimum size of data allocation under OS/390 is 1 track, or 56,664 bytes. This means that a Physical Sequential file which contained three 80 byte records will contain 240 bytes, but occupy 56,664 which is very wasteful. It also mean that two datasets cannot exist on the same track. A Partitioned Dataset (PDS) solves this problem by combining a lot of small files into one large container. The individual files are stored as members within the PDS. Each member must have a unique 1-8 character name and the members are located by an index that is called a PDS directory. PDS files have DSORG=PO and DSTYPE=PDS.
A PDS can also be used to collect a set of related files together into a single 'library'.
to allocate a PDS using ISPF option 3.2, you need to tell it how many directory blocks to reserve. A directory block holds between 3 and 21 members, depending on how the user updatable fields are used. So for a file which will contain a maximum of 60 members you should code
Directory blocks . . 20 Data set name type : PDS
To allocate a PDS in JCL, you need to include the directories space block parameter, the example below will create a 50 track file with 20 directory blocks, and scope to add another 15 * 10 track extents.
If you want to allocate a new member in a PDS file using JCL, you simple specify the file name followed by the member name in brackets like this DSN=CICP.SOURCE.LOAD(MN001DF)
There are a number of issues with PDS files. A PDS is prone to data corruption, as if you write a member to it with the incorrect DCB, this can change
the DCB for the whole file and make all the other members unreadable. It is also possible to update part of a PDS simultaneously and overlap members in the same physical space.
The members in a PDS are held in the directory in alphabetical order. This means the whole directory has the be re-organised every time a new member is inserted.
When you delete a member, the space it occupied is not reused. This means that over time, the PDS will grow, and eventually fill up. The easiest way to fix this is to list the file on ISPF option 3.4, then enter a 'Z' against it. This will compress the file, and all the deleted space will available for use. If the file is open, either for read or write, this will not work. Another option, to be used with care, is to run a compress in batch.
//STEP1 EXEC PGM=IEBCOPY
//SYSPRINT DD SYSOUT=*
//OUT DD DISP=SHR,DSN=file.in.use.be.careful
//SYSIN DD *
The DISP=SHR means the file will be compressed, even if it is in use, BUT, THIS CAN CORRUPT YOUR DATA, especially if it is being updated by another task. Some products, notably IMS, use absolute track positioning to locate their load and parameter data, not the PDS index. If you compress an IMS file with IMS active, the file members move, and IMS can't find them anymore. This means that IMS will crash. In-line compression can be a very effective way to fix problems, if you are sure of what you are doing. It can also be a very effective way to bring your systems down, if you don't know what you are doing. Use it with care.
You can avoid this problem completely by using PDSEs, or by using a utility like PDSMAN. PDSMAN, now supplied by Computer Associates, will reuse deleted space within a dataset.
As more members are added to the PDS, it will eventually run out of directory blocks. You can fix this with IEBCOPY, by creating a new dataset with more directory blocks, copying all the members over, delete the old dataset, then rename the new dataset to the old name. A much better way, if you have the product, is to use PDSMAN. See the Utilities page for details.
PDS backup is done at file level, and so is recovery. This means that if you want to recover a member within the file, you have to recover the whole file to a different name, copy over the missing or corrupt member, then delete the recovered file.
Partitioned DataSet Extended do not have a fixed size directory and they can reuse deleted space, so they fix many of the PDS problems. They also have better facilities for data sharing. PDSE files have DSORG=PO and DSTYPE=LIBRARY. The differences can be summarised as
Allowed extents per volume
Max track size
Reuse deleted space
Fixed size directory
Create members simultaneously
I've corrected the number of allowed PDSE extents to 123 after discussion with Klaus Hoelzl of Allianz.de
A PDSE uses a standard 4K block allocation called a page, and individual PDSE members do not share pages. A PDSE will store 12 blocks on a 3390 track.
This can mean that a PDSE is initially larger than a PDS, but the PDS will grow in size as members are updated. A PDSE will reuse deleted space, but it will also try to avoid fragmenting an individual member into blocks that are physically far apart.
A PDSE can contain up to 522,236 members and each member can normally contain up to 15,728,639 records. It is possible for a PDSE member to contain up to 2,146,435,071 records provided some specific access characteristics are met.
So why do people still use PDS files? When PDSEs were first released, they could use excessive CSA (Mainframe memory) and the mainframe needed an IPL to recover. It is not surprising that this lead to PDSE files getting a bit of a bad name. In 2002 a new release of z/OS introduced a restartable PDSE address space, which makes them much more stable, and if problems do occur, the resolution is to restart the address space, rather than the whole machine. PDSEs can now be safely used as a better way of storing small files.
However, PDSEs cannot be used to replace all PDS files as some are not suitable. Unsuitable datasets include; PDS file with a mixture of load modules and non-load modules, files that are read at IPL time, checkpoint datasets, SYS1.PROCLIB, and note lists.
To allocate a PDSE under ISPF option 3.2, use
data set name type : LIBRARY
To allocate a PDSE using JCL, you either need to use a PDSE dataclass, or use the DSNTYPE JCL parameter.
//SYSIN DD DSN=SEP.OUTPUT.LISTINGS,DISP=(NEW,KEEP),
Before z/OS V2R1, you could not define a PDSE as part of a GDG, but this restriction is now removed. The PDSE must be SMS managed. If you are running a SYSPLEX with some members at a lower z/OS level that R2V1, then the lower level members will still see the PDSE, but will not recognise it as part of a GDG.
If you want to know if a GDG contains a PDSE, run a LISTCAT against it and look for a new DSNTYPE field. If this is set to 'LIBRARY' for any entry, then that entry is a PDSE.
z/OS V2R1 also introduced an improved PDSE version that has improved performance, better space utilisation and reduced CPU consumption. You can either define individual datasets as PDSE version 2 with a JCL parameter, or decide that new PDSEs will be version 2 by default by adding a PDSE_VERSION parameter in your IGDSMSnn parmlib members. From a user's viewpoint, both old and new versions are identical.
The new JCL DD statement parameter is DSNTYPE=(LIBRARY,2), with DSNTYPE=(LIBRARY,1) being the default.
Version 2 PDSEs support multiple levels, or generations, of members, which works in a similar way to GDGs for data sets. When creating a PDSE V2 Dataset you can decide how many generations to retain for replaced members. This means that it is possible to work with older versions of replaced PDSE members. When editing via ISPF use the SAVE NEWGEN option to create a new member generation. The older generation and its aliases will be retained and when the generation limit is reached the oldest generation will be deleted permanently.
You cannot see older generations with ISPF, but the IBM utility Data Set Commander for z/OS can be used to display and work with old member generations and their aliases.
To initiate member generations you need to set the MAXGENS_LIMIT parameter in IGDSMSxx. The default value is '0' which means no member generations are allowed. The maximum possible value is 2,000,000,000, but think of the space consumption consequence of allowing that many generations! Once you activate DFSMS with the new parameter set, you control member generations for individual datasets with a MAXGENS DD keyword in your JCL when you create a PDSE. For example, DSNTYPE=(LIBRARY,2),MAXGENS(10). The default value for MAXGENS is also '0' which means that PDSEs are not eligible for member generations by default. Once a value of greater than 0 is set, member generation will be activated and the values can be between 1 and the overriding maximum generation value in parmlib.
PDSE address spaces.
Two PDSE address spaces became available when z/OS V1R6 was released: SMSPDSE and SMSPDSE1.
The new SMSPDSE1 address space is restartable and is used for PDSEs that are not accessed through the LNKLST. The original PDSE address space, SMSPDSE, is retained for PDSEs that are in the LNKLST and is not restartable. The restart feature on the PDSE1 address space allows you to recover from a PDSE problem without having to IPL. To use this new address space, you need to define your PDSE parameters in the IGDSMSxx member in SYS1.PARMLIB as:
Some commands and utilities are available to help you to diagnose problems with PDSEs.
The VARY SMS,PDSE,ANALYSIS helps you identify which PDSE has a problem, and gives you some recovery options. The parameter list includes:
- RESTART command to recycle the SMSPDSE1 address space
- ACTIVATE command to start the SMSPDSE1 address space
- ANALYSIS command to check for latch and lock contention problems.
- FREELATCH command to release problem latches identified by the ANALYSIS command.
- MONITOR command controls the PDSE monitor parameters.
The DISPLAY SMS,PDSE operator command has a LATCH parameter which can assist with PDSE problem determination. It displays the status of a PDSE latch (the mechanism that locks and releases individual PDSE members for secure data sharing) and tells you which address space is associated with a specific PDSE.
If you are experiencing enqueues on PDSEs, the D GRS command will display those PDSEs that have outstanding enqueues.
The D SMS,OPTIONS command will show which PDSE sharing mode is currently in use.
The IBM Redbook, 'Partitioned Dataset Extended Usage Guide', located at http://www.redbooks.ibm.com/redbooks/pdfs/sg246106.pdf explains all these commands and their restrictions.
If you want to convert a lot of PDS files to PDSEs, and you are using DFDSS, then the following input statements should work
COPY DATASET( -
DELETE PURGE -
Generation Data Groups
Generation Data Groups or GDGs are used to automatically manage datasets that are created periodically, for example billing runs or log files. The
advantage of GDGs is that the version control management is automatic. The disadvantage is that the GDG name is not necessarily intuitive.
The GDG naming standard appends a version segment to the end of the file name, as GxxxxVyy, where xxxx is the generation number from 0000 to 9999 and yy is the cycle number, from 00 to 99.
If you are looking for a file that was created on a particular day, it is not obvious from the name that SWP.BILLING.OUTPUT.G0135V00 was created on 05/07/2006, whereas SWP.BILLING.OUTPUT.D050706 is obvious. Then again, you need to create a process to generate a file name that is date specific (and the date is not Y2100 compliant). The biggest advantage of using a GDG is that the system keeps track of the generations for you so you don't have to change your JCL each time you create a new version.
So while GDGs cannot be considered a panacea for cyclic datasets, they are very widely used and very useful.
A GDG set consists of a GDG base, which is a catalog entry that describes the GDG structure, and a set of files that are associated with the base. To set up a GDG, you first need to define a GDG base using JCL which looks like
//STEP01 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSIN DD *
DEFINE GDG (NAME(IBP.IBLTOP05.POLICY) -
LIMIT(30) SCRATCH NOEMPTY)
What this job will do is define a GDG base called IBP.IBLTOP05.POLICY, which will consist of 30 generations. The maximum number of generations that
you can associate with a base is 255.
NOEMPTY means that when the 30 generation limit is reached, the oldest will be deleted, but all the others will be retained. EMPTY means that when the limit is reached, all the 30 current files will be deleted and the process will start again with the new file.
SCRATCH means uncatalog and delete files when they exceed the generation limit. You could code NOSCRATCH which means just uncatalog old generations but do not delete them.
There is another two parameters; FOR(days) and TO(yymmdd), which you can use to specify the number of days to keep a GDG group or the date when it would be deleted, but these are rarely seen.
Before you can create the files that attach to a GDG base, you need to create a model dataset that contains the DCB information for the files. This model data just needs to be a VTOC entry, it does not need to actually use up disk space. It is possible to create a different model dataset for every possible DCB combination, and some sites actually do that, but this needs a lot of effort to maintain. Many sites use a single model dataset, and then override the DCB information to build the correct type of file. To create a model DCB, use a dummy IEFBR14 job like this
//STEP2 EXEC PGM=IEFBR14
//MODEL1 DD DSN=IBP.IBLTOP05.MODEL,
The SPACE=(TRK,0) parameter means that this file will just be created as a VTOC entry, it will not use up disk space.
OK, so now that we have a GDG base and a GDG model, we can allocate a GDG file. All you do is specify the new GDG file with a +1 generation number in your JCL like this
//STEP1 EXEC PGM=xxxxxxxxx
//OUT DD DSN=IBP.IBLTOP05.POLICY(+1),DISP=(NEW,CATLG),
This will create a file called IBP.IBLTOP05.POLICY.G0001V00. When you add the 31st file it will be created as IBP.IBLTOP05.POLICY.G0031V00 and the IBP.IBLTOP05.POLICY.G0001V00 will be deleted, assuming you have SCRATCH coded in the base definition.
If you want to refer to a particular GDG after it has been created, you can either use the full GDG name, or a relative number. For example if the current generation of IBP.IBLTOP05.POLICY is 24, then you can refer to the previous version either as IBP.IBLTOP05.POLICY.G0023V00 or as IBP.IBLTOP05.POLICY(-1).
A GDG is not catalogued until the end of the creating job, so if you want to refer to it in a subsequent step in the creating job you have to use IBP.IBLTOP05.POLICY(+1), DISP=OLD
GDGs are normally used for physical sequential or 'flat' files but they can be used for partitioned datasets and now PDSEs. You would also normally use the same DCB characteristics for all the members of a GDG, although you don't have to do this. If the DCBs are all identical you can concatenate all the individual files in a GDG set by simply referring to the base entry name. For example, if you had thirty files associated with IBP.IBLTOP05.POLICY, and you specified
//DDIN DD DSN= IBP.IBLTOP05.POLICY,DISP=SHR
in your JCL, all thirty files would be joined together and processed as one. The processing order is normally current to oldest , but from z/OS V2R1 you can decide what processing order to use when defining a GDG, by using the FIFO parameter, which concatenates oldest to newest. The opposite LIFO option is the default.
One last thing. How do you delete a GDG base? If you just use 'D' against a 3.4 file listing the delete will fail with the error 'GDG base or VSAM
file'. If you use the IEFBR14 program with your file name and DISP=(OLD,DELETE) coded you will get a 'dataset not found' JCL error. There are two ways
to do it
Use 'DEL' against a 3.4 file listing
Use IDCAMS with DELETE file.name PURGE in the control statements
Virtual Sequential Access Method is fully discussed in the VSAM section
However, one question I often see asked is, Can you use the LIKE parameter when allocating VSAM files? The answer is no, but there is an equivalent MODEL parameter. Here's some IDCAMS statements that I've used to make a copy of the HSM OCDS. VSAM allocation can be this simple
DEFINE CLUSTER(NAME(HSMT.OCDS.COPY) -
HFS file system.
Hierarchical File System is a UNIX compliant file system that runs under z/OS UNIX System Services. HFS has been functionally stabilized and will eventually be dropped from support. zFS is IBM'S preferred file system for Unix System Services files.
HFS file have DSTYPE=HFS and can be allocated using the IEFBR14 utility
//S1 EXEC PGM=IEFBR14
//SYSPRINT DD SYSOUT=*
//DDNAME1 DD DSN=AX540.TEST1.HFS,
The file information is a bit confusing as it looks like a cross between a PDS and a PDSE, but the difference is that the DSNTYPE is HFS as specified above.
HFS data sets contain a UNIX tree structure that makes sense to OMVS utilities, but not to Z/OS utilities in general. For example HFS files cannot be copied with IEBCOPY. You need to OMVS COPYTREE utility to copy an HFS file. From TSO option 6, enter the following command : n.b. this is a foreground copy so it will lock your session out until it completes.
Copytree /source/directory/path target/directory/path
Before you can use a new HFS it has to be mounted. To do this -
Use the command TSO ISHELL to get into OMVS
Set up the DIRECTORY structure for your new file system.
Select 'FILE Systems' from the top line menu and select the 'Mount' option.
Update the following fields
MOUNT POINT (UNIX path)
FILE SYSTEM NAME (zOS dataset name)
FILE SYSTEM TYPE - HFS
Hit 'Enter' and your new file system should be mounted. You can use the MOUNT TABLE option from the FILESYSTEMS menu to check.
Z/FS File System
HFS and zFS are both used with Unix System Services but zFS is IBM'S preferred file system for Unix System Services files. From the Unix standpoint there's no difference in HFS and zFS as far as mounting directories, the hierarchical structure, and accessing files or directories. The principal differences between the two are that zFS resides in a Virtual Storage Access Method (VSAM), linear, data set. Each VSAM file can host several zFS file systems. zFS is basically a bit better than HFS as it is faster and uses space more efficiently and also supports files system based transaction logging for point in time recovery. The z/F S data and interfaces are DFS compliant. ZFS files have DSORG=PO and DSTYPE=ZFS. There is another type of ZFS file called NFS, which allows you to access file spaces on remote servers.
Unlike HFS which runs in the USS address space, zFS runs in its own address space. zFS files need to be initialized before use whereas HFS files are ready to go once defined. However, zFS has performance limitiations for very large directories that are not seen with HFS. For exanple zFS is limited to 64K subdirectories per directory.
You use IDCAMS to allocate a ZFS datasets
DEFINE CLUSTER(NAME(zfs.file.name) -
CYLINDERS(prim sec) LINEAR SHR(2))
This is just a standard VSAM allocation, you make the file zFS by formatting it with the IOEAGFMT utility.
Note that the zfs file name in the format job is case sensitive and the parameters must be in lower case as shown.
//S1 EXEC PGM=IOEAGFMT,REGION=0M,
// PARM=('-aggregate zfs.file.name -compat')
//SYSPRINT DD SYSOUT=*
//CEEDUMP DD SYSOUT=*
//STDERR DD SYSOUT=*
You can mount a ZFS file with the mount command
MOUNT FILESYSTEM('zfs.file.name') MOUNTPOINT('path/name TYPE(ZFS) MODE(RDWR)
Migrating from HFS to ZFS
'Migration' is simply executed using the copytree command.
Ensure both your SOURCE HFS and TARGET ZFS are mounted.
Using TSO option 6, enter the following command :
Copytree source target
Your TSO session will be in locked out until the copy is complete.
you also use the copytree command to copy data between two ZFS files