DB2 Performance Tuning

This page discusses some of the problems with DB2 performance from a Storage perspective. It is split into

Performance improvement techniques concentrates on minimising the need to go out to physical spinning disks by keeping data in CPU storage buffers. These days, you cannot buy a disk subsystem that exclusively uses spinning magnetic disks, and even hybrid flash/disk systems are being phased out. Pure Flash systems deliver IO rates measured in microseconds instead of milliseconds and they are now fronted by SCM storage that runs even faster. PAV, especially SuperPAV, has all but eliminated IOSQ, the time spent waiting for access to a z/OS 'DASD' device.
The important issue now is to ensure that you have enough comms channels to the disk subystems, and that they run fast enough not to be a bottleneck.
However, the fastest way to access data is still to have it in CPU buffers, so DB2 performance tuning is still important.

Unix and Linux file system buffers

Unix and Linux use system memory cache to speed up IO operations. A well tuned system cache can service up to 85% of IO requests from memory without having to go out to disk. At first sight, this seems to be a good way to improve application performance, but it might not be best for DB2 databases. If you are having problems with DB2 performance it is worth taking a look at how system buffering is working. There are two kinds of system cache, Cached IO and Concurrent IO.

Cached I/O

Data access to disk tends to be localised and repeated, so it is possible to keep frequently accessed data in memory buffers and retain it there. Then when an application requests some data from disk, the system first checks the cache to see if it has the data already. If the data exists in memory cache, it can be retrieved from there, so avoiding a time consuming disk access. If the data is not in cache, then the system goes out to disk for the data. This process is often called 'Cached I/O'. If the I/Os are sequential, then the cache can 'read ahead' and pull the data off disk before it is needed.
Cached I/O also works for writes. Applicationa can write data to the memory cache and continue processing once that write is complete, while the I/O is destaged asynchronously to disk later. Data intergrity is maintained by an Inode locking mechanism, which imposes write serialisation to ensure the writes are sequenced correctly.

The problem is that DB2 has its own caching buffer pools, so if it is used in conjuction with Cached I/O, the data is cached twice and passed between the two cache areas. This has an increased CPU and memory overhead and can actually degrade performance. There is another overhead with Write I/Os, because DB2 has its own locking mechanisms to maintain data consistency and integrity.

Concurrent I/O

Direct I/O bypasses cache buffers and copies data directly from disk to the appropriate application buffers. However, direct I/O preserves data integrity by using write serialisation and Inode locking, which conflicts with DB2 locking. Concurrent I/O works just like direct I/O, but without the locking mechanism. Vendors often advise that database performance can be improved by placing data on raw disks without file systems, to eliminate the file system processes. The problem then is that the raw disks are difficult to manage. Concurrent I/O lets you use the management facilities of file systems, without the performance issues caused by caching and Inode locking so that multiple threads can read and write concurrently to the same file.

The problem is that you must have some way of enforcing data consistency between reads and attempts at concurrent writes, so Concurrent I/O is only suitable for applications that have their own methods for enforcing data consistency and data integrity. DB2, like most relational database systems, uses its own application cache, and has sophisticated locking mechanisms to manage data access and consistency. Therefore Concurrent I/O is very suitable for DB2 databases.

Concurrent I/O can be implemented at the disk level by mounting file systems with the -o cio option specified. However it can also be implemeted within DB2 (for versions 9.1 or higher) at database level, if a tablespace has the NO FILESYSTEM CACHING option applied. This is a better way to do it, as then the DB2 Database Manager will decide which files are to be opened with Concurrent I/O.
Some DB2 files work better with file system caching, for example, temporary table spaces that use SMS storage or a table containing LOB data that is not stored inline and is queried frequently.
The issue is that you will need to ask your DBAs how they have defined their tablespaces, whereas you can check the status of file system mount points using the 'mount' command.

back to top

z/OS buffer pools

This section is intended to give Storage people a flavour of how DB2 buffering works. As such, it is too simplistic for DBAs, who will need a lot more detail. In particular, note how the buffers introduce asynchronous IO. This buffers up IO and processes them in bulk, so they appear as long response times. This can appear to be a performance problem, but in fact this is very efficient. The bottom line is that the prefetch buffering gets data into the CEC memory before the customer needs it. That makes the applications run a lot faster

DB2 uses four kinds of buffer processing

Synchronous read
The application requests a single page of data (between 4K and 32K) then waits for it to arrive in the buffer. IO response times should be 2-4 ms. This is not a preferred option as the I/O time overhead is relatively high.
Sequential prefetch
Db2 uses sequential prefetch for table scans and for sequential access to data in a multi-table segmented table space when index access is not available. The application 'knows' it is going to need a chunk of data, so it requests it in advance, then carries on doing its processing while the data is being transferred. Multiple pages are read in one IO operation, using IO chaining. IO response times can be 10-30 ms or so. Sequential prefetch is determined by DB2 at program bind time, so the DB2 optimiser decides then that it wants to do sequential prefetch, and always does it when that program piece is called. DB2 V9 reads up to 250KB in one I/O for SQL queries and up to 512KB for utilities.
Dynamic prefetch
In dynamic prefetch, Db2 uses a runtime sequential detection algorithm to detect whether pages are being read sequentially. Db2 tries to distinguish between clustered or sequential pages from random pages. Db2 uses multi-page asynchronous prefetch I/Os for the sequential pages, and synchronous I/Os for the random pages. Dynamic Prefetch can load up 128KB in a single I/O, and it can schedule 2 I/Os in parallel. If DB2 detects that 5 out of the last 8 IOs were sequential, it invokes dynamic prefetch. If the rate then drops to less than 5 from 8, prefetch is cancelled. This can be more efficient than Sequential Prefetch, but not always. It is possible for DB2 to invoke dynamic prefetch when the data is not being read sequentially, and this just wastes buffer space. Db2 uses dynamic prefetch in almost all situations, the main exception is for table space scans. Index scans always use dynamic prefetch.
List prefetch
Db2 uses list prefetch to read data which is not stored sequentially, but is read sequentially or in a known sequence. Examples could be data read from an index, from a DB2 log, or LOB (Large OBject) pages which are determined from the LOB map. Sometimes the optimiser might chose to use a list prefetch access path, and incremental image copies, which by definition do not read all of the database, will use list prefetch.

The amount of data transferred in a single IO depends on the buffer pool size and the page size, and will be between 2 and 32 pages. DB2 has several thresholds which control buffering. Two of them are Data Management (DMTH) and Sequential Prefetch (SPTH). If SPTH is exceeded, then prefetch will be cancelled. This is usually a symptom that the bufferpools are too small. Ask your DBAs the check out their DB2 PM reports to see if this is happening. A good benchmark to aim for is to make the buffer pool big enough to hold randomly read pages for five minutes.

It is probable that once a page is prefetched into a buffer, then it is unlikely to be accessed again by a random request. This means that it is best to keep random pages in the buffer for longer than prefetched pages. Prefetched page buffer usage is controlled by the VPSEQT parameter, which defaults to 80%. This means that up to 80 percent of the virtual pool can be used to cache prefetched pages. In general, a value of 20% to 30% is enough to hold prefetched pages for a few seconds, as this is plenty of time for the pages to be used before they are swapped out, and it prevents the buffer pool from being swamped by prefetched pages. You can calculate prefetch page rates from standard buffer pool statistics records. If you are using a hiperpool, HPSEQT should be set to zero so that no prefetched pages are moved to the hiperpool, but VPSEQT may need to be higher than 20-30%. If the buffer pool just contains the work database DSNDB07, VPSEQT should be set to 90% or more, as nearly all of the read activity is prefetch.

Remember that bigger is not necessarily better. If the buffer pools are too big, the z/OS system may start paging, which will adversely affect response times.

Write IOs

DB2 does asynchronous writes unless it needs to free up buffer space, or if is doing a Checkpoint, log switch or system shutdown. The Immediate Write Threshold (IWTH) is set to 97.5% of buffer pool space. if the IWTH is exceeded, DB2 does Synchronous writes and the application waits until they are complete. DB2 will now do a write for every table update until the buffer pool usage drops again. This means that if the IWTH is exceeded, write IOs will be excessive.
Asynchronous writes processes 4-32 pages at once, which is up to 128K. If the buffer pools are large, then DB2 does a lot of write IO at checkpoint time, and this can cause problems. IBM recommends that databases are tuned at table space level. Busy table spaces should be kept in the buffer pool, while table spaces with light usage should not be kept, to reduce the amount of write IO at checkpoint time. Also, make sure the LOGLOAD parameter is not set too high, as this determines frequency of checkpoints.

Just to complicate matters a little more, log handling is also controlled within the application program. If a write is specified as NO WAIT then the log record is written to buffers. If FORCE is specified, then the record is committed, and buffer flushed. In the latter case, the application waits until disk write complete, in the former, the log records write out asynchronously when buffer thresholds are met. Ideally, keep the log buffers as big as possible, and keep the WRITE THRESHOLD at less than 20% of the no. of buffers.

back to top

SMF Records

A good place to start, if you have DB2 performance problems, is to collect SMS records. Three SMF types specifically apply to DB2. The SMF section describes how to collect SMF data, and also details general SMF Storage records.

SMF 100 records have minimal overhead, and in general, all classes can be turned on. They contain DB2 system-wide information statistics.

SMF 101 records contain individual DB2 thread-related information. They are a good place to start to investigate DB2 performance. You will usually need sub types 1, 2, and 3, and also 7 and 8 if you're would like DB2 package level information.

SMF 102 records contain detailed performance-related information. These records collect lots of data, use lots of resource, and can seriously affect DB2 performance. Use with caution. It could be worth investing in a product like Spectrum SMF Writer that reads and simplifies SMF 102 records.

DB2 Storage

Lascon latest major updates