Managing the TSM database and log files.
TSM Database and Recovery Log
TSM Database version 6.1 and later
Basic database structure
Database sizing and tuning
Recovery log sizing and tuning
Restoring the TSM Database
Best practice for Database and Storage Pool disks
Using DB2 commands on a TSM server
Automatic Database reorganisation issues
Investigating Problems with the Server Instance
Errors on Database Startup
Problems due to AIX Maximum Number of Processes
Effect of Deduplication on Database size
Moving a TSM database hosted on Windows
TSM Database version 5.5 and earlier
This information has been moved to a separate page as TSM 5.x is now out of support TSM 5.x
Basic database structure
TSM uses a DB2 relational database. This means it can be much bigger than the old legacy database, does not neeed database audits and will automatically re-organise itself if required. Software database mirroring is no longer supported.
One thing to note is that you cannot have any other DB2 applications running on the same server as the one hosting your TSM database, though you can have multiple TSM instances on one host server. A TSM 'instance' is everything required to run a TSM server, including database, logs, storage pools etc.
Several of the commands we used to use to manage the database are different. Some of these are :-
|New Command||Effect||Old command|
|EXTEND DBSPACE||Increase the size of the database||DEFINE DBVOLUME then EXTEND DB|
|QUERY DBSPACE||Check the database size||QUERY DBVOLUME|
|SET DBRECOVERY||Used to define the database backup device class and must be run before the first backup.||DEFINE DBBACKUPTRIGGER|
|SET DBREPORTMODE||A new command used to decide how much diagnostic information to report. Options are; NONE, FULL, PARTIAL.||None|
Database sizing and tuning
The database can exist in up to 128 directories or 'containers' to use the correct DB2 term. The data is striped evenly across the directories and the containers do not require an initial format before they can be used. The Q DBSPACE output below shows a database striped over 3 containers.
LOCATION TOTAL SIZE OF File System(MB) SPACE USED (MB) FREE SPACE AVAILABLE (MB) /tsm/tivoli/dbdir001 102,144 6,919.22 98,224 /tsm/tivoli/dbdir002 102,144 6,919.22 98,224 /tsm/tivoli/dbdir003 102,144 6,919.22 98,224
When a TSM database is initially created on multiple file systems, that database will be spread eaqually over all the file systems. However if you add an extra file system to the database space using the 'extend dbs' command, DB2 will not rebalance the database to spread the data equally. This means that if some of the original file spaces were 100% full, they will still be 100% full after the new filespace is added and this could cause the TSM to stop.
If you are running TSM Server V6.2 or above you can rebalance the database dynamically using DB2 commands. I suggest that you look up the IBM technote about this, and also contact IBM for advice before trying this.
The maximum database size for TSM 6.3 and TSM 7.1 is 4TB, and 2TB for TSM 6.2. The DB2 database is largely self tuning, so there is no requirement for DB2 tuning skills. A new parameter, DBMEMPERCENT, replaces the old BUFFPOOLSIZE. This set of buffers contains much more data than the old buffer so the recommendation is to set its size to unlimited. In fact, TSM/DB2 will try to change it to unlimited on startup.
The database uses DB2 relational consistency rules to prevent incorrect data from entering, and is self auditing. The database will aslo run automatic 'runstats' from time to time. This is a DB2 feature that optimises storage paths through the database to improve performance.
The database also uses relational indices, so it does not require special index tables to speed up SQL queries.
Recovery log sizing and tuning
TSM has three recovery logs.
The Active log contains updates that have not been committed to disk yet and is used for roll-forward or roll-back in case of problems. Once a transaction is committed, the data is moved to the archive log. The default size for the Active log is 2GB and the size can be inceased by increments of 512MB right up to 128GB.
The Archive log contains committed transaction data and is used for PIT recovery of the database. The Archive log is cleared out by a full database backup. However it retains all data updates applied right back to the second last backup, so you need to size your archive log with that in mind.
The Failover Archive log
TSM collectively calls these three logs the 'recovery log', but a DB2 DBA would just call them 'transaction logs'.
The log files form part of the TSM database, and unlike the legacy TSM database there is no need to create and format log volumes. The logmode is equivalent to legacy roll-forward. In DB2 terms, these are archive logs, not circular logs. This means that the log files can fill up, so log file management is still required. You can specify a failover log for the Archive log to help prevent this, but the Active log cannot failover and the size is fixed between 2GB and 128GB, so don't allocate all the space that you have available for the Active log, keep some in reserve for emergencies.
It is highly recommended that FailoverArchiveLog space be set aside for possible emergency use. You can use slower disks for FailoverArchiveLog space.
If the Active log fills up and the server stops, the process to get your TSM server up again is:
DSMSERV DISPLAY LOG - to check the current log status
Update the Active log size parameter in dsmserv.opt
Start the server up
If you get a LOGREAD388 error, then the TSM server will crash. This error happens when a recovery log volume becomes corrupted. If this happens to you, your first option should be to check your operating system error logs. If they indicate that a file system has corruption errors or disk errors, get your system administrator to check out the operating system and see if the error can be fixed there.
Your next option is to check to see if your logs are mirrored, and if so then tell TSM to use the mirror copy. From an OS command line issue the command below and check the output
dsmserv display logv
this will tell you if the logs are mirrored if the command shows more than one copy for the log files. If they are mirrored, then add the option
MIRRORREAD LOG VERIFY
in the dsmserv.opt file, and that will force the TSM server to sync from the other copy. If dsmserv.opt already contains a MIRRORREAD line then comment it out to make sure that your option is used. If the TSM server starts OK, then it has fixed the issue by copying the uncorrupted data over from the copy log, and all is well.
If this fails, then you will probably have to restore your database to the latest backup, but you might want to contact IBM for further advice first.
You may want to change the location of the achive logs, maybe because you have added a bigger file system and this is usually quite easy.
- Take a full database backup, not just because that is the starting point for any change, but also because a full backup of the TSM database will clear the archive logs down.
- Halt the TSM server
- Edit the dsmserv.opt, changing the values for ARCHLOGDirectory and ARCHFAILOVERLOGDIR to your new paths.
- Start the TSM server up again and the directory structure will be automatically generated in the new empty filesystem. There is no need to copy the archive logs over from the original location, because your full backup emptied them out.
- If you want to prove to yourself that this worked, then a 'query log f=d' command should show the new locations.
If you really need to, or even just want to, it is possible to update DB2 directly to change the log location. You should really ask your IBM support rep for advice before you start, but an outline process would go like.
- backup your database (full), devconfig and volhistory files.
- Halt your TSM Server
- Login to your DB2 command line and check the existing log locations with command 'db2 get db cfg for tsmdb1'
- Create your new archive log directory and paths, and make sure that the TSM instance user has read/write access to them.
- Update DB2 to use the new path structure with these 3 commands, substituting your paths as appropriate
db2 update db cfg for tsmdb1 using OVERFLOWLOGPATH /tsm/new_archivelog/path/
db2 update db cfg for tsmdb1 using LOGARCHMETH1 DISK:/tsm/new_archivelog/archmethpath/
db2 update db cfg for tsmdb1 using FAILARCHPATH /tsm/new_archivelog/failarchpath/
- run the 'db2 get db cfg for tsmb1' command again and make sure your changes worked
- Edit your 'dsmserv.opt' and update the archive log path entry to match the changes you made to DB2
- Start your TSM Server up
- Run the 'q log f=d' command and check archive log location.
Restoring the TSM Database
The restore process will depend on what you need to achieve and where you are starting from. Three major scenarios exist;
- The database is broken and requires a simple restore up to the last committed transaction
- The database is corrupt and needs a point in time restore to a known good state
- The hosting server has crashed and a full DR restore is required
Taking these three in turn.
Restoring a server database to its most current state
Assuming the active and archive logs are intact, all backups exist and all the files and directories for the database still exist, the restore process is simply halt the server then from the O/S command line, issue the command
dsmserv restore db
Restoring a server database to a point in time
This is a little more complex. You need the latest full backup and the latest incremental backup from before the point in time. You also need the Server options file and the Volume history file, but note that the volume history file is wiped by the restore process, so take a copy of it before you start. If necessary, re-create your database and recovery log directories.
To restore the database to a backup that was created before 16:00 on August 23, 2014, enter:
dsmserv restore db todate=08/23/2014 totime=16:00
The server will then find the last full backup from the volume history file and restore it, then work its way through any incremental backups create before the date and time specified above. However, the problem is that as you have wound the database back to a previous time, it is now out of step with the data in the storage pools, as migration and expiration might have run after the backup. Your old volume history file has a record of all volume usage, so take a look at it and pull out a list of all volumes that were changed after the point in time restore. What you are looking for is volume statuses of STGREUSE (updates) STGNEW (added volumes) and STGDELETE (removed volumes) - you can do this while you are waiting for the restore to run.
Run an AUDIT VOLUME command with the FIX=YES parameter against all your disk volumes, and all tape volumes marged as reused or deleted. If storage pool volumes were added after the backup you will need to add them again.
If the audit cannot find a backup file, it will be permanently lost and so copied again on the next backup. If the file exists in a copy storage pool, it will be marked as 'damaged', so you will need to run the RESTORE STGPOOL command to get those files restored from the copy. If you don't have any backups, the ultimate cleanup is to run DELETE VOLUME with the DISCARD=YES parameter, but this will delete your backup data.
You will also need to run an AUDIT LIBRARY command to fix any tape volume anomalies.
Disaster Recovery issues
A disaster recovery restore adds another layer of complexity. First you need to rebuild your server, hopefully from a SYSBAK or MKSYSB that puts all the relevant code back exactly as it was. If you do need to re-install TSM then note that the IBM manuals specifically state that 'The restore operation must be done by a Tivoli Storage Manager server at a code level that is the same as that on the server that was backed up - This means that the server code level can not change at all - not even by a patch level'.
You will probably then need to define your tape library, at least one drive, and paths using DEFINE LIBRARY, DEFINE DRIVE and DEFINE PATH commands. At this point you should have a TSM system ready for restore, so you can follow whichever of the 2 options above is appropriate.
Best practice for Database and Storage Pool disks
The following are some of the 'Best Practices' recommendations from IBM for setting up DB disk volumes for TSM Servers
Use fast, low latency disks for the Database, use SSD if you can afford it. Avoid the slower internal disks included by default in most AIX servers, and avoid consumer grade PATA/SATA disks. Use faster disks for the Active Logs too. Do not mix active logs with disks containing the DB, archive logs, or system files such as page or swap space. Slower disks for archive logs and failover archive logs can be used, if needed.
Use multiple database containers. For an average size DB, it is recommended to use at least 4 containers initially for the DB. Larger TSM servers, or TSM servers planning on using data deduplication, should have up to 8 containers or more. You should plan for growth with additional containers up front as adding containers later can result in an imbalance of IO and create hot spots.
Place each database container in a different filesystem. This improves performance; DB2 will stripe the database data across the various containers. Tivoli Storage Manager supports up to 128 containers for the DB. Ideally place each container on a different LUN, though this is not so important for high end disks like XIV or VMAX.
There should be a ratio of one database directory, array, or LUN for each inventory expiration process.
The block size for the DB varies depending on the tablespace, most are 16K, but a few are 32K. Segment/strip sizes on disk subsystems should be 64K or 128K.
If you use RAID, then define all your LUNs with the same size and type. Don't mix 4+1 RAID5 and 4+2 RAID6 together. RAID10 will outperform RAID for heavy write workloads, but costs twice as much. RAID1 is good for active logs.
Smaller capacity disks are better than larger ones if they have the same rotational speed. Have containers on disks that have the same capacity and IO characteristics. don't mix 10K and 15K drives for the DB containers.
Cache subsystem "readahead" is good to use for the active logs; it helps in archiving them faster. Disk subsystems detect readahead on a LUN by LUN basis. If you have multiple reads going against a LUN, then this detection fails. So several smaller LUNs are better than a few large ones, but too many LUNS can be harder to manage.
However it is very difficult to given generic rules about disk configuration as this very much depends on what type of disks you are using.
High end disk subsystems such as the EMC DMX, the HDS VSP and the IBM DS8000 have very large front end cache to speed up performance, and stripe data in a way that makes it difficult to separate data by physical spindle. The IBM XIV takes this virtualisation to a higher level again. To get the best performance from these devices you want enough LUNs to spread the IO and get readahead cahce benefit, but not so many that they become difficlult to manage. For the XIV, consider using a queue depth of 64 per HBA to get best advantage of the parallelism capabilities.
Don't stripe your data using logical volumes, let the hardware do the striping. As a rule of thumb, consider using 50GB volumes for DISK pools and 25GB volumes for file pools. Define the same number of volumes per LUN as the RAID type to make up the LUN, so for example with 4+1 RAID5, define 4*50GB volumes per LUN, then each LUN will use 250GB, with effective capacity of 200GB.
The Unix Tips section contains some detail on how to use VMSTAT and IOSTAT commands to investigate potential disk bottlenecks.
Using DB2 commands on a TSM server
IBM's design model for TSM v6 and upwards is to store TSM metadata in a DB2 database, without the TSM administrators needing to know anything about DB2 and how to manage it. It is now becoming obvious that a TSM Administrator does need to know a bit of DB2, but it would always be wise to contact IBM for advice before running base DB2 cvommands. If you have a test 'sandpit' system that you can try things out on, then here are a few DB2 commands that might be useful.
On Windows, you start a DB2 command line from Start -> All Programs -> IBMDB2 -> Command Line Tools.
AUTHORISING A USERID TO BE ABLE TO START THE TSM SERVER
The TSM DB2 system is 'owned' by the userid that installed it, and normally only that userid has the administration authority needed to manage the DB2 database, including the ability to start the TSM service. However you can give access to another userid using DB2 commands. Open up a command line as the TSM instance owner by right clicking on it and taking the 'run as' option. You will need the instance owner userid and password to do this. Once you have the command line, type the following commands
db2==> connect to tsmdb1 db2==> grant dbadm on database to user TSM_ADMIN
Userid TSM_ADMIN can now be used to stop and start the TSM services
Recovering from a full archive log
Under tsm 6.x, the archive and active log directories can fill up, and if they do, the server will shut down. To prevent this, you need to make sure you trigger a FULL database backup once the archive log hits a threshold, but if the worst happens and the log files do fill up, you need a recovery process.
If this happens then you cannot use TSM commands to move the logs into bigger directories, as you cannot start TSM. What you need to do is create temporary logs elsewhere, then prune the archive log using native DB2 commands. However, remember that the archive log will hold enough information to wind back through the last 2 full backups, so you need to run 2 full backups to clear it down.
- Create a temporary directory large enough to hold your active logs. The dsmserv.opt file may contain the log sizes in the ACTIVELOGSIZE parameter, and if not, it will point to the physical log location.
- Open a DB2 command line and run the commands below to switch the logs to a new location
Set db2instance=SERVER1 db2start db2 update db cfg for tsmdb1 using newlogpath path\to\new\logs db2stop db2start
- 'Activate' the database to copy the log files with the following command, this command does not affect the original logs. This may take a while, and success will be indicated when you see a command prompt again.
db2 activate db tsmdb1
- Now you need to back the database up to clear the logs out, and you need to do this on disk, so identify or create a directory with enough space to take a database backup then run the following DB2 commands.
db2stop db2start db2 backup db tsmdb1 to path\to\database\backup\directoryThe archive logs will start pruning once you see the 'Backup Successful' message, but this could take a while to appear if your database is large. Make a note of the backup timestamp, which will look something like 'The timestamp for this backup image is: 20120412130821'
- Find some more space, and run another full DB2 database backup with the command.
db2 backup db tsmdb1 to path\to\another\database\backup\directoryWhen this second backup completes, the archive log directory and original active log directory are empty of log files. Make a note of the backup timestamp again, let us call this one 20120412150425
- Now you need to delete the first backup using these commands - note how you use the timestamp from step 4.
db2stop db2start db2 connect to tsmdb1 db2 PRUNE HISTORY 20120412130821 WITH FORCE OPTION AND DELETE
- Point DB2 back to the original, empty active log in the original location, you will get this from the ACTIVELOGDIRECTORY parameter in dsmserv.opt
db2 UPDATE DATABASE CONFIG FOR TSMDB1 USING NEWLOGPATH path\to\activelogdir
- Connect to the database again, and that will automatically start moving the active logs from the temporary location to original active log location, and again, this can take a while if the logs are big.
db2 force application all db2stop db2start db2 connect to tsmdb1
- Now you need to start the TSM server up and run a good backup. You need to start the server in the foreground to do this, so open a normal Windows command line and navigate to the server directory and run dsmserv. If you have more than one TSM server on this machine, you may need to use the -k option to get the right server. This will bring you up a TSM server command line. Disable your client sessions then take 2 full database backups. You need to know your backup device classes to be able to do this.
Disable sessions Backup db type=full dev=your_db_devclass
- Delete the second DB2 database backup as follows, using the database timestamp that you recorded in step 5.
db2 PRUNE HISTORY 20120412150425 WITH FORCE OPTION AND DELETE
- Now you can halt your server in the foreground, and start it normally. Remember to enable sessions.
It is possible to query what is happening while a database recovery is in progress with the db2pd utility, a DB2 diagnostic tool that is provided with the TSM server installation code. You simply run this as a command from the shell prompt, like this:
tsm:~ # db2pd db2pd> You are running db2pd in interactive mode. db2pd> If you want command line mode, rerun db2pd with valid options. db2pd> Type -h or -help for help. db2pd> Type q to quit.
To check out what is happening with a database recovery, run
db2pd> -recovery -db tsmdb1
STARTING AND STOPPING AUTO RUNSTATS
Runstats is used to optimise access paths through the TSM tables and should normally be set to run automatically as required. However if runstats starts automatically when the TSM is started up after a database upgrade, it can cause performance problems to the extent that no-one can log into the system.
To temporarily suspend auto runstats, before halting the TSM server for an upgrade, submit the following commands to the DB2 instance that is associated with the TSM server:
db2 connect to tsmdb1 db2 update db cfg for TSMDB1 using AUTO_RUNSTATS OFF
Now runstats will not start automatically when you restart TSM server. However you need runstats to keep your database optimised, so once you are happy that your TSM server is up and running, submit the following commands to the DB2 instance for your TSM server and Runstats will resume normal processing.
db2 connect to tsmdb1 db2 update db cfg for TSMDB1 using AUTO_RUNSTATS ON
SOME OTHER POTENTIALLY USEFUL COMMANDS
You can enter any DB2 command from the DB2 command line, including SQL queries and commands that update or delete the database, so be careful. Some of the query commands could be useful for investigating TSM problems
- get instance - returns the name of the TSM server
- list active databases - will show the TSM database name as known to DB2, and the path to it.
- get dbm config - shows the settings for the database configuration manager
- db2start and db2stop - obvious what these are! They should never be necessary as DB2 should be started automatically as part of the server startup, but if necessary, it can be done manually
- get db cfg show detail - shows the confguration parameters for the database
- list tables - for the connected database
- describe table table_name - lists the columns for the specified table
The following commands require you to be logged in with adminstrator authority and connected to a database
Automatic Database reorganisation issues
TSM should automatically tune and manage its database. However this might not always work. One problem that you might see is that your database grows a lot faster than you might have expected, even though you are running expiration and the number of objects in the database is not growing by much. Apart from consuming space, this growth can also cause performance problems.
DB2 should have a feature called 'reclaimable storage' which according to IBM can 'consolidate in-use extents below the high water mark and return unused extents in your table space to the system for reuse'. A problem can arise if you originally installed TSM 6.1 as that used DB2 9.5 tablespaces and they do not have the reclaimable storage feature. If you then upgraded your TSM to higher levels, you will still be using DB2 9.5. You could get in touch with IBM and obtain the scripts that will allow you to upgrade DB2 to release 9.7. The issue is that the scripts must be run while the server is halted and generally require ten to thirty hours to run, and sometimes much longer.
If you installed TSM 6.2 then things are a bit better, as that used DB2 9.7, and that release did include reclaimable storage. However even then you might still not be able to reuse freed up space, as if your workload is high, the database reorg utility can conflict with your regular housekeeping. Large deduplicated storage pools can be an especial problem.
How does TSM reclaimable storage work?
TSM checks via DB2 to see which tables and indexes need to be reorganised, then it works through the list and processes the tables one at a time. It can take a long time to reorganise a large table, possibly a month or more. The process will be paused at the end of the daily reorg window, then restarted when the next daily window starts. When each table reorg is complete, TSM will run a RUNSTATS agains the table to optimise its performance. Once all the tables and indexes are reorganised, TSM will then wait at least 20 days before it starts the process again.
Database reorganisation cannot run alongside database backup. If a backup is running, the reorg process will not start. If the reorg process is running, then every so often it will pause and allow a backup to run, if one is required or scheduled.
How do I know if TSM reclaimable storage is running?
The easiest way is to check the server log for messages
ANR0293I and ANR0294I are issued when a table reorg starts and completes
ANR0317I and ANR0318I are issued when a table index reorg starts and completes
ANR0336I and ANR0337I are issued when a RUNSTATS starts and completes, when RUNSTATS is triggered at the end of a table or index reorg.
A more direct way is to run the DB2 reorgchk command. Open a DB2 command window and type
db2 reorgchk current statistics on table all >db2reorgchk.out
What control do I have over what reclaimable storage does?
In general, the more current your level of TSM, the better control you have. Some of the options below might not be valid for older releases of TSM.
The default position is that reorganisation can run 24 hours a day, but you can use options REORGBEGINTIME hh:mm and REORGDURATION nn to control when reorganisation starts, and how long it runs for. The 'nn' parameter is hours. You should aim to schedule reorg activity at a time when server utilisation is low. Things to avoid are intense backup and archive processing, and expiration, migration, or reclamation.
ALLOWREORGTABLE and ALLOWREORGINDEX can be set to 'NO' or 'YES'. The default values are YES for Table reorg and NO for Index reorg. Index reorg really needs to run at a quiet time, and if you enable it you must set the server option DB_DB2_KEEPTABLELOCK NO as it can cause database deadlocks (this is the default for TSM v6.3 onwards). Also, unlike Table reorg, Index reorg cannot be paused but if necessary it can be cancelled from a DB2 command line. I suggest you discuss options with IBM before enabling ALLOWREORGINDEX and DB_DB2_KEEPTABLELOCK NO as there are several factors to consider.
Three other options are DISABLEREORGTable, DISABLEREORGIndex and DISABLEREORGCleanupindex. These options can be used to prevent reorganisation from running against problem tables. The first 2 options take a list of tables as a parameter, while DISABLEREORGCleanupindex will only accept BF_BITFILE_EXTENTS.
Investigating Problems with the Server Instance
The first place to start is the TSM Active log, but if you need to go deeper, then the DB2 logs can be useful. However finding those logs can be a challenge as the location can depend on the the OS platform or even the OS release level. The best way to be sure you have the correct log is to check the DIAGPATH variable in DB2.
Start up a DB2 command line, in Windows go to Start->Programs->IBM DB2->Command Line tools->Command Window and in UNIX, su - db2inst1 (db2inst1 is the default instance, if you change the instance name or have multiple instances, you need to su to the correct userid for your instance). You then type 'db2' to open the DB2 command line
From the db2 command line type db2=> get dbm cfg. The command produces a lot of output, look for the line like 'Diagnostic data directory path (DIAGPATH) = /home/E1WT1/e1wt1/sqllib/db2dump ' and this shows the path to the log files. If the DIAGPATH is blank, look for the default PATH directory instead.
'quit' gets you out of that DB2 command line
The DB2diag.log contains information like database backups, table reorganizations, memory management messages, start and stop of TSM server and hardware information logged at instance start time, as well as error and warning messages.
Sometimes when investigation TSM server problems, the DB2 terminology does not quite match TSM so the error messages in the DB2 logs can look a bit strange. For example, Tivoli Storage Manager refers to 'transactions' which DB2 calls to 'units of work' (UOW). Tivoli Storage Manager uses select statements where DB2 uses SQL and are also sometimes referred to as DML, or data manipulation language statements.
Another potentially useful file is the startup trace log dsmupgdx.trc, which is located in c:\program files\tivoli\tsm for Windows or /opt/tivoli/tsm/server for UNIX and Linux servers. If you get database startup problems it's alwayus worth checking the file to see if any useful error messages exist.
Some TSM errors messages are very generic and more investigation is needed to pinpoint the problem. An example is 'ANR8503E A failure occurred in writing to volume'. To investigate this you need to enable a TSM server trace. To do this, enter the following commands from the TSM server command line
trace disable * trace enable pvr addmsg trace begin path/name/trace.out redo the activity that created the error message trace flush trace end trace disable *
The open the trace file and look at the messages that were written out just before the error.
You might sometimes see an error message, 'ANR2981E Database backup terminated due to environment or setup issue related to file access - DB2 sqlcode -2033 sqlerrmc 106.' The messages in the DB2 log file are a bit confusing, and will be something like
MESSAGE : SQL2033N An error occurred while accessing TSM during the processing of a database utility. TSM reason code: "". FUNCTION: DB2 UDB, database utilities, sqlubDeleteVendorImage, probe:558 MESSAGE : SQL2062N An error occurred while accessing media "". Reason code: "". DATA #1 : String, 27 bytes Error returned by sqluvdel. DATA #2 : Vendor RC, PD_DB2_TYPE_VENDOR_RC, 4 bytes Vendor RC=0x0000001A=26 -- see DB2 API Guide for meaning. DATA #3 : Hexdump, 48 bytes 0x0A00060137109070 : 0000 006A 3133 3639 2031 3036 0000 0000 ...j1369 106.... 0x0A00060137109080 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0A00060137109090 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
A possible cause is that the DSMI variables are not defined correctly in the DB2 user profile. This is found in /sqllib/userprofile in UNIX/Linux systems. The script variables should be set as
export DSMI_CONFIG=/home/tsminst1/tsminst1/tsmdbmgr.opt export DSMI_DIR=/usr/tivoli/tsm/client/api/bin64 export DSMI_LOG=/home/tsminst1/tsminst1
Sometimes after a problem, DB2 goes into crash recovery mode and the TSM server appears to be hung. DB2 crash recovery can take a long time. If your TSM server is hung, check your DB2 db2diag.log for a message like
MESSAGE : DIA2051W Forward phase of crash recovery has completed. Next LSN is "0000259AD0276FD8".
Then log into DB2 as the instance owner and run the command 'db2 list utilities show detail'. If Crash Recovery is active, the message will tell you how much work it needs to do, so you can estimate how much longer it will take. Some of the sample output of this command is shown below.
Type = CRASH RECOVERY Database Name = TSMDB1 ... Estimated Percentage Complete = 100 Phase Number = 1 ... Phase Number [Current] = 2 Total Work = 25937697602 bytes Completed Work = 25769328527 bytes Start Time = 10/17/2013 11:36:29.368810
Errors on Database Startup
Your first point of call should be to raise an incident with TSM support and get advice on your problem. However the following might help.
When trying to start up TSM the following error message can appear "ANR0170E - Error detected, database restart required", and you may see errors in the actlog a bit like
ANR0171I dbieval.c(874): Error detected on 3:2, database in evaluation mode.
ANR0170E dbieval.c(935): Error detected on 3:2, database restart required.
ANR0162W Supplemental database diagnostic information: -1:58031:-1034
([IBM][CLI Driver] SQL1034C The database is damaged. The application has been disconnected from the database. All applications processing the database have been stopped.
The resolution is to restart DB2 manually with the RESTART command. Open up a DB2 command line window as explained above then issue the following
set db2instance=db2inst1 (this is the default instance) db2 force application all db2stop db2 restart database db2inst1
You might have to run the restart command a few times before the issue is resolved. If this does not fix the problem you probably need to contact IBM Support, although you can use the db2dart command to run a database analysis. This generates a report file that would be useful for IBM support.
db2 force application all db2stop db2dart db2inst1 /db
Another possible message is 'ANR0151W Database manager fails to start'. This problem can occur after a database crash, maybe due to a hardware issue. If the database will not start up with the dsmserv command and you see errors like
ANR0151W Database manager fails to start. Improper shutdown of instance produced orphaned IPC resources And you see entries like this in the db2diag.log I779399E357 LEVEL: Severe PID : 18375 TID : 140737134171936PROC : db2star2 INSTANCE: tsmserv NODE : 000 FUNCTION: DB2 UDB, base sys utilities, DB2StartMain, probe:580 MESSAGE : ECF=0x900003DE=-1879047202=ECF_TRCAPI_MUTEX_LOCK_ERROR Trace facility mutex unexpected lock error
The problem is that some Inter Process Communication (IPC) tasks are still active and are locking out resources. You need to cancel the DB2 instance then remove the IPC resoures. Log on with the Tivoli Storage Manager server instance user ID and run the following command to stop the DB2 instance
db2stop or if necessary, db2_kill
Check to see if there are outstanding IPC resources that are associated with the instance
Clean up any IPC resources that are associated with the instance
-bash-3.1$ ipclean ipclean: Removing DB2 engine and client's IPC resources for tsmserv.
Verify whether there are outstanding IPC resources that were not removed by the ipclean command.
If there are outstanding IPC resources associated to the instance, use the following commands to remove them
To remove a shared memory segment resource ipcrm -m shmid To remove a semaphore resource ipcrm -s semid To remove a message queue resource ipcrm -q msgid
Once all outstanding IPC resources have been removed, you should be able to restart the Tivoli Storage Manager server.
On Windows, a TSM server might fail to start after an upgrade with the error
ANR0188E Fail to start the database manager. Database manager license not found
The DB2 License has probably expired. To check this, log onto windows with your TSM Server instance userid, open a command prompt and type 'DB2CLP'. This will open another DB2CLP command window, and from this window enter the command 'db2licm -l'. If the licence has expired you will see
Product name: "DB2 Enterprise Server Edition" Expiry date: "Expired" Product identifier: "db2ese" Version information: "x.x"
The DB2 license file is called db2ese.lic and was shipped with your TSM server package. Its location will depend on where you extracted the server package, but assuming you extracted to a directory d:\tsm_extract\, you will find it in the following directory, where DB2_x.x corresponds to your DB2 version number.
Navigate your command line to that directory, then you can register the DB2 license again with the following command
db2licm -a db2ese.lic LIC1402I License added successfully.
Run the 'db2licm -l' command again and now you should see
Product name: "DB2 Enterprise Server Edition" License type: "Restricted" Expiry date: "Permanent" Product identifier: "db2ese" Version information: "x.x"
and you should be able to start your TSM Server instance normally.
AIX Maximum Number of Processes
You may see a database backup failing on an AIX server with an error like 'ANR2968E Database backup terminated. DB2 sqlcode: -2033. DB2 sqlerrmc: 292' If this error is not corrected the recovery log will fill up and crash the server. You may also see a message like 'Insufficient AIX system resource' in the db2diag log file.
The API error code 192 means that the API was unable to 'fork' or create a process to do its database backup. AIX has a parameter called maxuproc which limits the maximum number of processes that a user is allowed per user, and this value should be increased.
To see what value is set, use the command
lsattr -l sys0 -E | grep maxuproc
and to change the value use the command below, selecting a value that is suitable for your server.
chdev -l sys0 -a maxuproc='2048'
Effect of Deduplication on Database size
Deduplication will save a lot of backend storage, but it does this at the expense of increasing the size of the TSM database because the TSM database has to store and track the metadata that is required to manage the deduplication. The exact amount of extra space required is difficult to calculate up front, as it depends on your average 'deduplication chunk size' and this will vary depending on how well your data deduplicates. IBM suggests an typical chunk size of 100,000 bytes, and provides some scripts that you can run to measure your exact average chunk size once you have deduplication working. Each chunk needs 490 bytes of metadata to describe the data in the primary pool, and another 190 bytes for the data in each copy pool.
A starting point is to estimate your database size without deduplication, and to do this you use the formula
db_size = file_count * number_of_backup_copies * 200
To give you an idea of how many backup files exist, you can find the number of backup files that you are holding with the following SQL query on an existing server
select sum(cast(num_files as bigint)) from occupancy - where node_name is not null - and filespace_id is not null
To calculate the deduplication overhead, use the formula below to get the number of chunks
chunk_count = total_backedup_data_in_GB * 10,000 * 2
The doubling factor at the end of the formula is to cater for 'base deduplication chunks' that is, chunks that must remain even after a file is expired and deleted from TSM. The extra database overhead is then
chunk_count * (490 + 190 * extra_backup_copies)
Running this formula on an existing server with a 135GB database predicted an increase of 105GB with deduplication, which is not a trivial amount.
Moving a TSM database hosted on Windows
If you run your TSM server under Windows, then by default, your TSM database will be installed on the C: This can be an issue as the C: is usually reserved for system data, and system administrators prefer that your application data be stored on a different drive. It is possible to move the database using the DB2 utility 'db2relocatedb'. However db2relocatedb will not actually move your database, you have to do that yourself. What it does is update the pointers within DB2 so it knows where the database has been moved to.
First, just confirm that the database is allocated on the C:. Run the 'db2cmd' command from a Windows command line to open a DB2 command line, then run 'db2 list database directory' and within the command listing you should see a line
Local database directory = C:
Now, just incase something goes wrong, run a full TSM database backup
Stop the TSM Server with the HALT command. This should stop DB2 as well, but check and make sure it is down.
Copy your data to the new drive with the Windows copy command. Assuming you are going to the G:, this would be
copy c:\server1\* g:\server1\*
Yes, you could move it, but if you use copy, the original data is still on the c: incase anything unforseen happens - nothing wrong with having 2 backout plans! Take a look at the copy, and make sure the data and directory structure is as you expected
The db2relocatedb utility needs a configuration file to tell it what you are moving. The configuration file would normally be created in the same directory as db2relocateddb runs from and is usually called reloc.cfg. However you can create the file anywhere, in which case you need to specify the absolute path in the command example below.
Assuming you are using the default TSM database name and database instance then in our case, the config file will contain
DB_NAME=TSMDB1 DB_PATH=C:,G: INSTANCE=SERVER1
Create this file then run the db2relocatedb command pointing it to the configuration file location.
db2relocatedb -f reloc.cfg
Now start your TSM server up again and check that all is well. Rerun that 'db2 list database directory' command and check that the server is actually running from the g: Once you are certain all is well you can deleted the data off the c: