The basic SQL syntax is
SELECT ALL | DISTINCT |columnname(,columnname)
FROM tablename(, tablename)
WHERE 'column selection conditions'
GROUP BY columnname
HAVING 'row selection conditions'
ORDER BY columnname ASC| DESC
The standard SQL operator, '=', is case sensitive when comparing strings. You can use SELECT DISTINCT, to eliminate duplicate records, and get data from more than one table using the JOIN operator. You can also combine WHERE statements using the AND operator.
The first thing you need, is to know what TSM tables are available to query. These are described in the next section. After that, the best way to learn SQL is to try it out. It's a read only query language! The examples below might help explain what the syntax means, otherwise, try the following sites.
TSM SQL query command does not support the full SQL language syntax. The following operations will not work:
UNION
INTERSECT
EXCEPT
subqueries that return multiple values
You cannot use a semicolon as a command terminator
You can do maths in SQL statements, for example
SELECT AVG (total_mb/1024) -
AS "Average Total GB" -
FROM auditocc
You can select several columns, or items from a table by separating them with
commas, like
select platform_name,count from nodes
or you can join items together either of the two examples below will work.
select concat(FILESPACE_NAME,HL_NAME, LL_NAME) from backups
select filespace_name || hl_name || ll_name from backups
You can combine two tables together and select columns from each like this
SELECT nodes.domain_name,summary.activity FROM nodes, summary
or you can simplify the expression by giving the tables an alias
SELECT nn.domain_name,ss.activity FROM nodes nn, summary ss
Note that the aliases have two characters. For some reason TSM does not always like a single character. It seems to really object if you abbreviate 'summary' to 's'.
If you invoke SQL from a script then it may ask for confirmation to proceed, for example if it may check that you are happy to process a lot of output. You can suppress the confirmation messages with the option -noconfirm
To find out what TSM tables exist and what they contain, run the following queries
select * from syscat.tables
select * from syscat.columns
select * from syscat.enumtypes
The SUMMARY table contains a lot of useful entries for general statistics. A couple of fields are SUMMARY.ACTIVITY AND SUMMARY.SUCCESFUL. The activity field currently contains; 'TAPE MOUNT', 'FULL_DBBACKUP', 'INCR_DBBACKUP', 'EXPIRATION', 'RECLAMATION', 'MIGRATION', 'MOVE DATA', 'STGPOOL BACKUP', 'BACKUP', 'RESTORE', 'ARCHIVE' and 'RETRIEVE'. The successful field can be 'YES' or 'NO'. However you cannot rely on the summary table to report on the success of client events like backup and restore as it just reports on progress so far.
The DATE field in the EVENTS table does not support expressions like 'scheduled_start >= current_timestamp-24 hours'. If you issues this query at 14:00 it will return all events that started after midnight today, but not those between 14:00 and midnight yesterday. You can get the correct results by combining the relative time stamp with a constant timestamp, for example if yesterday was the 25th March then this will work.
scheduled_start > '2006-03-25' and scheduled_start >= current_timestamp-24 hours
This is no good if you want to schedule a query, but in that case you can simply use a very early fixed timestamp date, for example
select node_name, schedule_name, scheduled_start, status -
from events -
where scheduled_start >= '1900-01-01' and -
scheduled_start >= current_timestamp - 24 hours
If you enter SQL queries from the command line in the browser, you get the results in tabular format. It is possible to execute SQL from a host command line, and then you can pipe the command to a file and get the results in comma delimited format for importing to an Excel spreadsheet or similar.
The command is
Command output direction can be a bit complicated as it works differently for different operating systems. In general the '>' symbol will direct output to a file, but it is also a valid SQL mathematical operator. If the '>' symbol has spaces on both sides of it it will be considered as output redirection. If it has no space on either side, it will be considered as mathematical greater than.
So for example
select * from summary > summary.txt will direct lots of output text to a summary file, while
select * from summary where date>current_timestamp - 24 hours will look for events that happened today. Of course you can combine these as
select * from summary where date>current_timestamp - 24 hours > summary.out
If you run these commands in batch, the operating system might try to interpret the redirection command as greater than even if it is surrounded by spaces. In UNIX and LINUX you can put a slash before the command /> but the easier way is to put the whole command in quotes. "select * from summary where date>current_timestamp - 24 hours > summary.out"
TSM SQL queries can run for a long time, and use up a lot of resource. This
is usually because you are searching the whole database to get the data
you want. You can reduce the amount of database searching by selecting
specific data from an indexed column using a WHERE statement. To find
out which columns are indexed, use the query
select * from syscat.columns
A partial result looks like
TABSCHEMA: ADSM
TABNAME: MEDIA
COLNAME: LRD
COLNO: 9
INDEX_KEYSEQ:
INDEX_ORDER:
TYPENAME: TIMESTAMP
LENGTH: 0
SCALE: 0
NULLS: TRUE
REMARKS: Last Reference Date
TABSCHEMA: ADSM
TABNAME: MGMTCLASSES
COLNAME: DOMAIN_NAME
COLNO: 1
INDEX_KEYSEQ: 1
INDEX_ORDER: A
TYPENAME: VARCHAR
LENGTH: 30
SCALE: 0
NULLS: FALSE
REMARKS: Policy Domain Name
This tells you that the DOMAIN_NAME column in the MGMTCLASSES table is indexed, but the LRD column in the MEDIA table is not. So if you run a query like
SELECT * FROM MGMTCLASSES -
WHERE DOMAIN_NAME = 'DO_TDP'
'yyyy-mm-dd hh:mm:ss.nnnnnn'
yyyy = year
mm = month
dd = day
hh = hours
mm = minutes
ss = seconds
nnnnnn = fraction of a second
'ss' and 'nnnnnn' are both optional. When referring to a timestamp, put it in single quotes, for example to select records that started after 12:00 on July 21st you would specify start_time <= '2005-07-21 12:00:00'.
You can split the time stamp using a date function or a time function. For example to select records that began on July 21st, use date(start_time) >= '2005-07-21'.
If you just want records that started after 21:00, you would add time(start_time) <= '12:00:00'.
It is possible to combine two TSM tables in one query, but be aware that the TSM database is not really relational, so table joins take ages and use a lot of resource. It may be faster to do two queries, copy out the results then combine the data with an external program.
The key to database joins is
Both tables must have one column that contains the same data
You must give each table an alias name for reference purposes using 'table name alias' in the FROM statement
You select the common column in both tables using aliasname.table name
You then join the data with WHERE alias1.column2=alias2.column2
For example
SELECT pct_utilized, node_name, -
vm.volume_name, vu.volume_name -
FROM volumes vm,volumeusage vu -
WHERE node_name='NODE01' -
AND vm.volume_name=vu.volume_name
This combines the percent volume utilised column from the volumes table with the nodename column in the volumeusage table, combined with the volume column from each. Be aware that this is a really CPU intensive query.
How can I find out which volume contains a specific file?
select volume_name,node_name,filespace_name,file_name -
from contents -
where node_name='nodename' -
and filespace_name='filespace' -
and file_name='filename'
select count(*) as Scratch_count -
from libvolumes -
where status='Scratch'
If you have more than 1 library, you can find all your scratch tapes using the query
select LIBRARY_NAME,count(*)'scratches' from libvolumes where -
upper(status)='SCRATCH' group by LIBRARY_NAME
Thanks to Sven Neirynck of Compu-mark for that tip
TSM has a MAXSCRATCH parameter which is set independently for each storage pool. This defines the maximum number of tapes that each tape pool can contain. The following query will display how close each pool is to its limit.
SELECT STGPOOLS.STGPOOL_NAME, STGPOOLS.MAXSCRATCH, -
Count(STGPOOLS.MAXSCRATCH) as "Allocated_SCRATCH", -
STGPOOLS.MAXSCRATCH-count(STGPOOLS.MAXSCRATCH) as "Remaining_SCRATCH" -
FROM STGPOOLS,VOLUMES -
WHERE (VOLUMES.STGPOOL_NAME = STGPOOLS.STGPOOL_NAME) -
AND ((STGPOOLS.DEVCLASS="3590_CLASS")) -
GROUP BY STGPOOLS.STGPOOL_NAME, STGPOOLS.MAXSCRATCH
How many tapes can I reclaim by changing the reclamation threshold?
select count(*)from volumes -
where stgpool_name='poolname' -
and upper(status)='FULL' -
and pct_utilized <%%
poolname is the name of your tape storage pool, remember the name is case sensitive. Change %% to the level you could set the reclaim threshold to, the result is
As originally requested by Jose Barzaga then improved by Brian Bisson, the following query will list the number of tapes used by each node, by storage pool. The query is a bit resource intensive.
select count(DISTINCT volume_name) as volumes, node_name, stgpool_name -
from volumeusage -
group by node_name, stgpool_name -
order by volumes desc
If a tape is shared by more than one node, either because you do not run collocation, or because you ran out of tapes and TSM switched to partial colocation, then the query will count tapes more than once.
What was the times and data transferred for yesterday?
This select query will get the amount of data backed up in the previous 24 hours, along with the start and end times of the backups.
SELECT entity AS "Node name", -
CAST(sum(bytes/1024/1024) AS decimal(8,2)) AS "MB xfer", -
SUBSTR (CAST(min (start_time) AS char(29)),1,10) AS "start date", -
SUBSTR (CAST(min (start_time) AS char(29)),12,8) AS "start time", -
SUBSTR (CAST(max (end_time) AS char(29)),1,10) AS "end date", -
SUBSTR (CAST(max (end_time) AS char(29)),12,8) AS "end time" -
FROM summary -
WHERE activity='BACKUP' AND start_time>=current_timestamp - 24 hours -
GROUP BY entity
How much data is stored for each node by copy type?
This query assumes that you have three types of storage pools, one called with 'TAPEPOOL' which is used for standard backups, one called 'ARCHPOOL' which is used for Archives and one called 'TDPPOOL' which is used for SQL TDP backups. This query will summarise the amount of backup space used by each node in each pool. This is not the same as the query above, which reports on space usage by filespace.
select node_name as NODENAME, -
sum(case when substr(stgpool_name,1,3) in ('ARC') -
then logical_mb else 0 end) as ARC_OCC, -
sum(case when substr(stgpool_name,1,3) in ('TAP') -
then logical_mb else 0 end) as BCK_OCC, -
sum(case when substr(stgpool_name,1,3) in ('TDP') -
then logical_mb else 0 end) as TDP_OCC -
from occupancy group by node_name
select Entity,Successful,Bytes,Examined,Affected,Failed -
from summary -
where activity='BACKUP' -
and cast((current_timestamp-start_time)hours -
as decimal(8,0)) < 24 -
order by Entity'
To convert this to an exception report, add
and failed>0
at the end of the 'where' statement
If you want to know all the amount of data processed by all events, try
select nodes.domain_name,summary.activity, -
sum(cast(summary.bytes/1024/1024/1024 as decimal(6,2))) as GB -
from nodes, summary -
where (end_time between current_timestamp - 24 hours and current_timestamp) -
and (activity='BACKUP' or activity='RESTORE' -
or activity='ARCHIVE' or activity='RETRIEVE') -
and ((nodes.node_name=summary.entity)) -
group by domain_name,summary.activity -
order by activity,domain_name asc
Query the number of Volumes in use, and available by Device Class
This query will find every storage pool that has a device class of 3590_class, and return the storage pool name, the maxscratch value for the storage pool and how many volumes are in that pool.
SELECT a.stgpool_name,a.maxscratch,count(*) AS Volumes -
FROM stgpools a, volumes b -
WHERE a.stgpool_name = b.stgpool_name and a.devclass = '3590_CLASS' -
GROUP BY a.stgpool_name,a.maxscratch
Total data stored and tapes used per node in all storage pools
This select will show NODE_NAME, TOTAL_MB that is amount of data stored in TSM for this node, TAPES that is the amount of tapes that contain data of this node (of any storage pool), and AVG MB/tape that is the average of MB by tape (TOTAL_MB divided by number of tapes with node data). It's sorted by worst data distribution. This query will pickup data stored on any storage pool, including data on a disk pool pending migration. That can skew the results.
select vu.node_name, ao.total_mb, count(distinct vu.volume_name) -
as tapes, ao.total_mb/count(distinct vu.volume_name) -
as "AVG MB/tape" from volumeusage vu, auditocc ao -
where vu.node_name=ao.node_name -
group by vu.node_name, ao.total_mb order by 4
Typical output
NODE_NAME TOTAL_MB TAPES AVG MB/tape
--------------- --------- --------- -----------
DEC_XL34RT2B 3394 207 16
XLF3LV02 88796 2 44398
XLFFAF01 51080 1 51080
XLF3AF02 544846 9 60538
Total data stored and tapes used per node in one storage pool
This select will show NODE_NAME, TOTAL_MB that is amount of data stored in TSM for this node, TAPES that is the amount of tapes that contain data of this node located in the specified storage pool, and AVG MB/tape that is the average of MB by tape (TOTAL_MB divided by number of tapes with node data). It's sorted by worst data distribution.
select vu.node_name, ao.total_mb, count(distinct vu.volume_name) -
as tapes, ao.total_mb/count(distinct vu.volume_name) -
as "AVG MB/tape" from volumeusage vu, auditocc ao -
where vu.stgpool_name='YOUR_POOL_NAME' and vu.node_name=ao.node_name -
group by vu.node_name, ao.total_mb order by 4
typical output
NODE_NAME TOTAL_MB TAPES AVG MB/tape
------------------ ----------- ------- -----------
DEC_XL34RT2B 0 1 0
NODEL81 92 1 92
NODEL265 294 1 294
XLF3AF03 119524 2 59762
XLF3AF02 544846 9 60538
This select statement will create a script that will move data from low utilized tapes. This process is used to supplement Reclamation as it does not look at expired data and it is multi-streaming. The percent utilized is adjustable.
select 'move data ',volume_name, ' wait=yes', status -
from volumes where stgpool_name='pool_name' -
and pct_utilized>0 and pct_utilized>25 and access='OFFSITE'
typical output
Unnamed[1] VOLUME_NAME Unnamed[3] STATUS
---------- --------------- ---------- ------
move data DZ1963 wait=yes FULL
move data DZ2070 wait=yes FULL
move data DZ2653 wait=yes FULL