Specific page info
What is BMR?
Bare metal recovery is the art of recovering a machine from an empty hard drive or 'bare metal'. In a commercial environment it can take too much time to rebuild an operating system by hand, and this also requires skilled staff. Even if you have a record of all the information needed to build a server, including copies of all the applications that were loaded on it, it can still take days to complete a rebuild.
There are ways of making this easier. If a server is really business critical, it may be worth investing in a hot standby (that is, a spare server with all the software loaded on it and ready to go) to minimise the business impact of a failure. If it does not make economic sense to have a farm of spare servers sitting idle, then BMR tools exist to simplify a machine rebuild.
CBMR can automate a basic system rebuild on Windows, Linux, AIX, Solaris and HP-UX servers. It consists of the following components:
- A backup application that is used to copy critical system files.
- A Bootable DR service that can be on a CD or a Network Share. This includes Cristie's own file system tools.
- A set of server configuration files that will be used to rebuild the server.
CMBR can be managed from a GUI or a CLI. The CLI means you can script recoveries to minimise manual intervention, and it is possible to schedule several recoveries in parallel.
In a disaster situation you boot the server from the bootable DR service, then the hard disks will be partitioned and formatted. CBMR then restores the system drive and all the application data unless a recovery of the bare O/S is selected. The restored data can come from either a CD-ROM or from a TSM filespace. You then need to re-boot the server. This whole process can be automated with a script. Dissimilar Hardware is supported on Windows and Linux/
TBMR (CBMR with TSM)
TBMR supports Windows, Linux, AIX and HPUX and acts as a plug-in to TSM so leveraging regular TSM backups to provide complete disaster recovery protection. You do not need to run separate TBMR backups as TBMR utilises the existing node backup to protect the server.
Install TBMR on the file server - this is a typical point and click process
Create a TBMR client node on the TSM server. Each system should be backed up to a separate dedicated node to avoid a single point of failure.
On installation TBMR runs 'tbmrcfg.exe' or 'tbmrcfg' in the Linux version, which saves system configuration information to a folder in the root of the system drive named /TBMRCFG. This configuration information is backed up as part of the system's normal TSM backup routine. To ensure changes to the system are regularly recorded tbmrcfg.exe or tbmrcfg should be run as a preschedule command before regular tsm backups.
System Recovery Process
On recovery the TBMR DR media is used to boot the recovery server. The software then connects to the TSM client node and returns the configuration information. Using this data TBMR partitions and formats the disks ready for recovery. System, boot and application data are then recovered from TSM. If required, the user also has the opportunity to inject drivers to enable recovery to dissimilar hardware (fully automated on Linux).
As a useful extra, TMBR can also be used to migrate servers onto new hardware.
VERITAS Bare Metal Restore (VBMR) supports Windows and Unix clients and is tightly coupled with Veritas NetBackup for backup data storage. VBMR requires a number of logical servers.
VBMR Main Server
The main server is used to manage the BMR processes and to hold central BMR data. It must be installed on a supported UNIX server. You run the administration GUI on the main VBMR server.
VBMR Boot Server
A boot server is required for UNIX clients but not for Windows clients. It contains the boot images for UNIX clients and must run on the same UNIX platform as those clients, that is, a boot server for AIX clients must be hosted on an AIX machine.
VBMR File Server
A file server holds the files (called SRTs, see below) that VBMR uses to recover the clients. UNIX file servers must run on the same UNIX platform as their clients and in fact Solaris and HP-UX file servers must run on the same physical machine as the boot servers. File servers for Windows clients can run on any platform.
As well as the BMR software, VBMR clients must run NetBackup software too.
Shared Resource Tree
An SRT basically provides a temporary operating system that can be used to recover data to rebuild the real operating system. The format of the shared resource tree (SRT) depends on the type of client that you are protecting. For UNIX clients it is either a network or CD-based root file system.
Taking VBMR Backups
Once you install and configure VBMR, you can just run your normal NetBackup backups and these will invoke an extra process called 'bmrsaveconfig'. This records the current system configuration state, including the disk layouts and TCP/IP configuration. You can also run bmrsaveconfig manually if you want to take a system backup immediately after you change a system configuration.
Restoring with VBMR
Restoration is a very simple process. The VBMR GUI runs in a browser window. You simply select the server you want to recover form a list of registered clients, select a configuration for that client, select an SRT to use for the restore then select the 'prepare for restore' button. Once VMBR indicates that the restore is ready to go, you click on an OK prompt, then reboot the client. VBMR will then go off and rebuild the client to your specification.
You need bare metal recovery for mainframes too, though it is usually called Stand Alone Recovery (SAR). It is possible that you might find that all your disk data is trashed and you need to build a z/OS system from scratch. Here are a couple of processes using FDRSAR from Innovation DP and DFDSS SAR from IBM.
If you use a disaster recovery service like Sunguard or Iron Mountain, they will supply a running starter system for you. Otherwise, the basic principle behind SAR is that you IPL from a tape by mounting an SAR tape in a tape drive and pointing HCD at the drive address. You then use the SAR program on the tape to recover the set of system volumes that are required for an IPL . You communicate with the SAR program using a very simple command menu that is displayed on the system console.
Recovering volumes with SAR is a tedious, time consuming process. What you want is to set up a minimal special system that contains enough z/OS to be able to IPL from, with some Network connectivity, your recovery software and some sample JCL libraries. It is possible to fit all that onto three or four disks, which you can backup to a single large capacity tape. You would store your SAR tape and your system tape in a secure location, but make sure that you refresh them every time you upgrade z/OS or your backup software
Check your documentation for full details on using SAR, but you need to know the location on tape of all your required disk backups, including the tape number and the label, and also a list of target UCB addresses that you will recover to. Once you recover your small system with SAR, you can IPL from the recovered SYSRES volume.
At this point, you have a very limited z/OS system up and running, with access to TSO and ISPF, but no record of any backups or tapes. You need to recover your backup catalog and your tape management catalog next, then after that recovery should be reasonably easy.
So, it should now be obvious that you need some kind of tape catalog listing that you can access independent of your mainframe, especially if you use a product like DFDSS that has no native catalog facility. You can either print out your tape catalog every day and store it along with your offsite tapes, or you can file transfer a catalog listing to a remote fileserver. A more sophisticated approach is to write a program that just extracts backups of the disks that you need, works out target addresses, then copies these off to a safe location.
Creating an ABR SAR tape
//S1 EXEC PGM=IEBCOPY //SYSUT1 DD DISP=OLD,DSN=JCL,VOL=SER=FDR54T,UNIT=CART,LABEL=(10,SL) //SYSUT2 DD DSN=your.name,DISP=OLD //SYSPRINT DD SYSOUT=J,OUTLIM=900000 //SYSIN DD * COPY INDD=SYSUT1,OUTDD=SYSUT2 //
Creating a DFDSS SAR tape
//STEP1 EXEC PGM=ADRDSSU,PARM='UTILMSG=YES' //SAMODS DD DSN=SYS1.SADRYLIB,DISP=SHR //TAPEDD DD DSN=ADRSA.IPLT,UNIT=BUNK,LABEL=(,NL), // DISP=(NEW,KEEP),VOL=SER=TAPE01, // DCB=(DSORG=PS,RECFM=U,BLKSIZE=32760,LRECL=32760) //SYSPRINT DD SYSOUT=A //SYSIN DD * BUILDSA - INDD(SAMODS) - OUTDD(TAPEDD) - IPL(TAPE)