Navigation Bar

Long Advertising banner for Servergraph

What is BMR?

Bare metal recovery is the art of recovering a machine from an empty hard drive or 'bare metal'. At some time we must all have had to rebuild a home PC from a recovery disk. This puts your PC back into its 'factory state'. You must then find and re-install drivers for any new kit installed on top of base version, then install all applications that were not part of the base system. This can be painful and the last time I had to do it I went out and bought some anti-virus software.

A server recovery essentially follows the same process, but while this may be an acceptable process for a home PC, in a commercial environment it can take too much time to rebuild an operating system by hand, and this also requires skilled staff. Even if you have a record of all the information needed to build a server, including copies of all the applications that were loaded on it, it can still take days to complete a rebuild.
There are ways of making this easier. If a server is really business critical, it may be worth investing in a hot standby (that is, a spare server with all the software loaded on it and ready to go) to minimise the business impact of a failure. If it does not make economic sense to have a farm of spare servers sitting idle, then BMR tools exist to simplify a machine rebuild.

Cristie BMR

CBMR can automate a basic system rebuild and will interface with TSM to store backups on the network. It consists of the following components:

  • Backup and restore software (PC-BaX) - used to backup and restore files in Windows mode.
  • Open File Module (OFM) - enables backup of files that are in use by the windows system or other applications at the time of backup.
  • Linux operating system - A failed server needs to be booted from the CBMR CD-ROM or a network based copy. This will boot a version of Linux in which the partitioning and formatting of the hard disks will be done.
  • Linux mode restore software - restores the essential operating system files from the TSM server.

CMBR allows you to replace a hard disk with a bigger or a smaller one than the original. It will automatically scale disk partitions to fit the size of the new hard disk.

CBMR can either store system rebuild information on a floppy disk or on a network share. This information includes the number and types of hard disks, their layout, Windows, CBMR and ITSM installation folders and the SCSI, RAID and network adapters installed.

In a disaster situation you boot the server from a provided Linux operating system, then the hard disks will be partitioned and formatted. CBMR then restores the operating system files, either from CD-ROM or from a TSM filespace. You then need to re-boot the server. This whole process can be automated with a script.

Installing CBMR for use with TSM

Install CBMR on the file server - this is a typical point and click process
Create a CBMR client node on the TSM server. You can create one CBMR node for all the machines and then each machine's data will be stored under a different filespace.
In CBMR, create a storage device to represent the TSM client node. The first time you launch PC-BaX it will ask for the storage device. At this point, you tell it to use TSM.

Hardware or software configuration

You need to save the client's configuration information by running setupbmr.exe You should do this every time you change the configuration of the client.

Backup the important system files to the TSM server by running PC-BaX and selecting the do_a_dr_backup option

Both these processes can be automated with the TSM scheduler to run on a regular basis. This would be appropriate for the system files backup, but a manual backup may be a better option after configuration changes.

System Recovery Process

Boot the server from the CBMR CD-ROM. It is possible to hold a copy of the CD-ROM on a remote server, and then boot a failed server from it via a RIB board.
Get the configuration information from the network share - CBMR will now configure the hardware
Restore the system files from the TSM server
When prompted reboot the computer.

This whole process takes 5-10 minutes, and then the server is ready for data recovery. It is possible to recover multiple servers concurrently from one TSM server.

Veritas BMR

VERITAS Bare Metal Restore (VBMR) supports Windows and Unix clients and is tightly coupled with Veritas NetBackup for backup data storage. VBMR requires a number of logical servers.

VBMR Components

VBMR Main Server

The main server is used to manage the BMR processes and to hold central BMR data. It must be installed on a supported UNIX server. You run the administration GUI on the main VBMR server.

VBMR Boot Server

A boot server is required for UNIX clients but not for Windows clients. It contains the boot images for UNIX clients and must run on the same UNIX platform as those clients, that is, a boot server for AIX clients must be hosted on an AIX machine.

VBMR File Server

A file server holds the files (called SRTs, see below) that VBMR uses to recover the clients. UNIX file servers must run on the same UNIX platform as their clients and in fact Solaris and HP-UX file servers must run on the same physical machine as the boot servers. File servers for Windows clients can run on any platform.

VBMR Client

As well as the BMR software, VBMR clients must run NetBackup software too.

Shared Resource Tree

An SRT basically provides a temporary operating system that can be used to recover data to rebuild the real operating system. The format of the shared resource tree (SRT) depends on the type of client that you are protecting. For UNIX clients it is either a network or CD-based root file system.

Taking VBMR Backups

Once you install and configure VBMR, you can just run your normal NetBackup backups and these will invoke an extra process called 'bmrsaveconfig'. This records the current system configuration state, including the disk layouts and TCP/IP configuration. You can also run bmrsaveconfig manually if you want to take a system backup immediately after you change a system configuration.

Restoring with VBMR

Restoration is a very simple process. The VBMR GUI runs in a browser window. You simply select the server you want to recover form a list of registered clients, select a configuration for that client, select an SRT to use for the restore then select the 'prepare for restore' button. Once VMBR indicates that the restore is ready to go, you click on an OK prompt, then reboot the client. VBMR will then go off and rebuild the client to your specification.

Mainframe BMR

You need bare metal recovery for mainframes too, though it is usually called Stand Alone Recovery (SAR). It is possible that you might find that all your disk data is trashed and you need to build a z/OS system from scratch. Here are a couple of processes using FDRSAR from Innovation DP and DFDSS SAR from IBM.

If you use a disaster recovery service like Sunguard or Iron Mountain, they will supply a running starter system for you. Otherwise, the basic principle behind SAR is that you IPL from a tape by mounting an SAR tape in a tape drive and pointing HCD at the drive address. You then use the SAR program on the tape to recover the set of system volumes that are required for an IPL . You communicate with the SAR program using a very simple command menu that is displayed on the system console.

Recovering volumes with SAR is a tedious, time consuming process. What you want is to set up a minimal special system that contains enough z/OS to be able to IPL from, with some Network connectivity, your recovery software and some sample JCL libraries. It is possible to fit all that onto three or four disks, which you can backup to a single large capacity tape. You would store your SAR tape and your system tape in a secure location, but make sure that you refresh them every time you upgrade z/OS or your backup software

Check your documentation for full details on using SAR, but you need to know the location on tape of all your required disk backups, including the tape number and the label, and also a list of target UCB addresses that you will recover to. Once you recover your small system with SAR, you can IPL from the recovered SYSRES volume.

At this point, you have a very limited z/OS system up and running, with access to TSO and ISPF, but no record of any backups or tapes. You need to recover your backup catalog and your tape management catalog next, then after that recovery should be reasonably easy.

So, it should now be obvious that you need some kind of tape catalog listing that you can access independent of your mainframe, especially if you use a product like DFDSS that has no native catalog facility. You can either print out your tape catalog every day and store it along with your offsite tapes, or you can file transfer a catalog listing to a remote fileserver. A more sophisticated approach is to write a program that just extracts backups of the disks that you need, works out target addresses, then copies these off to a safe location.

Creating an ABR SAR tape

//S1       EXEC PGM=IEBCOPY
//SYSUT1   DD DISP=OLD,DSN=JCL,VOL=SER=FDR54T,UNIT=CART,LABEL=(10,SL)
//SYSUT2   DD DSN=your.name,DISP=OLD
//SYSPRINT DD SYSOUT=J,OUTLIM=900000
//SYSIN    DD *
 COPY INDD=SYSUT1,OUTDD=SYSUT2
//

Creating a DFDSS SAR tape

//STEP1  EXEC PGM=ADRDSSU,PARM='UTILMSG=YES'
//SAMODS DD DSN=SYS1.SADRYLIB,DISP=SHR
//TAPEDD DD DSN=ADRSA.IPLT,UNIT=BUNK,LABEL=(,NL),
//         DISP=(NEW,KEEP),VOL=SER=TAPE01,
//         DCB=(DSORG=PS,RECFM=U,BLKSIZE=32760,LRECL=32760)
//SYSPRINT DD SYSOUT=A
//SYSIN    DD *
  BUILDSA - 
        INDD(SAMODS) - 
        OUTDD(TAPEDD) - 
         IPL(TAPE)

back to top


By entering and using this site, you accept the conditions and limitations of use

 

 

 

Advertising banner for Servergraph

 

 

 

Advertising banner for Lasconet