What is BMR?
Bare metal recovery is the art of recovering a machine from an empty hard drive or 'bare metal'. In a commercial environment it can take too much time to rebuild an operating system by hand, and this also requires skilled staff. Even if you have a record of all the information needed to build a server, including copies of all the applications that were loaded on it, it can still take days to complete a rebuild.
Windows operating systems were always a BMR challenge, as they needed the recovery machine to be pretty much identical to the original, even down to the drivers. However Microsoft introduced Wbadmin, a new backup utility in Windows Server 2008 and this has built-in support for BMR. You can even recover your server to a Hyper-V virtual machine. Windows Server 2016 continues to offer this utility. You need to run the Wbadmin command from an elevated command prompt. A typical command to run a system state backup would be:
Wbadmin start backup -backupTarget:[backup-location] -allcritical -systemstate -vssfull
Of course, many large site these days use VMware virtualisation, so they don't actually have many physical Windows servers, and so not much need for bare metal restores. It is quick and easy to create a new VM. However if you do have physical servers, then there are ways of making BMR easier. If a server is really business critical, it may be worth investing in a hot standby (that is, a spare server with all the software loaded on it and ready to go) to minimise the business impact of a failure. If it does not make economic sense to have a farm of spare servers sitting idle, then BMR tools exist to simplify a machine rebuild.
CBMR can automate a basic system rebuild on Windows, Linux, AIX, Solaris and HP-UX servers. The backups can be encrypted for security, or compressed to save on storage space. You can use snapshots as a backup source, and Cristie BMR has its own scheduling facility. It consists of the following components:
- A backup application that is used to copy critical system files.
- A Bootable DR service that can be on a CD or a Network Share. This includes Cristie's own file system tools.
- A set of server configuration files that will be used to rebuild the server.
CMBR can be managed from a GUI or a CLI. The CLI means you can script recoveries to minimise manual intervention, and it is possible to schedule several recoveries in parallel.
In a disaster situation you boot the server from the bootable DR service, then the hard disks will be partitioned and formatted. CBMR then restores the system drive and all the application data unless a recovery of the bare O/S is selected. The restored data can come from either a CD-ROM or from a TSM filespace. You then need to re-boot the server. This whole process can be automated with a script. Dissimilar Hardware is supported on Windows and Linux, and recovery can be made to virtual servers or to the Cloud.
TBMR (CBMR with TSM)
TBMR supports Windows, Linux, AIX and HPUX and acts as a plug-in to TSM, using TSM as a backend datastore. You do not need to run separate TBMR backups as TBMR just uses the existing TSM node backup to protect the server. TBMR supports TSM incremental backups, image backups and online backupsets, but it you decide to use image backups, then you need to backup the TBMRCFG folder separately as a file backup, as it must be read to obtain the server build informaton. The install process is:
Install TBMR on the file server - this is a typical point and click process
Create a TBMR client node on the TSM server. Each system should be backed
up to a separate dedicated node to avoid a single point of failure.
On installation TBMR runs 'tbmrcfg.exe' or 'tbmrcfg' in the Linux version, which saves system configuration information to a folder in the root of the system drive named /TBMRCFG. This configuration information is backed up as part of the system's normal TSM backup routine. To ensure changes to the system are regularly recorded tbmrcfg.exe or tbmrcfg should be run as a preschedule command before regular tsm backups.
System Recovery Process
On recovery the TBMR DR media is used to boot the recovery server. The software then connects to the TSM client node and returns the configuration information. Using this data TBMR partitions and formats the disks ready for recovery. System, boot and application data are then recovered from TSM. If required, the user also has the opportunity to inject drivers to enable recovery to dissimilar hardware (fully automated on Linux).
As a useful extra, TMBR can also be used to migrate servers onto new hardware.
Cristie also supplies a cloud environment BMR product called CloneManager which lets you clone and move Windows and Linux clients. The clone function would let you create a set of 'standby' servers that are copies of the originals, and so remove the need for bare metal recovery. You can create these clones as exact copies of servers while they are running, without affecting business operations. Servers are displayed on a GUI, and you clone them simply by dragging and dropping a source machine from the left hand side of the screen to a target machine or environment on the right. Once the initial clone completes, it is possible to synchronise a clone with updates, without having to copy all the data again.
You can clone machines between, and within, different physical, virtual and cloud environments. You can also sync the cloned machines back to their source machine for DR purposes.
One of the selling points of CloneManager is that you do not need to install agents or any software on the source machine. However, you can take the option to install an agent and use this to get information about CPU, memory and disk usage on the source machine, then use this to make sure that your target machine is correctly sized.
You can also select a subset of disks and volumes to be cloned, configure the host name and IP addresses on the target machine, and schedule the clone and sync operations so that your BMR protection is fully automated.
VERITAS Bare Metal Restore (VBMR) supports Windows and Unix clients and is tightly coupled with Veritas NetBackup for backup data storage. VBMR requires a number of logical servers.
VBMR Master Server
The master server is used to manage the BMR processes and to hold central BMR data. It must be installed on a supported UNIX server and needs the BMR database configured on initial setup. You run the administration GUI on the master VBMR server.
VBMR Boot Server
Several boot servers are required for VBMR clients and they contain the boot images or SRTs for those clients. You will need a Boot Server for every different client operating system that you wish to protect, and ideally you should select a server that is running your latest version of those clients, as SRTs can be created for lower operating system versions, but not higher.
Shared Resource Trees
An SRT basically provides a temporary operating system that can be used to recover data to rebuild the real operating system. An SRT contains Operating System files, NetBackup client software, software to create and partition drives and anything else needed to get a basic system up. The exact format of the SRT depends on the type of client that you are protecting.
UNIX and Linux clients need a different SRT for each Client type and OS level and they must be created on the same operating system and the same or later version of Boot Server as the clients.
A Windows SRT just needs to be the same architecture, that is, 32bit or 64bit, otherwise a Boot Server can be any release of Windows.
SRTs are created on Boot Servers, but can be copied to CD or file shares for later use.
To create an SRT, you need the operating system installation software, your NetBackup client software and also any other software you might need to get your system running. You might also need patches, maintenance levels, service packs etc. The Shared Resource Tree Wizard will guide you through the creation process. It can take up to an hour to create an SRT, so these things are best planned and created in advance of a disaster. Once complete, the SRT can be installed temporarily onto a bare metal server and will give enough operating system and system applications to start off a normal recovery.
Taking VBMR Backups
To run a VBMR, you just need a suitable NetBackup 'Policy Set'. This must be 'MS-Windows' for Windows or 'Standard' for UNIX or LINUX. The Policy Set must also be used with 'Collect Disaster Recovery Information for Bare Metal Restore' set.
Restoring with VBMR
Restoration is a very simple process. The VBMR GUI runs in a browser window. You simply select the server you want to recover form a list of registered clients, select a configuration for that client, select an SRT to use for the restore then select the 'prepare for restore' button. Once VMBR indicates that the restore is ready to go, you click on an OK prompt, then reboot the client. VBMR will then go off and rebuild the client to your specification.
Once the restore is complete, you must take a full NetBackup backup, as any previous incrementals are now unusable.
You need bare metal recovery for mainframes too, though it is usually called Stand Alone Recovery (SAR). It is possible that you might find that all your disk data is trashed and you need to build a z/OS system from scratch. Here are a couple of processes using FDRSAR from Innovation DP and DFDSS SAR from IBM.
If you use a disaster recovery service like Sunguard or Iron Mountain, they will supply a running starter system for you. Otherwise, the basic principle behind SAR is that you IPL from a tape by mounting an SAR tape in a tape drive and pointing HCD at the drive address. You then use the SAR program on the tape to recover the set of system volumes that are required for an IPL . You communicate with the SAR program using a very simple command menu that is displayed on the system console.
What if you just have virtual tape libraries? IBM technotes exist that state that it is possible to run SAR from a TS7700 using virtual tapes and drives. You need to set the drive you want into StandAlone mode and make sure it is offline to all other systems. Then you can use the Stand Alone console on the TS7700 to select and mount the SAR tape, which must exist in the library of course. Once you IPL from the tape, you can then mount the tapes required for your small system from the same console.
Recovering volumes with SAR is a tedious, time consuming process. What you want is to set up a minimal special system that contains enough z/OS to be able to IPL from, with some Network connectivity, your recovery software and some sample JCL libraries. It is possible to fit all that onto three or four disks, which you can backup to a single large capacity tape. You would store your SAR tape and your system tape in a secure location, but make sure that you refresh them every time you upgrade z/OS or your backup software
Check your documentation for full details on using SAR, but you need to know the location on tape of all your required disk backups, including the tape number and the label, and also a list of target UCB addresses that you will recover to. Once you recover your small system with SAR, you can IPL from the recovered SYSRES volume.
At this point, you have a very limited z/OS system up and running, with access to TSO and ISPF, but no record of any backups or tapes. You need to recover your backup catalog and your tape management catalog next, then after that recovery should be reasonably easy.
So, it should now be obvious that you need some kind of tape catalog listing that you can access independent of your mainframe, especially if you use a product like DFDSS that has no native catalog facility. You can either print out your tape catalog every day and store it along with your offsite tapes, or you can file transfer a catalog listing to a remote fileserver. A more sophisticated approach is to write a program that just extracts backups of the disks that you need, works out target addresses, then copies these off to a safe location.
Creating an ABR SAR tape
//S1 EXEC PGM=IEBCOPY
//SYSUT1 DD DISP=OLD,DSN=JCL,VOL=SER=FDR54T,
//SYSUT2 DD DSN=your.name,DISP=OLD
//SYSPRINT DD SYSOUT=J,OUTLIM=900000
//SYSIN DD *
Creating a DFDSS SAR tape
//STEP1 EXEC PGM=ADRDSSU,PARM='UTILMSG=YES'
//SAMODS DD DSN=SYS1.SADRYLIB,DISP=SHR
//TAPEDD DD DSN=ADRSA.IPLT,UNIT=BUNK,LABEL=(,NL),
//SYSPRINT DD SYSOUT=A
//SYSIN DD *
back to top