What is BMR?
Bare metal recovery is the art of recovering a machine from an empty hard drive or 'bare metal'. In a commercial environment it can take too much time to rebuild an operating system by hand, and this also requires skilled staff. Even if you have a record of all the information needed to build a server, including copies of all the applications that were loaded on it, it can still take days to complete a rebuild.
Windows operating systems were always a BMR challenge, as they needed the recovery machine to be pretty much identical to the original, even down to the drivers. However Microsoft introduced Wbadmin, a new backup utility in Windows Server 2008 and this has built-in support for BMR. You can even recover your server to a Hyper-V virtual machine. Windows Server 2016 continues to offer this utility. You need to run the Wbadmin command from an elevated command prompt. A typical command to run a system state backup would be:
Wbadmin start backup -backupTarget:[backup-location] -allcritical -systemstate -vssfull
Of course, many large site these days use VMware virtualisation, so they don't actually have many physical Windows servers, and so not much need for bare metal restores. It is quick and easy to create a new VM. However if you do have physical servers, then there are ways of making BMR easier. If a server is really business critical, it may be worth investing in a hot standby (that is, a spare server with all the software loaded on it and ready to go) to minimise the business impact of a failure. If it does not make economic sense to have a farm of spare servers sitting idle, then BMR tools exist to simplify a machine rebuild.
Cristie Recover can automatically and rapidly run a complete BMR directly from backups carried out by IBM Spectrum Protect TSM, EMC NetWorker, EMC Avamar or CommVault Simpana, without having to create or manage any additional backups with Cristie Recover.
Your recovered machine will be functionally identical to the original and can be managed remotely, or directly from the recovered machine.
A Recovery Simulator comes free with any solution from the Cristie Recover range and provides automated vital disaster recovery testing, identifying if systems are recoverable in a DR scenario. Schedule regular or instant recovery simulations to virtual or cloud, to test whether your back-up will recover in the event of a failure. Recovery simulations can run from a different machine and provides a report to validate the reliability of the recovery.
In a virtual environment Cristie Recover delivers fully integrated and instant recoveries to physical, hypervisor and cloud environments.
Power Linux and AIX recovery, works with standard backup software, removes the need for mksysb image backups.
Cristie Recover allows you to perform a bare machine recovery of your system direct from either a Spectrum Protect TSM backup, a Networker backup, a CommVault Simpana backup or an EMC Avamar backup. These variants of Cristie Recover are called TBMR, NBMR, SBMR and ABMR respectively, and are re-sold globally by the backup software suppliers and their channel partners. They are currently available (2019) for Windows, Linux, Solaris and AIX operating systems, but check the Cristie website for up to date details.
xBMR recovers critical servers directly from the backup software and will recover to any point in time provided by that software. You can also run and schedule simulated recoveries.
This outline process refers in general to all 4 versions of Cristie BMR, but you should refer to the specific product documentation for the full process.
Assuming you have you TSM/Netbackup/Simpana/Avamar client alreay installed, you run the xBMR Install Wizard. During the install, it will automatically capture the initial configuration of your machine. Once the install is complete, you can launch xBMR GUI. You will see a range of configuration tools available to you, including, a driver validation tool to check your machine drivers, and an automated Disaster Recover Answer File creator which helps enable a seamless automatic recover with minimal user intervention, when needed. Once the configuration has been set, you just backup your machine with your chosen backup product in the usual way.
If you need to recover your system, and you created a Disaster Recover Answer File, you can ru an unattended, automatic recovery.
If you wish to control the recovery process, you start a manual recover wizard, which will guide you the steps needed for a successful recovery. You can chose the latest backup, or a previous point in time.
Cristie also supplies a cloud environment BMR product called CloneManager which lets you clone and move Windows and Linux clients. The clone function would let you create a set of 'standby' servers that are copies of the originals, and so remove the need for bare metal recovery. You can create these clones as exact copies of servers while they are running, without affecting business operations. Servers are displayed on a GUI, and you clone them simply by dragging and dropping a source machine from the left hand side of the screen to a target machine or environment on the right. Once the initial clone completes, it is possible to synchronise a clone with updates, without having to copy all the data again.
You can clone machines between, and within, different physical, virtual and cloud environments. You can also sync the cloned machines back to their source machine for DR purposes.
One of the selling points of CloneManager is that you do not need to install agents or any software on the source machine. However, you can take the option to install an agent and use this to get information about CPU, memory and disk usage on the source machine, then use this to make sure that your target machine is correctly sized.
You can also select a subset of disks and volumes to be cloned, configure the host name and IP addresses on the target machine, and schedule the clone and sync operations so that your BMR protection is fully automated.
VERITAS Bare Metal Restore (VBMR) supports Windows and Unix clients and is tightly coupled with Veritas NetBackup for backup data storage. VBMR will restore the operating system, the system configuration, and all the system files and the data files from a single command, then a client reboot. You need to follow three steps to get a successful BMR of a client:
- Install a NetBackup backup policy that is BMR-enabled, then on the next client backup, that client system skeleton information that is needed for a BMR recovery will be backed up. This system skeleton info comprises OS details, disk information, volume details, file system information and network information.
- Before you can run a recovery, you need to prepare a shared resource tree (SRT) on the BMR Boot server. An SRT provides a staging environment for the BMR recovery. You just need one SRT for clients belonging to same operating system family.
- The actual client recovery just involves a single click on a GUI, or a single command on a command line. The recovery can be Network-based boot or Media-based boot.
VBMR requires a number of logical servers.
VBMR Master Server
A NetBackup master server is used to manage backups and restores of managed clients. This NetBackup master server can also host a BMR master server, which is used to manage the BMR processes and to hold central BMR data. You would configure the BMR master server after you have installed the NetBackup master server.
Netbackup Media Servera
The NetBackup media servers managed the storage devices, disk or tape, that are used to store the backup files.
VBMR Boot Server
Several boot servers are required for VBMR clients and they contain the boot images or SRTs for those clients. You will need a Boot Server for every different client operating system that you wish to protect, and ideally you should select a server that is running your latest version of those clients, as SRTs can be created for lower operating system versions, but not higher. The boot server must be registered with BMR Master Server once the NetBackup client is installed.
Shared Resource Trees
An SRT basically provides a temporary operating system that can be used to recover data to rebuild the real operating system. An SRT contains Operating System files, NetBackup client software, software to create and partition drives and anything else needed to get a basic system up. The exact format of the SRT depends on the type of client that you are protecting.
UNIX and Linux clients need a different SRT for each Client type and OS level and they must be created on the same operating system and the same or later version of Boot Server as the clients.
A Windows SRT just needs to be the same architecture, that is, 32bit or 64bit, otherwise a Boot Server can be any release of Windows.
SRTs are created on Boot Servers, but can be copied to CD or file shares for later use.
To create an SRT, you need the operating system installation software, your NetBackup client software and also any other software you might need to get your system running. You might also need patches, maintenance levels, service packs etc. The Shared Resource Tree Wizard will guide you through the creation process. It can take up to an hour to create an SRT, so these things are best planned and created in advance of a disaster. Once complete, the SRT can be installed temporarily onto a bare metal server and will give enough operating system and system applications to start off a normal recovery.
Taking VBMR Backups
To run a VBMR, you just need a suitable NetBackup 'Policy Set'. This must be 'MS-Windows' for Windows or 'Standard' for UNIX or LINUX. The Policy Set must also be used with 'Collect Disaster Recovery Information for Bare Metal Restore' set.
Restoring with VBMR
Restoration is a very simple process. The VBMR GUI runs in a browser window. You simply select the server you want to recover form a list of registered clients, select a configuration for that client, select an SRT to use for the restore then select the 'prepare for restore' button. Once VMBR indicates that the restore is ready to go, you click on an OK prompt, then reboot the client. VBMR will then go off and rebuild the client to your specification.
Once the restore is complete, you must take a full NetBackup backup, as any previous incrementals are now unusable.
You need bare metal recovery for mainframes too, though it is usually called Stand Alone Recovery (SAR). It is possible that you might find that all your disk data is trashed and you need to build a z/OS system from scratch. Here are a couple of processes using FDRSAR from Innovation DP and DFDSS SAR from IBM.
If you use a disaster recovery service like Sunguard or Iron Mountain, they will supply a running starter system for you. Otherwise, the basic principle behind SAR is that you IPL from a tape by mounting an SAR tape in a tape drive and pointing HCD at the drive address. You then use the SAR program on the tape to recover the set of system volumes that are required for an IPL . You communicate with the SAR program using a very simple command menu that is displayed on the system console.
What if you just have virtual tape libraries? IBM technotes exist that state that it is possible to run SAR from a TS7700 using virtual tapes and drives. You need to set the drive you want into StandAlone mode and make sure it is offline to all other systems. Then you can use the Stand Alone console on the TS7700 to select and mount the SAR tape, which must exist in the library of course. Once you IPL from the tape, you can then mount the tapes required for your small system from the same console.
Recovering volumes with SAR is a tedious, time consuming process. What you want is to set up a minimal special system that contains enough z/OS to be able to IPL from, with some Network connectivity, your recovery software and some sample JCL libraries. It is possible to fit all that onto three or four disks, which you can backup to a single large capacity tape. You would store your SAR tape and your system tape in a secure location, but make sure that you refresh them every time you upgrade z/OS or your backup software
Check your documentation for full details on using SAR, but you need to know the location on tape of all your required disk backups, including the tape number and the label, and also a list of target UCB addresses that you will recover to. Once you recover your small system with SAR, you can IPL from the recovered SYSRES volume.
At this point, you have a very limited z/OS system up and running, with access to TSO and ISPF, but no record of any backups or tapes. You need to recover your backup catalog and your tape management catalog next, then after that recovery should be reasonably easy.
So, it should now be obvious that you need some kind of tape catalog listing that you can access independent of your mainframe, especially if you use a product like DFDSS that has no native catalog facility. You can either print out your tape catalog every day and store it along with your offsite tapes, or you can file transfer a catalog listing to a remote fileserver. A more sophisticated approach is to write a program that just extracts backups of the disks that you need, works out target addresses, then copies these off to a safe location.
Creating an ABR SAR tape
//S1 EXEC PGM=IEBCOPY
//SYSUT1 DD DISP=OLD,DSN=JCL,VOL=SER=FDR54T,
//SYSUT2 DD DSN=your.name,DISP=OLD
//SYSPRINT DD SYSOUT=J,OUTLIM=900000
//SYSIN DD *
Creating a DFDSS SAR tape
//STEP1 EXEC PGM=ADRDSSU,PARM='UTILMSG=YES'
//SAMODS DD DSN=SYS1.SADRYLIB,DISP=SHR
//TAPEDD DD DSN=ADRSA.IPLT,UNIT=BUNK,LABEL=(,NL),
//SYSPRINT DD SYSOUT=A
//SYSIN DD *
back to top