According to a survey carried out by in 2018 IDG Cloud Computing, 'Seventy-three percent of organizations have at least one application, or a portion of their computing infrastructure already in the cloud'. Many people like the idea of a limitless storage capacity in the Cloud, without really knowing what that means. One of the issues facing a storage architect is that business people hear about the Cloud, and want to know why they are not using it. Another issue that gets in the way of developing a cloud strategy is that there is some confusion about the Cloud is and means.
One definition is that the Cloud is all about delivering computing resource as a service, usually over the internet, where the customer just focuses on the service and does not need to worry about the technical implementation behind that service. The physical and virtual resources that host the cloud services are pooled and can be shared between several customers, which allows flexibility and economies of scale. Electricity supply is often used as an analogy for the Cloud. Startup companies do not generally build their own power stations to meet their electricity needs, they get their power from a utility supplier who will supply power on demand and the customer only pays for the power needed at a given time. The utility supplier manages the standby resources that are needed to cope with peaks in demand and the cost of this is shared among all the customers.
Flexibility is the key here, cloud technology means that computing services are automatically provisioned and de provisioned as and when necessary. This matching of demand with resource availability means that the cloud is very cost effective. The Services provided by the Cloud are broadly classified as Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS).
From a Storage perspective, IaaS is probably the most important as it specifically deals with hardware delivered as a service. As well as the data storage hardware, IaaS includes virtualised server space and the network to connect it all. The pool of physical servers are often distributed across several data centers for resilience.
The PaaS model relates to application development and includes the IaaS hardware services and also those software and configuration items that developers need to create applications.
SaaS is a software delivery model in which applications are hosted and made available to customers over a network connection, usually the internet. A popular example of SaaS is facebook, but common business applications are available in areas like accounting, invoicing, sales, communications and CRM systems.
There are three types of cloud provision models, Public, Private and Hybrid.
The 'Public cloud' model is the closest to the ideal above. The cloud providers own and operate the physical infrastructure in their own datacenters and usually provide access to cloud services over the internet. The physical resource pools can be enormous, so the provision is very cost effective and very scalable. Some applications are even free to the end consumer, as the supplier can make the application pay from advertising revenue. Facebook is an obvious example again.
Where the resources are charged, they are usually pay-as-you-go, so you only pay for what you use at that time and do not need to maintain an unused but expensive capacity overhead.
Because public clouds are usually very big and spread over several datacenters, they are very resilient. It is even possible for an entire datacenter to go offline without affecting applications. As the public cloud is usually accessed over the internet, it can be accessed from almost anywhere, from a variety of devices.
While a public cloud sounds like utopia, one of the issues with it is security. It can be accessed by all from anywhere and this leaves it open to abuse and hacking. The 'Private Cloud' model tries to fix this issue by ensuring that the cloud resources can only be accessed by a single organisation. The services are managed by a third party and may be hosted offsite, but some are even hosted on the customer's premises behind a firewall to ensure access is locked down. Access may be across private leased lines or over public networks with secure encrypted connections.
One advantage that the private cloud does give you over in-house systems is 'Cloud bursting'. You host your applications in a private cloud, but when you get spikes in your demand, you can switch some less critical applications into the public cloud. This frees up private cloud resources for critical applications, without having to maintain a spare capacity overhead to cope with those spikes.
This brings us neatly to the third type of cloud, the 'Hybrid Cloud'.
A Hybrid cloud consists of two or more clouds, private and public, that are separate, but can be connected together. They may even be managed by different cloud providers. The Cloud Burst function described above is an example of a hybrid cloud. Another example would be an application that stores detailed personal information in a private cloud, but passes anonymised data over to a business intelligence application that runs on a public cloud.
A multi-cloud environment is one that uses more than one public cloud.
OK, so if we accept that we now know what the Cloud is, then maybe we can define a strategy. So where do you start with your Cloud strategy?
Like any other IT strategy, you start by understanding what your business is up to, where it sits now, and what its objectives are for the future. Once you understand this, you can establish a realistic cloud adoption strategy that is aligned with your business goals, so that your cloud adoption activities add value to your organization. This business alignment stuff is discussed in detail in the Writing an IT Strategy page, but some cloud specific questions that will help you to do this are:
Once you have your company objectives documented, you can start looking at the detail of what the cloud can provide for you. The two important questions are; what is your organisations appetite for risk and do you need the ability to access data anywhere from any device with an internet connection. If your business is willing to hand its data over to a Cloud provider, then a public Cloud may be ideal, but if their appetite for risk is low, then you probably need a private Cloud. However a Hybrid approach may be best, with critical data kept onsite in a private Cloud, and less critical data kept in a public Cloud.
Do you want object storage or block storage? Object stores will let you access data from any device by using a URL address. However if this is not your objective, and you need to store data in a file system and randomly access subsets of that data programmatically, then Block Storage is a better option. You would create a storage volume that can be mounted as a raw block device, then create a file system on the device and use it as an additional hard drive.
If you decide to outsource your data to a public Cloud, then you are still responsible for making sure that the data is safe and secure. There are two ways to protect yourself from dataloss in the Cloud, Redundancy and Snapshots. Redundancy can range from having checksumm bits, to various types of RAID protection, right to having your data synchronously mirrored between two sites. Obviously, the more redundancy you require, the more you will have to pay. So you need to ask, How many redundant copies are necessary? and, how many redundant copies are too many? If you can easily reconstruct data or you can lose it without too much impact, then you should consider cheaper, reduced redundancy storage. For example large datasets used for test and development are suitable for reduced redundancy storage, as long as your developers can recreate that data. Absolute business critical data, like your customer database, should be RAID protected and replicated.
Redundancy will protect you from hardware errors, and even site failure if you are using remote mirroring, but it will not protect you against all data corruption or from file deletion. For this, you need a snapshot of the data at a previous point in time. You need to decide how frequently to create snapshots and how long they should be retained.
This leads neatly into general backup and recovery. The Cloud is very suitable for storing backup data as you can use the economy and scale of public cloud based storage, and backup data is usually considered to be low risk. If you decide to replace your tape backup storage with lower cost, online and on-demand cloud storage, then you need to decide how much data redundancy you need for that backup data. Do you really need to back up backup data?
Long term archive data is also suitable for the cloud, but unlike backup data, which is just a copy of live data, archive data is usually the only copy of a set of live data that may be required in the future, but probably will not. Examples could be year end tax records and end of project records. In this case, you will need a backup, but as the data is unlikely to change, maybe a single backup is all that is required. You must also have a process in place to identify when archive data is no longer required so it can be deleted.
So how long should you retain copies of cloud-based data? I can't give you an answer to that, it depends on the data, its importance to your company, and any legal requirements. Different types of data will require different retention periods. If you can implement some kind of life cycle management techniques to manage your data, so it is kept for the correct length of time, then deleted when it is not required, then that would be ideal.
The good side storing your data in a public Cloud is that your providers will bill you for the storage, which makes it very easy to pass those costs on to the relevant business department or development team. When they see the direct cost, they may then decide that they can handle shorter retention times and fewer backups, but this must be balanced with the risk of data not being available when needed.