Data Protection Diaries Fundamentals Walking The Data Protection Talk

November 26, 2017 – 6:51 pm

Data Protection Diaries Walking The Data Protection Talk

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg November 26, 2017

This is Part 8 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Software Defined Data Infrastructure Essentials Book SDDC

Click here to view the previous post Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends, and click here to view the next post who’s Doing What ( Toolbox Technology Tools).

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

In this post the focus is around what I (and Server StorageIO) does for Data Protection besides just talking the talk and is a work in progress that is being updated over time with additional insights.

Walking The Data Protection Talk What I Do

A couple of years back I did the first post as part of the Data Protection Diaries series ( view here), that included the following image showing some data protection needs and requirements, as well as what being done, along with areas for improvement. Part of what I and Server StorageIO does involves consulting (strategy, design, assessment), advising and other influencers activities (e.g. blog, write articles, create reports, webinars, seminars, videos, podcasts) pertaining to data Infrastructure topics as well as data protection.

What this means is knowing about the trends, tools, technologies, what’s old and new, who’s doing what, what should be in the data protection toolbox, as well as how to use those for different scenarios. Its one thing to talk the talk, however I also prefer to walk the talk including eating my own dog food applying various techniques, approaches, tools and technologies discussed.

The following are from a previous Data Protection Diaries post where I discuss my data protection needs (and wants) some of which have evolved since then. Note the image on the left is my Livescribe Echo digital pen and paper tablet. On the right is an example of the digital image created and imported into my computer from the Livescribe. In other words, Im able to protect my hand written notes, diagrams and figures.

Data Protection Diaries Data Protection Diaries Walking The Talk
Via my Livescribe Echo digital pen ( get your Livescribe here at

My Environment and data protection is always evolving, some based on changing projects, others that are more stable. Likewise the applications along with data are varied after all, everything is not the same. My data protection includes snapshots, replication, mirror, sync, versions, backup, archive, RAID, erasure code among others technologies, tools, and techniques.

Applications range from desktop, office, email, documents, spreadsheets, presentations, video, audio and related items in support of day-to-day activities. Then there are items part of various projects that range from physical to virtual, cloud and container leveraging various tools. This means having protection copies (sync, backup, snapshots, consistency points) of virtual machines, physical machine instances, applications and databases such as SQL Server among many others. Other application workloads include web, word press blog and email among others.

The Server StorageIO environment consists of a mix of legacy on-premises technologies from servers, storage, hardware, software, networks, tools as well as software defined virtual (e.g. VMware, Hyper-V, Docker among others), as well as cloud. The StorageIO data Infrastructure environment consists of dedicated private server (DPS) that I have had for several years now that supports this blog as well as other sites and activity. I also have a passive standby site used for testing of the WordPress based blog on an AWS Lightsail server. I use tools such as Updraft Plus Premium to routinely create a complete data protection view (database, plugins, templates, settings, configuration, core) of my WordPress site (runs on DPS) that is stored in various locations, including at AWS.

Data Protection Diaries Walking The Talk
Some of my past data protection requirements (they have evolved)

Currently the Lightsail Virtual Private Server (VPS) is in passive mode, however plans are to enable it as a warm or active standby fail over site for some of the DPS functions. One of the tools I have for monitoring and insight besides those in WordPress and the DPS are AWS Route 53 alerts that I have set up to monitor endpoints. AWS Route 53 is a handy resource for monitoring your endpoints such as a website, blog among other things and have it notify you, or take action including facilitating DNS fail over if needed. For now, Im simply using Route 53 besides as a secondary DNS as a notification tool.

Speaking of AWS, I have compute instances in Elastic Cloud Compute (EC2) along with associated Elastic Block Storage (EBS) volumes as well as their snapshots. I also have AWS S3 buckets in different regions that are on various tiers from standard to infrequent access (IA), as well as some data on Glacier. Data from my DPS at Bluehost gets protected to a AWS S3 bucket that I can access from AWS EC2, as well as via other locations including Microsoft Azure as needed.

Some on-premises data also gets protected to AWS S3 (as well as to elsewhere) using various tools, for different granularity, frequency, access and retention. After all, everything is not the same, why treat it the same. Some of the data protected to AWS S3 buckets is in native format (e.g. they appear as objects to S3 or object enabled applications), as well as file to file based applications with appropriate tools.

Other data that is also protected to AWS S3 from different data protection or backup tools are stored in vendor neutral or vendor specific save set, zip, tar ball or other formats. In other words, I need the tool or compatible tool that knows the format of the saved data to retrieve individual data files, items or objects. Note that this is similar to storing data on tape, HDDs, SSD or other media in native format vs. in some type of encapsulate save set or other format.

In addition to protecting data to AWS, I also have data at Microsoft Azure among other locations. Other locations include non-cloud based off-site where encrypted removable media is periodically taken to a safe secure place as a master, gold in case of major emergency, ransomeware copy.

Why not just rely on cloud copies?

Simple, I can pull individual files or relatively small amounts of data back from the cloud sometimes faster (or easier) than from on-site copies, let alone my off-site, off-line, air gap copies. On the other hand, if I need to restore large amounts of data, without a fast network, it can be quicker to get the air gap off-line, off-site copy, do the large restore, then apply incremental or changed data via cloud. In other a hybrid approach.

Now a common question I get is why not just do one or the other and save some money. Good point, I would save some money, however by doing the above among other things, they are part of being able to test, try new and different things, gain insight, experience not to mention walk the talk vs. simply talking the talk.

Of course Im always looking for ways to streamline to make my data protection more efficient, as well as effective (along with remove complexity and costs).

  • Everything is not the same, so why treat it all the same with common SLO, RTO, RPO and retention?
  • Likewise why treat and store all data the same way, on the same tiers of technology
  • Gain insight and awareness into environment, applications, workloads, PACE needs
  • Applications, data, systems or devices are protected with different granularity and frequency
  • Apply applicable technology and tools to the task at hand
  • Any data I have in cloud has a copy elsewhere, likewise, any data on-premises has a copy in the cloud or elsewhere
  • I implement the 4 3 2 1 rule by having multiple copies, versions, data in different locations, on and off-line including cloud
  • From a security standpoint, many different things are implemented on a logical as well as physical basis including encryption
  • Ability to restore data as well as applications or image instances locally as well as into cloud environments
  • Leverage different insight and awareness, reporting, analytics and monitoring tools
  • Mix of local storage configured with different RAID and other protection
  • Test, find, fix, remediate improve the environment including leveraging lessons learned

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Everything is not the same, thats why in my environment I use different technologies, tools and techniques to protect my data. This also means having different RTO, RPO across various applications, data and systems as well as devices. Data that is more important has more copies, versions in different locations as well as occurring more frequently as part of 4 3 2 1 data protection. Other data that does not change as frequently, or time sensitive have alternate RTO and RPO along with corresponding frequency of protection.

Get your copy of Software Defined Data Infrastructure Essentials here at, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series Part 9 who’s Doing What (Toolbox Technology Tools).

Ok, nuff said, for now.


Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2018 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.