management Archives

November 26, 2017December 29, 2025

Data Protection Diaries Access Availability RAID Erasure Codes LRC Deep Dive

Access Availability RAID Erasure Codes including LRC Deep Dive

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 3 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 2 Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals, and click here to view the next post Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

In this post part of the Data Protection diaries series as well as companion to Chapter 9 of SDDI Essentials book, we are going on a longer, deeper dive. We are going to look at availability, access and durability including mirror, replication, RAID including various traditional and newer parity approaches such as Erasure Codes ( EC), Local Reconstruction Code (LRC), Reed Solomon (RS) also known as RAID 2 among others. Later posts in this series look at point in time data protection to support recovery to a given time (e.g. RPO), while this and the previous post look at maintaining access and availability.

Keep in mind that if something can fail, it probably will, also that everything is not the same meaning different environments, application workloads (along with their data). Different environments and applications have diverse performance, availability, capacity economic (PACE) attributes, along with service level objectives ( SLOs). Various SLOs include PACE attributes, recovery point objectives ( RPO), recovery time objective ( RTO) among others.

Availability, accessibility and durability (see part two in this series) along with associated RAS topics are part of what enable RTO, as well as meet Faults (or failures) to tolerate ( FTT). This means that different fault tolerance modes ( FTM) determine what technologies, tools, trends and techniques to use to meet different RTO, FTT and application PACE needs.

Maintaining access and availability along with durability (e.g. how many copies of data as well as where stored) protects against loss or failure of a component device ( SSD, HDDs, adapters, power supply, controller), node or system, appliance, server, rack, clusters, stamps, data center, availability zones, regions, or other Fault or Failure domains spanning hardware, software, and services.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Protection Access Availability RAID Erasure Codes

This is a good place to mention some context for RAID and RAID array, which can mean different things pertaining to Data Protection. Some people associate RAID with a hardware storage array, or with a RAID card. Other people consider an array to be a storage array that is a RAID enabled storage system. A trend is to refer to legacy storage systems as RAID arrays or hardware-based RAID, to differentiate from newer implementations.

Context comes into play in that a RAID group (i.e., a collection of HDDs or SSD that is part of a RAID set) can be referred to as an array, a RAID array, or a virtual array. What this means is that while some RAID implementations may not be relevant, there are many new and evolving variations extending parity based protection making at least software-defined RAID still relevant

Keep context in mind, and don’t be afraid to ask what someone is referring to: a particular vendor storage system, a RAID implementation or packaging, a storage array, or a virtual array. Also keep the context of the virtual array in perspective vs. storage virtualization and virtual storage. RAID as a term is used to refer to different modes such as mirroring or parity, and parity can be legacy RAID 4, 5, or 6 along with erasure codes (EC). Note some people refer to erasure codes in the context of not being a RAID system, which can be an inference to not being a legacy storage system running hardware RAID (e.g. not software or software defined).

The following figure (9.13) shows various availability protection schemes (e.g. not recovery point) that maintain access while protecting against loss of a component, device, system, server, site, region or other part of a fault domain. Since everything is not the same with environments and applications having different Performance Availability Capacity Economic ( PACE) attributes, there are various approaches for enabling availability along with accessibility.

Keep in mind that RAID and Erasure codes along with their various, as well as replication and mirroring by themselves are not a replacement for backup or other point in time (e.g. enable recovery point) protection.

Instead, availability technologies such as RAID and erasure code along with mirror as well as replication need to be combined with snapshots, point in time copies, consistency points, checkpoints, backups among other recovery point protection for complete data protection.

Speaking of replacement for backup, while many vendors and their pundits claIm or want to see backup as being dead, as long as they keep talking about backup instead of broader data protection backup will remain alive.

SDDC SDDI RAID Parity Erasure Code EC
Figure 9.13 Various RAID, Mirror, Parity and Erasure Code (EC) approaches

Different RAID levels (including parity, EC, LRC and RS based) will affect storage energy effectiveness, similar to various SSD or HDD performance capacity characteristics; however, a balance of performance, availability, capacity, and energy needs to occur to meet application service needs. For example, RAID 1 mirroring or RAID 10 mirroring and striping use more HDDs and, thus, power, but will yield better performance than RAID 6 and erasure code parity protection.

	Normal performance	Availability	Performance overhead	Rebuild overhead	Availability overhead
RAID 0 (stripe)	Very good read & write	None	None	Full volume restore	None
RAID 1 (mirror or replicate)	Good reads; writes = device speed	Very good; two or more copies	Multiple copies can benefit reads	Re-synchronize with existing volume	2:1 for dual, 3:1 for three-way copies
RAID 4 (stripe with dedicated parity, i.e., 4 + 1 = 5 drives total)	Poor writes without cache	Good for smaller drive groups and devices	High on write without cache (i.e., parity)	Moderate to high, based on number and type of drives	Varies; 1 Parity/N, where N = number of devices
RAID 5 (stripe with rotating parity, 4 + 1 = 5 drives)	Poor writes without cache	Good for smaller drive groups and devices	High on write without cache (i.e., parity)	Moderate to high, based on number and type of drives	Varies 1 Parity/N, where N = number of devices
RAID 6 (stripe with dual parity, 4 + 2 = 6 drives)	Poor writes without cache	Better for larger drive groups and devices	High on write without cache (i.e., parity)	Moderate to high, based on number and type of drives	Varies; 2 Parity/N, where N = number of devices
RAID 10 (mirror and stripe)	Good	Good	Minimum	Re-synchronize with existing volume	Twice mirror capacity stripe drives
Reed-Solomon (RS) parity, also known as erasure code (EC), local reconstruction code (LRC), and SHEC	Ok for reads, slow writes; good for static and cold data with front-end cache	Good	High on writes (CPU for parity calculation, extra I/O operations)	Moderate to high, based on number and type of drives, how implemented, extra I/Os for reconstruction	Varies, low overhead when using large number of devices; CPU, I/O, and network overhead.

Table 9.3 Common RAID Characteristics

Besides those shown in table 9.3, other RAID including parity based approaches include 2 (Reed Solomon), 3 (synchronized stripe and dedicated parity) along with others including combinations such as 10, 01, 50, 60 among others.

Similar to legacy parity-based RAID, some erasure code implementations use narrow drive groups while others use larger ones to increase protection and reduce capacity overhead. For example, some larger enterprise-class storage systems (RAID arrays) use narrow 3 + 1 or 4 + 1 RAID 5 or 4 + 2 or 6 + 2 RAID 6, which have higher protection storage capacity overhead and fault=impact footprint.

On the other hand, many smaller mid-range and scale-out storage systems, appliances, and solutions support wide stripes such as 7 + 1, 15 + 1, or larger RAID 5, or 14 + 2 or larger RAID 6. These solutions trade the lower storage capacity protection overhead for risk of a multiple drive failures or impacts. Similarly, some EC implementations use relatively small groups such as 6, 2 (8 drives) or 4, 2 (6 drives), while others use 14, 4 (18 drives), 16, 4 (20 drives), or larger.

Table 9.4 shows options for a number of data devices (k) vs. a number of protect devices (m).

k (data devices)	m (protect devices)	Availability; Resiliency	Space capacity overhead	Normal performance	FTT	Comments; Examples
Narrow	Wide	Very good; Low impact of rebuild	Very high	Good (R/W)	Very good	Trade space for RAS; Larger m vs. k; 1, 1; 1, 2; 2, 2; 4, 5
Narrow	Narrow	Good	Good	Good (R/W)	Good	Use with smaller drive groups; 2, 1; 3, 1; 6, 2
Wide	Narrow	Ok to good; With larger m value	Low as m gets larger	Good (read); Writes can be slow	Ok to good	Smaller m can impact rebuild; 3, 1; 7, 1; 14, 2; 13, 3
Wide	Wide	Very good; Balanced	High	Good	Very good	Trade space for RAS; 2, 2; 4, 4; 8, 4; 18, 6

Table 9.4. Comparing Various Data Device vs. Protect Device Configurations

Note that wide k with no m, such as 4, 0, would not have protection. If you are focused on reducing costs and storage space capacity overhead, then a wider (i.e., more devices) with fewer protect devices might make sense. On the other hand, if performance, availability, and minimal to no impact during rebuild or reconstruction are important, then a narrower drive set, or a smaller ratio of data to protect drives, might make sense.

Also note that the higher or larger the RAID number, or parity scheme, or number of "m" devices in a parity and erasure code group may not be better, likewise smaller may not be better. What is better is which approach meets your specific application performance, availability, capacity, economic (PACE) needs, along with SLO, RTO, RPO requirements. What can also be good is to use hybrid approaches combining different technologies and tools to facilitate both access, availability, durability along with point in time recovery across different layers of granularity (e.g. device, drive, adapter, controller, cabinet, file system, data center, etc).

Some focus on the lower level RAID as the single or primary point of protection, however watch out for that being your single point of failure as well. For example, instead of building a resilient RAID 10 and then neglecting to have adequate higher level access, as well as recovery point protection, combine different techniques including file system protection, snapshots, and backups among others.

Figure 9.14 shows various options and considerations for balancing between too many or too few data (k) and protect (m) devices. The balance is about enabling particular FTT along with PACE attributes and SLO. This means, for some environments or applications, using different failure-tolerant modes ( FTM) in various combinations as well as configurations.

SDDC SDDI Data Protection
Figure 9.14 Comparing various data drive to protection devices

Figure 9.14 top shows no protection overhead (with no protection); the bottom shows 13 data drives and three protection drives in an EC (RS or LRC among others) configuration that could tolerate three devices failing before loss of data or access occurs. In between are various options that can also be scaled up or down across a different number of devices ( HDDs, SSD, or systems).

Some solutions allow the user or administrator to configure the I/O chunk, slabs, shard, or stripe size, for example, from 8 KB to 256 KB to 1 MB (or larger), aligning with application workload and I/O profiles. Other options include the ability to set or disable read-ahead, write-through vs. write-back cache (with battery-protected cache), among other options.

The width or number of devices in a RAID parity or erasure group is based on a combination of factor, including how much data is to be stored and what your FTT objective is, along with spreading out protection overhead. Another consideration is whether you have large or small files and objects.

For example, if you have many small files and a wide stripe, parity, or erasure code set with a large chunk or shard size, you may not have an optimal configuration from a performance perspective.

The following figure shows combing various data protection availability and accessibility technologies including local as well as remote mirroring and replication, along with parity or erasure code (including LRC, RS, SHEC among others) approaches. Instead of just using one technology, a hybrid approach is used leveraging mirror (local on SSD) and replication across sites including asynchronous and synchronous. Replication modes include Asynchronous (time-delayed, eventual consistency) for longer distance, higher latency networks, and synchronous (strong consistency, real-time) for short distance or low-latency networks.

Note that the mirror and replication can be done in software deployed as part of a storage system, appliance or as tin-wrapped software, virtual machine, virtual storage appliance, container or some other deployment mode. Likewise RAID, parity and erasure code software can be deployed and packaged in different ways.

In addition to mirror and replication, solutions are also using parity based including erasure code variations for lower cost, less active data. In other words, the mirror on SSD handles active hot data, as well as any buffering or cache, while lower performance, higher capacity, lower cost data gets de-staged or migrated to a parity erasure code tier. Some vendors, service provider and solutions leveraging variations of the approach in figure 9.15 include Microsoft ( Azure and Windows) and VMware among others.

SDDC SDDI Data Protection
Figure 9.15 Combining various availability data protection techniques

A tradecraft skill is finding the balance, knowing your applications, the data, and how the data is allocated as well as used, then leveraging that insight and your experience to configure to meet your application PACE requirements.

Consider:

Number of drives (width) in a group, along with protection copies or parity
Balance rebuild performance impact and time vs. storage space overhead savings
Ability to mix and match various devices in different drive groups in a system
Management interface, tools, wizards, GUIs, CLIs, APIs, and plug-ins
Different approaches for various applications and environments
Context of a physical RAID array, system, appliance, or solution vs. logical

Erasure Codes (EC)

Erasure Codes ( EC) combines advanced protection with variable space capacity overhead over many drives, devices, or systems using large parity chunks, shards compared to traditional parity RAID approaches. There are many variations of EC as well as parity based approaches, some are tied to Reed Solomon (RS) codes while others use different approaches.

Note that some EC are optimized for reducing the overhead and cost of storing data (e.g. less space capacity) for inactive, or primarily read data. Likewise, some EC or variations are optimized for performance of reads/writes as well as reducing overhead of rebuild, reconstructions, repairs with least impact. Which EC or parity derivative approach is best depends on what you are trying to do or impact to avoid.

Reed Solomon (RS) codes

Reed Solomon (RS) codes are advanced parity protection mathematical algorithm technique that works well on large amounts of data providing protection with lower space capacity overhead depending on how configured. Many Erasure Codes (EC) are based on derivatives of RS. Btw, did you know (or remember) that RAID 2 (rarely used with few legacy implementations) has ties to RS codes? Here are some additional links to RS including via Backblaze, CMU, and Dr Dobbs.

Local Reconstruction Codes (LRC)

Microsoft leverages LRC in Azure as well as in Windows Servers. LRC are optimized for a balance of protection, space capacity savings, normal performance as well as reducing impact on running workloads during a repair, rebuild or reconstruction. One of the tradeoffs that LRC uses is to add some amount of additional space capacity in exchange for normal and abnormal (e.g. during repair) performance improvements. Where RS, EC and other parity based derivatives typically use a (k,m) nomenclature (e.g. data, protection), LRC adds an extra variable to help with constructions (k,m,n).

Some might argue that LRC are not as space efficient as other EC, RS or parity derivative variations of which the counter argument can be that some of those approaches are not as performance effective. In other words, everything is not the same, one approach does not or should not have to be applied to all, unless of course your preferred solution approach can only do one thing.

Additional LRC related material includes:

(PDF by Microsoft) LRC Erasure Coding in Windows Storage Spaces
(Microsoft Usenix Paper) Best Paper Award Erasure Coding in Azure
(Via MSDN Shared) Azure Storage Erasure Coding with LRC
(Via Microsoft) Azure Storage with Strong Consistency
(Paper via Microsoft) 23rd ACM Symposium on Operating Systems Principles (SOSP)
(Microsoft) Erasure Coding in Azure with LRC
(Via Microsoft) Good collection of EC, RS, LRC and related material
(Via Microsoft) Storage Spaces Fault Tolerance
(Via Microsoft) Better Way To Store Data with EC/LRC
(Via Microsoft) Volume resiliency and efficiency in Storage Spaces

Shingled Erasure Code (SHEC)

Shingled Erasure Codes (SHEC) are a variation of Erasure Codes leveraging shingled overlay approach similar to what is being used in Shingled Magnetic Recording (SMR) on some HDDs. Ceph has been an early promoter of SHEC, read more here, and here.

Replication and Mirroring

Replication and Mirroring create a mirror or replica copy of data across different devices, systems, servers, clusters, sites or regions. In addition to keeping a copy, mirror and replication can occur on different time intervals such as real-time ( synchronous) and time deferred (Asynchronous). Besides time intervals, mirror and replication are implemented in different locations at various altitudes or stack layers from lower level hardware adapter or storage systems and appliances, to operating systems, hypervisors, software defined storage, volume managers, databases and applications themselves.

Covered in more detail in chapters 5 and 6, synchronous provides real-time, strong consistency, although high-latency local or remote interfaces can impact primary application performance. Note there is a common myth that high-latency networks are only long distance when in fact some local networks can also be high-latency. Asynchronous (also discussed in more depth in chapters 5 and 6) enables local and remote high-latency communications to be spanned, facilitating protection over a distance without impacting primary application performance, albeit with lower consistency, time deferred, also known as eventual consistency.

Mirroring (also known as RAID 1) and replication creates a copy (a mirror or replica) across two or more storage targets (devices, systems, file systems, cloud storage service, applications such as a database). The reason for using mirrors is to provide a faster (for normal running and during recovery) failure-tolerant mode for enabling availability, resiliency, and data protection, particularly for active data.

Figure 9.10 shows general replication scenarios. Illustrated are two basic mirror scenarios: At the top, a device, volume, file system, or object bucket is replicated to two other targets (i.e., three-way or three replicas); At the bottom, is a primary storage device using a hybrid replica and dispersal technique where multiple data chunks, shards, fragments, or extents are spread across devices in different locations.

SDDC SDDI Mirror and Replication
Figure 9.10 Various Mirror and Replication Approaches

Mirroring and replication can be done locally inside a system (server, storage system, or appliance), within a cabinet, rack, or data center, or remotely, including at cloud services. Mirroring can also be implemented inside a server in software or using RAID and HBA cards to off-load the processing.

SDDC SDDI Mirror Replication Techniques
Figure 9.11 Mirror or Replication combined with Snapshots or other PiT protection

Keep in mind that mirroring and replication by themselves are not a replacement for backups, versions, snapshots, or another recovery point, time-interval (time-gap) protection. The reason is that replication and mirroring maintain a copy of the source at one or more destination targets. What this means is that anything that changes on the primary source also gets applied to the target destination (mirror or replica). However, it also means that anything changed, deleted, corrupted, or damaged on the source is also impacted on the mirror replica (assuming the mirror or replicas were or are mounted and accessible on-line).

implementations in various locations (hardware, software, cloud) include:

Applications and databases such as SQL Server, Oracle among others
File systems, volume manager, Software-defined storage managers
Third-party storage software utilities and drivers
Operating systems and hypervisors
Hardware adapter and off-load devices
Storage systems and appliances
Cloud and managed services

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Revisiting RAID storage remains relevant and resources
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Data Protection Diaries series
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

There are various data protection technologies, tools and techniques for enabling availability of information resources including applications, data and data Infrastructure resources. Likewise there are many different aspects of RAID as well as context from legacy hardware based to cloud, virtual, container and software defined. In other words, not all RAID is in legacy storage systems, and there is a lot of FUD about RAID in general that is probably actually targeted more at specific implementations or products.

There are different approaches to meet various needs from stripe for performance with no protection by itself, to mirror and replication, as well as many parity approaches from legacy to erasure codes including Reed Solomon based as well as LRC among others. Which approach is best depends on your objects including balancing performance, availability, capacity economic (PACE) for normal running behavior as well as during faults and failure modes.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

Ok, nuff said, for now.

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2026 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

November 26, 2017November 26, 2023

Data Protection Fundamentals Recovery Points (Backup, Snapshots, Versions)

Enabling Recovery Points (Backup, Snapshots, Versions)

Updated 1/7/18

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 4 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC, and click here to view the next post Part 5 Point In Time Data Protection Granularity Points of Interest.

In this post the focus is around Data Protection Recovery Points (Archive, Backup, Snapshots, Versions) from Chapter 10 .

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Enabling RPO (Archive, Backup, CDP, PIT Copy, Snapshots, Versions)

SDDC SDDI Data Protection Points of Interests
Figure 9.5 Data Protection and Availability Points of Interest

RAID, including parity and erasure code (EC) along with mirroring and replication, provide availability and accessibility. These by themselves, however, are not a replacement for backup (or other point in time data protection) to support recovery points. For complete data protection the solution is to combine resiliency technology with point-in-time tools enabling availability and facilitate going back to a previous consistency time.

Recovery point protection is implemented within applications using checkpoint and consistency points as well as log and journal switches or flush. Other places where recovery-point protection occurs include in middleware, database, key-value stores and repositories, file systems, volume managers, and software-defined storage, in addition to hypervisors, operating systems, containers, utilities, storage systems, appliances, and service providers.

In addition to where, there are also different approaches, technologies, techniques, and tools, including archive, backup, continuous data protection, point-in-time copies, or clones such as snapshots, along with versioning.

Common recovery point Data Protection related terms, technologies, techniques, trends and topics pertaining to data protection from availability and access to durability and consistency to point in time protection and security are shown below.

Time interval protection for example with Snapshot, backup/restore, point in time copies, checkpoints, consistency point among other approaches can be scheduled or dynamic. They can also vary by how they copy data for example full copy or clone, or incremental and differential (e.g. what has changed) among other techniques to support 4 3 2 1 data protection. Other variations include how many concurrent copies, snapshots or versions can take place, along with how many stored and for how long (retention).

Additional Data Protection Terms

Copy Data Management ( CDM) as its name implies is associated managing various data copies for data protection, analytics among other activities. This includes being able to identify what copies exist (along with versions), where they are located among other insight.

Data Protection Management ( DPM) as its name implies is the management of data protection from backup/restore, to snapshots and other recovery point in time protection, to replication. This includes configuration, monitoring, reporting, analytics, insight into what is protected, how well it is protected, versions, retention, expiration, disposition, access control among other items.

Number of 9s Availability – Availability (access or durability or access and availability) can be expressed in number of nines. For example, 99.99 (four nines), indicates the level of availability (downtime does not exceed) objective. For example, 99.99% availability means that in a 24-hour day there could be about 9 seconds of downtime, or about 52 minutes and 34 seconds per year. Note that numbers can vary depending on whether you are using 30 days for a month vs. 365/12 days, or 52 weeks vs. 365/7 for weeks, along with rounding and number decimal places as shown in Table 9.1.

Uptime	24-hour Day	Week	Month	Year
99	0 h 14 m 24 s	1 h 40 m 48 s	7 h 18 m 17 s	3 d 15 h 36 m 15 s
99.9	0 h 01 m 27 s	0 h 10 m 05 s	0 h 43 m 26 s	0 d 08 h 45 m 36 s
99.99	0 h 00 m 09 s	0 h 01 m 01 s	0 h 04 m 12 s	0 d 00 h 52 m 34 s
99.999	0 h 00 m 01s	0 h 00 m 07 s	0 h 00 m 36 s	0 d 00 h 05 m 15 s

Table 9.1 Number of 9’s Availability Shown as Downtime per Time Interval

Service Level Objectives SLOs are metrics and key performance indicators (KPI) that guide meeting performance, availability, capacity, and economic targets. For example, some number of 9’s availability or durability, a specific number of transactions per second, or recovery and restart of applications. Service-level agreement (SLA) – SLA specifies various service level objectives such as PACE requirements including RTO and RPO, among others that define the expected level of service and any remediation for loss of service. SLA can also specify availability objectives as well as penalties or remuneration should SLO be missed.

Recovery Time Objective RTO is how much time is allowed before applications, data, or data infrastructure components need to be accessible, consistent, and usable. An RTO = 0 (zero) means no loss of access or service disruption, i.e., continuous availability. One example is an application end-to-end RTO of 4 hours, meaning that all components (application server, databases, file systems, settings, associated storage, networks) must be restored, rolled back, and restarted for use in 4 hours or less.

Another RTO example is component level for different data infrastructure layers as well as cumulative or end to end. In this scenario, the 4 hours includes time to recover, restart, and rebuild a server, application software, storage devices, databases, networks, and other items. In this scenario, there are not 4 hours available to restore the database, or 4 hours to restore the storage, as some time is needed for all pieces to be verified along with their dependencies.

Data Loss Access DLA occurs when data still exists, is consistent, durable, and safe, but it cannot be accessed due to network, application, or other problem. Note that the inverse is data that can be accessed, but it is damaged. Data Loss Event DLE is an incident that results in loss or damage to data. Note that some context is needed in a scenario in which data is stolen via a copy but the data still exists, vs. the actual data is taken and is now missing (no copies exist). Also note that there can be different granularity as well as scope of DLE for example all data or just some data lost (or damaged). Data Loss Prevention DLP encompasses the activities, techniques, technologies, tools, best practices, and tradecraft skills used to protect data from DLE or DLA.

Point in Time (PiT) such as PiT copy or data protection refers to a recovery or consistency point where data can be restored from or to (i.e., RPO), such as from a copy, snapshot, backup, sync, or clone. Essentially, as its name implies, it is the state of the data at that particular point in time.

Recovery Point Objective RPO is the point in time to which data needs to be recoverable (i.e., when it was last protected). Another way of looking at RPO is how much data you can afford to lose, with RPO = 0 (zero) meaning no data loss, or, for example, RPO = 5 minutes being up to 5 minutes of lost data.

SDDC SDDI RTO RPO
Figure 9.8 Recovery Points (point in time to recover from), and Recovery Time (how long recovery takes)

Frequency refers to how often and on what time interval protection is performed.

4 3 2 1 and 3 2 1 data protection rule
Figure 9.4 Data Protection 4 3 2 1 and 3 2 1 rule

In the context of the 4 3 2 1 rule, enabling RPO is associated with durability, meaning number of copies and versions. Simply having more copies is not sufficient because if they are all corrupted, damaged, infected, or contain deleted data, or data with latent nefarious bugs or root kits, then they could all be bad. The solution is to have multiple versions and copies of the versions in different locations to provided data protection to a given point in time.

Timeline and delta or recovery points are when data can be recovered from to move forward. They are consistent points in the context of what is/was protected. Figure 10.1 shows on the left vertical axis different granularity, along with protection and consistency points that occur over time (horizontal axis). For example, data “Hello” is written to storage (A) and then (B), an update is made “Oh Hello,” followed by (C) full backup, clone, and master snapshot or a gold copy is made.

SDDC SDDI Data Protection Recovery consistency points
Figure 10.1 Recovery and consistency points

Next, data is changed (D) to “Oh, Hello,” followed by, at time-1 (E), an incremental backup, copy, snapshot. At (F) a full copy, the master snapshot, is made, which now includes (H) “Hello” and “Oh, Hello.” Note that the previous full contained “Hello” and “Oh Hello,” while the new full (H) contains “Hello” and “Oh, Hello.” Next (G) data is changed to “Oh, Hello there,” then changed (I) to “Oh, Hello there I’m here.” Next (J) another incremental snapshot or copy is made, date is changed (K) to “Oh, Hello there I’m over here,” followed by another incremental (L), and other incremental (M) made a short time later.

At (N) there is a problem with the file, object, or stored item requiring a restore, rollback, or recovery from a previous point in time. Since the incremental (M) was too close to the recovery point (RP) or consistency point (CP), and perhaps damaged or its consistency questionable, it is decided to go to (O), the previous snapshot, copy, or backup. Alternatively, if needed, one can go back to (P) or (Q).

Note that simply having multiple copies and different versions is not enough for resiliency; some of those copies and versions need to be dispersed or placed in different systems or locations away from the source. How many copies, versions, systems, and locations are needed for your applications will depend on the applicable threat risks along with associated business impact.

The solution is to combine techniques for enabling copies with versions and point-in-time protection intervals. PIT intervals enable recovering or access to data back in time, which is a RPO. That RPO can be an application, transactional, system, or other consistency point, or some other time interval. Some context here is that there are gaps in protection coverage, meaning something was not protected.

A good data protection gap is a time interval enabling RPO, or simply a physical and logical break and the distance between the active or protection copy, and alternate versions and copies. For example, a gap in coverage (e.g. bad data protection gap) means something was not protected.

A protection air or distance gap is having one of those versions and copies on another system, in a different location and not directly accessible. In other words, if you delete, or data gets damaged locally, the protection copies are safe. Furthermore, if the local protection copies are also damaged, an air or distance gap means that the remote or alternate copies, which may be on-line or off-line, are also safe.

Good Data Protection Gaps
Figure 9.9 Air Gaps and Data Protection

Figure 10.2 shows on the left various data infrastructure layers moving from low altitude (lower in the stack) host servers or bare metal (BM) physical machine (PM) and up to higher levels with applications. At each layer or altitude, there are different hardware and software components to protect, with various policy attributes. These attributes, besides PACE, FTT, RTO, RPO, and SLOs, include granularity (full or incremental), consistency points, coverage, frequency (when protected), and retention.

SDDC SDDI Data Protection Granularity
Figure 10.2 Protecting data infrastructure granularity and enabling resiliency at various stack layers (or altitude)

Also shown in the top left of Figure 10.2 are protections for various data infrastructure management tools and resources, including active directory (AD), Azure AD (AAD), domain controllers (DC), group policy objects (GPO) and organizational units (OU), network DNS, routing and firewall, among others. Also included are protecting management systems such as VMware vCenter and related servers, Microsoft System Center, OpenStack, as well as data protection tools along with their associated configurations, metadata, and catalogs.

The center of Figure 10.2 lists various items that get protected along with associated technologies, techniques, and tools. On the right-hand side of Figure 10.2 is an example of how different layers get protected at various times, granularity, and what is protected.

For example, the PM or host server BIOS and UEFI as well as other related settings seldom change, so they do not have to be protected as often. Also shown on the right of Figure 10.2 are what can be a series of full and incremental backups, as well as differential or synthetic ones.

Figure 10.3 is a variation of Figure 10.2 showing on the left different frequencies and intervals, with a granularity of focus or scope of coverage on the right. The middle shows how different layers or applications and data focus have various protection intervals, type of protection (full, incremental, snap, differentials), along with retention, as well as some copies to keep.

SDDC SDDI Data Protection Granularity
Figure 10.3 Protecting different focus areas with various granularities

Protection in Figures 10.2 and 10.3 for the PM could be as simple as documentation of what settings to configure, versions, and other related information. A hypervisors may have changes, such as patches, upgrades, or new drivers, more frequently than a PM. How you go about protecting may involve reinstalling from your standard or custom distribution software, then applying patches, drivers, and settings.

You might also have a master copy of a hypervisors on a USB thumb drive or another storage device that can be cloned, customized with the server name, IP address, log location, and other information. Some backup and data protection tools also provide protection of hypervisors (or containers and cloud machine instances) in addition to the virtual machine (VM), guest operating systems, applications, and data.

The point is that as you go up the stack, higher in altitude (layers), the granularity and frequency of protection increases. What this means is that you may have more frequent smaller protection copies and consistency points higher up at the application layer, while lower down, less frequent, yet larger full image, volume, or VM protection, combining different tools, technology, and techniques.

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Everything is not the same across different environments, data centers, data infrastructures, applications and their workloads (along with data, and its value). Likewise there are different approaches for enabling data protection to meet various SLO needs including RTO, RPO, RAS, FTT and PACE attributes among others. What this means is that complete data protection requires using different new (and old) tools, technologies, trends, services (e.g. cloud) in new ways. This also means leveraging existing and new techniques, learning from lessons of the past to prevent making the same errors.

RAID (mirror, replicate, parity including erasure codes) regardless of where and how implemented (hardware, software, legacy, virtual, cloud) by itself is not a replacement for backup, they need to be combined with recovery point protection of some type (backup, checkpoint, consistency point, snapshots). Also protection should occur at multiple levels of granularity (device, system, application, database, table) to meet various SLO requirements as well as different time intervals enabling 4 3 2 1 data protection.

Keep in mind what is it that you are protecting, why are you protecting it and against what, what is likely to happen, also if something happens what will its impact be, what are your SLO requirements, as well as minimize impact to normal operating, as well as during failure scenarios. For example do you need to have a full system backup to support recovery of an individual database table, or can that table be protected and recovered via checkpoints, snapshots or other fine-grained routine protection? Everything is not the same, why treat and protect everything the same way?

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 5 Point In Time Data Protection Granularity Points of Interest.

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Diaries Fundamental Point In Time Granularity Points of Interest

Data Protection Diaries Fundamental Point In Time Granularity

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 5 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions), and click here to view the next post Part 6 Data Protection Security Logical Physical Software Defined.

In this post the focus is around Data Protection points of granularity, addressing different layers and stack altitude (higher application and lower system level) Chapter 10 . among others.

Point-in-Time Protection Granularity Points of Interest

SDDC SDDI Data Protection Recovery consistency points
Figure 10.1 Recovery and consistency points

Figure 10.1 above is a refresh from previous posts about the role and importance of having various recovery points at different time intervals to enable data protection (and restoration). Building upon figure 10.1, figure 10.5 looks at different granularity of where and how data should be protected. Keep in mind that everything is not the same, so why treat everything the same with the same type of protection?

Figure 10.5 shows backup and Data Protection focus, granularity, and coverage. For example, at the top left is less frequent protection of the operating system, hypervisors, and BIOS, UEFI settings. At the middle left is volume, or device level protection (full, incremental, differential), along with various views on the right ranging from protecting everything, to different granularity such as file system, database, database logs and journals, and operating system (OS) and application software, along with settings.

SDDC SDDI Different Protection Granularity
Figure 10.5 Backup and data protection focus, granularity, and coverage

In Figure 10.5, note that the different recovery point focus and granularity also take into consideration application and data consistency (as well as checkpoints), along with different frequencies and coverage (e.g. full, partial, incremental, incremental forever, differential) as well as retention.

Tip – Some context is needed about object backup and backing up objects, which can mean different things. As mentioned elsewhere, objects refer to many different things, including cloud and object storage buckets, containers, blobs, and objects accessed via S3 or Swift, among other APIs. There are also database objects and entities, which are different from cloud or object storage objects.

Another context factor is that an object backup can refer to protecting different systems, servers, storage devices, volumes, and entities that collectively comprise an application such as accounting, payroll, or engineering, vs. focusing on the individual components. An object backup may, in fact, be a collection of individual backups, PIT copies, and snapshots that combined represent what’s needed to restore an application or system.

On the other hand, the content of a cloud or object storage repository ( buckets, containers, blobs, objects, and metadata) can be backed up, as well as serve as a destination target for protection.

Backups can be cold and off-line like archives, as well as on-line and accessible. However, the difference between the two, besides intended use and scope, is granularity. Archives are intended to be coarser and less frequently accessed, while backups can be more frequently and granular accessed. Can you use a backup for an archive and vice versa? A qualified yes, as an archive could be a master gold copy such as an annual protection copy, in addition to functioning in its role as a compliance and retention copy. Likewise, a full backup set to long-term retention can provide and enable some archive functions.

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

A common theme in this series as well as in my books, webinars, seminars and general approach to data infrastructures, data centers and IT in general is that everything is not the same, why treat it all the same? What this means is that there are differences across various environments, data centers, data infrastructures, applications, workloads and data. There are also different threat risks scenarios (e.g. threat vectors and attack surface if you like vendor industry talk) to protect against.

Rethinking and modernizing data protection means using new (and old) tools in new ways, stepping back and rethinking what to protect, when, where, why, how, with what. This also means protecting in different ways at various granularity, time intervals, as well as multiple layers or altitude (higher up the application stack, or lower level).

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 6 Data Protection Security Logical Physical Software Defined.

Ok, nuff said, for now.

November 26, 2017November 3, 2024

Data Infrastructure Data Protection Diaries Fundamental Security Logical Physical

Data Infrastructure Data Protection Security Logical Physical

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 6 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 5 – Point In Time Data Protection Granularity Points of Interest, and click here to view the next post Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends.

In this post the focus is around Data Infrastructure and Data Protection security including logical as well as physical from chapter 10 , 13 and 14 among others.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

There are many different aspects of security pertaining to data infrastructures that span various technology domains or focus areas from higher level application software to lower level hardware, from legacy to cloud an software-defined, from servers to storage and I/O networking, logical and physical, from access control to intrusion detection, monitoring, analytics, audit, monitoring, telemetry logs, encryption, digital forensics among many others. Security should not be an after thought of something done independent of other data infrastructure, data center and IT functions, rather integrated.

Security Logical Physical Software Defined

Physical security includes locked doors of facilities, rooms, cabinets or devices to prevent un-authorized access. In addition to locked doors, physical security also includes safeguards to prevent accidental or intentional acts that would compromise the contents of a data center including data Infrastructure resources (servers, storage, I/O networks, hardware, software, services) along with the applications that they support.

Logical security includes access controls, passwords, event and access logs, encryption among others technologies, tools, techniques. Figure 10.11 shows various data infrastructure security–related items from cloud to virtual, hardware and software, as well as network services. Also shown are mobile and edge devices as well as network connectivity between on-premises and remote cloud services. Cloud services include public, private, as well as hybrid and virtual private clouds (VPC) along with virtual private networks (VPN). Access logs for telemetry are also used to track who has accessed what and when, as well as success along with failed attempts.

Certificates (public or private), Encryption, Access keys including .pem and RSA files via a service provider or self-generated with a tool such as Putty or ssh-keygen among many others. Some additional terms including Two Factor Authentication (2FA), Subordinated, Role based and delegated management, Single Sign On (SSO), Shared Access Signature (SAS) that is used by Microsoft Azure for access control, Server Side Encryption (SSE) with various Key Management System (KMS) attributes including customer managed or via a third-party.

SDDC SDDI Data Protection Security
Figure 10.11 Various physical and logical security and access controls

Also shown in figure 10.11 are encryption enabled at various layers, levels or altitude that can range from simple to complex. Also shown are iSCSI IPsec and CHAP along with firewalls, Active Directory (AD) along with Azure AD (AAD), and Domain Controllers (DC), Group Policies Objects (GPO) and Roles. Note that firewalls can exist in various locations both in hardware appliances in the network, as well as software defined network (SDN), network function virtualization (NFV), as well as higher up.

For example there are firewalls in network routers and appliances, as well as within operating systems, hypervisors, and further up in web blogs platforms such as WordPress among many others. Likewise further up the stack or higher in altitude access to applications as well as database among other resources is also controlled via their own, or in conjunction with other authentication, rights and access control including ADs among others.

A term that might be new for some is attestation which basically means to authenticate and be validated by a server or service, for example, a host guarded server attests with a attestation server. What this means is that the host guarded server (for example Microsoft Windows Server) attests with a known attestation server, that looks at the Windows server comparing it to known good fingerprints, profiles, making sure it is safe to run as a guarded resources.

Other security concerns for legacy and software defined environments include secure boot, shield VMs, host guarded servers and fabrics (networks or clusters of servers) for on-premises, as well as cloud. The following image via Microsoft shows an example of shielded VMs in a Windows Server 2016 environment along with host guarded service (HGS) components ( see how to deploy here).

Via Microsoft.com Guarded Hosts, Shielded VMs and Key Protection Services

Encryption can be done in different locations ranging from data in flight or transit over networks (local and remote), as well as data at rest or while stored. Strength of encryption is determined by different hash and cipher codes algorithms including SHA among others ranging from simple to more complex. The encryption can be done by networks, servers, storage systems, hypervisors, operating systems, databases, email, word and many other tools at granularity from device, file systems, folder, file, database, table, object or blob.

Virtual machine and their virtual disks ( VHDX and VMDK) can be encrypted, as well as migration or movements such as vMotions among other activities. Here are some VMware vSphere encryption topics, along with deep dive previews from VMworld 2016 among other resources here, VMware hardening guides here (NSX, vSphere), and a VMware security white paper (PDF) here.

Other security-related items shown in Figure 10.11 include Lightweight Direct Access Protocol (LDAP), Remote Authentication Dial-In User Service (RADIUS), and Kerberos network authentication. Also shown are VPN along with Secure Socket Layer (SSL) network security, along with security and authentication keys, credentials for SSH remote access including SSO. The cloud shown in figure 10.11 could be your own private using AzureStack, VMware (on-site, or public cloud such as IBM or AWS), OpenStack among others, or a public cloud such as AWS, Azure or Google (among others).

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

There are many different aspects, as well as layers of security from logical to physical pertaining to data centers, applications and associated data Infrastructure resources, both on-premises and cloud. Security for legacy and software defined environments needs to be integrated as part of various technology domain focus areas, as well as across them including data protection. The above is a small sampling of security related topics with more covered in various chapters of SDDI Essentials as well as in my other books, webinars, presentations and content.

From a data protection focus, security needs to be addressed from a physical who has access to primary and protection copies, what is being protected against and where, as well as who can access logically protection copes, as well as the configuration, settings, certificates involved in data protection. In other words, how are you protecting your data protection environment, configuration and deployment. Data protection copies need to be encrypted to meet regulations, compliance and other requirements to guard against loss or theft, accidental or intentional. Likewise access control needs to be managed including granting of roles, security, authentication, monitoring of access, along with revocation.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 7 Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends

Ok, nuff said, for now.

November 26, 2017October 18, 2024

Data Protection Diaries Tools Technologies Toolbox Buzzword Bingo Trends

Fundamental Tools, Technologies, Toolbox, Buzzword Bingo Trends

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

This is Part 7 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 6 Data Protection Security Logical Physical Software Defined, and click here to view the next post Part 8 Walking The Data Protection Talk What I Do.

In this post the focus is around Data Protection related tools, technologies, trends as companion to other posts in this series, as well as across various chapters from the SDDI book.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends

There are many data Infrastructure related topics, technologies, tools, trends, techniques and tips that pertain to data protection, many of which have been covered in this series of posts already, as well as in the SDDI Essentials book, and elsewhere. The following are some additional related data Infrastructure data protection topics, tools, technologies.

Buzzword Bingo is a popular industry activity involving terms, trends, tools and more, read more here, here, and here. The basic idea of buzzword bingo is when somebody starts mentioning lots of buzzwords, buzz terms, buzz trends at some point just say bingo. Sometimes you will get somebody who asks what that means, while others will know, perhaps get the point to move on to what’s relevant vs. talking the talk or showing how current they are on industry activity, trends and terms.

Just as everything is not the same across different environments, there are various size and focus from hyper-scale clouds and managed service providers (MSP) server (and storage along with applications focus), smaller and regional cloud, hosting and MSPs, as well as large enterprise, small medium enterprise (SME), small medium business (SMB), remote office branch office (ROBO), small office home office (SOHO), prosumer, consumer and client or edge. Sometimes you will hear server vs. edge or client focus, thus context is important.

Data protection just like data infrastructures span servers, storage, I/O networks, hardware, software, clouds, containers, virtual, hypervisors and related topics. Otoh, some might view data protection as unique to a particular technology focus area or domain. For example, I once had backup vendor tell me that backups and data protection was not a storage topic, can you guess which vendor did not get recommend for data protection of data stored on storage?

Data gets protected to different target media, mediums or services including HDDs, SSD, tape, cloud, bulk and object storage among others in various format from native to encapsulated in save sets, zips, tar ball among others.

Bulk storage can be on-site, on-premises low-cost tape, disk (file, block or object) as well as off-site including cloud services such as AWS S3 (buckets and objects), Microsoft Azure (containers and blobs), Google among others using various Access ( Protocols, Personalities, Front-end, Back-end) technologies. Which type of data protection storage medium, location or service is best depends on what you are trying to do, along with other requirements.

SDDC SDDI data center data protection toolbox
Data Protection Toolbox

Figure 3.18 Generic Object (and Blob) architecture with Buckets (and Containers)

Object Storage

Before discussing Object Storage lets take a step back and look at some context that can clarify some confusion around the term object. The word object has many different meanings and context, both inside of the IT world as well as outside. Context matters with the term object such as a verb being a thing that can be seen or touched as well as a person or thing of action or feeling directed towards.

Besides a person, place or physical thing, an object can be a software defined data structure that describes something. For example, a database record describing somebody’s contact or banking information, or a file descriptor with name, index ID, date and time stamps, permissions and access control lists along with other attributes or metadata. Another example is an object or blob stored in a cloud or object storage system repository, as well as an item in a hypervisor, operating system, container image or other application.

Besides being a verb, object can also be a noun such as disapproval or disagreement with something or someone. From an IT context perspective, object can also refer to a programming method (e.g. object oriented programming [oop], or Java [among other environments] objects and class’s) and systems development in addition to describing entities with data structures.

In other words, a data structure describes an object that can be a simple variable, constant, complex descriptor of something being processed by a program, as well as a function or unit of work. There are also objects unique or with context to specific environments besides Java or databases, operating systems, hypervisors, file systems, cloud and other things.

Figure 3.19 AWS S3 Object storage example, objects left and descriptive names on right

The role of object storage (view more at www.objectstoragecenter.com) is to provide low-cost, scalable capacity, durable availability of data including data protection copies on-premises or off-site. Note that not all object storage solutions or services are the same, some are immutable with write once read many (WORM) like attributes, while others non-immutable meaning that they can be not only appended to, also updated to page or block level granularity.

Also keep in mind that some solutions and services refer to items being stored as objects while others as blobs, and the name space those are part of as a bucket or container. Note that context is important not to confuse an object container with a docker, kubernetes or micro services container.

Many applications and storage systems as well as appliances support as back-end targets cloud access using AWS S3 API (of AWS S3 service or other solutions), as well as OpenStack Switch API among others. There are also many open source and third-party tools for working with cloud storage including objects and blobs. Learn more about object storage, cloud storage at www.objectstoragecenter.com as well as in chapters 3, 4, 13 and 14 in SDDI Essentials book.

S3 Simple Storage Service

Simple Storage Service ( S3) is the Amazon Web Service (AWS) cloud object storage service that can be used for bulk and other storage needs. The S3 service can be accessed from within AWS as well as externally via different tools. AWS S3 supports large number of buckets and objects across different regions and availability zones. Objects can be stored in a hierarchical directory structure format for compatibility with existing file systems or as a simple flat name space.

Context is important with data protection and S3 which can mean the access API, or AWS service. Likewise context is important in that some solutions, software and services support S3 API access as part of their front-end (e.g. how servers or clients access their service), as well as a back-end target (what they can store data on).

Additional AWS S3 (service) and related resources include:

Data Infrastructure Environments and Applications

Data Infrastructure environments that need to be protected include legacy, software defined (SDDC, SDDI, SDS), cloud, virtual and container based, as well as clustered, scale-out, converged Infrastructure (CI), hyper-converged Infrastructure (HCI) among others. In addition to data protection related topics already converged in the posts in this series (as well as those to follow), a related topic is Data Footprint Reduction ( DFR). DFR comprises several different technologies and techniques including archiving, compression, compaction, deduplication (dedupe), single instance storage, normalization, factoring, zip, tiering and thin provisioning among many others.

Data Footprint Reduction (DFR) Including Dedupe

There is a long-term relationship with data protection and DFR in that to reduce the impact of storing more data, traditional techniques such as compression and compaction have been used, along with archive and more recently dedupe among others. In the Software Defined Data Infrastructure Essentials book there is an entire chapter on DFR ( chapter 11), as well as related topics in chapters 8 and 13 among others. For those interested in DFR and related topics, there is additional material in my books Cloud and Virtual Data Storage Networking (CRC Press), along with in The Green and Virtual Data Center (CRC Press), as well as various posts on StorageIOblog.com and storageio.com. Figure 11.4 is from Software Defined Data Infrastructure Essentials showing big picture of various places where DFR can be implemented along with different technologies, tools and techniques.

Figure 11.4 Various points of interest where DFR techniques and technology can be applied

Just as everything is not the same, there are different DFR techniques along with implementations to address various application workload and data performance, availability, capacity, economics (PACE) needs. Where is the best location for DFR that depends on your objectives as well as what your particular technology can support. However in general, I recommend putting DFR as close to where the data is created and stored as possible to maximize its effectiveness which can be on the host server. That however also means leveraging DFR techniques downstream where data gets sent to be stored or protected. In other words, a hybrid DFR approach as a companion to data protection should use various techniques, technologies in different locations. Granted, your preferred vendor might only work in a given location or functionality so you can pretty much guess what the recommendations will be ;) .

Tips, Recommendations and Considerations

Additional learning experiences along with common questions (and answers), appendices, as well as tips can be found here.

General action items, tips, considerations and recommendations include:

- Everything is not the same; different applications with SLO, PACE, FTT, FTM needs
- Understand the 4 3 2 1 data protection rule and how to implement it.
- Balance rebuild performance impact and time vs. storage space overhead savings.
- Use different approaches for various applications and environments.
- What is best for somebody else may not be best for you and your applications.
- You cant go forward in the future after a disaster if you cant go back
- Data protection is a shared responsibility between vendors, service providers and yourself
- There are various aspects to data protection and data Infrastructure management

Where To Learn More

- Part 1 – Data Infrastructure Data Protection Fundamentals
- Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
- Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
- Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
- Part 5 – Point In Time Data Protection Granularity Points of Interest
- Part 6 – Data Protection Security Logical Physical Software Defined
- Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
- Part 8 – Data Protection Diaries Walking Data Protection Talk
- Part 9 – who’s Doing What ( Toolbox Technology Tools)
- Part 10 – Data Protection Resources Where to Learn More
- Data Protection Diaries series
- Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
- Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

There are many different buzzword, buzz terms, buzz trends pertaining to data infrastructure and data protection. These technologies span legacy and emerging, software-defined, cloud, virtual, container, hardware and software. Key point is what technology is best fit for your needs and applications, as well as how to use the tools in different ways (e.g. skill craft techniques and tradecraft). Keep context in mind when looking at and discussing different technologies such as objects among others.

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Diaries Fundamentals Walking The Data Protection Talk

Data Protection Diaries Walking The Data Protection Talk

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 8 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends, and click here to view the next post who’s Doing What ( Toolbox Technology Tools).

In this post the focus is around what I (and Server StorageIO) does for Data Protection besides just talking the talk and is a work in progress that is being updated over time with additional insights.

Walking The Data Protection Talk What I Do

A couple of years back I did the first post as part of the Data Protection Diaries series ( view here), that included the following image showing some data protection needs and requirements, as well as what being done, along with areas for improvement. Part of what I and Server StorageIO does involves consulting (strategy, design, assessment), advising and other influencers activities (e.g. blog, write articles, create reports, webinars, seminars, videos, podcasts) pertaining to data Infrastructure topics as well as data protection.

What this means is knowing about the trends, tools, technologies, what’s old and new, who’s doing what, what should be in the data protection toolbox, as well as how to use those for different scenarios. Its one thing to talk the talk, however I also prefer to walk the talk including eating my own dog food applying various techniques, approaches, tools and technologies discussed.

The following are from a previous Data Protection Diaries post where I discuss my data protection needs (and wants) some of which have evolved since then. Note the image on the left is my Livescribe Echo digital pen and paper tablet. On the right is an example of the digital image created and imported into my computer from the Livescribe. In other words, Im able to protect my hand written notes, diagrams and figures.

Via my Livescribe Echo digital pen ( get your Livescribe here at Amazon.com)

My Environment and data protection is always evolving, some based on changing projects, others that are more stable. Likewise the applications along with data are varied after all, everything is not the same. My data protection includes snapshots, replication, mirror, sync, versions, backup, archive, RAID, erasure code among others technologies, tools, and techniques.

Applications range from desktop, office, email, documents, spreadsheets, presentations, video, audio and related items in support of day-to-day activities. Then there are items part of various projects that range from physical to virtual, cloud and container leveraging various tools. This means having protection copies (sync, backup, snapshots, consistency points) of virtual machines, physical machine instances, applications and databases such as SQL Server among many others. Other application workloads include web, word press blog and email among others.

The Server StorageIO environment consists of a mix of legacy on-premises technologies from servers, storage, hardware, software, networks, tools as well as software defined virtual (e.g. VMware, Hyper-V, Docker among others), as well as cloud. The StorageIO data Infrastructure environment consists of dedicated private server (DPS) that I have had for several years now that supports this blog as well as other sites and activity. I also have a passive standby site used for testing of the WordPress based blog on an AWS Lightsail server. I use tools such as Updraft Plus Premium to routinely create a complete data protection view (database, plugins, templates, settings, configuration, core) of my WordPress site (runs on DPS) that is stored in various locations, including at AWS.

Some of my past data protection requirements (they have evolved)

Currently the Lightsail Virtual Private Server (VPS) is in passive mode, however plans are to enable it as a warm or active standby fail over site for some of the DPS functions. One of the tools I have for monitoring and insight besides those in WordPress and the DPS are AWS Route 53 alerts that I have set up to monitor endpoints. AWS Route 53 is a handy resource for monitoring your endpoints such as a website, blog among other things and have it notify you, or take action including facilitating DNS fail over if needed. For now, Im simply using Route 53 besides as a secondary DNS as a notification tool.

Speaking of AWS, I have compute instances in Elastic Cloud Compute (EC2) along with associated Elastic Block Storage (EBS) volumes as well as their snapshots. I also have AWS S3 buckets in different regions that are on various tiers from standard to infrequent access (IA), as well as some data on Glacier. Data from my DPS at Bluehost gets protected to a AWS S3 bucket that I can access from AWS EC2, as well as via other locations including Microsoft Azure as needed.

Some on-premises data also gets protected to AWS S3 (as well as to elsewhere) using various tools, for different granularity, frequency, access and retention. After all, everything is not the same, why treat it the same. Some of the data protected to AWS S3 buckets is in native format (e.g. they appear as objects to S3 or object enabled applications), as well as file to file based applications with appropriate tools.

Other data that is also protected to AWS S3 from different data protection or backup tools are stored in vendor neutral or vendor specific save set, zip, tar ball or other formats. In other words, I need the tool or compatible tool that knows the format of the saved data to retrieve individual data files, items or objects. Note that this is similar to storing data on tape, HDDs, SSD or other media in native format vs. in some type of encapsulate save set or other format.

In addition to protecting data to AWS, I also have data at Microsoft Azure among other locations. Other locations include non-cloud based off-site where encrypted removable media is periodically taken to a safe secure place as a master, gold in case of major emergency, ransomeware copy.

Why not just rely on cloud copies?

Simple, I can pull individual files or relatively small amounts of data back from the cloud sometimes faster (or easier) than from on-site copies, let alone my off-site, off-line, air gap copies. On the other hand, if I need to restore large amounts of data, without a fast network, it can be quicker to get the air gap off-line, off-site copy, do the large restore, then apply incremental or changed data via cloud. In other a hybrid approach.

Now a common question I get is why not just do one or the other and save some money. Good point, I would save some money, however by doing the above among other things, they are part of being able to test, try new and different things, gain insight, experience not to mention walk the talk vs. simply talking the talk.

Of course Im always looking for ways to streamline to make my data protection more efficient, as well as effective (along with remove complexity and costs).

Everything is not the same, so why treat it all the same with common SLO, RTO, RPO and retention?
Likewise why treat and store all data the same way, on the same tiers of technology
Gain insight and awareness into environment, applications, workloads, PACE needs
Applications, data, systems or devices are protected with different granularity and frequency
Apply applicable technology and tools to the task at hand
Any data I have in cloud has a copy elsewhere, likewise, any data on-premises has a copy in the cloud or elsewhere
I implement the 4 3 2 1 rule by having multiple copies, versions, data in different locations, on and off-line including cloud
From a security standpoint, many different things are implemented on a logical as well as physical basis including encryption
Ability to restore data as well as applications or image instances locally as well as into cloud environments
Leverage different insight and awareness, reporting, analytics and monitoring tools
Mix of local storage configured with different RAID and other protection
Test, find, fix, remediate improve the environment including leveraging lessons learned

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Everything is not the same, thats why in my environment I use different technologies, tools and techniques to protect my data. This also means having different RTO, RPO across various applications, data and systems as well as devices. Data that is more important has more copies, versions in different locations as well as occurring more frequently as part of 4 3 2 1 data protection. Other data that does not change as frequently, or time sensitive have alternate RTO and RPO along with corresponding frequency of protection.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series Part 9 who’s Doing What (Toolbox Technology Tools).

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Diaries Fundamentals Who Is Doing What Toolbox Technology Tools

Data Protection Toolbox Whos Doing What Technology Tools

Updated 1/17/2018

Data protection toolbox is a companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 9 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 8 Walking The Data Protection Talk, and click here to view the next post Part 10 Data Protection Resources Where to Learn More.

In this post the focus is around Data Protection who’s Doing What ( Toolbox Technology Tools).

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

who’s Doing What (Toolbox Technology Tools)

SDDC SDDI data center data protection toolbox
Data Protection Toolbox

Note that this post is evolving with additional tools, technologies, techniques, hardware, software, services being added over time along with applicable industry links.

The following are a sampling of some hardware, software, solution and component vendors along with service providers involved with data protection from RAID, Erasure Codes (EC) to snapshots, backup, BC, BR, DR, archive, security, cloud, bulk object storage, HDDs, SSD, tape among others including buzzword (and buzz term trends) bingo. Acronis, Actifio, Arcserve, ATTO, AWS, Backblaze, Barracuda, Broadcom, Caringo, Chelsio (offload), Code42/Crashplan, Cray, Ceph, Cisco, Cloudian, Cohesity, Compuverde, Commvault, Datadog, Datrium, Datos IO, DDN, Dell EMC, Druva, E8, Elastifile, Exagrid, Excelero, Fujifilm, Fujutsu, Google, HPE, Huawei, Hedvig, IBM, Intel, Iomega, Iron Mountain, IBM, Jungledisk, Kinetic key value drives (Seagate), Lenovo, LTO organization, Mangstor, Maxta, Mellanox (offload), Micron, Microsoft (Azure, Windows, Storage Spaces), Microsemi, Nakivo, NetApp, NooBaa, Nexsan, Nutanix, OpenIO, OpenStack (Swift), Oracle, Panasas, Panzura, Promise, Pure, Quantum, Quest, Qumulo, Retrospect, Riverbed, Rozo, Rubrik, Samsung, Scale, Scality, Seagate (DotHill), Sony, Solarwinds, Spectralogic, Starwind, Storpool, Strongbox, Sureline, Swiftstack, Synology, Toshiba, Tintri, Turbonomics, Unitrends, Unix and Linux platforms, Vantara, Veeam, VMware, Western Digital (Amplidata, Tegile and others), WekaIO, X-IO, Zadara and Zmanda among many others.

Note if you dont see yours, or your favorite, preferred or clients listed above or in the data Infrastructure industry related links send us a note for consideration to be included in future updates, or having a link, or sponsor spot pointing to your site added. Feel free to add a non sales marketing pitch to courteous comments to the comment section below.

View additional IT, data center and data Infrastructure along with data protection related vendors, services, tools, technologies links here.

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Part of modernizing data protection for various data center and data infrastructure environments is to know the tools, technologies and trends that are part of your data protection toolbox. The other part of modernizing data is protection is knowing the techniques of how to use different tools, technologies to meet various application workload performance, availability, capacity economic (PACE) needs.

Also keep in mind that information services requires applications (e.g. programs) and that programs are a combination of algorithms (code, rules, policies) and data structures (e.g. data and how it is organized including unstructured). What this means is that data protection needs to address not only data, also the applications, configuration settings, metadata as well as protecting the protection tools and its data.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 10 Data Protection Fundamental Resources Where to Learn More.

Ok, nuff said, for now.

November 26, 2017December 29, 2025

Data Protection Diaries Fundamental Resources Where to Learn More

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is the last in a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 9 – who’s Doing What ( Toolbox Technology Tools).

In this post the focus is around Data Protection Resources Where to Learn More.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Software Defined Data Infrastructure Essentials Table of Contents (TOC)

Here is a link (PDF) to the table of contents (TOC) for Software Defined Data Infrastructure Essentials.

The following is a Software Defined Data Infrastructure Essentials book TOC summary:

Chapter 1: Server Storage I/O and Data Infrastructure Fundamentals
Chapter 2: Application and IT Environments
Chapter 3: Bits, Bytes, Blobs, and Software-Defined Building Blocks
Chapter 4: Servers: Physical, Virtual, Cloud, and Containers
Chapter 5: Server I/O and Networking
Chapter 6: Servers and Storage-Defined Networking
Chapter 7: Storage Mediums and Component Devices
Chapter 8: Data Infrastructure Services: Access and Performance
Chapter 9: Data Infrastructure Services: Availability, RAS, and RAID
Chapter 10: Data Infrastructure Services: Availability, Recovery-Point Objective, and Security
Chapter 11: Data Infrastructure Services: Capacity and Data Reduction
Chapter 12: Storage Systems and Solutions (Products and Cloud)
Chapter 13: Data Infrastructure and Software-Defined Management
Chapter 14: Data Infrastructure Deployment Considerations
Chapter 15: Software-Defined Data Infrastructure Futures, Wrap-up, and Summary
Appendix A: Learning Experiences
Appendix B: Additional Learning, Tools, and tradecraft Tricks
Appendix C: Frequently Asked Questions
Appendix D: Book Shelf and Recommended Reading
Appendix E: Tools and Technologies Used in Support of This Book
Appendix F: How to Use This Book for Various Audiences
Appendix G: Companion Website and Where to Learn More
Glossary
Index

Click here to view (PDF) table of contents (TOC).

Data Protection Resources Where To Learn More

Learn more about Data Infrastructure and Data Protection related technology, trends, tools, techniques, tradecraft and tips with the following links.

The following are the various posts that are part of this data protection series:

Part 1 – Data Infrastructure Data Protection Fundamentals

Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals

Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC

Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)

Part 5 – Point In Time Data Protection Granularity Points of Interest

Part 6 – Data Protection Security Logical Physical Software Defined

Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends

Part 8 – Data Protection Diaries Walking Data Protection Talk

Part 9 – who’s Doing What ( Toolbox Technology Tools)

Part 10 – Data Protection Resources Where to Learn More

The following are various data protection blog posts:

Welcome to the Data Protection Diaries

Until the focus expands to data protection, backup is staying alive!

The blame game, Does cloud storage result in data loss?

Loss of data access vs. data loss

Revisiting RAID storage remains relevant and resources

Only you can prevent cloud (or other) data loss

Data protection is a shared responsibility

Time for CDP (Commonsense Data Protection)?

Data Infrastructure Server Storage I/O Tradecraft Trends (skills, experiences, knowledge)

My copies were corrupted: The [4] 3-2-1 rule and more about 4 3 2 1 as well as 3 2 1 here and here

The following are various data protection tips and articles:

Via Infostor Cloud Storage Concerns, Considerations and Trends

Via Network World What’s a data infrastructure?

Via Infostor Data Protection Gaps, Some Good, Some Not So Good

Via Infostor Object Storage is in your future

Via Iron Mountain Preventing Unexpected Disasters

Via InfoStor – The Many Variations of RAID Storage

Via InfoStor – RAID Remains Relevant, Really!

Via WservNews Cloud Storage Considerations (Microsoft Azure)

Via ComputerWeekly Time to restore from backup: Do you know where your data is?

Via Network World Ensure your data infrastructure remains available and resilient

The following are various data protection related webinars and events:

BrightTalk Webinar Data Protection Modernization – Protect, Preserve and Serve you Information

BrightTalk Webinar BCDR and Cloud Backup Protect Preserve and Secure Your Data Infrastructure

TechAdvisor Webinar (Free with registration) All You Need To Know about ROBO data protection

TechAdvisor Webinar (Free with registration) Tips for Moving from Backup to Full Disaster Recovery

The following are various data protection tools, technologies, services, vendor and industry resource links:

Various Data Infrastructure related news commentary, events, tips and articles

Data Center and Data Infrastructure industry links (vendors, services, tools, technologies, hardware, software)

Data Infrastructure server storage I/O network Recommended Reading List Book Shelf

Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Everything is not the same across environments, data centers, data infrastructures including SDDC, SDX and SDDI as well as applications along with their data.

Likewise everything is and does not have to be the same when it comes to Data Protection.

Since everything is not the same, various data protection approaches are needed to address various application performance, availability, capacity economic (PACE) needs, as well as SLO and SLAs.

Data protection encompasses many different hardware, software, services including cloud technologies, tools, techniques, best practices, policies and tradecraft experience skills (e.g. knowing what to use when, where, why and how).

Context is important as different terms have various meanings depending on what they are being discussed with. Likewise different technologies and topics such as object, blob, backup, replication, RAID, erasure code (EC), mirroring, gaps (good, bad, ugly), snapshot, checkpoint, availability, durability among others have various meanings depending on context, as well as implementation approach.

In most cases there is no bad technology or tool, granted there are some poor or bad (even ugly) implementations, as well as deployment or configuration decisions. What this means is the best technology or approach for your needs may be different from somebody else’s and vice versa.

Some other points include there is no such thing as an information recession with more data generated every day, granted, how that data is transformed or stored can be in a smaller footprint. Likewise there is an increase in the size of data including unstructured big data, as well as the volume (how much data), as well as velocity (speed at which it is created, moved, processed, stored). This also means there is an increased dependency on data being available, accessible and intact with consistency. Thus the fundamental role of data Infrastructures (e.g. what’s inside the data center or cloud) is to combine resources, technologies, tools, techniques, best practices, policies, people skill set, experiences (e.g. tradecraft) to protect, preserve, secure and serve information (applications and data).

modernizing data protection including backup, availability and related topics means more than swapping out one hardware, software, service or cloud for whatever is new, and then using it in old ways.

What this means is to start using new (and old) things in new ways, for example move beyond using SSD or HDDs like tape as targets for backup or other data protection approaches. Instead use SSD, HDDs or cloud as a tier, yet also to enable faster protection and recovery by stepping back and rethinking what to protect, when, where, why, how and apply applicable techniques, tools and technologies. Find a balance between knowing all about the tools and trends while not understanding how to use those toolbox items, as well as knowing all about the techniques of how to use the tools, yet not knowing what the tools are.

Want to learn more, have questions about specific tools, technologies, trends, vendors, products, services or techniques discussed in this series, send a note (info at storageio dot com) or via our contact page. We can set up a time to discuss your questions or needs pertaining to Data Protection as well as data infrastructures related topics from legacy to software defined virtual, cloud, container among others. For example consulting, advisory services, architecture strategy design, technology selection and acquisition coaching, education knowledge transfer sessions, seminars, webinars, special projects, test drive lab reviews or audits, content generation, videos, podcasts, custom content, chapter excerpts, demand generation among many other things.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here.

Ok, nuff said, for now.

November 14, 2017November 26, 2023

AWS Announces New S3 Cloud Storage Security Encryption Features

server storage I/O data infrastructure trends

Updated 1/17/2018

Amazon Web Services (AWS) recently announced new Simple Storage Service (S3) e.g. AWS S3 encryption and security enhancements including Default Encryption, Permission Checks, Cross-Region Replication ACL Overwrite, Cross-Region Replication with KMS and Detailed Inventory Report. Another recent announcement by AWS is for PrivateLinks endpoints within a Virtual Private Cloud (VPC).

AWS Dashboard
AWS Service Dashboard

Default Encryption

Extending previous security features, now you can mandate all objects stored in a given S3 bucket be encrypted without specifying a bucket policy that rejects non-encrypted objects. There are three server-side encryption (SSE) options for S3 objects including keys managed by S3, AWS KMS and SSE Customer ( SSE-C) managed keys. These options provide more flexibility as well as control for different environments along with increased granularity. Note that encryption can be forced on all objects in a bucket by specifying a bucket encryption configuration. When an unencrypted object is stored in an encrypted bucket, it will inherit the same encryption as the bucket, or, alternately specified by a PUT required.

AWS S3 Bucket Encryption
AWS S3 Buckets

Permission Checks

There is now an indicator on the S3 console dashboard prominently indicating which S3 buckets are publicly accessible. In the above image, some of my AWS S3 buckets are shown including one that is public facing. Note in the image above how there is a notion next to buckets that are open to public.

Cross-Region Replication ACL Overwrite and KMS

AWS Key Management Service (KMS) keys can be used for encrypting objects. Building on previous cross-region replication capabilities, now when you replicate objects across AWS accounts, a new ACL providing full access to the destination account can be specified.

Detailed Inventory Report

The S3 Inventory report ( which can also be encrypted) now includes the encryption status of each object.

PrivateLink for AWS Services

PrivateLinks enable AWS customers to access services from a VPC without using a public IP as well as traffic not having to go across the internet (e.g. keeps traffic within the AWS network. PrivateLink endpoints appear in Elastic Network Interface (ENI) with private IPs in your VPC and are highly available, resiliency and scalable. Besides scaling and resiliency, PrivateLink eliminates the need for white listing of public IPs as well as managing internet gateway, NAT and firewall proxies to connect to AWS services (Elastic Cloud Compute (EC2), Elastic Load Balancer (ELB), Kinesis Streams, Service Catalog, EC2 Systems Manager). Learn more about AWS PrivateLink for services here including VPC Endpoint Pricing here.

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

What This All Means

Common cloud concern considerations include privacy and security. AWS S3 among other industry cloud service and storage providers have had their share of not so pleasant news coverage involving security.

Keep in mind that data protection including security is a shared responsibility (and only you can prevent data loss). This means that the vendor or service provider has to take care of their responsibility making sure their solutions have proper data protection and security features by default, as well as extensions, and making those capabilities known to consumers.

The other part of shared responsibility is that consumers and users of cloud services need to know what the capabilities are, defaults and options as well as when to use various approaches. Ultimately it is up to the user of a cloud service to implement best practices to leverage cloud as well as their own on-premises technologies so that they can support data infrastructure that in turn protect, preserve, secure and serve information (along with their applications and data).

These are good enhancements by AWS to make their S3 cloud storage security encryption features available as well as provide options and awareness for users on how to use those capabilities.

Ok, nuff said, for now.

October 25, 2017December 29, 2025

Introducing Windows Subsystem for Linux WSL Overview #blogtober

server storage I/O data infrastructure trends

Updated 1/21/2018

Introducing Windows Subsystem for Linux WSL and Overview. Microsoft has been increasing their support of Linux across Azure public cloud, Hyper-V and Linux Integration Services (LIS) and Windows platforms including Windows Subsystem for Linux (WSL) as well as Server along with Docker support.

WSL installed with Ubuntu on Windows 10

WSL with Ubuntu installed and open in a window on one of my Windows 10 systems.

WSL is not a virtual machine (VM) running on Windows or Hyper-V, rather it is a subsystem that coexists next to win32 (read more about how it works and features, enhancements here). Once installed, WSL enables use of Linux bash shell along with familiar tools (find, grep, sed, awk, rsync among others) as well as services such as ssh, MySQL among others.

What this all means is that if you work with both Windows and Linux, you can do so on the same desktop, laptop, server or system using your preferred commands. For example in one window you can be using Powershell or traditional Windows commands and tools, while in another window working with grep, find and other tools eliminating the need to install things such as wingrep among others.

Installing WSL

Depending on which release of Windows desktop or server you are running, there are a couple of different install paths. Since my Windows 10 is the most recent release (e.g. 1709) I was able to simply go to the Microsoft Windows Store via desktop, search for Windows Linux, select the distribution, install and launch. Microsoft has some useful information for installing WSL on different Windows version here, as well as for Windows Servers here.

Get WSL from Windows Store

Get WSL from Windows Store or more information and options here.

Microsoft WSL install

Click on Get the app

Select which Linux for WSL to install

Select desired WSL distribution

SUSE linux for WSL

Lests select SUSE as I already have Ubuntu installed (I have both)

WSL installing SUSE

SUSE WSL in the process of downloading. Note SUSE needs an access code (free) that you get from https://www.suse.com/subscriptions/sles/developer/ while waiting for the download and install is a good time to get that code.

launching WSL on Windows 10

Launching WSL with SUSE, you will be prompted to enter the code mentioned above, if you do not have a code, get it here from SUSE.

completing install of WSL

The WSL installation is very straight forward, enter the SUSE code (Ubuntu did not need a code). Note the Ubuntu and SUSE WSL task bar icons circled bottom center.

Ubuntu and SUSE WSL on Windows 10

Provide a username for accessing the WSL bash shell along with password, confirm how root and sudo to be applied and that is it. Serious, the install for WSL at least with Windows 10 1709 is that fast and easy. Note in the above image, I have WSL with Ubuntu open in a window on the left, WSL with SUSE on the right, and their taskbar icons bottom center.

Windows WSL install error 0x8007007e

Enable Windows Subsystem for Linux Feature on Windows

If you get the above WSL error message 0x8007007e when installing WSL Ubuntu, SUSE or other shell distro, make sure to enable the Windows WSL feature if not already installed.

Windows WSL install error fix

One option is to install additional Windows features via settings or control panel. For example, Control panel -> Programs and features -> Turn Windows features on or off -> Check the box for Windows Subsystem for Linux

Another option is to install Windows subsystem feature via Powershell for example.

enable-windowsoptionalfeature -online  -featurename microsoft-windows-subsystem-linux

Using WSL

Once you have WSL installed, try something simple such as view your present directory:

pwd

Then look at the Windows C: drive location

ls /mnt/c -al

In case you did not notice the above, you can use Windows files and folders from the bash shell by placing /mnt in front of the device path. Note that you need to be case-sensitive such as User vs. user or Documents vs. documents.

As a further example, I needed to change several .htm, .html, .php and .xml files on a Windows system whose contents had not yet changed from https://storageio.com to https://storageio.com. Instead of installing wingrep or some tools, using WSL such as with Ubuntu finding files can be done with grep such as:

grep "https://storageio.com" /mnt/c/Users/*.xml

And then making changes using find and sed such as:

find /mnt/c/Users -name \*.xml -exec sed  -i "s,https://storageio.com,https://storageio.com,g" {} \;

Note that not all Linux apps and tools can use file via /mnt in which case a solution is to create a symbolic link.

For example:

ln -s "/mnt/c/Users/Test1/Documents"  /home/Test1/Projects

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

Cloud Conversations Azure AWS Service Maps via Microsoft
Microsoft Azure September 2017 Software Defined Data Infrastructure Updates
Azure Software Defined Data Infrastructure Architecture Resources
New family of Intel Xeon Scalable Processors enable software defined data infrastructures (SDDI) and SDDC
Microsoft Windows Server, Azure, Nano Life cycle Updates
Overview Review of Microsoft ReFS (Reliable File System) and resource links
General WSL information (Via MSDN)
WSL FAQs (Via MSDN) and reference material (Via MSDN)
Installing WSL on Windows 10 systems (Via MSDN) and on a Windows Server (Via MSDN)
What about powershell, bash and Windows (Via Microsoft Blogs)
WSL features, enhancements and other notes (Via Microsoft Blogs)
Fixing the Microsoft Windows 10 1709 post upgrade restart loop
Software Defined Data Infrastructure Essentials (CRC Press) book companion page

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

If you primarily work on (or have a preference for) Linux systems and need to do some functions from development to the administration or other activity on a Windows system, Windows Subsystem for Linux (WSL) provides a bash shell to do familiar tasks. Likewise, if you are primarily a Windows person and need to brush up on your Linux skills, WSL can help. If you need to run Linux server applications or workloads, put those into a Docker container, Hyper-V instance or Azure VM.

Overall I like WSL for what it is, a tool that eliminates the need of having to install several other tools to do common tasks, plus makes it easier to work across various Linux and Windows systems including bare metal, virtual and cloud-based. Now that you have been introduced to Windows Subsystems for Linux WSL and an overview including install as well as using, add it to your data infrastructure toolbox.

By the way, if you have not heard, its #Blogtober, check out some of the other blogs and posts occurring during October here.

Ok, nuff said, for now.

October 18, 2017November 26, 2023

Cloud Conversations AWS Azure Service Maps via Microsoft

server storage I/O data infrastructure trends

Updated 1/21/2018

Microsoft has created an Amazon Web Service AWS Azure Service Map. The AWS Azure Service Map is a list created by Microsoft looks at corresponding services of both cloud providers.

Image via Azure.Microsoft.com

Note that this is an evolving work in progress from Microsoft and use it as a tool to help position the different services from Azure and AWS.

Also note that not all features or services may not be available in different regions, visit Azure and AWS sites to see current availability.

As with any comparison they are often dated the day they are posted hence this is a work in progress. If you are looking for another Microsoft created why Azure vs. AWS then check out this here. If you are looking for an AWS vs. Azure, do a simple Google (or Bing) search and watch all the various items appear, some sponsored, some not so sponsored among others.

Whats In the Service Map

The following AWS and Azure services are mapped:

Marketplace (e.g. where you select service offerings)
Compute (Virtual Machines instances, Containers, Virtual Private Servers, Serverless Microservices and Management)
Storage (Primary, Secondary, Archive, Premium SSD and HDD, Block, File, Object/Blobs, Tables, Queues, Import/Export, Bulk transfer, Backup, Data Protection, Disaster Recovery, Gateways)
Network & Content Delivery (Virtual networking, virtual private networks and virtual private cloud, domain name services (DNS), content delivery network (CDN), load balancing, direct connect, edge, alerts)
Database (Relational, SQL and NoSQL document and key value, caching, database migration)
Analytics and Big Data (data warehouse, data lake, data processing, real-time and batch, data orchestration, data platforms, analytics)
Intelligence and IoT (IoT hub and gateways, speech recognition, visualization, search, machine learning, AI)
Management and Monitoring (management, monitoring, advisor, DevOps)
Mobile Services (management, monitoring, administration)
Security, Identity and Access (Security, directory services, compliance, authorization, authentication, encryption, firewall
Developer Tools (workflow, messaging, email, API management, media trans coding, development tools, testing, DevOps)
Enterprise Integration (application integration, content management)

Down load a PDF version of the service map from Microsoft here.

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

What’s a data infrastructure? (Via NetworkWorld)
Introduction to Azure Files (Microsoft Docs)
Public preview of Virtual Network service endpoints for Azure storage and SQL database
Azure debuts Availability Zones for high availability and resiliency (Azure)
Azure and AWS
comparison (Docs Microsoft)
Cloud Storage Considerations (via WServerNews)
Microsoft Windows Server, Azure, Nano Life cycle Updates
Overview Review of Microsoft ReFS (Reliable File System) and resource links
Software Defined Data Infrastructure Essentials (CRC Press) book companion page
Amazon Web Service AWS September 2017 Updates
AWS S3 Storage Gateway Revisited (Part I)
Cloud conversations: AWS EBS, Glacier and S3 overview (Part II S3)
Microsoft September 2017 Updates

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

On one hand this can and will likely be used as a comparison however use caution as both Azure and AWS services are rapidly evolving, adding new features, extending others. Likewise the service regions and site of data centers also continue to evolve thus use the above as a general guide or tool to help map what service offerings are similar between AWS and Azure.

By the way, if you have not heard, its Blogtober, check out some of the other blogs and posts occurring during October here.

Ok, nuff said, for now.

October 11, 2017April 27, 2025

Dell EMC VMware September 2017 Software Defined Data Infrastructure Updates

server storage I/O data infrastructure trends

Dell EMC VMware September 2017 Software Defined Data Infrastructure Updates

vmworld 2017

September was a busy month including VMworld in Las Vegas that featured many Dell EMC VMware (among other) software defined data infrastructure updates and announcements.

A summary of September VMware (and partner) related announcements include:

Workspace, security and endpoint solutions
Pivotal Container Service (PKS) with Google for Kubernetes serverless container management,
DXC partnership for hybrid cloud management
Security enablement via AppDefense
Data infrastructure platform enhancements (integrated OpenStack, vRealize management tools, vSAN)
Multi-cloud and hybrid cloud support along with VMware on AWS
Dell EMC data protection for VMware and AWS environments.

VMware and AWS via Amazon Web Services

VMware and AWS

Some of you might recall VMware earlier attempt at public cloud with vCloud Air service (see Server StorageIO lab test drive here) which has since been depreciated (e.g. retired). This new approach by VMware leverages the large global presence of AWS enabling customers to set up public or hybrid vSphere, vSAN and NSX based clouds, as well as software defined data centers (SDDC) and software defined data infrastructures (SDDI).

VMware Cloud on AWS exists on a dedicated, single-tenant (unlike Elastic Cloud Compute (EC2) multi-tenant instances or VMs) that supports from 4 to 16 underlying host per cluster. Unlike EC2 virtual machine instances, VMware Cloud on AWS is delivered on elastic bare-metal (e.g. dedicated private servers aka DPS). Note AWS EC2 is more commonly known, AWS also has other options for server compute including Lambda micro services serverless containers, as well as Lightsail virtual private servers (VPS).

Besides servers with storage optimized I/O featuring low latency NVMe accessed SSDs, and applicable underlying server I/O networking, VMware Cloud on AWS leverages the VMware software stack directly on underlying host servers (e.g. there is no virtualization nesting taking place). This means more robust performance should be expected like in your on premise VMware environment. VM workloads can move between your onsite VMware systems and VMware Cloud on AWS using various tools. The VMware Cloud on AWS is delivered and managed by VMware, including pricing. Learn more about VMware Cloud on AWS here, and here (VMware PDF) and here (VMware Hands On Lab aka HOL).

Read more about AWS September news and related updates here in this StorageIOblog post.

VMware and Pivotal PKS via VMware.com

Pivotal Container Service (PKS) and Google Kubernetes Partnership

During VMworld VMware, Pivotal and Google announced a partnership for enabling Kubernetes container management called PKS (Pivotal Container Service). Kubernetes is evolving as a popular open source container microservice serverless management orchestration platform that has roots within Google. What this means is that what is good for Google and others for managing containers, is now good for VMware and Pivotal. In related news, VMware has become a platinum sponsor of the Cloud Native Compute Foundation (CNCF). If you are not familiar with CNCF, add it to your vocabulary and learn more here at www.cncf.io.

Other VMworld and September VMware related announcements

Hyper converged data infrastructure provider Maxta has announced a VMware vSphere Escape Pod (parachute not included ;) ) to facilitate migration from ESXi based to Red Hat Linux hypervisor environments. IBM and VMware for cloud partnership, along with Dell EMC, IBM and VMware joint cloud solutions. White listing of VMware vSphere VMs for enhanced security combine with earlier announced capabilities.

Note that both VMware with vSphere ESXi and Microsoft with Hyper-V (Windows and Azure based) are supporting various approaches for securing Virtual Machines (VMs) and the hosts they run on. These enhancements are moving beyond simply encrypting the VMDK or VHDX virtual disks the VMs reside in or use, as well as more than password, ssh and other security measures. For example Microsoft is adding support for host guarded fabrics (and machine hosts) as well as shielded VMs. Keep an eye on how both VMware and Microsoft extend the data protection and security capabilities for software defined data infrastructures for their solutions and services.

Dell EMC Announcements

At VMworld in September Dell EMC announcements included:

Hyper Converged Infrastructure (HCI) and Hybrid Cloud enhancements
Data Protection, Goverence and Management suite updates
XtremIO X2 all flash array (AFA) availability optimized for vSphere and VDI

HCI and Hybrid Cloud enhancements include VxRail Appliance, VxRack SDDC (vSphere 6.5, vSAN 6.6, NSX 6.3) along with hybrid cloud platforms (Enterprise Hybrid Cloud and Native Hybrid Cloud) along with vSAN Ready Nodes (vSAN 6.6 and encryption) and VMware Ready System. Note that Dell EMC in addition to supporting VMware hybrid clouds also previously announced solutions for Microsoft Azure Stack back in May.

Software Defined Data Infrastructure Essentials at VMworld Bookstore

Software Defined Data Infrastructure Essentials (CRC Press) at VMworld bookstore

My new book Software Defined Data Infrastructure Essentials (CRC Press) made its public debut in the VMware book store where I did a book signing event. You can get your copy of Software Defined Data Infrastructure Essentials which includes Software Defined Data Centers (SDDC) along with hybrid, multi-cloud, serverless, converged and related topics at Amazon among other venues. Learn more here.

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

What’s a data infrastructure? (Via NetworkWorld)
Software Defined Data Infrastructure Essentials (CRC Press) book companion page
Top 10 things to know about vSAN (Via Yellow Bricks)
VMware vSAN Management Today and Future (Via Yellow Bricks)
Integrated Data Protection for vSAN (Via Yellow Bricks)
VMworld 2017 Video on demand (Via VMware)
Top 10 VMworld 2017 hands on labs (HOL) via VMware
Top VMworld 2017 sessions via VMware
Amazon Web Service AWS September 2017 Updates
Microsoft September 2017 Updates
Dell EMC VMware September 2017 Updates
Getting Caught Up What Happened In September 2017

What This All Means

A year ago at VMworld the initial conversations were started around what would become the VMware Cloud on AWS solution. Also a year ago besides VMware Integrated Containers (VIC) and some other pieces, the overall container and in particular related management story was a bit cloudy (pun intended). However, now the fog and cloud seem to be clearing with the PKS solution, along with details of VMware Cloud on AWS. Likewise vSphere, vSAN and NSX along with associated vRealize tools continue to evolve as well as customer deployment growing. All in all, VMware continues to evolve, let’s see how things progress now over the year until the next VMworld.

By the way, if you have not heard, its Blogtober, check out some of the other blogs and posts occurring during October here.

Ok, nuff said, for now.
Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (and vSAN). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio.

Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

August 27, 2017November 3, 2024

August 2017 Server StorageIO Data Infrastructure Update Newsletter

Server StorageIO August 2017 Data Infrastructure Update Newsletter

Volume 17, Issue VII (Pre VMworld 2017)

Hello and welcome to the August 2017 issue of the Server StorageIO update newsletter.

Its end of summer season here in north america which means wrapping up holidays, vacations, back to school shopping (and going to school), as well as the start of the fall IT technology conference season. VMworld 2017 USA is this week in Las Vegas and there will be several announcements coming out of that event. Given all of the activity so far this month, I’m going to cover the VMworld and related topics in a special early September issue of this newsletter.

Speaking of VMworld 2017, if you are going to be there in Las Vegas, stop by the book store located in the community village area on Tuesday at 1PM I will be doing a book signing, meet and greet, stop by and say hello.

Thanks to all who participated in the recent thevPad top 100 vBloggers event, I am honored to have StorageIOblog listed in the top 100 vBlogs. Also congratulations to new and returning fellow Microsoft MVPs and VMware vExperts. There is a lot going on in the industry, lets get to it in this Server StorageIO Data Infrastructure Update Newsletter.

Various Events and Webinars
Industry Resources and Links
Connect and Converse With Us
About Us

Enjoy this edition of the Server StorageIO update newsletter (pre VMworld edition).

Cheers GS

Data Infrastructure and IT Industry Activity Trends

Acronis announced True Image 2018 for home based data protection (backup), while Crashplan aka code42 announced they were getting out of the consumer, small office home office (SOHO) backup and data protection space to focus on the enterprise.

Cisco bought software defined storage converged infrastructure software vendor Springpath for about $320M USD. Cisco and Swiftstack (object storage software) also announced interoperability news with the UCS S32600 storage server platform.

GPU vendor NVIDIA announced Quadro Virtual Data Center workstation technology.

Meanwhile ioFABRIC announced their new Vicinity 3.0 software defined management solution.

Microsemi (remember PMC Sierra) announced release of its Flashtec PCIe controllers to help speed adoption deployment of SSDs including NVMe based.

Microsoft bought Cycle Computing to enhance Azure services, while also making Azure Blob storage tiering available as part of an ongoing public preview. For those not aware, Azure Blob is similar to what other services call objects. Get in on the public preview here. For those who live in a hybrid world where your environment and experience include both Windows and Linux, check out Windows Services for Linux here. With this service which can install onto an Windows 10 system along side Win32 (e.g. it co-exists, its not a virtual machine), you can choose from the Windows Store which Linux distro you want (e.g. Centos, Ubuntu, etc).

Need to learn, refresh or simply gain a better understanding of Microsoft PowerShell for software defined management of Windows, Azure and other environments? Check out this great post from Microsoft Blogs.

For those who work in a Windows or Azure environment, here are some useful icons for Powerpoint, Visio, PNG and SVG from Microsoft. With Microsoft Ignite coming up in September, watch for some interesting update enhancements to Windows Server from a server storage I/O perspective.

NextPlatform.com has an interesting article on Exascale Timeline for Storage and I/O systems worth a read. Panzura global name space and scale out software defined storage management software announced mobile client file sharing. After dropping their own cloud business, Verizon is now a virtual network services partner with Amazon.

Over at all flash array (AFA) SSD vendor Pure, revenues are growing closer to an annual $1B USD rate despite loss per share, Pure also announced a change in leadership with current CEO Scott Dietzen stepping aside for Charles Giancarlo to take the lead spot.

VMware has been talking about the continued increase in customer adoption and deployment of VSAN now they are showing they eat their own dog food. Check out this post here from VMware that shows how many and what size VSAN clusters they are using for various internal operations. Also on the VMware storage front, learn more about enhancements for large and small file allocation blocks with vSphere VMFS6.

With all of the pre and post VMworld related announcements, remember to check out the tools available over at the VMware flings site including vSphere HTML5 Web Client, HCIBench, vRealize Operations Export, VisualEsxtop, ESXi Embedded Host Client, VMware OS Optimization Tool and many others. Watch for VMworld coverage in the September newsletter along with posts at www.storageioblog.com

Check out other industry news, comments, trends perspectives here.

Server StorageIO Commentary in the news

Recent Server StorageIO industry trends perspectives commentary in the news.

Via EnterpriseStorageForum: Comments on Who Will Rule the Storage World?
Via InfoGoto: Comments on Google Cloud Platform Gaining Data Storage Momentum
Via InfoGoto: Comments on Singapore High Rise Data Centers
Via InfoGoto: Comments on New Tape Storage Capacity

View more Server, Storage and I/O trends and perspectives comments here

Server StorageIOblog Posts

Recent and popular Server StorageIOblog posts include:

NVMe Wont Replace Flash By Itself They Complement Each Other
There has been some recent industry marketing buzz generated by a startup to get some attention by claiming via a study sponsored by including the startup that Non-Volatile Memory (NVM) Express (NVMe) will replace flash storage. Granted, many IT customers as well as vendors are still confused by NVMe thinking it is a storage medium as opposed to an interface used for accessing fast storage devices such as nand flash among other solid state devices (SSDs). Part of that confusion can be tied to common SSD based devices rely on NVM that are persistent memory retaining data when powered off (unlike the memory in your computer).

Like IT Data Centers Do You Take Trade Show Exhibit Infrastructure For Granted?
Think about this for a moment; do you assume that Information Technology (IT) and Cloud based data centers along with their associated Data Infrastructure supporting various applications will be accessible when needed. Likewise, when you go to a trade show, conference, symposium, user group or another conclave is it assumed that the trade show, exposition (expo), exhibits, booths, stands or demo areas will be ready, waiting and accessible? In case you did not hear, things heated up at the recent Flash Memory Summit (FMS) in Santa Clara where a small fire in one of the exhibt booths prevented the expo hall from opening during the event.

Microsoft Azure Software Defined Data Infrastructure Reference Resources
Need to learn more about Microsoft Azure Cloud Software Defined Data Infrastructure topics including reference architecture among other resources for various application workloads? Microsoft Azure has an architecture and resources page (here) that includes various application workload reference tools.

Chelsio Storage over IP and other Networks Enable Data Infrastructures
Chelsio and Storage over IP (SoIP) continue to enable Data Infrastructures from legacy to software defined virtual, container, cloud as well as converged.

Announcing Software Defined Data Infrastructure Essentials Book by Greg Schulz
Software Defined Data Infrastructure (SDDI) Essentials is now generally available at various global venues in hardcopy, hardback print as well as various electronic versions including via Amazon and CRC Press among others.

Server StorageIO Industry Trends Perspectives Report WekaIO Matrix
Like Data They Protect For Now Quantum Revenues Continue To Grow
Travel Fun Crossword Puzzle For VMworld 2017 Las Vegas
Hot Popular New Trending Data Infrastructure Vendors To Watch

Server StorageIO Data Infrastructure Tips and Articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via NetworkWorld: Do you have an IT trade craft skills gap?

View more Server, Storage and I/O trends and perspectives comments here

Events and Activities

Recent and upcoming event activities.

Sep. 21, 2017 – MSP CMG – Minneapolis MN
Sep. 20, 2017 – Redmond Data Protection and Backup – Webinar
Sep. 14, 2017 – Fujifilm IT Executive Summit – Seattle WA
Sep. 12, 2017 – SNIA Software Developers Conference (SDC) – Santa Clara CA
Sep. 7, 2017 – WiPro – Planning Your Software Defined Journey – New York City
August 29, 2017 – VMworld – Las Vegas

See more webinars and activities on the Server StorageIO Events page here.

Server StorageIO Industry Resources and Links

Useful links and pages:
Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows
storageio.com/links – Various industry links (over 1,000 with more to be added soon)
objectstoragecenter.com – Cloud and object storage topics, tips and news items
OpenStack.org – Various OpenStack related items
storageio.com/protect – Various data protection items and topics
thenvmeplace.com – Focus on NVMe trends and technologies
thessdplace.com – NVM and Solid State Disk topics, tips and techniques
storageio.com/converge – Various CI, HCI and related SDS topics
storageio.com/performance – Various server, storage and I/O benchmark and tools
VMware Technical Network – Various VMware related items

Ok, nuff said, for now.

Cheers
Gs

Greg Schulz – Multi-year Microsoft MVP Cloud and Data Center Management, VMware vExpert (and vSAN). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio.

August 23, 2017April 27, 2025

Announcing Software Defined Data Infrastructure Essentials Book by Greg Schulz

New SDDI Essentials Book by Greg Schulz of Server StorageIO

Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft

server storage I/O data infrastructure trends

Update 1/21/2018
Over the past several months I have posted, commenting, presenting and discussing more about Software Defined Data Infrastructure Essentials aka SDDI or SDDC and SDI. Now it is time to announce my new book (my 4th solo project), Software Defined Data Infrastructure Essentials Book (CRC Press). Software Defined Data Infrastructure Essentials is now generally available at various global venues in hardcopy, hardback print as well as various electronic versions including via Amazon and CRC Press among others. For those attending VMworld 2017 in Las Vegas, I will be doing a book signing, meet and greet at 1PM Tuesday August 29 in the VMworld book store, as well as presenting at various other fall industry events.

Software Defined Data Infrastructure Essentials Book Announcement

(Via Businesswire) Stillwater, Minnesota – August 23, 2017 – Server StorageIO, a leading independent IT industry advisory and consultancy firm, in conjunction with publisher CRC Press, a Taylor and Francis imprint, announced the release and general availability of “Software-Defined Data Infrastructure Essentials,” a new book by Greg Schulz, noted author and Server StorageIO founder.

The Software Defined Data Infrastructure Essentials book covers physical, cloud, converged (and hyper-converged), container, and virtual server storage I/O networking technologies, revealing trends, tools, techniques, and tradecraft skills.

Various IT and Cloud Infrastructure Layers including Data Infrastructures

From cloud web scale to enterprise and small environments, IoT to database, software-defined data center (SDDC) to converged and container servers, flash solid state devices (SSD) to storage and I/O networking,, the book helps develop or refine hardware, software, services and management experiences, providing real-world examples for those involved with or looking to expand their data infrastructure education knowledge and tradecraft skills.

Software Defined Data Infrastructure Essentials book topics include:

Cloud, Converged, Container, and Virtual Server Storage I/O networking
Data protection (archive, availability, backup, BC/DR, snapshot, security)
Block, file, object, structured, unstructured and data value
Analytics, monitoring, reporting, and management metrics
Industry trends, tools, techniques, decision making
Local, remote server, storage and network I/O troubleshooting
Performance, availability, capacity and economics (PACE)

Where To Purchase Your Copy

Order via Amazon.com and CRC Press along with Google Books among other global venues.

What People Are Saying About Software Defined Data Infrastructure Essentials Book

“From CIOs to operations, sales to engineering, this book is a comprehensive reference, a must-read for IT infrastructure professionals, beginners to seasoned experts,” said Tom Becchetti, advisory systems engineer.

“We had a front row seat watching Greg present live in our education workshop seminar sessions for ITC professionals in the Netherlands material that is in this book. We recommend this amazing book to expand your converged and data infrastructure knowledge from beginners to industry veterans.”

Gert and Frank Brouwer – Brouwer Storage Consultancy

“Software-Defined Data Infrastructures provides the foundational building blocks to improve your craft in several areas including applications, clouds, legacy, and more. IT professionals, as well as sales professionals and support personal, stand to gain a great deal by reading this book.”

Mark McSherry- Oracle Regional Sales Manager

“Greg Schulz has provided a complete ‘toolkit’ for storage management along with the background and framework for the storage or data infrastructure professional (or those aspiring to become one).”
Greg Brunton – Experienced Storage and Data Management Professional

“Software-defined data infrastructures are where hardware, software, server, storage, I/O networking and related services converge inside data centers or clouds to protect, preserve, secure and serve applications and data,” said Schulz. “Both readers who are new to data infrastructures and seasoned pros will find this indispensable for gaining and expanding their knowledge.”

More About Software Defined Data Infrastructure Essentials
Software Defined Data Infrastructures (SDDI) Essentials provides fundamental coverage of physical, cloud, converged, and virtual server storage I/O networking technologies, trends, tools, techniques, and tradecraft skills. From webscale, software-defined, containers, database, key-value store, cloud, and enterprise to small or medium-size business, the book is filled with techniques, and tips to help develop or refine your server storage I/O hardware, software, Software Defined Data Centers (SDDC), Software Data Infrastructures (SDI) or Software Defined Anything (SDx) and services skills. Whether you are new to data infrastructures or a seasoned pro, you will find this comprehensive reference indispensable for gaining as well as expanding experience with technologies, tools, techniques, and trends.

This book is the definitive source providing comprehensive coverage about IT and cloud Data Infrastructures for experienced industry experts to beginners. Coverage of topics spans from higher level applications down to components (hardware, software, networks, and services) that get defined to create data infrastructures that support business, web, and other information services. This includes Servers, Storage, I/O Networks, Hardware, Software, Management Tools, Physical, Software Defined Virtual, Cloud, Docker, Containers (Docker and others) as well as Bulk, Block, File, Object, Cloud, Virtual and software defined storage.

Additional topics include Data protection (Availability, Archiving, Resiliency, HA, BC, BR, DR, Backup), Performance and Capacity Planning, Converged Infrastructure (CI), Hyper-Converged, NVM and NVMe Flash SSD, Storage Class Memory (SCM), NVMe over Fabrics, Benchmarking (including metrics matter along with tools), Performance Capacity Planning and much more including whos doing what, how things work, what to use when, where, why along with current and emerging trends.

Book Features

ISBN-13: 978-1498738156
ISBN-10: 149873815X
Hardcover: 672 pages
(Available in Kindle and other electronic formats)
Over 200 illustrations and 70 plus tables
Frequently asked Questions (and answers) along with many tips
Various learning exercises, extensive glossary and appendices
Publisher: Auerbach/CRC Press Publications; 1 edition (June 19, 2017)
Language: English

SDDI and SDDC toolbox

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

Whats a data infrastructure? (Via NetworkWorld)
Software Defined Data Infrastructure Essentials (CRC Press) book companion page (includes various images, sample figures and added content)
Click here to view (PDF) table of contents (TOC)
Click here to view (PDF) Preface, who should read, how organized and related material
Search and see whats inside Software Defined Data Infrastructure Essentials using this link to Google Books
Do you have an IT trade craft skills gap? (Via NetworkWorld)
Ensure your data infrastructure remains available and resilient (Via NetworkWorld)
Data Infrastructure Primer and Overview (Its Whats Inside The Data Center)
Data Infrastructure Server Storage I/O Tradecraft Trends
Data Infrastructure Server Storage I/O related Tradecraft Overview
Some server storage I/O benchmark workload scripts (Part I)
Server Storage I/O Benchmarking Performance Resource Tools
Server Storage I/O Converged (CI) Hyper-converged (HCI) overview
Data Infrastructure industry links page – Various Data Infrastructure related links
Welcome to the Data Protection Diaries – Data protection related topics
Time to restore your backups, do you know where is your data is?
Object and Cloud Storage Center (www.objectstoragecenter.com)
www.thessdplace.com – NVM, flash, SSD, SCM and related topics
www.thenvmeplace.com – NVM Express (NVMe) related topics
Cloud and Virtual Data Storage Networking (CRC Press) “Intel Recommended Reading” book companion page
The Green and Virtual Data Center (CRC Press) “Intel Recommended Reading” book companion page
Resilient Storage Networks – Designing Flexible Scalable Data Infrastructures (Elsevier) book companion page
Other books, articles, tips, blog posts, news, events, webinars, videos by Greg Schulz and Server StorageIO (portfolio page)

Various IT and Cloud Infrastructure Layers including Data Infrastructures

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Data Infrastructures exist to protect, preserve, secure and serve information along with the applications and data they depend on. With more data being created at a faster rate, along with the size of data becoming larger, increased application functionality to transform data into information means more demands on data infrastructures and their underlying resources.

Software-Defined Data Infrastructure Essentials: Cloud, Converged, and Virtual Fundamental Server Storage I/O Tradecraft is for people who are currently involved with or looking to expand their knowledge and tradecraft skills (experience) of data infrastructures. Software-defined data centers (SDDC), software data infrastructures (SDI), software-defined data infrastructure (SDDI) and traditional data infrastructures are made up of software, hardware, services, and best practices and tools spanning servers, I/O networking, and storage from physical to software-defined virtual, container, and clouds. The role of data infrastructures is to enable and support information technology (IT) and organizational information applications.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Everything is not the same in business, organizations, IT, and in particular servers, storage, and I/O. This means that there are different audiences who will benefit from reading this book. Because everything and everybody is not the same when it comes to server and storage I/O along with associated IT environments and applications, different readers may want to focus on various sections or chapters of this book.

If you are looking to expand your knowledge into an adjacent area or to understand whats under the hood, from converged, hyper-converged to traditional data infrastructures topics, this book is for you. For experienced storage, server, and networking professionals, this book connects the dots as well as provides coverage of virtualization, cloud, and other convergence themes and topics.

This book is also for those who are new or need to learn more about data infrastructure, server, storage, I/O networking, hardware, software, and services. Another audience for this book is experienced IT professionals who are now responsible for or working with data infrastructure components, technologies, tools, and techniques.

Learn more here about Software Defined Data Infrastructure (SDDI) Essentials book along with cloud, converged, and virtual fundamental server storage I/O tradecraft topics, order your copy from Amazon.com or CRC Press here, and thank you in advance for learning more about SDDI and related topics.

Ok, nuff said, for now.

Access Availability RAID Erasure Codes including LRC Deep Dive

Data Protection Access Availability RAID Erasure Codes

Erasure Codes (EC)

Reed Solomon (RS) codes

Local Reconstruction Codes (LRC)

Shingled Erasure Code (SHEC)

Replication and Mirroring

Where To Learn More

What This All Means

Share this:

Enabling Recovery Points (Backup, Snapshots, Versions)

Enabling RPO (Archive, Backup, CDP, PIT Copy, Snapshots, Versions)

Additional Data Protection Terms

Where To Learn More

What This All Means

Share this:

Data Protection Diaries Fundamental Point In Time Granularity

Point-in-Time Protection Granularity Points of Interest

Where To Learn More

What This All Means

Share this:

Data Infrastructure Data Protection Security Logical Physical

Security Logical Physical Software Defined

Where To Learn More

What This All Means

Share this:

Fundamental Tools, Technologies, Toolbox, Buzzword Bingo Trends

Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends

Object Storage

S3 Simple Storage Service

Data Infrastructure Environments and Applications

Data Footprint Reduction (DFR) Including Dedupe

Tips, Recommendations and Considerations

Where To Learn More

What This All Means

Share this:

Data Protection Diaries Walking The Data Protection Talk

Walking The Data Protection Talk What I Do

Where To Learn More

What This All Means

Share this:

Data Protection Toolbox Whos Doing What Technology Tools

who’s Doing What (Toolbox Technology Tools)

Where To Learn More

What This All Means

Share this:

Data Protection Diaries Fundamental Resources Where to Learn More

Software Defined Data Infrastructure Essentials Table of Contents (TOC)

Data Protection Resources Where To Learn More

What This All Means

Share this:

AWS Announces New S3 Cloud Storage Security Encryption Features

Default Encryption

Permission Checks

Cross-Region Replication ACL Overwrite and KMS

Detailed Inventory Report

PrivateLink for AWS Services

Where To Learn More

What This All Means

Share this:

Introducing Windows Subsystem for Linux WSL Overview #blogtober

Installing WSL

Enable Windows Subsystem for Linux Feature on Windows

Using WSL

Where To Learn More

What This All Means

Share this:

Cloud Conversations AWS Azure Service Maps via Microsoft

Whats In the Service Map

Where To Learn More

What This All Means

Share this:

Dell EMC VMware September 2017 Software Defined Data Infrastructure Updates

VMware and AWS

Pivotal Container Service (PKS) and Google Kubernetes Partnership

Other VMworld and September VMware related announcements

Dell EMC Announcements

Software Defined Data Infrastructure Essentials at VMworld Bookstore

Where To Learn More

What This All Means