object Archives

July 23, 2018November 26, 2023

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

Here is the 2018 Hot Popular New Trending Data Infrastructure Vendors To Watch which includes startups as well as established vendors doing new things. This piece follows last year’s hot favorite trending data infrastructure vendors to watch list (here), as well as who will be top of storage world in a decade piece here.

Data Infrastructures Support Information Systems Applications and Their Data

Data Infrastructures are what exists inside physical data centers and cloud availability zones (AZ) that are defined to provide traditional, as well as cloud services. Cloud and legacy data infrastructures are combined by hardware (server, storage, I/O network), software along with management tools, policies, tradecraft techniques (skills), best practices to support applications and their data. There are different types of data infrastructures to meet the needs of various environments that range in size, scope, focus, application workloads, along with Performance and capacity.

Another important aspect of data infrastructures is that they exist to protect, preserve, secure and serve applications that transform data into information. This means that availability and Data Protection including archive, backup, business continuance (BC), business resiliency (BR), disaster recovery (DR), privacy and security among other related topics, technology, techniques, and trends are essential data infrastructure topics.

Different timelines of adoption and deployment for various audiences

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

Some of those on this year’s list are focused on different technology areas, while others on size or types of vendors, suppliers, service providers. Others on the list are focused on who is new, startup, evolving, or established which varies from if you are an industry insider or IT customer environment. Meanwhile others new and some are established doing new things, mix of some you may not have heard of for those who want or need to have the most current list to rattle off startups for industry adoption (and deployment), as well as what some established players are doing that might lead to customer deployment (and adoption).

AMD – The AMD EPYC family of processors is opening up new opportunities for AMD to challenge Intel among others for a more significant share of the general-purpose compute market in support of data center and data infrastructure markets. An advantage that AMD has and is playing to in the industry speeds feeds, slots and watts price performance game is the ability to support more memory and PCIe lanes per socket than others including Intel. Keep in mind that PCIe lanes will become even more critical as NVMe deployment increases, as well as the use of GPU’s and faster Ethernet among other devices. Name brand vendors including Dell and HPE among others have announced or are shipping AMD EPYC based processors.

Aperion – Cloud and managed service provider with diverse capabilities.

Amazon Web Services (AWS) – Continues to expand its footprint regarding regions, availability zones (AZ) also known as data centers in regions, as well as some services along with the breadth of those capabilities. AWS has recently announced a new Snowball Edge (SBE) which in the past has been a data migration appliance now enhanced with on-prem Elastic Cloud Compute (EC2) capabilities. What this means is that AWS can put on-prem compute capabilities as part of a storage appliance for short-term data movement, migration, conversion, importing of virtual machines and other items.

On the other hand, AWS can also be seen as using SBE as a first entry to placing equipment on-prem for hybrid clouds, or, converged infrastructure (CI), hyper-converged infrastructure (HCI), cloud in a box similar to Microsoft Azure Stack, as well as CI/HCI solutions from others.

My prediction near term, however, is that CI/HCI vendors will either ignore SBE, downplay it, create some new marketing on why it is not CI/HCI or fud about vendor lock-in. In other words, make some popcorn and sit back, watch the show.

Backblaze – Low-cost, high-capacity cloud storage for backup and archiving provider known for their quarterly disk drive reliability ratings (or failure) reports. They have been around for a while, have a good reputation among those who use their services for being a low-cost alternative to the larger providers.

Barefoot networks – Some of you may already be aware of or following Barefoot Networks, while others may not have heard of them outside of the networking space. They have some impressive capabilities, are new, you probably have not heard of them, thus an excellent addition to this list.

Cloudian – Continue to evolve and no longer just another object storage solution, Cloudian has been expanding via organic technology development, as well as acquisitions giving them a broad portfolio of software-defined storage and tiering from on-prem to the cloud, block, file and object access.

Cloudflare – Not exactly a startup, some of you may know or are using Cloudflare, while to others, their role as a web cache, DNS, and other service is transparent. I have been using Cloudflare on my various sites for over a year, and like the security, DNS, cache and analytics tools they provide as a customer.

Cobalt Iron – For some, they might be new, Software-defined Data protection and management is the name of the game over at Cobalt Iron which has been around a few years under the radar compared to more popular players. If you have or are involved with IBM Tivoli aka TSM based backup and data protection among others, check out the exciting capabilities that Cobalt can bring to the table.

CTERA – Having been around for a while, to some they might not be a startup, on the other hand, they may be new to others while offering new data and file management options to others.

DataCore – You might know of DataCore for their software-defined storage and past storage hypervisor activity. However, they have a new piece of software MaxParallel that boost server storage I/O performance. The software installs on your Windows Server instance (bare metal, VM, or cloud instance) and shows you performance with and without acceleration which you can dynamically turn off and off.

DataDirect Networks (DDN) – Recently acquired Lustre assets from Intel, now picking up the storage startup Tintri pieces after it ceased operations. What this means is that while beefing up their traditional High-Performance Compute (HPC) and Super Compute (SC) focus, DDN is also expanding into broader markets.

Dell Technologies – At its recent Dell Technology World event in Las Vegas during late April, early May 2018, several announcements were made, including some tied to emerging Gen-Z along with composability. More recently, Dell Technologies along with VMware announced business structure and finance changes. Changes include VMware declaring a dividend, Dell Technologies being its largest shareholder will use proceeds to fund restricting and debt service. Read more about VMware and Dell Technology business and financial changes here.

Densify – With a name like Densify no surprise they propose to drive densification and automation with AI-powered deep learning to optimize application resource use across on-prem software-defined virtual as well as cloud instances and containers.

FlureDB – If you are into databases (SQL or NoSQL), as well as Blockchain or distributed ledgers, check out FlureDB.

Innovium.com – When it comes to data infrastructure and data center networking, Innovium is probably not on your radar, however, keep an eye on these folks and their TERALYNX switching silicon to see where it ends up given their performance claims.

Komprise – File, and data management solutions including tiering along with partners such as IBM.

Kubernetes – A few years ago OpenStack, then Docker containers was the favorite and trending discussion topic, then Mesos and along comes Kubernetes. It’s safe to say, at least for now, Kubernetes is settling in as a preferred open source industry and customer defecto choice (I want to say standard, however, will hold off on that for now) for container and related orchestration management. Besides, do it yourself (DiY) leveraging open source, there are also managed AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Services (AKS), Google Kubernetes Engine (GKE), and VMware Pivotal Container Service (PKS) among others. Besides Azure, Microsoft also includes Kubernetes support (along with Docker and Windows containers) as part of Windows Servers.

ManageEngine (part of Zoho) – Has data infrastructure monitoring technology called OpManager for keeping an eye on networking.

Marvel – Marvel may not be a familiar name (don’t confuse with comics), however, has been a critical component supplier to partners whose server or storage technology you may be familiar with or have yourself. Server, Storage, I/O Networking chip maker has closed on its acquisition of Cavium (who previously bought Qlogic among others). The combined company is well positioned as a key data infrastructure component supplier to various partners spanning servers, storage, I/O networking including Fibre Channel (FC), Ethernet, InfiniBand, NVMe (and NVMeoF) among others.

Mellanox – Known for their InfiniBand adapters, switches, and associated software, along with growing presence in RDMA over Converged Ethernet (RoCE), they are also well positioned for NVMe over Fabrics among other growth opportunities following recent boardroom updates, along with technology roadmap’s.

Microsoft – Azure public cloud continues to evolve similarly to AWS with more region locations, availability zone (AZ) data centers, as well as features and extensions. Microsoft also introduced about a year ago its hybrid on-prem CI/HCI cloud in a box platform appliance Azure Stack (read about my test drive here). However, there is more to Microsoft than just their current cloud first focus which means Windows (desktop), as well as Server, are also evolving. Currently, in public preview, Windows Server 2019 insiders build available to try out many new capabilities, some of which were covered in the recent free Microsoft Virtual Summit held in June. Key themes of Windows Server 2019 include security, performance, hybrid cloud, containers, software-defined storage and much more.

Microsemi – Has been around for a while is the combination of some vendors you may not have heard of or heard about in some time including PMC-Sierra (acquired Adaptec) and Vitesse among others. The reason I have Microsemi on this list is a combination of their acquisitions which might be an indicator of whom they pick up next. Another reason is that their components span data infrastructure topics from servers, storage, I/O and networking, PCIe and many more.

NVIDIA – GPU high performance compute and related compute offload technologies have been accessible for over a decade. More recently with new graphics and computational demands, GPU such as those NVIDIA are in need. Demand includes traditional graphics acceleration for physical and virtual, augmented and virtual reality, as well as cloud, along with compute-intensive analytics, AI, ML, DL along with other cognitive workloads.

NGDSystems (NGD) – Similar to what NVIDIA and other GPU vendors do for enabling compute offload for specific applications and workloads, NGD is working on a variation. That variation is to move offload compute capabilities for the server I/O storage-intensive workloads closer, in fact into storage system components such as SSDs and emerging SCMs and PMEMs. Unlike GPU based applications or workloads that tend to be more memory and compute intensive, NGD is positioned for applications that are the server I/O and storage intensive.

The premise of NGD is that they move the compute and application closer to where the data is, eliminating extra I/O, as well as reducing the amount of main server memory and compute cycles. If you are familiar with other server storage I/O offload engines and systems such as Oracle Exadata database appliance NGD is working at a tighter integration granularity. How it works is your application gets ported to run on the NGD storage platform which is SSD based and having a general-purpose processor. Your application is initiated from a host server, where it then runs on the NGD meaning I/Os are kept local to the storage system. Keep in mind that the best I/O is the one that you do not have to do, the second best is the one with the least resource or user impact.

Opvisor – Performance activity and capacity monitoring tools including for VMware environments.

Pavillon – Startup with an interesting NVMe based hardware appliance.

Quest – Having gained their independence as a free-standing company since divestiture from Dell Technologies (Dell had previously acquired Quest before EMC acquisition), Quest continues to make their data infrastructure related management tools available. Besides now being a standalone company again, keep an eye on Quest to see how they evolve their existing data protection and data infrastructure resource management tools portfolio via growth, acquisition, or, perhaps Quest will be on somebody else’s future growth list.

Retrospect – Far from being a startup, after gaining their independence from when EMC bought them several years ago, they have since continued to enhance their data protection technology. Disclosure, I have been a Retrospect customer since 2001 using it for on-site, as well as cloud data protection backups to the cloud.

Rubrik – Becoming more of a data infrastructure household name given their expanding technology portfolio and marketing efforts. More commonly known in smaller customer environments, as well as broadly within industry insider circles, Rubrik has potential with continued technology evolution to move further upmarket similar to how Commvault did back in the late 90s, just saying.

SkyScale – Cloud service provider that offers dedicated bare metal, as well as private, hybrid cloud instances along with GPU to support AI, ML, DL and other high performance, compute workloads.

Snowflake – The name does not describe well what they do or who they are. However, they have an interesting cloud data warehouse (old school) large-scale data lakes (new school) technologies.

Strongbox – Not to be confused with technology such as those from Iosafe (e.g., waterproof, fireproof), Strongbox is a data protection storage solution for storing archives, backups, BC/BR/DR data, as well as cloud tiering. For those who are into buzzword bingo, think cloud tiering, object, cold storage among others. The technology evolved out of Crossroads and with David Cerf at the helm has branched out into a private company with keeping an eye on.

Storbyte – With longtime industry insider sales and marketing pro-Diamond Lauffin (formerly Nexsan) involved as Chief Evangelist, this is worth keeping an eye on and could be entertaining as well as exciting. In some ways it could be seen as a bit of Nexsan meets NVme meets NAND Flash meets cost-effective value storage dejavu play.

Talon – Enterprise storage and management solutions for file sharing across organizations, ROBO and cloud environments.

Ubitqui – Also known as UBNT is a data infrastructure networking vendor whose technologies span from WiFi access points (AP), high-performance antennas, routing, switching and related hardware, along with software solutions. UBNT is not as well-known in more larger environments as a Cisco or others. However, they are making a name for themselves moving from the edge to the core. That is, working from the edge with AP and routers, firewalls, gateways for the SMB, ROBO, SOHO as well as consumer (I have several of their APs, switches, routers and high-performance antennas along with management software), these technologies are also finding their way into larger environments.

My first use of UBNT was several years ago when I needed to get an IP network connection to a remote building separated by several hundred yards of forest. The solution I found was to get a pair of UBNT NANO Apps, put them in secure bridge mode; now I have a high-performance WiFi service through a forest of trees. Since then have replaced an older Cisco router, several Cisco, and other APs, as well as the phased migration of switches.

UpdraftPlus– If you have a WordPress web or blog site, you should also have a UpdraftPlus plugin (go premium btw) for data protection. I have been using Updraft for several years on my various sites to backup and protect the MySQL databases and all other content. For those of you who are familiar with Spanning (e.g., was acquired by EMC then divested by Dell) and what they do for cloud applications, UpdraftPlus does similar for lower-end, smaller cloud-based applications.

Vexata – Startup scale out NVMe storage solution.

VMware – Expanding their cloud foundation from on-prem to in and on clouds including AWS among others. Data Infrastructure focus continues to expand from core to edge, server, storage, I/O, networking. With recent Dell Technologies and VMware declaring a dividend, should be interesting to see what lies ahead for both entities.

What About Those Not Mentioned?

By the way, if you were wondering about or why others are not in the above list, simple, check out last year’s list which includes Apcera, Blue Medora, Broadcom, Chelsio, Commvault, Compuverde, Datadog, Datrium, Docker, E8 Storage, Elastifile, Enmotus, Everspin, Excelero, Hedvig, Huawei, Intel, Kubernetes, Liqid, Maxta, Micron, Minio, NetApp, Neuvector, Noobaa, NVIDA, Pivot3, Pluribus Networks, Portwork, Rozo Systems, ScaleMP, Storpool, Stratoscale, SUSE Technology, Tidalscale, Turbonomic, Ubuntu, Veeam, Virtuozzo and WekaIO. Note that many of the above have expanded their capabilities in the past year and remain, or have become even more interesting to watch, while some might be on the future where are they now list sometime down the road. View additional vendors and service providers via our industry links and resources page here.

What About New, Emerging, Trending and Trendy Technologies

Bitcoin and Blockchain storage startups, some of which claim or would like to replace cloud storage taking on giants such as AWS S3 in the not so distant future have been popping up lately. Some of these have good and exciting stories if they can deliver on the hype along with the premise. A couple of names to drop include among others Filecoin, Maidsafe, Sia, Storj along with services from AWS, Azure, Google and a long list of others.

Besides Blockchain distributed ledgers, other technologies and trends to keep an eye on include compute processes from ARM to SoC, GPU, FPGA, ASIC for offload and specialized processing. GPU, ASIC, and FPGA are appearing in new deployments across cloud providers as they look to offload processing from their general servers to derive total effective productivity out of them. In other words, innovating by offloading to boost their effective return on investment (old ROI), as well as increase their return on innovation (the new ROI).

Other data infrastructure server I/O which also ties into storage and network trends to watch include Gen-Z that some may claim as the successor to PCIe, Ethernet, InfiniBand among others (hint, get ready for a new round of “something is dead” hype). Near-term the objective of Gen-Z is to coexist, complement PCIe, Ethernet, CPU to memory interconnect, while enabling more granular allocation of data infrastructure resources (e.g., composability). Besides watching who is part of the Gen-Z movement, keep an eye on who is not part of it yet, specifically Intel.

NVMe and its many variations from a server internal to networked NVMe over Fabrics (NVMeoF) along with its derivatives continue to gain both industry adoption, as well as customer deployment. There are some early NVMeoF based server storage deployments (along with marketing dollars). However, the server side NVMe customer adoption is where the dollars are moving to the vendors. In other words, it’s still early in the bigger broader NVMe and NVMeoF game.

Where to learn more

Learn more about data infrastructures and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What this all means

Let’s see how those mentioned last year as well as this year, along with some new and emerging vendors, service providers who did not get said end up next year, as well as the years after that.

Different timelines of adoption and deployment for various audiences

Keep in mind that there is a difference between industry adoption and customer deployment, granted they are related. Likewise let’s see who will be at the top in three, five and ten years, which means some of the current top or favorite vendors may or may not be on the list, same with some of the established vendors. Meanwhile, check out the 2018 Hot Popular New Trending Data Infrastructure Vendors to Watch.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

July 2, 2018November 26, 2023

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash.

Image via Dell Technologies

Summary of Dell transaction announcement includes:

VMware declares an $11 Billion USD cash dividend pro rata to all VMware stock holders.
Given ownership percentage of VMware, Dell Technologies will receive approximately $9 Billion USD cash dividend.
Dell plans to list its Class C common stock shares on the New York Stock Exchange (NYSE).
Dell plans to use the VMware dividend proceeds to fund cash consideration to be paid to Class V (tracking stock) shareholders.
For each Class V share (e.g. VMware tracking stock) shareholders can choose to receive:
1.3665 shares of Dell Technologies Class C common stock, or
$109 in cash per DVMT (Class V share) a 29% premium per share

Image via Dell Technologies

Additional interest points of this transaction include:

Transaction expected to close Q4 CY2018, subject to Class V shareholder approval.
VMware maintains its independence as a separate publicly traded company.
Dell Technologies maintains its 81% ownership of VMware common stock
Dell Technologies Class V (DVMT) shareholders will own 20.8% to 31.0% of Dell Class C (depending on cash election amounts).
Streamline Dell capital and ownership structure.
Establishes a public security (stock) in global end to end data infrastructure provider (e.g. Dell Technologies Stock on NYSE).
Enables financial flexibility for future strategic initiatives

Image via Dell Technologies

Michael Dell and Silver Lake Continued Ownership

As part of this transaction, both Michael Dell and Silver Lake partners announce commitment to Dell Technologies. Michael Dell will continue to serve as Chairman and CEO, along with a committed stockholder beneficially owning between about 47% to 54% of Dell Technologies on a fully diluted basis. Silver Lake equity partners, an investor in Dell will continue its long-term partnership with Michael Dell beneficially owning between about 16%-18% of Dell Technologies on a fully diluted basis.

Where to learn more

Learn more about Dell Technologies, VMware, Data Infrastructures and related topics via the following links:

Dell Investors Events Page
Announcement of VMware Dividend, Dell Class V, and Class C stock news
Dell announcement conference call presentation ( PDF)
VMware VMworld
Via SearchStorage: Dell EMC storage IPO, VMware merger plans still unclear
Via SearchStorage: Dell EMC storage strategy talk buzzes Dell Tech World
Via SearchStorage: How a Dell and VMware merger could affect EMC storage
May 2018 Server StorageIO Data Infrastructure Update Newsletter
Dell Technology World 2018 Announcement Summary
VMware vSphere vSAN vCenter version 6.7 SDDC Update Summary
Application Data Value Characteristics Everything Is Not The Same (Part I)
Data Infrastructure Primer Overview (Its Whats Inside The Data Center)
If NVMe is the answer, what are the questions?
NVMe Primer (or refresh), The NVMe Place, and The SSD Place
Server Storage I/O Benchmark Performance Resource Tools
Data Infrastructure server storage I/O network Recommended Reading

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What this all means

This announcement enables Dell to streamline its financial structure, while providing VMware shareholder with a dividend value. In addition, this Dell Technologies announcement puts to rest industry discussions of what will Michael Dell along with Dell Technologies and VMware do in the future. Speaking of the future, this transaction could also page the wave for future investment or acquisitions by Dell and/or VMware. Now the question is if you are a DVMT tracking stock shareholder, do you take the $109 USD cash, or, new Class C Dell Technologies stock? Now lets see how Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash plays out during the rest of summer and into the fall.

Ok, nuff said, for now.

Cheers Gs

June 30, 2018November 26, 2023

June 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 6 (June 2018)

Hello and welcome to the June 2018 Server StorageIO Data Infrastructure Update Newsletter.

In cased you missed it, the May 2018 Server StorageIO Data Infrastructure Update Newsletter can be viewed here (HTML and PDF).

In this issue buzzwords topics include AI, All Flash, HPC, Lustre, Multi Cloud, NVMe, NVMeoF, SAS, and SSD among others:

Data Infrastructure Industry Activity
News Commentary and Tips
Server StorageIOblog posts
Recommended Reading
Various Events and Webinars
Industry Resources and Links

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

June data infrastructure, server, storage, I/O network, hardware, software, cloud, converged, and container as well as data protection industry activity includes among others:

Check out what’s new at Amazon Web Services (AWS) here, as well as Microsoft Azure here, Google Cloud Compute here, IBM Softlayer here, and OVH here. CTERA announced new cloud storage gateways (HC Series) for enterprise environments that include all flash SSD options, capacity up to 96TB (raw), Petabyte scale tiering to public and private cloud, 10 Gbe Ethernet connectivity, virtual machine deployment, along with high availability configuration.

Cray announced enhancements to its Lustre (parallel file system) based ClusterStor storage system for high performance compute (HPC) along with it previously acquired from Seagate (Who had acquired it as part of the Xyratex acquisition). New enhancements for ClusterStor include all flash SSD solution that will integrate and work with our existing hard disk drive (HDD) based systems.

In related Lustre based activity, DataDirect Network (DDN) has acquired from Intel, their Lustre File system capability. Intel acquired its Lustre capabilities via its purchase of Whamcloud back in 2012, and in 2017 announced that it was getting out of the Lustre business (here and here). DDN also announced new storage solutions for enabling HPC environment workloads along with Artificial Intelligence (AI) centric applications.

HPE which held its Discover event announced a $4 Billion USD investment over four years pertaining to development of edge technologies and services including software defined WAN (SD-WAN) and security among others.

Microsoft held its first virtual Windows Server Summit in June that outlined current and future plans for the operating system along with its hybrid cloud future.

Penguin computing has announced the Accelion solution for accessing geographically dispersed data enabling faster file transfer or other data movement functions.

SwiftStack has added multi cloud features (enhanced search, universal access, policy management, automation, data migration) and making them available via 1space open source project. 1space enables a single object namespace across different object storage locations including integration with OpenStack Swift.

Vexata announced a new version of its Vexata operating system (VX-OS) for its storage solution including NVMe over Fabric (NVMe-oF) support.

Speaking of NVMe and fabrics, the Fibre Channel Industry Association (FCIA) announced that the International Committee on Information Technology Standards (INCITS) has published T11 technical committee developed Fibre Channel over NVMe (FC-NVMe) standard.

Various NVMe front-end including NVMeoF along with NVMe back-end devices (U.2, M.2, AiC)

Keep in mind that there are many different facets to NVMe including direct attached (M.2, U.2/8639, PCIe AiC) along with fabrics. Likewise, there are various fabric options for the NVMe protocol including over Fibre Channel (FC-NVMe), along with other NVMe over Fabrics including RDMA over Converged Ethernet (RoCE) as well as IP based among others. NVMe can be used as a front-end on storage systems supporting server attachment (e.g. competes with Fibre Channel, iSCSI, SAS among others).

Another variation of NVMe is as a back-end for attachment of drives or other NVMe based devices in storage systems, as well as servers. There is also end to end NVMe (e.g. both front-end and back-end) options. Keep context in mind when you hear or talk about NVMe and in particular, NVMe over fabrics, learn more about NVMe at https://storageioblog.com/nvme-place-volatile-memory-express/.

Toshiba announced new RM5 series of high capacity SAS SSDs for replacement of SATA devices in servers. The RM5 series being added to the Toshiba portfolio combine capacity and economics traditional associated with SATA SSDs along with performance as well as connectivity of SAS.

Check out other industry news, comments, trends perspectives here.

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via SearchStorage: Comments The storage administrator skills you need to keep up today
Via SearchStorage: Comments Managing storage for IoT data at the enterprise edge
Via SearchCloudComputing: Comments Hybrid cloud deployment demands a change in security mindset

View more Server, Storage and I/O trends and perspectives comments here.

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

Announcing Windows Server Summit Virtual Online Event
May 2018 Server StorageIO Data Infrastructure Update Newsletter
Solving Application Server Storage I/O Performance Bottlenecks Webinar
Have you heard about the new CLOUD Act data regulation?
Data Protection Recovery Life Post World Backup Day Pre GDPR
Microsoft Windows Server 2019 Insiders Preview
Which Enterprise HDD for Content Server Platform
Server Storage I/O Benchmark Performance Resource Tools
Introducing Windows Subsystem for Linux WSL Overview
Data Infrastructure Primer Overview (Its Whats Inside The Data Center)
If NVMe is the answer, what are the questions?

View other recent as well as past StorageIOblog posts here

Server StorageIO Recommended Reading (Watching and Listening) List

In addition to my own books including Software Defined Data Infrastructure Essentials (CRC Press 2017) available at Amazon.com (check out special sale price), the following are Server StorageIO data infrastructure recommended reading, watching and listening list items. The Server StorageIO data infrastructure recommended reading list includes various IT, Data Infrastructure and related topics including Intel Recommended Reading List (IRRL) for developers is a good resource to check out. Speaking of my books, Didier Van Hoye (@WorkingHardInIt) has a good review over on his site you can view here, also check out the rest of his great content while there.

Watch for more items to be added to the recommended reading list book shelf soon.

Events and Activities

Recent and upcoming event activities.

July 25, 2018 – Webinar – Data Protect & Storage

June 27, 2018 – Webinar – App Server Performance

June 26, 2018 – Webinar – Cloud App Optimize

May 29, 2018 – Webinar – Microsoft Windows as a Service

April 24, 2018 – Webinar – AWS and on-site, on-premises hybrid data protection

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Data Infrastructure Recommend Reading and watching list
Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows
storageio.com/links – Various industry links (over 1,000 with more to be added soon)
objectstoragecenter.com – Cloud and object storage topics, tips and news items
OpenStack.org – Various OpenStack related items
storageio.com/downloads – Various presentations and other download material
storageio.com/protect – Various data protection items and topics
thenvmeplace.com – Focus on NVMe trends and technologies
thessdplace.com – NVM and Solid State Disk topics, tips and techniques
storageio.com/converge – Various CI, HCI and related SDS topics
storageio.com/performance – Various server, storage and I/O benchmark and tools
VMware Technical Network – Various VMware related items

What this all means and wrap-up

Data Infrastructures are what exists inside physical data centers as well as spanning cloud, converged, hyper-converged, virtual, serverless and other software defined as well as legacy environments. NVMe continues to gain in industry adoption as well as customer deployment. Cloud adoption also continues along with multi-cloud deployments. Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter and watch for more NVMe,cloud, data protection among other topics in future posts, articles, events, and newsletters.

Ok, nuff said, for now.

June 9, 2018November 26, 2023

Solving Application Server Storage I/O Performance Bottlenecks Webinar

The best I/O is the one you do not have to do, the second best is the one with least server I/O and storage overhead along with application performance bottleneck impact.

Fast applications need fast servers, storage, I/O networking hardware, and software. Merely throwing more hardware as a cache at application performance bottlenecks can help. However, throwing more hardware at performance problems can also cost much cash. On the other hand, a little fast memory and storage in the right place, with robust software performance acceleration can have significant application productivity benefits. Fast hardware also needs fast software to help boost application and user productivity.

As application workloads activity increases, by implementing server software, performance acceleration along with additional fast memory and storage including flash, Storage Class Memories (SCM) among other SSD along with NVMe accessed devices, even more, work can be done boosting productivity while reducing cost.

Join me on June 27 at 1 PM Pacific Time (PT) when I host a free webinar (registration required) sponsored by DataCore and produced by Redmond Magazine/1105 Media as we discuss Solving Application Server Storage I/O Performance Bottlenecks including what you can do today.

I will be joined by guest presenters Augie Gonzalez, Director Technical Product Marketing and Tim Warden, Director Engineering Product Management both from DataCore. During the interactive webinar discussions, we invite you to participate with your questions, as we look at issues, challenges, various approaches, and what you can do today to boost different application performance and productivity.

This webinar is for those whose applications have the need for speed including database, VDI, SharePoint, Exchange, AI, ML and other I/O intensive workloads. Topics that we will be discussing in addition to your questions include:

Boosting application performance without breaking the bank
Improving application productivity and reducing user wait time
Gaining insight and awareness into bottlenecks and what to do
Unlocking value in your existing hardware and software licenses
What you can do today, literally right after or even during this webinar

Where to learn more

Learn more about Windows Server Summit and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What this all means

The best I/O is the one you do not have to do, the second best is the one that has the least impact on your applications, while boosting user productivity. There are many differ net approaches to addressing various server storage I/O performance bottlenecks across applications. Join me on June 27, 2018 at 1 PM PT for the free webinar Solving Application Server Storage I/O Performance Bottlenecks Webinar and learn what you can do today to boost your uses productivity.

Ok, nuff said, for now.

Cheers Gs

June 7, 2018November 26, 2023

Announcing Windows Server Summit Virtual Online Event

Microsoft will be hosting a free (no registration required) half day virtual (e.g. online) Windows Server Summit Virtual Online Event June 26, 2018 starting at 9AM PT. As part of its continued focus on supporting hybrid strategy spanning on-premises Windows Server to Azure (among others including AWS) cloud based, Microsoft is preparing for the launch later this year of Windows Server 2019.

There is no registration required, you can just show up without concern of getting email or other spam, however you can also click here to save the date, as well as here to get updates on the event.

Windows Server 2019 is now in insider preview (get it here) and is the next Long Term Service Channel (LTSC) release following Windows Server 2016. In the past, Microsoft would have called Windows Server 2019 something such as Windows Server 2016 R2, however that has changed with the new Semiannual Channel (SAC) and LTSC release cycles.

Keynote kick off presentations will be from Erin Chapple, Director of Program Management, Cloud + AI (which includes Windows Kernel, Hypervisors, Containers and Storage), Arpan Shah, General Manager of Azure Infrastructure marketing (Windows Server, Azure IaaS, Azure Stack, Azure Management and Security), and, Jeff Woosley Principal PM, Windows Server. In addition to the kick off presentations with current state and status of Windows Servers available for on-premises bare metal, virtual, container as well as cloud, there will be demos, Q&A, roadmap’s and much more. Topics will include new and recent functionalities such as Windows Server 2019, Windows Admin Center (formerly known as Honolulu), IoT, roadmap’s and much more.

Images Via Microsoft Windows Server Summit Page

Windows Server Summit Break Out Tracks

During the Windows Server Summit, there will be four technology focused tracks including:

Hybrid – From on-premisess to Azure, how Windows Server supports different workloads in various configurations, along with associated management tools (including Windows Admin Center aka Honolulu)
Security – New and recent security enhancements for Windows Server along with Hyper-V and other related topics.
Application Platform – Containers and Linux support along with associated management tools for on-premisess and Azure.
Hyper-converged infrastructure (HCI) – Leveraging software defined storage (SDS) with Storage Spaces Direct (S2D) in Windows Server 2016, along with Hyper-V and other technologies, learn how Microsoft supports HCI and beyond.

Where to learn more

Learn more about Windows Server Summit and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What this all means

Windows Server remains relevant today for traditional, on site, on-premises, as well as on-premisess along with cloud, container among other deployments. Remember to click here to save the date, click here to sign up for Windows Server Summit updates and learn more about the Windows Server Summit Virtual Online event here, see there, or at least virtually.

Ok, nuff said, for now.

Cheers Gs

May 24, 2018November 26, 2023

May 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 5 (May 2018)

Hello and welcome to the May 2018 Server StorageIO Data Infrastructure Update Newsletter.

In cased you missed it, the April 2018 Server StorageIO Data Infrastructure Update Newsletter can be viewed here (HTML and PDF).

May has been a busy month with a lot of data infrastructure related activity from software-defined virtual, cloud, container, converged, serverless to legacy, hardware, software, services, server, storage, I/O and networking along with data protection topics among others.

In this issue buzzwords topics include GDPR, NVMe, NVMeoF, Composable, Serverless, Data Protection, SCM, Gen-Z, MaaS:

Data Infrastructure Industry Activity
News Commentary and Tips
Server StorageIOblog posts
Recommended Reading
Various Events and Webinars
Industry Resources and Links

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

May has been a busy month, some data infrastructure, server, storage, I/O network, hardware, software, cloud, converged, and container as well as data protection activity includes among others:

Depending on when you read this, the new global data protection regulations (GDPR) are either days away, or already in effect. For those who are not aware of GDPR other than seeing many inbox items in your email pertaining to it, here are some resources as a refresher or primer:

Webinar with Danny Alan of Veeam (free with registration): GDPR experiences walking the talk
Blog post – Data Protection life after world backup day and pre GDPR
Various GDPR related links, posts and resources

May Buzzword, Buzz Topic and Trends

Besides data protection and GDPR, other recent data infrastructure related news, trends, technologies and topics to keep an eye on (besides AI, ML, DL, AR/VR, IoT, Blockchain, Serverless) include Metal as a Service (MaaS) that might be familiar to some, for others, something new. Canonical has been busy for sometime now with MaaS including in Ubuntu and they are not alone with variations appearing with various managed service providers, hosting and cloud providers as well. NVMe has become a more common topic, technology, trend including for use in servers as well as over fabrics (e.g. NVMe over Fabrics) as a language for server, storage, I/O communication.

A new emerging companion to NVMe is Gen-Z which initially is a companion to PCIe. Longer term, Gen-Z could maybe possibly be a replacement, as well as for use accessing direct random access memory (DRAM) among other uses. Storage Class Memory (SCM) has been an industry conversation topic for several years now with new persistent memories (PMEM) that combine the best of traditional DRAM (Speed and write endurance) as well as persistent, higher capacity, lower cost of traditional NAND flash SSDs.

Another trend topic is that for some, ASIC, FPGA and GPU are new companions to standard commodity compute processors along with servers, yet for others it may be Dejavu as they have been being used for years (ok, decades) in some solutions. For now, two other buzzwords, buzz terms to add or refresh your data infrastructure vocabulary include distributed ledgers (aka blockchains), composable resources and ephemeral instance storage (storage on a cloud instance).

May NVMe Momentum Movement Activity

May saw a lot of NVMe related activity, from chips and components (adapters, devices) to systems spanning direct attached to NVMe over Fabric (NVMeoF). Here is a primer (or refresh) for NVMe along with various deployment options. NVMeoF includes RDMA over Converged Ethernet (RoCE) based, along with NVMe over Fibre Channel (FC-NVMe), as well as emerging NVMe over IP.

NVMe being used for front-end accessed via shared PCIe along with back-end devices

There are many different facets of NVMe including for use as a front-end on storage systems supporting server attachment (e.g. competes with Fibre Channel, iSCSI, SAS among others). Another variation of NVMe is as a back-end for attachment of drives or other NVMe based devices in storage systems, as well as servers.

Front-end using traditional block SAN access with back-end NVMe, SAS and SATA devices

Read more about the many different options and variations of NVMe including key questions to ask or understand, deployment topology along with other related topics at thenvmeplace.com.

Various NVMe front-end including NVMeoF along with NVMe back-end devices (U.2, M.2, AiC)

Software Defined Data Infrastructure Activity

Amazon Web Services (AWS) continues to add new features, functionality as well as extending those as along with existing capabilities into various regions. Some recent updates include new Elastic Cloud Compute (EC2) Microsoft Windows Servers versions 1709 and 1803 Amazon Machine Images (AMIs). Other AWS updates include spot instances support for Red Hat BYOL (Bring Your Own License), VPN enhancements, X1e instances available in Frankfurt, H1 instance price reduction, as well as LightSail now in Canada, Paris, and Seoul regions.

For those who are not familiar with LightSail, they are virtual private servers (VPS) which are different from traditional EC2 instances. LightSail can be a cost-effective way for those who need to move out of general population shared hosting, yet cannot justify a full EC2 instance while requiring more than a container.

The LightSail instance also is available with various software pre-installed such as for WordPress websites among others. For example, I have used LightSail as a backup and standby WordPress site for StorageIOblog using Updraft Plus Pro for data protection.

In other news, AWS C5d EC2 instances are available in various regions. C5d instances are available with 2, 4, 8, 16, 36 and 72 vCPUs along with up to 1800GB of NVMe based ephemeral storage for on-demand reserved or spot instances.

Note that instance-based storage is temporary meaning that it persists for the life of the instance. What this means is that if you stop and restart the instance, the data is not persistence. Instance-based storage is useful for data that can be protected or persisted to other storage including EBS (Elastic Block Storage). Usage includes batch, log and analytics processing, burst buffers, cache or workspace.

AWS also announced a new Simple Storage Service (S3) storage class a month or so ago called One Zone Availability Infrequent Access. This new storage class primarily provides a lower cost of storage with lower durability (e.g., data spread across one zone vs. multiple). Over the past couple of months, I have been migrating from S3 Infrequent Access (IA) as well as standard into One Zone Availability. Some of my active data remains in S3 Standard storage class, while cold archives are in Glacier.

A tip about migrating to One Zone Availability, as well as between other S3 storage classes is paid attention to your API calls and monthly budget. You might see an increase in S3 costs during the migration time, that then settles into the lower prices once data has been moved due to API calls (gets, puts, lists, dir). In other words, pay attention to how many API calls you are allowed per storage class per month, along with other fees beyond focusing only on cost per TByte. Read about other recent AWS news updates here.

Software-defined storage startup Cloudian announced their technology available for test drive on Google Cloud Platform as part of a continued industry trend. That trend is for storage vendors to make their storage software technology available on different cloud platforms such as AWS, Azure, Google, Softlayer among others.

Dell Technologies made several announcements as part of Dell Technologies World that are covered in a series of posts here. Announcements included PowerMax the successor to VMAX, XtremIO X2 updates, new servers, workstations among many other items, read more here.

Besides the data infrastructure, cloud service providers and systems vendors, component suppliers including Cavium announced NVMe over Fibre Channel updates (here and here), along with Marvel NVMe updates here. HPE announced new thin clients and software (t430 Thin Client, HP mt44 Mobile Thin Client, HP ThinPro software), as well as updates to 3PAR and other storage solutions.

IBM announced various storage enhancements (and here) as well as a Happy 30th anniversary to the IBM Power9 based i systems. In other news, Kaseya bought backup data protection vendor Unitrends.

Micron announced the first quad layer cell (QLC) nand flash solid state device (SSD) named 52100 has begun shipping to select customers (and vendors). QLC packs or stacks 4 bits per cell. The 5200 is optimized for read-intensive workloads with up to 33% higher densities compared to previous generation TLC (triple layer cell) NAND flash. Broader market availability is expected to occur later fall 2018, 5210 form factor is 2.5” as a standard SSD or HDD, with capacities from 1.92TB to 7.68TB.

In other news, Micron also announced a $10 Billion (USD) stock repurchase plan, along with an extension of Intel 3D NAND flash memory partnership involving 3D NAND flash, as well as 96 layer 3D NAND. Meanwhile, various vendors are increasingly talking about how their systems are or will be storage class memory (SCM) ready including for use such as Micron 3D XPoint also known as Intel Optane among others.

Microsoft has placed into public preview Azure Active Directory (AAD) Storage authentication for Azure Blobs and Queues. Azure Storage Explorer is now released as version 1.0. AAD storage authentication enables organizations to implement role-based access control of Azure storage resources. Speaking of Azure, Microsoft has published several architectures, reference and other content at the Azure Virtual Datacenter portal here.

If you have not done so, check out Azure File Sync which is currently in public preview. Having been involved and using it for over a year including during private preview, Azure File Sync is an exciting, useful technology for creating a hybrid distributed file sharing with cloud tiering solutions. Learn more Azure File Sync here and here. In other news, Microsoft has announced a preview as part of the April 2018 Windows 10 build for a Hyper-V Google Android emulator support.

NetApp has had Azure based NAS storage in preview for a while now, and also announced Cloud Volumes on Google Cloud Platform (GCP). In addition to Cloud Volumes on AWS, Azure, and GCP, NetApp also announced enhanced NVMe based storage systems among other updates.

Two companies that have similar names are Opendrives (video workflow acceleration) and Opendrive (cloud storage, backup, and data protection). Meanwhile, data infrastructure startup Pavilion has received new funding as well as begun talking about their NVMe including NVMe over Fabric (NVMeOF) hardware storage system. Long-time data infrastructure converged server storage startup Pivot3 announced additional cloud workload mobility.

Pure storage made a couple of announcements including FlashArray//X NVMe based shared accelerated storage system as well as NVIDIA (GPU powered) based AIRI Mini for AI/DL/ML.

Have you heard about Snowflake computing, aka, the cloud data warehouse solution? If not, check them out here. Another cloud-related data infrastructure vendor to look into is Upbound.io who have received additional funding for their multi-cloud management solutions.

Building off of recent VMware vSphere updates (here), and Dell Technology World here, the following is an excellent post about Instant Clone in vSphere 6.7, and VMware vSAN HCI assessment tool here.

Check out other industry news, comments, trends perspectives here.

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via SearchStorage: Comments Managing storage for IoT data at the enterprise edge
Via SearchCloudComputing: Comments Hybrid cloud deployment demands a change in security mindset
Via SearchStorage: Comments Dell EMC storage IPO, VMware merger plans still unclear
Via SearchStorage: Comments Dell EMC midrange storage keeps its overlapping arrays
Via SearchStorage: Comments Dell EMC all-flash PowerMax replaces VMAX, injects NVMe
Via IronMountain InfoGoto: The growing Trend of Secondary Data Storage

View more Server, Storage and I/O trends and perspectives comments here.

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

Dell Technology World 2018 Announcement Summary
Part II Dell Technology World 2018 Modern Data Center Announcement Details
Part III Dell Technology World 2018 Storage Announcement Details
Part IV Dell Technology World 2018 PowerEdge MX Gen-Z Composable Infrastructure
Part V Dell Technology World 2018 Server Converged Announcement Details
April 2018 Server StorageIO Data Infrastructure Update Newsletter
VMware vSphere vSAN vCenter version 6.7 SDDC Update Summary
PCIe Fundamentals Server Storage I/O Network Essentials
Have you heard about the new CLOUD Act data regulation?
Data Protection Recovery Life Post World Backup Day Pre GDPR
Microsoft Windows Server 2019 Insiders Preview
Application Data Value Characteristics Everything Is Not The Same
Data Infrastructure Resource Links cloud data protection tradecraft trends
IT transformation Serverless Life Beyond DevOps Podcast
Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips
Introducing Windows Subsystem for Linux WSL Overview
Data Infrastructure Primer Overview (Its Whats Inside The Data Center)
If NVMe is the answer, what are the questions?

View other recent as well as past StorageIOblog posts here

Server StorageIO Recommended Reading (Watching and Listening) List

Containers, serverless, kubernetes continue to gain in industry adoption, as well as customer deployments. Here is some information about Microsoft Azure Kubernetes Service (AKS). Note that AWS has Elastic Kubernetes Service (EKS), Google, VMware and Pivotal with Pivotal Kubernetes Service (PKS) among others.

Here is an interesting perspective by Ben Kepps about Serverless (e.g. life beyond Kubernetes and containers (e.g. life beyond virtualization which to some is or was life (e.g. life beyond bare metal))) as well as the all to often punditry, evangelism of something new causing something else to be dead.

SNIA has updated their Emerald aka Green energy effectiveness (focus on productivity) measurement specification (V3.01) including NAS NFS file activity (besides block). Learn more at snia.org/forums/green.

Watch for more items to be added to the recommended reading list book shelf soon.

Events and Activities

Recent and upcoming event activities.

June 27, 2018 – Webinar – TBA

May 29, 2018 – Webinar – Microsoft Windows as a Service

April 24, 2018 – Webinar – AWS and on-site, on-premises hybrid data protection

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Connect and Converse With Us

Subscribe to Newsletter – Newsletter Archives – StorageIO.com – StorageIOblog.com

What this all means and wrap-up

Data Infrastructures are what exists inside physical data centers spanning cloud, converged, hyper-converged, virtual, serverless and other software defined as well as legacy environments. So far this spring there has been a lot of data infrastructure related activity, from new technology announcements, to events, trends among others. Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter and watch for more NVMe, Gen-Z, cloud, data protection among other topics in future posts, articles, events, and newsletters.

Ok, nuff said, for now.

March 20, 2018November 26, 2023

March 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 3 (March 2018)

Hello and welcome to the March 2018 Server StorageIO Data Infrastructure Update Newsletter.

If you are wondering where the January and February 2018 update newsletters are, they are rolled into this combined edition. In addition to the short email version (free signup here), you can access full versions (html here and PDF here) along with previous editions here.

In this issue:

Data Infrastructure Industry Activity
News Commentary and Tips
Server StorageIOblog posts
Recommended Reading
Various Events and Webinars
Industry Resources and Links

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

Data Infrastructure Data Protection and Backup BC BR DR HA Security

World Backup day is coming up on March 31 which is a good time to remember to verify and validate that your data protection is working as intended. On one hand I think it is a good idea to call out the importance of making sure your data is protected including backed up.

On the other hand data protection is not a once a year, rather a year around, 7 x 24 x 365 day focus. Also the focus needs to be on more than just backup, rather, all aspects of data protection from archiving to business continuance (BC), business resiliency (BR), disaster recovery (DR), always on, always accessible, along with security and recovery.

Data Infrastructure Data Protection Backup 4 3 2 1 rule
Data Infrastructure 4 3 2 1 Data Protection and Backup

Some data spring thoughts, perspectives and reminders. Data lakes may swell beyond their banks causing rivers of data to flood as they flow into larger reservoirs, great data lakes, gulfs of data, seas and oceans of data. Granted, some of that data will be inactive cold parked like glaciers while others semi-active floating around like icebergs. Hopefully your data is stored on durable storage solutions or services and does not melt.

Data Infrastructure Server Storage I/O flash SSD NVMe
Various NAND Flash SSD devices and SAS, SATA, NVMe, M.2 interfaces

Non-Volatile Memory (NVM) including various solid state device (SSD) mediums (e.g. nand flash, 3D XPoint, MRAM among others), packaging (drives, PCIe Add in cars [AiC] along with entire systems, appliances or arrays). Also part of the continue evolution of NVM, SSD and other persistent memories (PM) including storage class memories (SCM) are different access protocol interfaces.

Keep in mind that there is a difference between NVM (medium) and NVMe (access), NVM is the generic category of mediums or media and devices such as nand flash, nvram, 3D XPoint among others SCM (and PMs). In other words, NVM is what data devices use for storing data, NVMe is how devices and systems are accessed. NVMe and its variations is how NVM, SSD, PM, SCM media and devices get accessed locally, as well as over network fabrics (e.g. NVMe-oF an FC-NVMe).

NVMe continues to evolve including with networked fabric variations such as RDMA based NVMe over Fabric (NVMe-oF), along with Fibre Channel based (FC-NVMe). The Fibre Channel Industry Association trade group recently held its second multi-vendor plugfest in support of NVMe over Fibre Channel.

Read more about NVM, NVMe, SSD, SCM, flash and related technologies, tools, trends, tips via the following resources:

Has Object Storage failed to live up to its industry hype lacking traction? Or, is object storage (also known as blobs) progressing with customer adoption and deployment on normal realistic timelines? Recently I have seen some industry comments about object storage not catching on with customers or failing to live up to its hyped expectation. IMHO object storage is very much alive along with block, file, table (e.g. database SQL and NoSQL repositories), message/queue among others, as well as emerging blockchain aka data exchanges.

Various Industry and Customer Adoption Deployment Timeline (Via: StorageIOblog.com)

An issue with object storage is that it is still new, still evolving, many IT environments applications do not yet speak or access objects and blobs natively. Likewise as is often the case, industry adoption and deployment is usually early and short term around the hype, vs. the longer cycle of customer adoption and deployment. The downside for those who only focus on object storage (or blobs) is that they may be under pressure to do things short term instead of adjusting to customer cycles which take longer, however real adoption and deployment also last longer.

While the hype and industry buzz around object storage (and blobs) may have faded, customer adoption continues and is here to stay, along with block, file among others, learn more at www.objectstoragecenter.com. Also keep in mind that there is a difference between industry and customer adoption along with deployment.

Some recent Industry Activities, Trends, News and Announcements include:

In case you missed it, Amazon Web Services (e.g. AWS) announced EKS (Elastic Kubernetes Service) which as its name implies, is an easy to use and manage Kubernetes (containers, serverless data infrastructure) running on AWS. AWS joins others including Microsoft Azure Kubernetes Services (AKS), Googles Kubernetes Engine, EasyStack (ESContainer for openstack and Kubernetes),VMware Pivotal Container Service (PKS) among others. What this means is that in the container serverless data infrastructure ecosystem Kubernetes container management (orchestration platform) is gaining in both industry as well as customer adoption along with deployment.

Check out other industry news, comments, trends perspectives here.

Data Infrastructure Server StorageIO Comments Content

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via BizTech: Why Hybrid (SSD and HDD) Storage Might Be Fit for SMB environments
Via Excelero: Server StorageIO white paper enabling database DBaaS productivity
Via Cloudian: YouTube video interview file services on object storage with HyperFile
Via CDW Solutions: Comments on Software Defined Access
Via SearchStorage: Comments on Cloudian HyperStore on demand cloud like pricing
Via EnterpriseStorageForum: Comments and tips on Software Defined Storage Best Practices
Via PRNewsWire: Comments on Excelero NVMe NVMesh Database and DBaaS solutions
Via SearchStorage: Comments on NooBaa multi-cloud storage management
Via CDW: Comments on New IT Strategies Improve Your Bottom Line
Via EnterpriseStorageForum: Comments on Software Defined Storage: Pros and Cons
Via DataCenterKnowledge: Comments on The Great Data Center Headache IoT
Via SearchStorage: Comments on Dell and VMware merger scenario options
Via PRNewswire: Comments on Chelsio Microsoft Validation of iWARP/RDMA
Via SearchStorage: Comments on Server Storage Industry trends and Dell EMC
Via ChannelProSMB: Comments on Hybrid HDD and SSD storage solutions
Via ChannelProNetwork: Comments on What the Future Holds for HDDs
Via HealthcareITnews: Comments on MOUNTAINS OF MOBILE DATA
Via SearchStorage: Comments on Cloudian HyperStore 7 targets multi-cloud complexities
Via GlobeNewsWire: Comments on Cloudian HyperStore 7
Via GizModo: Comments on Intel Optane 800P NVMe M.2 SSD
Via DataCenterKnowledge: Comments on getting data centers ready for IoT
Via DataCenterKnowledge: Comments on Beyond the Hype: AI in the Data Center
Via DataCenterKnowledge: Comments on Data Center and Cloud Disaster Recovery
Via SearchStoragae: Comments on Cloudian HyperFile marries NAS and object storage
Via SearchStoragae: Comments on Top 10 Tips on Solid State Storage Adoption Strategy
Via SearchStoragae: Comments on 8 Top Tips for Beating the Big Data Deluge

View more Server, Storage and I/O trends and perspectives comments here.

Data Infrastructure Server StorageIOblog posts

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

Application Data Value Characteristics Everything Is Not The Same
Application Data Availability 4 3 2 1 Data Protection
AWS Cloud Application Data Protection Webinar
Microsoft Windows Server 2019 Insiders Preview
Application Data Characteristics Types Everything Is Not The Same
Application Data Volume Velocity Variety Everything Is Not The Same
Application Data Access Lifecycle Patterns Everything Is Not The Same
Veeam GDPR preparedness experiences Webinar walking the talk
VMware continues cloud construction with March announcements
Benefits of Moving Hyper-V Disaster Recovery to the Cloud Webinar
World Backup Day 2018 Data Protection Readiness Reminder
Use Intel Optane NVMe U.2 SFF 8639 SSD drive in PCIe slot
Data Infrastructure Resource Links cloud data protection tradecraft trends
How to Achieve Flexible Data Protection Availability with All Flash Storage Solutions
November 2017 Server StorageIO Data Infrastructure Update Newsletter
IT transformation Serverless Life Beyond DevOps Podcast
Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips
HPE Announces AMD Powered Gen 10 ProLiant DL385 For Software Defined Workloads
AWS Announces New S3 Cloud Storage Security Encryption Features
Introducing Windows Subsystem for Linux WSL Overview #blogtober
Hot Popular New Trending Data Infrastructure Vendors To Watch

View other recent as well as past StorageIOblog posts here

Server StorageIO Recommended Reading (Watching and Listening) List

In case you may have missed it, here is a good presentation from AWS re:invent 2017 by Brendan Gregg (@brendangregg) about how Netflix does EC2 and other AWS tuning along with plenty of great resource links. Keith Tenzer (@keithtenzer) provides a good perspective piece about containers in a large IT enterprise environment here including various options.

Speaking of IT data centers and data infrastructure environments, checkout the list of some of the worlds most extreme habitats for technology here. Mark Betz (@markbetz) has a series of Docker and Kubernetes networking fundamentals posts on his site here, as well as over at Medium including mention of Google Cloud (@googlecloud). The posts in Marks series are good refresher or intros to how Docker and Kubernetes handles basic networking between containers, pods, nodes, hosts in clusters. Check out part I here and part II here.

Blockchain elements
Image via https://stevetodd.typepad.com

Steve Todd (@Stevetodd) has some good perspectives about Trusted Data Exchanges e.g. life beyond blockchain and bitcoin here along with core element considerations (beyond the product pitch) here, along with associated data infrastructure and storage evolution vs. revolution here.

Watch for more items to be added to the recommended reading list book shelf soon.

Data Infrastructure Server StorageIO event activities

Events and Activities

Recent and upcoming event activities.

March 27, 2018 – Webinar – Veeams Road to GDPR Compliancy The 5 Lessons Learned

Feb 28, 2018 – Webinar – Benefits of Moving Hyper-V Disaster Recovery to the Cloud

Jan 30, 2018 – Webinar – Achieve Flexible Data Protection and Availability with All Flash Storage

Nov. 9, 2017 – Webinar – All You Need To Know about ROBO Data Protection Backup

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Connect and Converse With Us

Subscribe to Newsletter – Newsletter Archives – StorageIO.com – StorageIOblog.com

What this all means and wrap-up

Data Infrastructures are what exists inside physical data centers spanning cloud, converged, hyper-converged, virtual, serverless and other software defined as well as legacy environments. The fundamental role of data infrastructures comprising server (compute), storage, I/O networking hardware, software, services defined by management tools, best practices and policies is to provide a platform for applications along with their data to deliver information services. With March 31 being world backup day, also focus on making sure that on April 1st you are not a fool trying to recover from a bad data protection copy. With the continued movement to flash SSD along with other forms of storage class memory (SCM) and persistent memories (PM), data moves at a faster rate meaning data protection is even more important to get you out of trouble as fast as you get into issues.

Ok, nuff said, for now.

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

November 26, 2017March 7, 2022

Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips

Data Protection Fundamental Topics Tools Techniques Technologies Tips

Update 1/16/2018

Data protection fundamental companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part I of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

The focus of this series is around data protection fundamental topics including Data Infrastructure Services: Availability, RAS, RAID and Erasure Codes (including LRC) ( Chapter 9), Data Infrastructure Services: Availability, Recovery Point ( Chapter 10). Additional Data Protection related chapters include Storage Mediums and Component Devices ( Chapter 7), Management, Access, Tenancy, and Performance ( Chapter 8), as well as Capacity, Data Footprint Reduction ( Chapter 11), Storage Systems and Solutions Products and Cloud ( Chapter 12), Data Infrastructure and Software-Defined Management ( Chapter 13) among others.

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

Posts in this data protection fundamental series include:

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Infrastructures

Data Infrastructures exists to support business, cloud and information technology (IT) among other applications that transform data into information or services. The fundamental role of data infrastructures is to provide a platform environment for applications and data that is resilient, flexible, scalable, agile, efficient as well as cost-effective.

Put another way, data infrastructures exist to protect, preserve, process, move, secure and serve data as well as their applications for information services delivery. Technologies that make up data infrastructures include hardware, software, or managed services, servers, storage, I/O and networking along with people, processes, policies along with various tools spanning legacy, software-defined virtual, containers and cloud. Read more about data infrastructures (its what’s inside data centers) here.

Why SDDC SDDI Need Data Protection
Various Needs Demand Drivers For Data Protection Fundamentals

Why The Need For Data Protection

Data Protection encompasses many different things, from accessibility, durability, resiliency, reliability, and serviceability ( RAS) to security and data protection along with consistency. Availability includes basic, high availability ( HA), business continuance ( BC), business resiliency ( BR), disaster recovery ( DR), archiving, backup, logical and physical security, fault tolerance, isolation and containment spanning systems, applications, data, metadata, settings, and configurations.

From a data infrastructure perspective, availability of data services spans from local to remote, physical to logical and software-defined, virtual, container, and cloud, as well as mobile devices. Figure 9.2 shows various data infrastructure availability, accessibility, protection, and security points of interest. On the left side of Figure 9.2 are various data protection and security threat risks and scenarios that can impact availability, or result in a data loss event ( DLE), data loss access ( DLA), or disaster. The right side of Figure 9.2 shows various techniques, tools, technologies, and best practices to protect data infrastructures, applications, and data from threat risks.

SDDI SDDC Data Protection Fundamental Big Picture
Figure 9.2 Various threat vectors, issues, problems, and challenges that drive the need for data protection

A fundamental role of data infrastructures (and data centers) is to protect, preserve, secure and serve information when needed with consistency. This also means that the data infrastructure resources (servers, storage, I/O networks, hardware, software, external services) and the applications (and data) they combine and are defined to protect are also accessible, durable and secure.

Data Protection topics include:

Maintaining availability, accessibility to information services, applications and data
Data include software, actual data, metadata, settings, certificates and telemetry
Ensuring data is durable, consistent, secure and recoverable to past points in time
Everything is not the same across different environments, applications and data
Aligning techniques and technologies to meet various service level objectives ( SLO)

Data Protection Fundamental Tradecraft Skills Experience Knowledge

Tools, technologies, trends are part of Data Protection, so to are the techniques of knowing (e.g. tradecraft) what to use when, where, why and how to protect against various threats risks (challenges, issues, problems).

Part of what is covered in this series of posts as well as in the Software Defined Data Infrastructure (SDDI) Essentials book is tradecraft skills, tips, experiences, insight into what to use, as well as how to use old and new things in new ways.

This means looking outside the technology box towards what is that you need to protect and why, then knowing how to use different skills, experiences, techniques part of your tradecraft combined with data protection toolbox tools. Read more about tradecraft here.

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Fundamental Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book
PCIe Server Storage I/O Network Fundamentals
If NVMe is the answer, what are the questions?
Fixing the Microsoft Windows 10 1709 post upgrade restart loop
Data Infrastructure server storage I/O network Recommended Reading
Introducing Windows Subsystem for Linux WSL Overview
IT transformation Serverless Life Beyond DevOps with New York Times CTO Nick Rockwell Podcast
HPE Announces AMD Powered Gen 10 ProLiant DL385 For Software Defined Workloads
NVM Non Volatile Memory Express NVMe Place
AWS Announces New S3 Cloud Storage Security Encryption Features

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Everything is not the same across environments, data centers, data infrastructures and applications.

Likewise everything is and does not have to be the same when it comes to Data Protection. Data protection fundamentals encompasses many different hardware, software, services including cloud technologies, tools, techniques, best practices, policies and tradecraft experience skills (e.g. knowing what to use when, where, why and how).

Since everything is not the same, various data protection approaches are needed to address various application performance availability capacity economic ( PACE) needs, as well as SLO and SLAs.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 2 Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals.

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Diaries Reliability, Availability, Serviceability RAS Fundamentals

Reliability, Availability, Serviceability RAS Fundamentals

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 2 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 1 Data Infrastructure Data Protection Fundamentals, and click here to view the next post Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC.

In this post the focus is around Data Protection availability from Chapter 9 which includes access, durability, RAS, RAID and Erasure Codes (including LRC), mirroring and replication along with related topics.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals

Reliability, Availability Serviceability (RAS) and other access availability along with Data Protection topics are covered in chapter 9. A resilient data infrastructure (software-defined, SDDC and legacy) protects, preserves, secures and serves information involving various layers of technology. These technologies enable various layers ( altitudes) of functionality, from devices up to and through the various applications themselves.

SDDI SDDC Data Protection Big Picture
Figure 9.2 Various threat issues and challenges that drive the need for data protection

Some applications need a faster rebuild, while others need sustained performance (bandwidth, latency, IOPs, or transactions) with the slower rebuild; some need lower cost at the expense of performance; others are ok with more space if other objectives are meet. The result is that since everything is different yet there are similarities, there is also the need to tune how data Infrastructure protects, preserves, secures, and serves applications and data.

General reliability, availability, serviceability, and data protection functionality includes:

Manually or automatically via policies, start, stop, pause, resume protection
Adjust priorities of protection tasks, including speed, for faster or slower protection
Fast-reacting to changes, disruptions or failures, or slower cautious approaches
Workload and application load balancing (performance, availability, and capacity)

RAS can be optimized for:

Reduced redundancy for lower overall costs vs. resiliency
Basic or standard availability (leverage component plus)
High availability (use better components, multiple systems, multiple sites)
Fault-tolerant with no single points of failure (SPOF)
Faster restart, restore, rebuild, or repair with higher overhead costs
Lower overhead costs (space and performance) with lower resiliency
Lower impact to applications during rebuild vs. faster repair
Maintenance and planned outages or for continues operations

Common availability Data Protection related terms, technologies, techniques, trends and topics pertaining to data protection from availability and access to durability and consistency to point in time protection and security are shown below.

Data Protection Gaps and Air Gap

There are Good Data Protection Gaps that provide recovery points to a past time enabling recoverability in the future to move forward. Another good data protection gap is an Air Gap that isolates protection copies off-site or off-line so that they can not be tampered with enabling recovery from ransomware and other software defined threats. There are Bad data protection gaps including gaps in coverage where data is not protected or items are missing. Then there are Ugly data protecting gaps which include Bad gaps that result in what you think is protected are not and finding that your copies are bad when it is too late.

Data Protection Gaps Good Bad and Ugly

The following figure shows good data protection gaps including recovery points (point in time protection) along with air gaps.

Good Data Protection Gaps
Figure 9.9 Air Gaps and Data Protection

Fault / Failures To Tolerate (FTT)

FTT is how many faults or failures to tolerate for a given solution or service which in turn determines what mode of protection, or fault tolerant mode ( FTM) to use.

Fault Tolerant Mode (FTM)

FTM is the mode or technique used to enable resiliency and protect against some number of faults.

Fault / Failure Domains

Fault or Failure domains are places and things that can fail from regions, data centers or availability zones, clusters, stamps, pods, servers, networks, storage, hardware (systems, components including SSD and HDDs, power supplies, adapters). Other fault domain topics and focus areas include facility power, cooling, software including applications, databases, operating systems and hypervisors among others.

SDDI SDDC Fault Domains Zones Regions
Figure 9.5 Various Fault and Failure Domains, Regions, Locations

Clustering

Clustering is a technique and technology for enabling resiliency, as well as scaling performance, availability, and capacity. Clusters can be local, remote, or wide-area to support different data infrastructure objectives, combined with replication and other techniques.

SDDI SDDC Clustering
Figure 9.12 Clustering and Replication Examples

Another characteristic of clustering and resiliency techniques is the ability to detect and react quickly to failures to isolate and contain faults, as well as invoking automatic repair if needed. Different clustering technologies enable various approaches, from proprietary hardware and software tightly coupled to loosely coupled general-purpose hardware or software.

Clustering characteristics include:

Application, database, file system, operating system (Windows Storage Replica)
Storage systems, appliances, adapters and network devices
Hypervisors ( Hyper-V, VMware vSphere ESXi and vSAN among others)
Share everything, share some things, share nothing
Tightly or loosely coupled with common or individual system metadata
Local in a data center, campus, metro, or stretch cluster
Wide-area in different regions and availability zones
Active/active for fast fail over or restart, active/passive (standby) mode

Additional clustering considerations include:

How does performance scale as nodes are added, or what overhead exists?
How is cluster resource locking in shared environments handled?
How many (or few) nodes are needed for quorum to exist?
Network and I/O interface (and management) requirements
Cluster partition or split-brain (i.e., cluster splits into two)?
Fast-reacting fail over and resiliency vs. overhead of failing back
Locality of where applications are located vs. storage access and clustering

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Everything is not the same across different environments, data centers, data infrastructures and applications. There are various performance, availability, capacity economic (PACE) considerations along with service level objectives (SLO). Availability means being able to access information resources (applications, data and underlying data infrastructure resources), as well as data being consistent along with durable. Being durable means enabling data to be accessible in the event of a device, component or other fault domain item failures (hardware, software, data center).

Just as everything is not the same across different environments, there are various techniques, technologies and tools that can be used in different ways to enable availability and accessibility. These include high availability (HA), RAS, mirroring, replication, parity along with derivative erasure code (EC), LRC, RS and other RAID implementations, along with clustering. Also keep in mind that pertaining to data protection, there are good gaps (e.g. time intervals for recovery points, air gaps), bad gaps (missed coverage or lack of protection), and ugly gaps (not being able to recover from a gap in time).

Note that mirroring, replication, EC, LRC, RS or other Parity and RAID approaches are not replacements for backup, rather they are companions to time interval based recovery point protection such as snapshots, backup, checkpoints, consistency points and versioning among others (discussed in follow-up posts in this series).

Which data protection tool, technology to trend is the best depends on what you are trying to accomplish and your application workload PACE requirements along with SLOs. Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC.

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Diaries Access Availability RAID Erasure Codes LRC Deep Dive

Access Availability RAID Erasure Codes including LRC Deep Dive

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 3 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 2 Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals, and click here to view the next post Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

In this post part of the Data Protection diaries series as well as companion to Chapter 9 of SDDI Essentials book, we are going on a longer, deeper dive. We are going to look at availability, access and durability including mirror, replication, RAID including various traditional and newer parity approaches such as Erasure Codes ( EC), Local Reconstruction Code (LRC), Reed Solomon (RS) also known as RAID 2 among others. Later posts in this series look at point in time data protection to support recovery to a given time (e.g. RPO), while this and the previous post look at maintaining access and availability.

Keep in mind that if something can fail, it probably will, also that everything is not the same meaning different environments, application workloads (along with their data). Different environments and applications have diverse performance, availability, capacity economic (PACE) attributes, along with service level objectives ( SLOs). Various SLOs include PACE attributes, recovery point objectives ( RPO), recovery time objective ( RTO) among others.

Availability, accessibility and durability (see part two in this series) along with associated RAS topics are part of what enable RTO, as well as meet Faults (or failures) to tolerate ( FTT). This means that different fault tolerance modes ( FTM) determine what technologies, tools, trends and techniques to use to meet different RTO, FTT and application PACE needs.

Maintaining access and availability along with durability (e.g. how many copies of data as well as where stored) protects against loss or failure of a component device ( SSD, HDDs, adapters, power supply, controller), node or system, appliance, server, rack, clusters, stamps, data center, availability zones, regions, or other Fault or Failure domains spanning hardware, software, and services.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Protection Access Availability RAID Erasure Codes

This is a good place to mention some context for RAID and RAID array, which can mean different things pertaining to Data Protection. Some people associate RAID with a hardware storage array, or with a RAID card. Other people consider an array to be a storage array that is a RAID enabled storage system. A trend is to refer to legacy storage systems as RAID arrays or hardware-based RAID, to differentiate from newer implementations.

Context comes into play in that a RAID group (i.e., a collection of HDDs or SSD that is part of a RAID set) can be referred to as an array, a RAID array, or a virtual array. What this means is that while some RAID implementations may not be relevant, there are many new and evolving variations extending parity based protection making at least software-defined RAID still relevant

Keep context in mind, and don’t be afraid to ask what someone is referring to: a particular vendor storage system, a RAID implementation or packaging, a storage array, or a virtual array. Also keep the context of the virtual array in perspective vs. storage virtualization and virtual storage. RAID as a term is used to refer to different modes such as mirroring or parity, and parity can be legacy RAID 4, 5, or 6 along with erasure codes (EC). Note some people refer to erasure codes in the context of not being a RAID system, which can be an inference to not being a legacy storage system running hardware RAID (e.g. not software or software defined).

The following figure (9.13) shows various availability protection schemes (e.g. not recovery point) that maintain access while protecting against loss of a component, device, system, server, site, region or other part of a fault domain. Since everything is not the same with environments and applications having different Performance Availability Capacity Economic ( PACE) attributes, there are various approaches for enabling availability along with accessibility.

Keep in mind that RAID and Erasure codes along with their various, as well as replication and mirroring by themselves are not a replacement for backup or other point in time (e.g. enable recovery point) protection.

Instead, availability technologies such as RAID and erasure code along with mirror as well as replication need to be combined with snapshots, point in time copies, consistency points, checkpoints, backups among other recovery point protection for complete data protection.

Speaking of replacement for backup, while many vendors and their pundits claIm or want to see backup as being dead, as long as they keep talking about backup instead of broader data protection backup will remain alive.

SDDC SDDI RAID Parity Erasure Code EC
Figure 9.13 Various RAID, Mirror, Parity and Erasure Code (EC) approaches

Different RAID levels (including parity, EC, LRC and RS based) will affect storage energy effectiveness, similar to various SSD or HDD performance capacity characteristics; however, a balance of performance, availability, capacity, and energy needs to occur to meet application service needs. For example, RAID 1 mirroring or RAID 10 mirroring and striping use more HDDs and, thus, power, but will yield better performance than RAID 6 and erasure code parity protection.

	Normal performance	Availability	Performance overhead	Rebuild overhead	Availability overhead
RAID 0 (stripe)	Very good read & write	None	None	Full volume restore	None
RAID 1 (mirror or replicate)	Good reads; writes = device speed	Very good; two or more copies	Multiple copies can benefit reads	Re-synchronize with existing volume	2:1 for dual, 3:1 for three-way copies
RAID 4 (stripe with dedicated parity, i.e., 4 + 1 = 5 drives total)	Poor writes without cache	Good for smaller drive groups and devices	High on write without cache (i.e., parity)	Moderate to high, based on number and type of drives	Varies; 1 Parity/N, where N = number of devices
RAID 5 (stripe with rotating parity, 4 + 1 = 5 drives)	Poor writes without cache	Good for smaller drive groups and devices	High on write without cache (i.e., parity)	Moderate to high, based on number and type of drives	Varies 1 Parity/N, where N = number of devices
RAID 6 (stripe with dual parity, 4 + 2 = 6 drives)	Poor writes without cache	Better for larger drive groups and devices	High on write without cache (i.e., parity)	Moderate to high, based on number and type of drives	Varies; 2 Parity/N, where N = number of devices
RAID 10 (mirror and stripe)	Good	Good	Minimum	Re-synchronize with existing volume	Twice mirror capacity stripe drives
Reed-Solomon (RS) parity, also known as erasure code (EC), local reconstruction code (LRC), and SHEC	Ok for reads, slow writes; good for static and cold data with front-end cache	Good	High on writes (CPU for parity calculation, extra I/O operations)	Moderate to high, based on number and type of drives, how implemented, extra I/Os for reconstruction	Varies, low overhead when using large number of devices; CPU, I/O, and network overhead.

Table 9.3 Common RAID Characteristics

Besides those shown in table 9.3, other RAID including parity based approaches include 2 (Reed Solomon), 3 (synchronized stripe and dedicated parity) along with others including combinations such as 10, 01, 50, 60 among others.

Similar to legacy parity-based RAID, some erasure code implementations use narrow drive groups while others use larger ones to increase protection and reduce capacity overhead. For example, some larger enterprise-class storage systems (RAID arrays) use narrow 3 + 1 or 4 + 1 RAID 5 or 4 + 2 or 6 + 2 RAID 6, which have higher protection storage capacity overhead and fault=impact footprint.

On the other hand, many smaller mid-range and scale-out storage systems, appliances, and solutions support wide stripes such as 7 + 1, 15 + 1, or larger RAID 5, or 14 + 2 or larger RAID 6. These solutions trade the lower storage capacity protection overhead for risk of a multiple drive failures or impacts. Similarly, some EC implementations use relatively small groups such as 6, 2 (8 drives) or 4, 2 (6 drives), while others use 14, 4 (18 drives), 16, 4 (20 drives), or larger.

Table 9.4 shows options for a number of data devices (k) vs. a number of protect devices (m).

k (data devices)	m (protect devices)	Availability; Resiliency	Space capacity overhead	Normal performance	FTT	Comments; Examples
Narrow	Wide	Very good; Low impact of rebuild	Very high	Good (R/W)	Very good	Trade space for RAS; Larger m vs. k; 1, 1; 1, 2; 2, 2; 4, 5
Narrow	Narrow	Good	Good	Good (R/W)	Good	Use with smaller drive groups; 2, 1; 3, 1; 6, 2
Wide	Narrow	Ok to good; With larger m value	Low as m gets larger	Good (read); Writes can be slow	Ok to good	Smaller m can impact rebuild; 3, 1; 7, 1; 14, 2; 13, 3
Wide	Wide	Very good; Balanced	High	Good	Very good	Trade space for RAS; 2, 2; 4, 4; 8, 4; 18, 6

Table 9.4. Comparing Various Data Device vs. Protect Device Configurations

Note that wide k with no m, such as 4, 0, would not have protection. If you are focused on reducing costs and storage space capacity overhead, then a wider (i.e., more devices) with fewer protect devices might make sense. On the other hand, if performance, availability, and minimal to no impact during rebuild or reconstruction are important, then a narrower drive set, or a smaller ratio of data to protect drives, might make sense.

Also note that the higher or larger the RAID number, or parity scheme, or number of "m" devices in a parity and erasure code group may not be better, likewise smaller may not be better. What is better is which approach meets your specific application performance, availability, capacity, economic (PACE) needs, along with SLO, RTO, RPO requirements. What can also be good is to use hybrid approaches combining different technologies and tools to facilitate both access, availability, durability along with point in time recovery across different layers of granularity (e.g. device, drive, adapter, controller, cabinet, file system, data center, etc).

Some focus on the lower level RAID as the single or primary point of protection, however watch out for that being your single point of failure as well. For example, instead of building a resilient RAID 10 and then neglecting to have adequate higher level access, as well as recovery point protection, combine different techniques including file system protection, snapshots, and backups among others.

Figure 9.14 shows various options and considerations for balancing between too many or too few data (k) and protect (m) devices. The balance is about enabling particular FTT along with PACE attributes and SLO. This means, for some environments or applications, using different failure-tolerant modes ( FTM) in various combinations as well as configurations.

SDDC SDDI Data Protection
Figure 9.14 Comparing various data drive to protection devices

Figure 9.14 top shows no protection overhead (with no protection); the bottom shows 13 data drives and three protection drives in an EC (RS or LRC among others) configuration that could tolerate three devices failing before loss of data or access occurs. In between are various options that can also be scaled up or down across a different number of devices ( HDDs, SSD, or systems).

Some solutions allow the user or administrator to configure the I/O chunk, slabs, shard, or stripe size, for example, from 8 KB to 256 KB to 1 MB (or larger), aligning with application workload and I/O profiles. Other options include the ability to set or disable read-ahead, write-through vs. write-back cache (with battery-protected cache), among other options.

The width or number of devices in a RAID parity or erasure group is based on a combination of factor, including how much data is to be stored and what your FTT objective is, along with spreading out protection overhead. Another consideration is whether you have large or small files and objects.

For example, if you have many small files and a wide stripe, parity, or erasure code set with a large chunk or shard size, you may not have an optimal configuration from a performance perspective.

The following figure shows combing various data protection availability and accessibility technologies including local as well as remote mirroring and replication, along with parity or erasure code (including LRC, RS, SHEC among others) approaches. Instead of just using one technology, a hybrid approach is used leveraging mirror (local on SSD) and replication across sites including asynchronous and synchronous. Replication modes include Asynchronous (time-delayed, eventual consistency) for longer distance, higher latency networks, and synchronous (strong consistency, real-time) for short distance or low-latency networks.

Note that the mirror and replication can be done in software deployed as part of a storage system, appliance or as tin-wrapped software, virtual machine, virtual storage appliance, container or some other deployment mode. Likewise RAID, parity and erasure code software can be deployed and packaged in different ways.

In addition to mirror and replication, solutions are also using parity based including erasure code variations for lower cost, less active data. In other words, the mirror on SSD handles active hot data, as well as any buffering or cache, while lower performance, higher capacity, lower cost data gets de-staged or migrated to a parity erasure code tier. Some vendors, service provider and solutions leveraging variations of the approach in figure 9.15 include Microsoft ( Azure and Windows) and VMware among others.

SDDC SDDI Data Protection
Figure 9.15 Combining various availability data protection techniques

A tradecraft skill is finding the balance, knowing your applications, the data, and how the data is allocated as well as used, then leveraging that insight and your experience to configure to meet your application PACE requirements.

Consider:

Number of drives (width) in a group, along with protection copies or parity
Balance rebuild performance impact and time vs. storage space overhead savings
Ability to mix and match various devices in different drive groups in a system
Management interface, tools, wizards, GUIs, CLIs, APIs, and plug-ins
Different approaches for various applications and environments
Context of a physical RAID array, system, appliance, or solution vs. logical

Erasure Codes (EC)

Erasure Codes ( EC) combines advanced protection with variable space capacity overhead over many drives, devices, or systems using large parity chunks, shards compared to traditional parity RAID approaches. There are many variations of EC as well as parity based approaches, some are tied to Reed Solomon (RS) codes while others use different approaches.

Note that some EC are optimized for reducing the overhead and cost of storing data (e.g. less space capacity) for inactive, or primarily read data. Likewise, some EC or variations are optimized for performance of reads/writes as well as reducing overhead of rebuild, reconstructions, repairs with least impact. Which EC or parity derivative approach is best depends on what you are trying to do or impact to avoid.

Reed Solomon (RS) codes

Reed Solomon (RS) codes are advanced parity protection mathematical algorithm technique that works well on large amounts of data providing protection with lower space capacity overhead depending on how configured. Many Erasure Codes (EC) are based on derivatives of RS. Btw, did you know (or remember) that RAID 2 (rarely used with few legacy implementations) has ties to RS codes? Here are some additional links to RS including via Backblaze, CMU, and Dr Dobbs.

Local Reconstruction Codes (LRC)

Microsoft leverages LRC in Azure as well as in Windows Servers. LRC are optimized for a balance of protection, space capacity savings, normal performance as well as reducing impact on running workloads during a repair, rebuild or reconstruction. One of the tradeoffs that LRC uses is to add some amount of additional space capacity in exchange for normal and abnormal (e.g. during repair) performance improvements. Where RS, EC and other parity based derivatives typically use a (k,m) nomenclature (e.g. data, protection), LRC adds an extra variable to help with constructions (k,m,n).

Some might argue that LRC are not as space efficient as other EC, RS or parity derivative variations of which the counter argument can be that some of those approaches are not as performance effective. In other words, everything is not the same, one approach does not or should not have to be applied to all, unless of course your preferred solution approach can only do one thing.

Additional LRC related material includes:

(PDF by Microsoft) LRC Erasure Coding in Windows Storage Spaces
(Microsoft Usenix Paper) Best Paper Award Erasure Coding in Azure
(Via MSDN Shared) Azure Storage Erasure Coding with LRC
(Via Microsoft) Azure Storage with Strong Consistency
(Paper via Microsoft) 23rd ACM Symposium on Operating Systems Principles (SOSP)
(Microsoft) Erasure Coding in Azure with LRC
(Via Microsoft) Good collection of EC, RS, LRC and related material
(Via Microsoft) Storage Spaces Fault Tolerance
(Via Microsoft) Better Way To Store Data with EC/LRC
(Via Microsoft) Volume resiliency and efficiency in Storage Spaces

Shingled Erasure Code (SHEC)

Shingled Erasure Codes (SHEC) are a variation of Erasure Codes leveraging shingled overlay approach similar to what is being used in Shingled Magnetic Recording (SMR) on some HDDs. Ceph has been an early promoter of SHEC, read more here, and here.

Replication and Mirroring

Replication and Mirroring create a mirror or replica copy of data across different devices, systems, servers, clusters, sites or regions. In addition to keeping a copy, mirror and replication can occur on different time intervals such as real-time ( synchronous) and time deferred (Asynchronous). Besides time intervals, mirror and replication are implemented in different locations at various altitudes or stack layers from lower level hardware adapter or storage systems and appliances, to operating systems, hypervisors, software defined storage, volume managers, databases and applications themselves.

Covered in more detail in chapters 5 and 6, synchronous provides real-time, strong consistency, although high-latency local or remote interfaces can impact primary application performance. Note there is a common myth that high-latency networks are only long distance when in fact some local networks can also be high-latency. Asynchronous (also discussed in more depth in chapters 5 and 6) enables local and remote high-latency communications to be spanned, facilitating protection over a distance without impacting primary application performance, albeit with lower consistency, time deferred, also known as eventual consistency.

Mirroring (also known as RAID 1) and replication creates a copy (a mirror or replica) across two or more storage targets (devices, systems, file systems, cloud storage service, applications such as a database). The reason for using mirrors is to provide a faster (for normal running and during recovery) failure-tolerant mode for enabling availability, resiliency, and data protection, particularly for active data.

Figure 9.10 shows general replication scenarios. Illustrated are two basic mirror scenarios: At the top, a device, volume, file system, or object bucket is replicated to two other targets (i.e., three-way or three replicas); At the bottom, is a primary storage device using a hybrid replica and dispersal technique where multiple data chunks, shards, fragments, or extents are spread across devices in different locations.

SDDC SDDI Mirror and Replication
Figure 9.10 Various Mirror and Replication Approaches

Mirroring and replication can be done locally inside a system (server, storage system, or appliance), within a cabinet, rack, or data center, or remotely, including at cloud services. Mirroring can also be implemented inside a server in software or using RAID and HBA cards to off-load the processing.

SDDC SDDI Mirror Replication Techniques
Figure 9.11 Mirror or Replication combined with Snapshots or other PiT protection

Keep in mind that mirroring and replication by themselves are not a replacement for backups, versions, snapshots, or another recovery point, time-interval (time-gap) protection. The reason is that replication and mirroring maintain a copy of the source at one or more destination targets. What this means is that anything that changes on the primary source also gets applied to the target destination (mirror or replica). However, it also means that anything changed, deleted, corrupted, or damaged on the source is also impacted on the mirror replica (assuming the mirror or replicas were or are mounted and accessible on-line).

implementations in various locations (hardware, software, cloud) include:

Applications and databases such as SQL Server, Oracle among others
File systems, volume manager, Software-defined storage managers
Third-party storage software utilities and drivers
Operating systems and hypervisors
Hardware adapter and off-load devices
Storage systems and appliances
Cloud and managed services

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Revisiting RAID storage remains relevant and resources
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Data Protection Diaries series
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

There are various data protection technologies, tools and techniques for enabling availability of information resources including applications, data and data Infrastructure resources. Likewise there are many different aspects of RAID as well as context from legacy hardware based to cloud, virtual, container and software defined. In other words, not all RAID is in legacy storage systems, and there is a lot of FUD about RAID in general that is probably actually targeted more at specific implementations or products.

There are different approaches to meet various needs from stripe for performance with no protection by itself, to mirror and replication, as well as many parity approaches from legacy to erasure codes including Reed Solomon based as well as LRC among others. Which approach is best depends on your objects including balancing performance, availability, capacity economic (PACE) for normal running behavior as well as during faults and failure modes.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Fundamentals Recovery Points (Backup, Snapshots, Versions)

Enabling Recovery Points (Backup, Snapshots, Versions)

Updated 1/7/18

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 4 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC, and click here to view the next post Part 5 Point In Time Data Protection Granularity Points of Interest.

In this post the focus is around Data Protection Recovery Points (Archive, Backup, Snapshots, Versions) from Chapter 10 .

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Enabling RPO (Archive, Backup, CDP, PIT Copy, Snapshots, Versions)

SDDC SDDI Data Protection Points of Interests
Figure 9.5 Data Protection and Availability Points of Interest

RAID, including parity and erasure code (EC) along with mirroring and replication, provide availability and accessibility. These by themselves, however, are not a replacement for backup (or other point in time data protection) to support recovery points. For complete data protection the solution is to combine resiliency technology with point-in-time tools enabling availability and facilitate going back to a previous consistency time.

Recovery point protection is implemented within applications using checkpoint and consistency points as well as log and journal switches or flush. Other places where recovery-point protection occurs include in middleware, database, key-value stores and repositories, file systems, volume managers, and software-defined storage, in addition to hypervisors, operating systems, containers, utilities, storage systems, appliances, and service providers.

In addition to where, there are also different approaches, technologies, techniques, and tools, including archive, backup, continuous data protection, point-in-time copies, or clones such as snapshots, along with versioning.

Common recovery point Data Protection related terms, technologies, techniques, trends and topics pertaining to data protection from availability and access to durability and consistency to point in time protection and security are shown below.

Time interval protection for example with Snapshot, backup/restore, point in time copies, checkpoints, consistency point among other approaches can be scheduled or dynamic. They can also vary by how they copy data for example full copy or clone, or incremental and differential (e.g. what has changed) among other techniques to support 4 3 2 1 data protection. Other variations include how many concurrent copies, snapshots or versions can take place, along with how many stored and for how long (retention).

Additional Data Protection Terms

Copy Data Management ( CDM) as its name implies is associated managing various data copies for data protection, analytics among other activities. This includes being able to identify what copies exist (along with versions), where they are located among other insight.

Data Protection Management ( DPM) as its name implies is the management of data protection from backup/restore, to snapshots and other recovery point in time protection, to replication. This includes configuration, monitoring, reporting, analytics, insight into what is protected, how well it is protected, versions, retention, expiration, disposition, access control among other items.

Number of 9s Availability – Availability (access or durability or access and availability) can be expressed in number of nines. For example, 99.99 (four nines), indicates the level of availability (downtime does not exceed) objective. For example, 99.99% availability means that in a 24-hour day there could be about 9 seconds of downtime, or about 52 minutes and 34 seconds per year. Note that numbers can vary depending on whether you are using 30 days for a month vs. 365/12 days, or 52 weeks vs. 365/7 for weeks, along with rounding and number decimal places as shown in Table 9.1.

Uptime	24-hour Day	Week	Month	Year
99	0 h 14 m 24 s	1 h 40 m 48 s	7 h 18 m 17 s	3 d 15 h 36 m 15 s
99.9	0 h 01 m 27 s	0 h 10 m 05 s	0 h 43 m 26 s	0 d 08 h 45 m 36 s
99.99	0 h 00 m 09 s	0 h 01 m 01 s	0 h 04 m 12 s	0 d 00 h 52 m 34 s
99.999	0 h 00 m 01s	0 h 00 m 07 s	0 h 00 m 36 s	0 d 00 h 05 m 15 s

Table 9.1 Number of 9’s Availability Shown as Downtime per Time Interval

Service Level Objectives SLOs are metrics and key performance indicators (KPI) that guide meeting performance, availability, capacity, and economic targets. For example, some number of 9’s availability or durability, a specific number of transactions per second, or recovery and restart of applications. Service-level agreement (SLA) – SLA specifies various service level objectives such as PACE requirements including RTO and RPO, among others that define the expected level of service and any remediation for loss of service. SLA can also specify availability objectives as well as penalties or remuneration should SLO be missed.

Recovery Time Objective RTO is how much time is allowed before applications, data, or data infrastructure components need to be accessible, consistent, and usable. An RTO = 0 (zero) means no loss of access or service disruption, i.e., continuous availability. One example is an application end-to-end RTO of 4 hours, meaning that all components (application server, databases, file systems, settings, associated storage, networks) must be restored, rolled back, and restarted for use in 4 hours or less.

Another RTO example is component level for different data infrastructure layers as well as cumulative or end to end. In this scenario, the 4 hours includes time to recover, restart, and rebuild a server, application software, storage devices, databases, networks, and other items. In this scenario, there are not 4 hours available to restore the database, or 4 hours to restore the storage, as some time is needed for all pieces to be verified along with their dependencies.

Data Loss Access DLA occurs when data still exists, is consistent, durable, and safe, but it cannot be accessed due to network, application, or other problem. Note that the inverse is data that can be accessed, but it is damaged. Data Loss Event DLE is an incident that results in loss or damage to data. Note that some context is needed in a scenario in which data is stolen via a copy but the data still exists, vs. the actual data is taken and is now missing (no copies exist). Also note that there can be different granularity as well as scope of DLE for example all data or just some data lost (or damaged). Data Loss Prevention DLP encompasses the activities, techniques, technologies, tools, best practices, and tradecraft skills used to protect data from DLE or DLA.

Point in Time (PiT) such as PiT copy or data protection refers to a recovery or consistency point where data can be restored from or to (i.e., RPO), such as from a copy, snapshot, backup, sync, or clone. Essentially, as its name implies, it is the state of the data at that particular point in time.

Recovery Point Objective RPO is the point in time to which data needs to be recoverable (i.e., when it was last protected). Another way of looking at RPO is how much data you can afford to lose, with RPO = 0 (zero) meaning no data loss, or, for example, RPO = 5 minutes being up to 5 minutes of lost data.

SDDC SDDI RTO RPO
Figure 9.8 Recovery Points (point in time to recover from), and Recovery Time (how long recovery takes)

Frequency refers to how often and on what time interval protection is performed.

4 3 2 1 and 3 2 1 data protection rule
Figure 9.4 Data Protection 4 3 2 1 and 3 2 1 rule

In the context of the 4 3 2 1 rule, enabling RPO is associated with durability, meaning number of copies and versions. Simply having more copies is not sufficient because if they are all corrupted, damaged, infected, or contain deleted data, or data with latent nefarious bugs or root kits, then they could all be bad. The solution is to have multiple versions and copies of the versions in different locations to provided data protection to a given point in time.

Timeline and delta or recovery points are when data can be recovered from to move forward. They are consistent points in the context of what is/was protected. Figure 10.1 shows on the left vertical axis different granularity, along with protection and consistency points that occur over time (horizontal axis). For example, data “Hello” is written to storage (A) and then (B), an update is made “Oh Hello,” followed by (C) full backup, clone, and master snapshot or a gold copy is made.

SDDC SDDI Data Protection Recovery consistency points
Figure 10.1 Recovery and consistency points

Next, data is changed (D) to “Oh, Hello,” followed by, at time-1 (E), an incremental backup, copy, snapshot. At (F) a full copy, the master snapshot, is made, which now includes (H) “Hello” and “Oh, Hello.” Note that the previous full contained “Hello” and “Oh Hello,” while the new full (H) contains “Hello” and “Oh, Hello.” Next (G) data is changed to “Oh, Hello there,” then changed (I) to “Oh, Hello there I’m here.” Next (J) another incremental snapshot or copy is made, date is changed (K) to “Oh, Hello there I’m over here,” followed by another incremental (L), and other incremental (M) made a short time later.

At (N) there is a problem with the file, object, or stored item requiring a restore, rollback, or recovery from a previous point in time. Since the incremental (M) was too close to the recovery point (RP) or consistency point (CP), and perhaps damaged or its consistency questionable, it is decided to go to (O), the previous snapshot, copy, or backup. Alternatively, if needed, one can go back to (P) or (Q).

Note that simply having multiple copies and different versions is not enough for resiliency; some of those copies and versions need to be dispersed or placed in different systems or locations away from the source. How many copies, versions, systems, and locations are needed for your applications will depend on the applicable threat risks along with associated business impact.

The solution is to combine techniques for enabling copies with versions and point-in-time protection intervals. PIT intervals enable recovering or access to data back in time, which is a RPO. That RPO can be an application, transactional, system, or other consistency point, or some other time interval. Some context here is that there are gaps in protection coverage, meaning something was not protected.

A good data protection gap is a time interval enabling RPO, or simply a physical and logical break and the distance between the active or protection copy, and alternate versions and copies. For example, a gap in coverage (e.g. bad data protection gap) means something was not protected.

A protection air or distance gap is having one of those versions and copies on another system, in a different location and not directly accessible. In other words, if you delete, or data gets damaged locally, the protection copies are safe. Furthermore, if the local protection copies are also damaged, an air or distance gap means that the remote or alternate copies, which may be on-line or off-line, are also safe.

Good Data Protection Gaps
Figure 9.9 Air Gaps and Data Protection

Figure 10.2 shows on the left various data infrastructure layers moving from low altitude (lower in the stack) host servers or bare metal (BM) physical machine (PM) and up to higher levels with applications. At each layer or altitude, there are different hardware and software components to protect, with various policy attributes. These attributes, besides PACE, FTT, RTO, RPO, and SLOs, include granularity (full or incremental), consistency points, coverage, frequency (when protected), and retention.

SDDC SDDI Data Protection Granularity
Figure 10.2 Protecting data infrastructure granularity and enabling resiliency at various stack layers (or altitude)

Also shown in the top left of Figure 10.2 are protections for various data infrastructure management tools and resources, including active directory (AD), Azure AD (AAD), domain controllers (DC), group policy objects (GPO) and organizational units (OU), network DNS, routing and firewall, among others. Also included are protecting management systems such as VMware vCenter and related servers, Microsoft System Center, OpenStack, as well as data protection tools along with their associated configurations, metadata, and catalogs.

The center of Figure 10.2 lists various items that get protected along with associated technologies, techniques, and tools. On the right-hand side of Figure 10.2 is an example of how different layers get protected at various times, granularity, and what is protected.

For example, the PM or host server BIOS and UEFI as well as other related settings seldom change, so they do not have to be protected as often. Also shown on the right of Figure 10.2 are what can be a series of full and incremental backups, as well as differential or synthetic ones.

Figure 10.3 is a variation of Figure 10.2 showing on the left different frequencies and intervals, with a granularity of focus or scope of coverage on the right. The middle shows how different layers or applications and data focus have various protection intervals, type of protection (full, incremental, snap, differentials), along with retention, as well as some copies to keep.

SDDC SDDI Data Protection Granularity
Figure 10.3 Protecting different focus areas with various granularities

Protection in Figures 10.2 and 10.3 for the PM could be as simple as documentation of what settings to configure, versions, and other related information. A hypervisors may have changes, such as patches, upgrades, or new drivers, more frequently than a PM. How you go about protecting may involve reinstalling from your standard or custom distribution software, then applying patches, drivers, and settings.

You might also have a master copy of a hypervisors on a USB thumb drive or another storage device that can be cloned, customized with the server name, IP address, log location, and other information. Some backup and data protection tools also provide protection of hypervisors (or containers and cloud machine instances) in addition to the virtual machine (VM), guest operating systems, applications, and data.

The point is that as you go up the stack, higher in altitude (layers), the granularity and frequency of protection increases. What this means is that you may have more frequent smaller protection copies and consistency points higher up at the application layer, while lower down, less frequent, yet larger full image, volume, or VM protection, combining different tools, technology, and techniques.

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

Everything is not the same across different environments, data centers, data infrastructures, applications and their workloads (along with data, and its value). Likewise there are different approaches for enabling data protection to meet various SLO needs including RTO, RPO, RAS, FTT and PACE attributes among others. What this means is that complete data protection requires using different new (and old) tools, technologies, trends, services (e.g. cloud) in new ways. This also means leveraging existing and new techniques, learning from lessons of the past to prevent making the same errors.

RAID (mirror, replicate, parity including erasure codes) regardless of where and how implemented (hardware, software, legacy, virtual, cloud) by itself is not a replacement for backup, they need to be combined with recovery point protection of some type (backup, checkpoint, consistency point, snapshots). Also protection should occur at multiple levels of granularity (device, system, application, database, table) to meet various SLO requirements as well as different time intervals enabling 4 3 2 1 data protection.

Keep in mind what is it that you are protecting, why are you protecting it and against what, what is likely to happen, also if something happens what will its impact be, what are your SLO requirements, as well as minimize impact to normal operating, as well as during failure scenarios. For example do you need to have a full system backup to support recovery of an individual database table, or can that table be protected and recovered via checkpoints, snapshots or other fine-grained routine protection? Everything is not the same, why treat and protect everything the same way?

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 5 Point In Time Data Protection Granularity Points of Interest.

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Diaries Fundamental Point In Time Granularity Points of Interest

Data Protection Diaries Fundamental Point In Time Granularity

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 5 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions), and click here to view the next post Part 6 Data Protection Security Logical Physical Software Defined.

In this post the focus is around Data Protection points of granularity, addressing different layers and stack altitude (higher application and lower system level) Chapter 10 . among others.

Point-in-Time Protection Granularity Points of Interest

SDDC SDDI Data Protection Recovery consistency points
Figure 10.1 Recovery and consistency points

Figure 10.1 above is a refresh from previous posts about the role and importance of having various recovery points at different time intervals to enable data protection (and restoration). Building upon figure 10.1, figure 10.5 looks at different granularity of where and how data should be protected. Keep in mind that everything is not the same, so why treat everything the same with the same type of protection?

Figure 10.5 shows backup and Data Protection focus, granularity, and coverage. For example, at the top left is less frequent protection of the operating system, hypervisors, and BIOS, UEFI settings. At the middle left is volume, or device level protection (full, incremental, differential), along with various views on the right ranging from protecting everything, to different granularity such as file system, database, database logs and journals, and operating system (OS) and application software, along with settings.

SDDC SDDI Different Protection Granularity
Figure 10.5 Backup and data protection focus, granularity, and coverage

In Figure 10.5, note that the different recovery point focus and granularity also take into consideration application and data consistency (as well as checkpoints), along with different frequencies and coverage (e.g. full, partial, incremental, incremental forever, differential) as well as retention.

Tip – Some context is needed about object backup and backing up objects, which can mean different things. As mentioned elsewhere, objects refer to many different things, including cloud and object storage buckets, containers, blobs, and objects accessed via S3 or Swift, among other APIs. There are also database objects and entities, which are different from cloud or object storage objects.

Another context factor is that an object backup can refer to protecting different systems, servers, storage devices, volumes, and entities that collectively comprise an application such as accounting, payroll, or engineering, vs. focusing on the individual components. An object backup may, in fact, be a collection of individual backups, PIT copies, and snapshots that combined represent what’s needed to restore an application or system.

On the other hand, the content of a cloud or object storage repository ( buckets, containers, blobs, objects, and metadata) can be backed up, as well as serve as a destination target for protection.

Backups can be cold and off-line like archives, as well as on-line and accessible. However, the difference between the two, besides intended use and scope, is granularity. Archives are intended to be coarser and less frequently accessed, while backups can be more frequently and granular accessed. Can you use a backup for an archive and vice versa? A qualified yes, as an archive could be a master gold copy such as an annual protection copy, in addition to functioning in its role as a compliance and retention copy. Likewise, a full backup set to long-term retention can provide and enable some archive functions.

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

A common theme in this series as well as in my books, webinars, seminars and general approach to data infrastructures, data centers and IT in general is that everything is not the same, why treat it all the same? What this means is that there are differences across various environments, data centers, data infrastructures, applications, workloads and data. There are also different threat risks scenarios (e.g. threat vectors and attack surface if you like vendor industry talk) to protect against.

Rethinking and modernizing data protection means using new (and old) tools in new ways, stepping back and rethinking what to protect, when, where, why, how, with what. This also means protecting in different ways at various granularity, time intervals, as well as multiple layers or altitude (higher up the application stack, or lower level).

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 6 Data Protection Security Logical Physical Software Defined.

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Infrastructure Data Protection Diaries Fundamental Security Logical Physical

Data Infrastructure Data Protection Security Logical Physical

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 6 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 5 – Point In Time Data Protection Granularity Points of Interest, and click here to view the next post Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends.

In this post the focus is around Data Infrastructure and Data Protection security including logical as well as physical from chapter 10 , 13 and 14 among others.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

There are many different aspects of security pertaining to data infrastructures that span various technology domains or focus areas from higher level application software to lower level hardware, from legacy to cloud an software-defined, from servers to storage and I/O networking, logical and physical, from access control to intrusion detection, monitoring, analytics, audit, monitoring, telemetry logs, encryption, digital forensics among many others. Security should not be an after thought of something done independent of other data infrastructure, data center and IT functions, rather integrated.

Security Logical Physical Software Defined

Physical security includes locked doors of facilities, rooms, cabinets or devices to prevent un-authorized access. In addition to locked doors, physical security also includes safeguards to prevent accidental or intentional acts that would compromise the contents of a data center including data Infrastructure resources (servers, storage, I/O networks, hardware, software, services) along with the applications that they support.

Logical security includes access controls, passwords, event and access logs, encryption among others technologies, tools, techniques. Figure 10.11 shows various data infrastructure security–related items from cloud to virtual, hardware and software, as well as network services. Also shown are mobile and edge devices as well as network connectivity between on-premises and remote cloud services. Cloud services include public, private, as well as hybrid and virtual private clouds (VPC) along with virtual private networks (VPN). Access logs for telemetry are also used to track who has accessed what and when, as well as success along with failed attempts.

Certificates (public or private), Encryption, Access keys including .pem and RSA files via a service provider or self-generated with a tool such as Putty or ssh-keygen among many others. Some additional terms including Two Factor Authentication (2FA), Subordinated, Role based and delegated management, Single Sign On (SSO), Shared Access Signature (SAS) that is used by Microsoft Azure for access control, Server Side Encryption (SSE) with various Key Management System (KMS) attributes including customer managed or via a third-party.

SDDC SDDI Data Protection Security
Figure 10.11 Various physical and logical security and access controls

Also shown in figure 10.11 are encryption enabled at various layers, levels or altitude that can range from simple to complex. Also shown are iSCSI IPsec and CHAP along with firewalls, Active Directory (AD) along with Azure AD (AAD), and Domain Controllers (DC), Group Policies Objects (GPO) and Roles. Note that firewalls can exist in various locations both in hardware appliances in the network, as well as software defined network (SDN), network function virtualization (NFV), as well as higher up.

For example there are firewalls in network routers and appliances, as well as within operating systems, hypervisors, and further up in web blogs platforms such as WordPress among many others. Likewise further up the stack or higher in altitude access to applications as well as database among other resources is also controlled via their own, or in conjunction with other authentication, rights and access control including ADs among others.

A term that might be new for some is attestation which basically means to authenticate and be validated by a server or service, for example, a host guarded server attests with a attestation server. What this means is that the host guarded server (for example Microsoft Windows Server) attests with a known attestation server, that looks at the Windows server comparing it to known good fingerprints, profiles, making sure it is safe to run as a guarded resources.

Other security concerns for legacy and software defined environments include secure boot, shield VMs, host guarded servers and fabrics (networks or clusters of servers) for on-premises, as well as cloud. The following image via Microsoft shows an example of shielded VMs in a Windows Server 2016 environment along with host guarded service (HGS) components ( see how to deploy here).

Via Microsoft.com Guarded Hosts, Shielded VMs and Key Protection Services

Encryption can be done in different locations ranging from data in flight or transit over networks (local and remote), as well as data at rest or while stored. Strength of encryption is determined by different hash and cipher codes algorithms including SHA among others ranging from simple to more complex. The encryption can be done by networks, servers, storage systems, hypervisors, operating systems, databases, email, word and many other tools at granularity from device, file systems, folder, file, database, table, object or blob.

Virtual machine and their virtual disks ( VHDX and VMDK) can be encrypted, as well as migration or movements such as vMotions among other activities. Here are some VMware vSphere encryption topics, along with deep dive previews from VMworld 2016 among other resources here, VMware hardening guides here (NSX, vSphere), and a VMware security white paper (PDF) here.

Other security-related items shown in Figure 10.11 include Lightweight Direct Access Protocol (LDAP), Remote Authentication Dial-In User Service (RADIUS), and Kerberos network authentication. Also shown are VPN along with Secure Socket Layer (SSL) network security, along with security and authentication keys, credentials for SSH remote access including SSO. The cloud shown in figure 10.11 could be your own private using AzureStack, VMware (on-site, or public cloud such as IBM or AWS), OpenStack among others, or a public cloud such as AWS, Azure or Google (among others).

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

There are many different aspects, as well as layers of security from logical to physical pertaining to data centers, applications and associated data Infrastructure resources, both on-premises and cloud. Security for legacy and software defined environments needs to be integrated as part of various technology domain focus areas, as well as across them including data protection. The above is a small sampling of security related topics with more covered in various chapters of SDDI Essentials as well as in my other books, webinars, presentations and content.

From a data protection focus, security needs to be addressed from a physical who has access to primary and protection copies, what is being protected against and where, as well as who can access logically protection copes, as well as the configuration, settings, certificates involved in data protection. In other words, how are you protecting your data protection environment, configuration and deployment. Data protection copies need to be encrypted to meet regulations, compliance and other requirements to guard against loss or theft, accidental or intentional. Likewise access control needs to be managed including granting of roles, security, authentication, monitoring of access, along with revocation.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 7 Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends

Ok, nuff said, for now.

November 26, 2017November 26, 2023

Data Protection Diaries Tools Technologies Toolbox Buzzword Bingo Trends

Fundamental Tools, Technologies, Toolbox, Buzzword Bingo Trends

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulz – www.storageioblog.com November 26, 2017

This is Part 7 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Click here to view the previous post Part 6 Data Protection Security Logical Physical Software Defined, and click here to view the next post Part 8 Walking The Data Protection Talk What I Do.

In this post the focus is around Data Protection related tools, technologies, trends as companion to other posts in this series, as well as across various chapters from the SDDI book.

Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends

There are many data Infrastructure related topics, technologies, tools, trends, techniques and tips that pertain to data protection, many of which have been covered in this series of posts already, as well as in the SDDI Essentials book, and elsewhere. The following are some additional related data Infrastructure data protection topics, tools, technologies.

Buzzword Bingo is a popular industry activity involving terms, trends, tools and more, read more here, here, and here. The basic idea of buzzword bingo is when somebody starts mentioning lots of buzzwords, buzz terms, buzz trends at some point just say bingo. Sometimes you will get somebody who asks what that means, while others will know, perhaps get the point to move on to what’s relevant vs. talking the talk or showing how current they are on industry activity, trends and terms.

Just as everything is not the same across different environments, there are various size and focus from hyper-scale clouds and managed service providers (MSP) server (and storage along with applications focus), smaller and regional cloud, hosting and MSPs, as well as large enterprise, small medium enterprise (SME), small medium business (SMB), remote office branch office (ROBO), small office home office (SOHO), prosumer, consumer and client or edge. Sometimes you will hear server vs. edge or client focus, thus context is important.

Data protection just like data infrastructures span servers, storage, I/O networks, hardware, software, clouds, containers, virtual, hypervisors and related topics. Otoh, some might view data protection as unique to a particular technology focus area or domain. For example, I once had backup vendor tell me that backups and data protection was not a storage topic, can you guess which vendor did not get recommend for data protection of data stored on storage?

Data gets protected to different target media, mediums or services including HDDs, SSD, tape, cloud, bulk and object storage among others in various format from native to encapsulated in save sets, zips, tar ball among others.

Bulk storage can be on-site, on-premises low-cost tape, disk (file, block or object) as well as off-site including cloud services such as AWS S3 (buckets and objects), Microsoft Azure (containers and blobs), Google among others using various Access ( Protocols, Personalities, Front-end, Back-end) technologies. Which type of data protection storage medium, location or service is best depends on what you are trying to do, along with other requirements.

SDDC SDDI data center data protection toolbox
Data Protection Toolbox

Figure 3.18 Generic Object (and Blob) architecture with Buckets (and Containers)

Object Storage

Before discussing Object Storage lets take a step back and look at some context that can clarify some confusion around the term object. The word object has many different meanings and context, both inside of the IT world as well as outside. Context matters with the term object such as a verb being a thing that can be seen or touched as well as a person or thing of action or feeling directed towards.

Besides a person, place or physical thing, an object can be a software defined data structure that describes something. For example, a database record describing somebody’s contact or banking information, or a file descriptor with name, index ID, date and time stamps, permissions and access control lists along with other attributes or metadata. Another example is an object or blob stored in a cloud or object storage system repository, as well as an item in a hypervisor, operating system, container image or other application.

Besides being a verb, object can also be a noun such as disapproval or disagreement with something or someone. From an IT context perspective, object can also refer to a programming method (e.g. object oriented programming [oop], or Java [among other environments] objects and class’s) and systems development in addition to describing entities with data structures.

In other words, a data structure describes an object that can be a simple variable, constant, complex descriptor of something being processed by a program, as well as a function or unit of work. There are also objects unique or with context to specific environments besides Java or databases, operating systems, hypervisors, file systems, cloud and other things.

Figure 3.19 AWS S3 Object storage example, objects left and descriptive names on right

The role of object storage (view more at www.objectstoragecenter.com) is to provide low-cost, scalable capacity, durable availability of data including data protection copies on-premises or off-site. Note that not all object storage solutions or services are the same, some are immutable with write once read many (WORM) like attributes, while others non-immutable meaning that they can be not only appended to, also updated to page or block level granularity.

Also keep in mind that some solutions and services refer to items being stored as objects while others as blobs, and the name space those are part of as a bucket or container. Note that context is important not to confuse an object container with a docker, kubernetes or micro services container.

Many applications and storage systems as well as appliances support as back-end targets cloud access using AWS S3 API (of AWS S3 service or other solutions), as well as OpenStack Switch API among others. There are also many open source and third-party tools for working with cloud storage including objects and blobs. Learn more about object storage, cloud storage at www.objectstoragecenter.com as well as in chapters 3, 4, 13 and 14 in SDDI Essentials book.

S3 Simple Storage Service

Simple Storage Service ( S3) is the Amazon Web Service (AWS) cloud object storage service that can be used for bulk and other storage needs. The S3 service can be accessed from within AWS as well as externally via different tools. AWS S3 supports large number of buckets and objects across different regions and availability zones. Objects can be stored in a hierarchical directory structure format for compatibility with existing file systems or as a simple flat name space.

Context is important with data protection and S3 which can mean the access API, or AWS service. Likewise context is important in that some solutions, software and services support S3 API access as part of their front-end (e.g. how servers or clients access their service), as well as a back-end target (what they can store data on).

Additional AWS S3 (service) and related resources include:

Data Infrastructure Environments and Applications

Data Infrastructure environments that need to be protected include legacy, software defined (SDDC, SDDI, SDS), cloud, virtual and container based, as well as clustered, scale-out, converged Infrastructure (CI), hyper-converged Infrastructure (HCI) among others. In addition to data protection related topics already converged in the posts in this series (as well as those to follow), a related topic is Data Footprint Reduction ( DFR). DFR comprises several different technologies and techniques including archiving, compression, compaction, deduplication (dedupe), single instance storage, normalization, factoring, zip, tiering and thin provisioning among many others.

Data Footprint Reduction (DFR) Including Dedupe

There is a long-term relationship with data protection and DFR in that to reduce the impact of storing more data, traditional techniques such as compression and compaction have been used, along with archive and more recently dedupe among others. In the Software Defined Data Infrastructure Essentials book there is an entire chapter on DFR ( chapter 11), as well as related topics in chapters 8 and 13 among others. For those interested in DFR and related topics, there is additional material in my books Cloud and Virtual Data Storage Networking (CRC Press), along with in The Green and Virtual Data Center (CRC Press), as well as various posts on StorageIOblog.com and storageio.com. Figure 11.4 is from Software Defined Data Infrastructure Essentials showing big picture of various places where DFR can be implemented along with different technologies, tools and techniques.

Figure 11.4 Various points of interest where DFR techniques and technology can be applied

Just as everything is not the same, there are different DFR techniques along with implementations to address various application workload and data performance, availability, capacity, economics (PACE) needs. Where is the best location for DFR that depends on your objectives as well as what your particular technology can support. However in general, I recommend putting DFR as close to where the data is created and stored as possible to maximize its effectiveness which can be on the host server. That however also means leveraging DFR techniques downstream where data gets sent to be stored or protected. In other words, a hybrid DFR approach as a companion to data protection should use various techniques, technologies in different locations. Granted, your preferred vendor might only work in a given location or functionality so you can pretty much guess what the recommendations will be ;) .

Tips, Recommendations and Considerations

Additional learning experiences along with common questions (and answers), appendices, as well as tips can be found here.

General action items, tips, considerations and recommendations include:

Everything is not the same; different applications with SLO, PACE, FTT, FTM needs
Understand the 4 3 2 1 data protection rule and how to implement it.
Balance rebuild performance impact and time vs. storage space overhead savings.
Use different approaches for various applications and environments.
What is best for somebody else may not be best for you and your applications.
You cant go forward in the future after a disaster if you cant go back
Data protection is a shared responsibility between vendors, service providers and yourself
There are various aspects to data protection and data Infrastructure management

Where To Learn More

Part 1 – Data Infrastructure Data Protection Fundamentals
Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
Part 5 – Point In Time Data Protection Granularity Points of Interest
Part 6 – Data Protection Security Logical Physical Software Defined
Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
Part 8 – Data Protection Diaries Walking Data Protection Talk
Part 9 – who’s Doing What ( Toolbox Technology Tools)
Part 10 – Data Protection Resources Where to Learn More
Data Protection Diaries series
Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
Software Defined Data Infrastructure Essentials (CRC 2017) Book

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

What This All Means

There are many different buzzword, buzz terms, buzz trends pertaining to data infrastructure and data protection. These technologies span legacy and emerging, software-defined, cloud, virtual, container, hardware and software. Key point is what technology is best fit for your needs and applications, as well as how to use the tools in different ways (e.g. skill craft techniques and tradecraft). Keep context in mind when looking at and discussing different technologies such as objects among others.

Ok, nuff said, for now.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

What About Those Not Mentioned?

What About New, Emerging, Trending and Trendy Technologies

Where to learn more

What this all means

Share this:

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash

Michael Dell and Silver Lake Continued Ownership

Where to learn more

What this all means

Share this:

June 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 6 (June 2018)

What this all means and wrap-up

Share this:

Solving Application Server Storage I/O Performance Bottlenecks Webinar

Where to learn more

What this all means

Share this:

Announcing Windows Server Summit Virtual Online Event

Windows Server Summit Break Out Tracks

Where to learn more

What this all means

Share this:

May 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 5 (May 2018)

May Buzzword, Buzz Topic and Trends

May NVMe Momentum Movement Activity

Software Defined Data Infrastructure Activity

What this all means and wrap-up

Share this:

March 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 3 (March 2018)

What this all means and wrap-up

Share this:

Data Protection Fundamental Topics Tools Techniques Technologies Tips

Data Infrastructures

Why The Need For Data Protection

Data Protection Fundamental Tradecraft Skills Experience Knowledge

Where To Learn More

What This All Means

Share this:

Reliability, Availability, Serviceability RAS Fundamentals

Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals

Data Protection Gaps and Air Gap

Fault / Failures To Tolerate (FTT)

Fault Tolerant Mode (FTM)

Fault / Failure Domains

Clustering

Where To Learn More

What This All Means

Share this:

Access Availability RAID Erasure Codes including LRC Deep Dive

Data Protection Access Availability RAID Erasure Codes

Erasure Codes (EC)

Reed Solomon (RS) codes

Local Reconstruction Codes (LRC)

Shingled Erasure Code (SHEC)

Replication and Mirroring

Where To Learn More

What This All Means

Share this:

Enabling Recovery Points (Backup, Snapshots, Versions)

Enabling RPO (Archive, Backup, CDP, PIT Copy, Snapshots, Versions)

Additional Data Protection Terms

Where To Learn More

What This All Means

Share this:

Data Protection Diaries Fundamental Point In Time Granularity

Point-in-Time Protection Granularity Points of Interest

Where To Learn More

What This All Means

Share this:

Data Infrastructure Data Protection Security Logical Physical

Security Logical Physical Software Defined

Where To Learn More

What This All Means

Share this:

Fundamental Tools, Technologies, Toolbox, Buzzword Bingo Trends