AWS Snowball Edge SBE Converged Cloud Storage Appliance

AWS Snowball Edge SBE Converged Cloud Storage Appliance

AWS Snowball Edge SBE Converged Cloud Storage Appliance

As part of extending their cloud platform reach, recent Amazon Web Services (AWS) announcements include AWS Snowball Edge SBE Converged Cloud Storage Appliance. Snowball Edge (SBE) has evolved from its previous focus as a data transfer, migration platform appliance to now include support for on-prem compute. SBE has previously been available as an appliance that ships from AWS to your location as a service to enable bulk data movement to the public cloud (e.g. AWS Simple Storage Service (S3) bucket). With this new capability, AWS is enabling SBE to support on-prem compute similar to Elastic Cloud Compute (EC2) cloud instances.

AWS Snowball Data Migration at PB scale
AWS Snowball Appliance Image via AWS.com

What is AWS Snowball

Snowball is a bulk physical data migration appliance that AWS ships to your location. You use Snowball by setting up a copy job with AWS, when the device arrives at your site, set it up, and enable the copy jobs to occur moving data from source to Snowball destination. Once data is copied, you ship the Snowball back to an AWS region and availability zone (AZ) where its contents are copied into a Simple Storage Service (S3) bucket of your choice. Once the copy job into an AWS S3 is complete, AWS performs a secure erase of the Snowball.

Basic Snowball includes 10 GbE network connections (RJ45 and SFP+ [fiber or copper]). Security and Encryption includes 256-bit keys that can be managed via AWS Key Management Service (KMS). Note that keys are not sent to or stored on the device for security during transit. For additional protection, tamper-resistant seals are included along with the Trusted Platform Module (TPM) to detect unauthorized hardware, firmware or software changes.

End to End tracking is enabled using E ink shipping labels and allow monitoring via AWS Simple Notification Service (SNS). Once your data transfer job completes along with verified, a software erasure of the SBE is performed by AWS following NIST media handling guidelines.

For management, SBE has an API for customer integration, as well as the ability to create and manage transfer jobs via the AWS management console. SBE Adapter also gives customers direct access to Snowball where it appears as an S3 endpoint (how you access the storage and data).

Backside view of AWS Snowball
Backside view of Snowball Image via Amazon.com

Additional Snowball Speeds and Specification Feature Feeds include:

  • Storage space capacity of 50TB (42TB usable) or 80TB (72TB usable)
  • Network connectivity 10 GbE RJ45 (Cat6), SFP+ (Copper and Optical). Cables include RJ45 and Copper SFP+. For Fiber attached Ethernet, the customer supplies their own SFP+ optical cables.
  • SBE is designed for office environments, as well as data centers (e.g., about 68db) and weigh about 47 pounds.
  • Power requirements include NEMA 5-15p (standard wall outlet) 100-200 volts with power cable included.

Note for traditional Snowball deployments an on-prem workstation or server is needed to copy data from source locations to the Snowball device.

How AWS Snowball and Snowball Edge work

How AWS Snowball Works

Referring to the image above, first step to using AWS Snowball (or Snowball Edge) is to place an order via AWS management console (A). Part of the ordering process involves setting up the data transfer job, and in the case of AWS Snowball Edge, defining the EC2 instance and image (read more about that here via AWS). After placing order and setup, the AWS Snowball arrives at your location (B), on-site setup is done and data transfer performed (C). Once data is transferred, the AWS Snowball is returned to designated AWS location via two day shipping (D) and data copied into your specified S3 or Glacier bucket (E). After your data is transferred into the S3 or Glacier bucket you specify as part of the transfer job, you are able to do what you want with your files, folders, images, videos, VHDX’s, VMDK’s, ISO, little data, big data.

What is AWS Snowball Edge

AWS has enhanced its Snowball Edge (SBE) data mobility, migration, and transport appliance to now also include compute. For those not familiar, Snowball is an appliance that comes in various sizes that you order from AWS, it shows up at your site, and then you copy your data to it for migration into AWS. Once data is copied, you return to AWS where the data then appears in your designated S3 bucket. From your S3 bucket, you can then move the data, files, volumes, images to other locations, use for standing up EC2 compute, populating databases or other items.

With the new compute feature, AWS is enabling compute on the snowball edge appliance functioning similar to EC2 instance, except that they are on your site. This means you can use the compute to run your own custom AMI’s (Amazon Machine Image) on site or on-prem in support of data migration, conversion or another process. You can also keep the appliance on-site for as long as you want, granted your credit card gets charged to support development, test, extended migration, or to have a converged, or, hyper-converged platform.

Note that with SBE having compute capability, you can now run an EC2 image that functions as your copy server eliminating the need to have a workstation or server on-prem for the copy operation.

Additional AWS Snowball Edge Speeds and feature function feeds include:

  • 100TB (82TB usable) storage space capacity
  • 10 GbE network, along with 10/25 GbE SFP28 and 40 GbE QSFP+ with device-based encryption (customer provided network cables)
  • Local computing with EC2 and Lambda functions for remote deployment along with scale-out clustering of multiple SBE’s
  • S3 compatible endpoint along with NFS endpoint (mount point) using both NFS v3 and v4.1.
  • Weighs about 50 pounds, tamper evident seals along with TPM similar to traditional Snowball along with detection of hardware, firmware or software changes.
  • Can exist in an office environment, or data center.
  • Power cables are included, NEMA 5-15p, 100-220 volts, 400 watts.

What is AWS Snowmobile

Need something with more capacity than an SBE? AWS has a more extensive version called Snowmobile that supports up to 100PB that is brought to your site via a 45-foot-long tractor-trailer truck. Both SBE and Snowmobile physically move data from your location to an AWS region availability zone (AZ) aka data center where it is placed into the Simple Storage Service (S3) or Glacier bucket of your choice. Once in the S3 or Glacier bucket, you can move the data to where ever you need it.

Why Snowball Edge and Snowmobile vs. Fast Networks

Some people ask why the need for services such as SBE and Snowmobile, or, physically shipping your SSDs, HDD’s, tape or other storage media to a cloud provider in the Internet era of fast networks. The reason can be quite simple; most environments do not have internet connection speeds of 10 GbE or higher that can be dedicated outside of regular use for data movement at scale.

Likewise, some public cloud service providers have limitations on the network speed of their front-end general-purpose Internet access.

Note that some such as AWS have high-speed, low latency direct connect services from partner staging locations. However, those too may be limited in speed for large bulk transfers. AWS also has other performance-enhanced services for general Internet access including S3 Transfer Acceleration. Note that Microsoft Azure has special connectivity options such as ExpressRoute, while Google Compute Platform (GCP) has Cloud Interconnect.

Is AWS SBE and CI, HCI, CiB or Appliance?

The answer to the question of if SBE is a Converged Infrastructure (CI), Hyper-Converged Infrastructure (HCI), Cloud in a Box (CiB) or Cloud Appliance depends on your view and definition of those deployment models. Some will argue that SBE is a CI or HCI as well as CiB based on what Cisco, Dell Technologies, HPE, Microsoft (Azure Stack and Windows S2D), NetApp, Nutanix, Pivot3 and VMware vSAN among others offerings.

On the other hand, some will argue that SBE is not the same as the above and others give it does not meet the definition of their CI, HCI, CiB or cloud appliance. What is important is not if CI, HCI, CiB or appliance, rather, what it can do, how it can adapt to your environment and work for you vs. you work for it. In other words, what is important is the enablement a solution provides vs. if it is CI, HCI, CIB or something else. Meanwhile watch to see who ignores SBE, who welcomes it to their market space, and who throws mud balls and fud balls at snowball.

When to use Snowball vs. Snowball Edge

If all you need is bulk data migration appliance using one of your servers or workstations for smaller amounts of data, traditional Snowball is a good fit. On the other hand, if you need to move more data, leverage SBE enabled on-prem compute with EC2 and Lambda functionality for short, or long-term duration, as well as scale-out to create a cluster, then SBE is for you. SBE is also a good fit for environments that need short-term, as well as the longer-term deployment of compute, storage and network (e.g., converged). For example, factory environments, rugged implementations on ships, energy exploration and processing, traveling venues and sporting events, distributed environments being consolidated among others.

AWS Regions, AZ locations
AWS Regions and AZ’s image Via AWS.com

What About AWS Snowball Edge Pricing

Pricing varies based on AWS region you are using for your transfer and management from. Another variable is if you are selecting data transfer only, or, enabling EC2 compute instance on-prem. Yet another pricing variable is how long you will keep the Snowball Edge on-prem. You are given ten (10) free days as part of your data transfer job along with days for shipping and return.

Beyond the ten free days, you will pay a daily rate that varies. The longer you keep the SBE on-prem, and for example commit to a one or three-year pre-pay, you will receive larger discounts. Also note that there are no data transfer fees for moving data into AWS. However, standard pricing applies once stored into AWS, or moved. Also note that standard AWS storage charges (e.g. S3, Glacier, along with API calls apply once data is stored).

As an example, data transfer only, the service fee for a data transfer job is USD 300 for the US and another non-Asia-Pacific (Singapore). Additional days are $30 each.

Another example is selecting data transfer plus EC2 compute instance which varies by region example is $500 for transfer job (US East Northern Virginia or Ohio), $50 a day extra fee. However, if you are will to pay up front for one year, the day fee drops to $42 (varies by region), and to $35 a day for a three commitment.

For some environments, it may cost less to buy a server with storage, set it up and manage, while for others, the simplicity of a turnkey converged platform may be more cost-effective along with better value. Learn more about AWS Snowball Edge pricing here.

Where to learn more

Learn more about AWS, Snowball Edge, Cloud and data infrastructures related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

Has AWS embraced hybrid public cloud and on-prem computing? IMHO while AWS is making it easier for environments to use, access as well as move to public cloud, they are still focused on the public cloud as the destination. In other words, AWS is making it easy to move your data and applications to their services as well as access them with AWS Snowball Edge SBE Converged Cloud Storage Appliance.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Google Cloud Platform GCP announced new Los Angeles Region

Google Cloud Platform GCP announced new Los Angeles Region

Google Cloud Platform GCP announced new Los Angeles Region

Google Cloud Platform GCP announced new Los Angeles Region

Google Cloud Platform (GCP) has announced a new Los Angeles Region (e.g., uswest-2) with three initial Availability Zones (AZ) also known as data centers. Keep in mind that a region is a geographic area that is made up of two or more AZ’s. Thus, a region has multiple data centers for availability, resiliency, durability.

The new GCP uswest-2 region is the fifth in the US and seventh in the Americas. GCP regions (and AZ’s) in the Americas include Iowa (us-central1), Montreal Quebec Canada (northamerica-northeast1), Northern Virginia (us-east4), Oregon (us-west1), Los Angeles (us-west2), South Carolina (us-east1) and Sao Paulo Brazil (southamerica-east1). View other Geographies as well as services including Europe and the Asia-Pacific here.

How Does GCP Compare to AWS and Azure?

The following are simple graphical comparisons of what Amazon Web Services (AWS) and Microsoft Azure currently have deployed for regions and AZ’s across different geographies. Note, each region may have a different set of services available so check your cloud providers notes as to what is currently available at various locations.

Google Cloud Compute Platform regions
Google Compute Platform Locations (Regions and AZ’s) Image via Google.com

AWS Regions, AZ locations
AWS Regions and AZ’s image Via AWS.com

Microsoft Azure Cloud Region Locations
Microsoft Azure Regions and AZ’s image Via Azure.com

Where to learn more

Learn more about data infrastructures and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

Google continues to evolve its public cloud platform (GCP) both regarding geographical global physical locations (e.g., regions and AZ’s), also regarding feature, function, extensibility. By adding a new Los Angeles (e.g. uswest-2) Region and three AZ’s within it, Google is providing a local point of presence for data infrastructure intense (server compute, memory, I/O, storage) applications such as those in media, entertainment, high performance compute, aerospace among others in the southern California region.  Overall, Google Cloud Platform GCP announced new Los Angeles Region is good to see not only new features being added to GCP but also physical points of presences.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

Here is the 2018 Hot Popular New Trending Data Infrastructure Vendors To Watch which includes startups as well as established vendors doing new things. This piece follows last year’s hot favorite trending data infrastructure vendors to watch list (here), as well as who will be top of storage world in a decade piece here.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch
Data Infrastructures Support Information Systems Applications and Their Data

Data Infrastructures are what exists inside physical data centers and cloud availability zones (AZ) that are defined to provide traditional, as well as cloud services. Cloud and legacy data infrastructures are combined by hardware (server, storage, I/O network), software along with management tools, policies, tradecraft techniques (skills), best practices to support applications and their data. There are different types of data infrastructures to meet the needs of various environments that range in size, scope, focus, application workloads, along with Performance and capacity.

Another important aspect of data infrastructures is that they exist to protect, preserve, secure and serve applications that transform data into information. This means that availability and Data Protection including archive, backup, business continuance (BC), business resiliency (BR), disaster recovery (DR), privacy and security among other related topics, technology, techniques, and trends are essential data infrastructure topics.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch
Different timelines of adoption and deployment for various audiences

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

Some of those on this year’s list are focused on different technology areas, while others on size or types of vendors, suppliers, service providers. Others on the list are focused on who is new, startup, evolving, or established which varies from if you are an industry insider or IT customer environment. Meanwhile others new and some are established doing new things, mix of some you may not have heard of for those who want or need to have the most current list to rattle off startups for industry adoption (and deployment), as well as what some established players are doing that might lead to customer deployment (and adoption).

AMD – The AMD EPYC family of processors is opening up new opportunities for AMD to challenge Intel among others for a more significant share of the general-purpose compute market in support of data center and data infrastructure markets. An advantage that AMD has and is playing to in the industry speeds feeds, slots and watts price performance game is the ability to support more memory and PCIe lanes per socket than others including Intel. Keep in mind that PCIe lanes will become even more critical as NVMe deployment increases, as well as the use of GPU’s and faster Ethernet among other devices. Name brand vendors including Dell and HPE among others have announced or are shipping AMD EPYC based processors.

Aperion – Cloud and managed service provider with diverse capabilities.

Amazon Web Services (AWS) – Continues to expand its footprint regarding regions, availability zones (AZ) also known as data centers in regions, as well as some services along with the breadth of those capabilities. AWS has recently announced a new Snowball Edge (SBE) which in the past has been a data migration appliance now enhanced with on-prem Elastic Cloud Compute (EC2) capabilities. What this means is that AWS can put on-prem compute capabilities as part of a storage appliance for short-term data movement, migration, conversion, importing of virtual machines and other items.

On the other hand, AWS can also be seen as using SBE as a first entry to placing equipment on-prem for hybrid clouds, or, converged infrastructure (CI), hyper-converged infrastructure (HCI), cloud in a box similar to Microsoft Azure Stack, as well as CI/HCI solutions from others.

My prediction near term, however, is that CI/HCI vendors will either ignore SBE, downplay it, create some new marketing on why it is not CI/HCI or fud about vendor lock-in. In other words, make some popcorn and sit back, watch the show.

Backblaze – Low-cost, high-capacity cloud storage for backup and archiving provider known for their quarterly disk drive reliability ratings (or failure) reports. They have been around for a while, have a good reputation among those who use their services for being a low-cost alternative to the larger providers.

Barefoot networks – Some of you may already be aware of or following Barefoot Networks, while others may not have heard of them outside of the networking space. They have some impressive capabilities, are new, you probably have not heard of them, thus an excellent addition to this list.

Cloudian – Continue to evolve and no longer just another object storage solution, Cloudian has been expanding via organic technology development, as well as acquisitions giving them a broad portfolio of software-defined storage and tiering from on-prem to the cloud, block, file and object access.

Cloudflare – Not exactly a startup, some of you may know or are using Cloudflare, while to others, their role as a web cache, DNS, and other service is transparent. I have been using Cloudflare on my various sites for over a year, and like the security, DNS, cache and analytics tools they provide as a customer.

Cobalt Iron – For some, they might be new, Software-defined Data protection and management is the name of the game over at Cobalt Iron which has been around a few years under the radar compared to more popular players. If you have or are involved with IBM Tivoli aka TSM based backup and data protection among others, check out the exciting capabilities that Cobalt can bring to the table.

CTERA – Having been around for a while, to some they might not be a startup, on the other hand, they may be new to others while offering new data and file management options to others.

DataCore – You might know of DataCore for their software-defined storage and past storage hypervisor activity. However, they have a new piece of software MaxParallel that boost server storage I/O performance. The software installs on your Windows Server instance (bare metal, VM, or cloud instance) and shows you performance with and without acceleration which you can dynamically turn off and off.

DataDirect Networks (DDN) – Recently acquired Lustre assets from Intel, now picking up the storage startup Tintri pieces after it ceased operations. What this means is that while beefing up their traditional High-Performance Compute (HPC) and Super Compute (SC) focus, DDN is also expanding into broader markets.

Dell Technologies – At its recent Dell Technology World event in Las Vegas during late April, early May 2018, several announcements were made, including some tied to emerging Gen-Z along with composability. More recently, Dell Technologies along with VMware announced business structure and finance changes. Changes include VMware declaring a dividend, Dell Technologies being its largest shareholder will use proceeds to fund restricting and debt service. Read more about VMware and Dell Technology business and financial changes here.

Densify – With a name like Densify no surprise they propose to drive densification and automation with AI-powered deep learning to optimize application resource use across on-prem software-defined virtual as well as cloud instances and containers.

FlureDB – If you are into databases (SQL or NoSQL), as well as Blockchain or distributed ledgers, check out FlureDB.

Innovium.com – When it comes to data infrastructure and data center networking, Innovium is probably not on your radar, however, keep an eye on these folks and their TERALYNX switching silicon to see where it ends up given their performance claims.

Komprise – File, and data management solutions including tiering along with partners such as IBM.

Kubernetes – A few years ago OpenStack, then Docker containers was the favorite and trending discussion topic, then Mesos and along comes Kubernetes. It’s safe to say, at least for now, Kubernetes is settling in as a preferred open source industry and customer defecto choice (I want to say standard, however, will hold off on that for now) for container and related orchestration management. Besides, do it yourself (DiY) leveraging open source, there are also managed AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Services (AKS), Google Kubernetes Engine (GKE), and VMware Pivotal Container Service (PKS) among others. Besides Azure, Microsoft also includes Kubernetes support (along with Docker and Windows containers) as part of Windows Servers.

ManageEngine (part of Zoho) – Has data infrastructure monitoring technology called OpManager for keeping an eye on networking.

Marvel – Marvel may not be a familiar name (don’t confuse with comics), however, has been a critical component supplier to partners whose server or storage technology you may be familiar with or have yourself. Server, Storage, I/O Networking chip maker has closed on its acquisition of Cavium (who previously bought Qlogic among others). The combined company is well positioned as a key data infrastructure component supplier to various partners spanning servers, storage, I/O networking including Fibre Channel (FC), Ethernet, InfiniBand, NVMe (and NVMeoF) among others.

Mellanox – Known for their InfiniBand adapters, switches, and associated software, along with growing presence in RDMA over Converged Ethernet (RoCE), they are also well positioned for NVMe over Fabrics among other growth opportunities following recent boardroom updates, along with technology roadmap’s.

Microsoft – Azure public cloud continues to evolve similarly to AWS with more region locations, availability zone (AZ) data centers, as well as features and extensions. Microsoft also introduced about a year ago its hybrid on-prem CI/HCI cloud in a box platform appliance Azure Stack (read about my test drive here). However, there is more to Microsoft than just their current cloud first focus which means Windows (desktop), as well as Server, are also evolving. Currently, in public preview, Windows Server 2019 insiders build available to try out many new capabilities, some of which were covered in the recent free Microsoft Virtual Summit held in June. Key themes of Windows Server 2019 include security, performance, hybrid cloud, containers, software-defined storage and much more.

Microsemi – Has been around for a while is the combination of some vendors you may not have heard of or heard about in some time including PMC-Sierra (acquired Adaptec) and Vitesse among others. The reason I have Microsemi on this list is a combination of their acquisitions which might be an indicator of whom they pick up next. Another reason is that their components span data infrastructure topics from servers, storage, I/O and networking, PCIe and many more.

NVIDIA – GPU high performance compute and related compute offload technologies have been accessible for over a decade. More recently with new graphics and computational demands, GPU such as those NVIDIA are in need. Demand includes traditional graphics acceleration for physical and virtual, augmented and virtual reality, as well as cloud, along with compute-intensive analytics, AI, ML, DL along with other cognitive workloads.

NGDSystems (NGD) – Similar to what NVIDIA and other GPU vendors do for enabling compute offload for specific applications and workloads, NGD is working on a variation. That variation is to move offload compute capabilities for the server I/O storage-intensive workloads closer, in fact into storage system components such as SSDs and emerging SCMs and PMEMs. Unlike GPU based applications or workloads that tend to be more memory and compute intensive, NGD is positioned for applications that are the server I/O and storage intensive.

The premise of NGD is that they move the compute and application closer to where the data is, eliminating extra I/O, as well as reducing the amount of main server memory and compute cycles. If you are familiar with other server storage I/O offload engines and systems such as Oracle Exadata database appliance NGD is working at a tighter integration granularity. How it works is your application gets ported to run on the NGD storage platform which is SSD based and having a general-purpose processor. Your application is initiated from a host server, where it then runs on the NGD meaning I/Os are kept local to the storage system. Keep in mind that the best I/O is the one that you do not have to do, the second best is the one with the least resource or user impact.

Opvisor – Performance activity and capacity monitoring tools including for VMware environments.

Pavillon – Startup with an interesting NVMe based hardware appliance.

Quest – Having gained their independence as a free-standing company since divestiture from Dell Technologies (Dell had previously acquired Quest before EMC acquisition), Quest continues to make their data infrastructure related management tools available. Besides now being a standalone company again, keep an eye on Quest to see how they evolve their existing data protection and data infrastructure resource management tools portfolio via growth, acquisition, or, perhaps Quest will be on somebody else’s future growth list.

Retrospect – Far from being a startup, after gaining their independence from when EMC bought them several years ago, they have since continued to enhance their data protection technology. Disclosure, I have been a Retrospect customer since 2001 using it for on-site, as well as cloud data protection backups to the cloud.

Rubrik – Becoming more of a data infrastructure household name given their expanding technology portfolio and marketing efforts. More commonly known in smaller customer environments, as well as broadly within industry insider circles, Rubrik has potential with continued technology evolution to move further upmarket similar to how Commvault did back in the late 90s, just saying.

SkyScale – Cloud service provider that offers dedicated bare metal, as well as private, hybrid cloud instances along with GPU to support AI, ML, DL and other high performance,  compute workloads.

Snowflake – The name does not describe well what they do or who they are. However, they have an interesting cloud data warehouse (old school) large-scale data lakes (new school) technologies.

Strongbox – Not to be confused with technology such as those from Iosafe (e.g., waterproof, fireproof), Strongbox is a data protection storage solution for storing archives, backups, BC/BR/DR data, as well as cloud tiering. For those who are into buzzword bingo, think cloud tiering, object, cold storage among others. The technology evolved out of Crossroads and with David Cerf at the helm has branched out into a private company with keeping an eye on.

Storbyte – With longtime industry insider sales and marketing pro-Diamond Lauffin (formerly Nexsan) involved as Chief Evangelist, this is worth keeping an eye on and could be entertaining as well as exciting. In some ways it could be seen as a bit of Nexsan meets NVme meets NAND Flash meets cost-effective value storage dejavu play.

Talon – Enterprise storage and management solutions for file sharing across organizations, ROBO and cloud environments.

Ubitqui – Also known as UBNT is a data infrastructure networking vendor whose technologies span from WiFi access points (AP), high-performance antennas, routing, switching and related hardware, along with software solutions. UBNT is not as well-known in more larger environments as a Cisco or others. However, they are making a name for themselves moving from the edge to the core. That is, working from the edge with AP and routers, firewalls, gateways for the SMB, ROBO, SOHO as well as consumer (I have several of their APs, switches, routers and high-performance antennas along with management software), these technologies are also finding their way into larger environments. 

My first use of UBNT was several years ago when I needed to get an IP network connection to a remote building separated by several hundred yards of forest. The solution I found was to get a pair of UBNT NANO Apps, put them in secure bridge mode; now I have a high-performance WiFi service through a forest of trees. Since then have replaced an older Cisco router, several Cisco, and other APs, as well as the phased migration of switches.

UpdraftPlus– If you have a WordPress web or blog site, you should also have a UpdraftPlus plugin (go premium btw) for data protection. I have been using Updraft for several years on my various sites to backup and protect the MySQL databases and all other content. For those of you who are familiar with Spanning (e.g., was acquired by EMC then divested by Dell) and what they do for cloud applications, UpdraftPlus does similar for lower-end, smaller cloud-based applications.

Vexata – Startup scale out NVMe storage solution.

VMware – Expanding their cloud foundation from on-prem to in and on clouds including AWS among others. Data Infrastructure focus continues to expand from core to edge, server, storage, I/O, networking. With recent Dell Technologies and VMware declaring a dividend, should be interesting to see what lies ahead for both entities.

What About Those Not Mentioned?

By the way, if you were wondering about or why others are not in the above list, simple, check out last year’s list which includes Apcera, Blue Medora, Broadcom, Chelsio, Commvault, Compuverde, Datadog, Datrium, Docker, E8 Storage, Elastifile, Enmotus, Everspin, Excelero, Hedvig, Huawei, Intel, Kubernetes, Liqid, Maxta, Micron, Minio, NetApp, Neuvector, Noobaa, NVIDA, Pivot3, Pluribus Networks, Portwork, Rozo Systems, ScaleMP, Storpool, Stratoscale, SUSE Technology, Tidalscale, Turbonomic, Ubuntu, Veeam, Virtuozzo and WekaIO. Note that many of the above have expanded their capabilities in the past year and remain, or have become even more interesting to watch, while some might be on the future where are they now list sometime down the road. View additional vendors and service providers via our industry links and resources page here.

What About New, Emerging, Trending and Trendy Technologies

Bitcoin and Blockchain storage startups, some of which claim or would like to replace cloud storage taking on giants such as AWS S3 in the not so distant future have been popping up lately. Some of these have good and exciting stories if they can deliver on the hype along with the premise. A couple of names to drop include among others Filecoin, Maidsafe, Sia, Storj along with services from AWS, Azure, Google and a long list of others.

Besides Blockchain distributed ledgers, other technologies and trends to keep an eye on include compute processes from ARM to SoC, GPU, FPGA, ASIC for offload and specialized processing. GPU, ASIC, and FPGA are appearing in new deployments across cloud providers as they look to offload processing from their general servers to derive total effective productivity out of them. In other words, innovating by offloading to boost their effective return on investment (old ROI), as well as increase their return on innovation (the new ROI).

Other data infrastructure server I/O which also ties into storage and network trends to watch include Gen-Z that some may claim as the successor to PCIe, Ethernet, InfiniBand among others (hint, get ready for a new round of “something is dead” hype). Near-term the objective of Gen-Z is to coexist, complement PCIe, Ethernet, CPU to memory interconnect, while enabling more granular allocation of data infrastructure resources (e.g., composability). Besides watching who is part of the Gen-Z movement, keep an eye on who is not part of it yet, specifically Intel.

NVMe and its many variations from a server internal to networked NVMe over Fabrics (NVMeoF) along with its derivatives continue to gain both industry adoption, as well as customer deployment. There are some early NVMeoF based server storage deployments (along with marketing dollars). However, the server side NVMe customer adoption is where the dollars are moving to the vendors. In other words, it’s still early in the bigger broader NVMe and NVMeoF game.

Where to learn more

Learn more about data infrastructures and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

Let’s see how those mentioned last year as well as this year, along with some new and emerging vendors, service providers who did not get said end up next year, as well as the years after that.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch
Different timelines of adoption and deployment for various audiences

Keep in mind that there is a difference between industry adoption and customer deployment, granted they are related. Likewise let’s see who will be at the top in three, five and ten years, which means some of the current top or favorite vendors may or may not be on the list, same with some of the established vendors. Meanwhile, check out the 2018 Hot Popular New Trending Data Infrastructure Vendors to Watch.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash.

Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash
Image via Dell Technologies

Summary of Dell transaction announcement includes:

  • VMware declares an $11 Billion USD cash dividend pro rata to all VMware stock holders.
  • Given ownership percentage of VMware, Dell Technologies will receive approximately $9 Billion USD cash dividend.
  • Dell plans to list its Class C common stock shares on the New York Stock Exchange (NYSE).
  • Dell plans to use the VMware dividend proceeds to fund cash consideration to be paid to Class V (tracking stock) shareholders.
  • For each Class V share (e.g. VMware tracking stock) shareholders can choose to receive:

    1.3665 shares of Dell Technologies Class C common stock, or
    $109 in cash per DVMT (Class V share) a 29% premium per share

Dell Announces Class V VMware Tracking Stock exchange for stock or cash
Image via Dell Technologies

Additional interest points of this transaction include:

  • Transaction expected to close Q4 CY2018, subject to Class V shareholder approval.
  • VMware maintains its independence as a separate publicly traded company.
  • Dell Technologies maintains its 81% ownership of VMware common stock
    Dell Technologies Class V (DVMT) shareholders will own 20.8% to 31.0% of Dell Class C (depending on cash election amounts).
  • Streamline Dell capital and ownership structure.
  • Establishes a public security (stock) in global end to end data infrastructure provider (e.g. Dell Technologies Stock on NYSE).
  • Enables financial flexibility for future strategic initiatives

Dell Announces Class V VMware Tracking Stock exchange for stock or cash
Image via Dell Technologies

Michael Dell and Silver Lake Continued Ownership

As part of this transaction, both Michael Dell and Silver Lake partners announce commitment to Dell Technologies. Michael Dell will continue to serve as Chairman and CEO, along with a committed stockholder beneficially owning between about 47% to 54% of Dell Technologies on a fully diluted basis. Silver Lake equity partners, an investor in Dell will continue its long-term partnership with Michael Dell beneficially owning between about 16%-18% of Dell Technologies on a fully diluted basis.

Where to learn more

Learn more about Dell Technologies, VMware, Data Infrastructures and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

This announcement enables Dell to streamline its financial structure, while providing VMware shareholder with a dividend value. In addition, this Dell Technologies announcement puts to rest industry discussions of what will Michael Dell along with Dell Technologies and VMware do in the future. Speaking of the future, this transaction could also page the wave for future investment or acquisitions by Dell and/or VMware. Now the question is if you are a DVMT tracking stock shareholder, do you take the $109 USD cash, or, new Class C Dell Technologies stock? Now lets see how Dell Technologies Announces Class V VMware Tracking Stock exchange for stock or cash plays out during the rest of summer and into the fall.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

June 2018 Server StorageIO Data Infrastructure Update Newsletter

June 2018 Server StorageIO Data Infrastructure Update Newsletter

June 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 6 (June 2018)

Hello and welcome to the June 2018 Server StorageIO Data Infrastructure Update Newsletter.

In cased you missed it, the May 2018 Server StorageIO Data Infrastructure Update Newsletter can be viewed here (HTML and PDF).

In this issue buzzwords topics include AI, All Flash, HPC, Lustre, Multi Cloud, NVMe, NVMeoF, SAS, and SSD among others:

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

June data infrastructure, server, storage, I/O network, hardware, software, cloud, converged, and container as well as data protection industry activity includes among others:

Check out what’s new at Amazon Web Services (AWS) here, as well as Microsoft Azure here, Google Cloud Compute here, IBM Softlayer here, and OVH here. CTERA announced new cloud storage gateways (HC Series) for enterprise environments that include all flash SSD options, capacity up to 96TB (raw), Petabyte scale tiering to public and private cloud, 10 Gbe Ethernet connectivity, virtual machine deployment, along with high availability configuration.

Cray announced enhancements to its Lustre (parallel file system) based ClusterStor storage system for high performance compute (HPC) along with it previously acquired from Seagate (Who had acquired it as part of the Xyratex acquisition). New enhancements for ClusterStor include all flash SSD solution that will integrate and work with our existing hard disk drive (HDD) based systems.

In related Lustre based activity, DataDirect Network (DDN) has acquired from Intel, their Lustre File system capability. Intel acquired its Lustre capabilities via its purchase of Whamcloud back in 2012, and in 2017 announced that it was getting out of the Lustre business (here and here). DDN also announced new storage solutions for enabling HPC environment workloads along with Artificial Intelligence (AI) centric applications.

HPE which held its Discover event announced a $4 Billion USD investment over four years pertaining to development of edge technologies and services including software defined WAN (SD-WAN) and security among others.

Microsoft held its first virtual Windows Server Summit in June that outlined current and future plans for the operating system along with its hybrid cloud future.

Penguin computing has announced the Accelion solution for accessing geographically dispersed data enabling faster file transfer or other data movement functions.

SwiftStack has added multi cloud features (enhanced search, universal access, policy management, automation, data migration) and making them available via 1space open source project. 1space enables a single object namespace across different object storage locations including integration with OpenStack Swift.

Vexata announced a new version of its Vexata operating system (VX-OS) for its storage solution including NVMe over Fabric (NVMe-oF) support.

Speaking of NVMe and fabrics, the Fibre Channel Industry Association (FCIA) announced that the International Committee on Information Technology Standards (INCITS) has published T11 technical committee developed  Fibre Channel over NVMe (FC-NVMe) standard.

NVMe frontend NVMeoF
Various NVMe front-end including NVMeoF along with NVMe back-end devices (U.2, M.2, AiC)

Keep in mind that there are many different facets to NVMe including direct attached (M.2, U.2/8639, PCIe AiC) along with fabrics. Likewise, there are various fabric options for the NVMe protocol including over Fibre Channel (FC-NVMe), along with other NVMe over Fabrics including RDMA over Converged Ethernet (RoCE) as well as IP based among others. NVMe can be used as a front-end on storage systems supporting server attachment (e.g. competes with Fibre Channel, iSCSI, SAS among others).

Another variation of NVMe is as a back-end for attachment of drives or other NVMe based devices in storage systems, as well as servers. There is also end to end NVMe (e.g. both front-end and back-end) options. Keep context in mind when you hear or talk about NVMe and in particular, NVMe over fabrics, learn more about NVMe at https://storageioblog.com/nvme-place-volatile-memory-express/.

Toshiba announced new RM5 series of high capacity SAS SSDs for replacement of SATA devices in servers. The RM5 series being added to the Toshiba portfolio combine capacity and economics traditional associated with SATA SSDs along with performance as well as connectivity of SAS.

Check out other industry news, comments, trends perspectives here.

Data Infrastructure Server StorageIO Comments Content

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via SearchStorage: Comments The storage administrator skills you need to keep up today
Via SearchStorage: Comments Managing storage for IoT data at the enterprise edge
Via SearchCloudComputing: Comments Hybrid cloud deployment demands a change in security mindset

View more Server, Storage and I/O trends and perspectives comments here.

Data Infrastructure Server StorageIOblog posts

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

Announcing Windows Server Summit Virtual Online Event
May 2018 Server StorageIO Data Infrastructure Update Newsletter
Solving Application Server Storage I/O Performance Bottlenecks Webinar
Have you heard about the new CLOUD Act data regulation?
Data Protection Recovery Life Post World Backup Day Pre GDPR
Microsoft Windows Server 2019 Insiders Preview
Which Enterprise HDD for Content Server Platform
Server Storage I/O Benchmark Performance Resource Tools
Introducing Windows Subsystem for Linux WSL Overview
Data Infrastructure Primer Overview (Its Whats Inside The Data Center)
If NVMe is the answer, what are the questions?

View other recent as well as past StorageIOblog posts here

Server StorageIO Recommended Reading (Watching and Listening) List

Software-Defined Data Infrastructure Essentials SDDI SDDC

In addition to my own books including Software Defined Data Infrastructure Essentials (CRC Press 2017) available at Amazon.com (check out special sale price), the following are Server StorageIO data infrastructure recommended reading, watching and listening list items. The Server StorageIO data infrastructure recommended reading list includes various IT, Data Infrastructure and related topics including Intel Recommended Reading List (IRRL) for developers is a good resource to check out. Speaking of my books, Didier Van Hoye (@WorkingHardInIt) has a good review over on his site you can view here, also check out the rest of his great content while there.

Watch for more items to be added to the recommended reading list book shelf soon.

Data Infrastructure Server StorageIO event activities

Events and Activities

Recent and upcoming event activities.

July 25, 2018 – Webinar – Data Protect & Storage

June 27, 2018 – Webinar – App Server Performance

June 26, 2018 – Webinar – Cloud App Optimize

May 29, 2018 – Webinar – Microsoft Windows as a Service

April 24, 2018 – Webinar – AWS and on-site, on-premises hybrid data protection

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Data Infrastructure Recommend Reading and watching list
Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows
storageio.com/links – Various industry links (over 1,000 with more to be added soon)
objectstoragecenter.com – Cloud and object storage topics, tips and news items
OpenStack.org – Various OpenStack related items
storageio.com/downloads – Various presentations and other download material
storageio.com/protect – Various data protection items and topics
thenvmeplace.com – Focus on NVMe trends and technologies
thessdplace.com – NVM and Solid State Disk topics, tips and techniques
storageio.com/converge – Various CI, HCI and related SDS topics
storageio.com/performance – Various server, storage and I/O benchmark and tools
VMware Technical Network – Various VMware related items

What this all means and wrap-up

Data Infrastructures are what exists inside physical data centers as well as spanning cloud, converged, hyper-converged, virtual, serverless and other software defined as well as legacy environments. NVMe continues to gain in industry adoption as well as customer deployment. Cloud adoption also continues along with multi-cloud deployments. Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter and watch for more NVMe,cloud, data protection among other topics in future posts, articles, events, and newsletters.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Solving Application Server Storage I/O Performance Bottlenecks Webinar

Solving Application Server Storage I/O Performance Bottlenecks Webinar

Performance Bottlenecks Webinar

Solving Application Server Storage I/O Performance Bottlenecks Webinar

The best I/O is the one you do not have to do, the second best is the one with least server I/O and storage overhead along with application performance bottleneck impact.

Fast applications need fast servers, storage, I/O networking hardware, and software. Merely throwing more hardware as a cache at application performance bottlenecks can help. However, throwing more hardware at performance problems can also cost much cash. On the other hand, a little fast memory and storage in the right place, with robust software performance acceleration can have significant application productivity benefits. Fast hardware also needs fast software to help boost application and user productivity.

As application workloads activity increases, by implementing server software, performance acceleration along with additional fast memory and storage including flash, Storage Class Memories (SCM) among other SSD along with NVMe accessed devices, even more, work can be done boosting productivity while reducing cost.

Join me on June 27 at 1 PM Pacific Time (PT) when I host a free webinar (registration required) sponsored by DataCore and produced by Redmond Magazine/1105 Media as we discuss Solving Application Server Storage I/O Performance Bottlenecks including what you can do today.

I will be joined by guest presenters Augie Gonzalez, Director Technical Product Marketing and Tim Warden, Director Engineering Product Management both from DataCore. During the interactive webinar discussions, we invite you to participate with your questions, as we look at issues, challenges, various approaches, and what you can do today to boost different application performance and productivity.

This webinar is for those whose applications have the need for speed including database, VDI, SharePoint, Exchange, AI, ML and other I/O intensive workloads. Topics that we will be discussing in addition to your questions include:

  • Boosting application performance without breaking the bank
  • Improving application productivity and reducing user wait time
  • Gaining insight and awareness into bottlenecks and what to do
  • Unlocking value in your existing hardware and software licenses
  • What you can do today, literally right after or even during this webinar

Where to learn more

Learn more about Windows Server Summit and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

The best I/O is the one you do not have to do, the second best is the one that has the least impact on your applications, while boosting user productivity. There are many differ net approaches to addressing various server storage I/O performance bottlenecks across applications. Join me on June 27, 2018 at 1 PM PT for the free webinar Solving Application Server Storage I/O Performance Bottlenecks Webinar and learn what you can do today to boost your uses productivity.

 

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Announcing Windows Server Summit Virtual Online Event

Announcing Windows Server Summit Virtual Online Event

Dell Technology World 2018 Announcement Summary

Announcing Windows Server Summit Virtual Online Event

Microsoft will be hosting a free (no registration required) half day virtual (e.g. online) Windows Server Summit Virtual Online Event June 26, 2018 starting at 9AM PT. As part of its continued focus on supporting hybrid strategy spanning on-premises Windows Server to Azure (among others including AWS) cloud based, Microsoft is preparing for the launch later this year of Windows Server 2019.

There is no registration required, you can just show up without concern of getting email or other spam, however you can also click here to save the date, as well as here to get updates on the event.

Microsoft Windows Server LTSC and SAC release

Windows Server 2019 is now in insider preview (get it here) and is the next Long Term Service Channel (LTSC) release following Windows Server 2016. In the past, Microsoft would have called Windows Server 2019 something such as Windows Server 2016 R2, however that has changed with the new Semiannual Channel (SAC) and LTSC release cycles.

Keynote kick off presentations will be from Erin Chapple, Director of Program Management, Cloud + AI (which includes Windows Kernel, Hypervisors, Containers and Storage), Arpan Shah, General Manager of Azure Infrastructure marketing (Windows Server, Azure IaaS, Azure Stack, Azure Management and Security), and, Jeff Woosley Principal PM, Windows Server. In addition to the kick off presentations with current state and status of Windows Servers available for on-premises bare metal, virtual, container as well as cloud, there will be demos, Q&A, roadmap’s and much more. Topics will include new and recent functionalities such as Windows Server 2019, Windows Admin Center (formerly known as Honolulu), IoT, roadmap’s and much more.

Windows Server Summit HybridWindows Server Summit SecurityWindows Server Summit HCIWindows Server Summit Application Development
Images Via Microsoft Windows Server Summit Page

Windows Server Summit Break Out Tracks

During the Windows Server Summit, there will be four technology focused tracks including:

  • Hybrid – From on-premisess to Azure, how Windows Server supports different workloads in various configurations, along with associated management tools (including Windows Admin Center aka Honolulu)
  • Security – New and recent security enhancements for Windows Server along with Hyper-V and other related topics.
  • Application Platform – Containers and Linux support along with associated management tools for on-premisess and Azure.
  • Hyper-converged infrastructure (HCI) – Leveraging software defined storage (SDS) with Storage Spaces Direct (S2D) in Windows Server 2016, along with Hyper-V and other technologies, learn how Microsoft supports HCI and beyond.

Where to learn more

Learn more about Windows Server Summit and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

Windows Server remains relevant today for traditional, on site, on-premises, as well as on-premisess along with cloud, container among other deployments. Remember to click here to save the date, click here to sign up for Windows Server Summit updates and learn more about the Windows Server Summit Virtual Online event here, see there, or at least virtually.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

May 2018 Server StorageIO Data Infrastructure Update Newsletter

May 2018 Server StorageIO Data Infrastructure Update Newsletter

May 2018 Server StorageIO Data Infrastructure Update Newsletter

Volume 18, Issue 5 (May 2018)

Hello and welcome to the May 2018 Server StorageIO Data Infrastructure Update Newsletter.

In cased you missed it, the April 2018 Server StorageIO Data Infrastructure Update Newsletter can be viewed here (HTML and PDF).

May has been a busy month with a lot of data infrastructure related activity from software-defined virtual, cloud, container, converged, serverless to legacy, hardware, software, services, server, storage, I/O and networking along with data protection topics among others.

In this issue buzzwords topics include GDPR, NVMe, NVMeoF, Composable, Serverless, Data Protection, SCM, Gen-Z, MaaS:

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

May has been a busy month, some data infrastructure, server, storage, I/O network, hardware, software, cloud, converged, and container as well as data protection activity includes among others:

Depending on when you read this, the new global data protection regulations (GDPR) are either days away, or already in effect. For those who are not aware of GDPR other than seeing many inbox items in your email pertaining to it, here are some resources as a refresher or primer:

May Buzzword, Buzz Topic and Trends

Besides data protection and GDPR, other recent data infrastructure related news, trends, technologies and topics to keep an eye on (besides AI, ML, DL, AR/VR, IoT, Blockchain, Serverless) include Metal as a Service (MaaS) that might be familiar to some, for others, something new. Canonical has been busy for sometime now with MaaS including in Ubuntu and they are not alone with variations appearing with various managed service providers, hosting and cloud providers as well. NVMe has become a more common topic, technology, trend including for use in servers as well as over fabrics (e.g. NVMe over Fabrics) as a language for server, storage, I/O communication.

A new emerging companion to NVMe is Gen-Z which initially is a companion to PCIe. Longer term, Gen-Z could maybe possibly be a replacement, as well as for use accessing direct random access memory (DRAM) among other uses. Storage Class Memory (SCM) has been an industry conversation topic for several years now with new persistent memories (PMEM) that combine the best of traditional DRAM (Speed and write endurance) as well as persistent, higher capacity, lower cost of traditional NAND flash SSDs.

Another trend topic is that for some, ASIC, FPGA and GPU are new companions to standard commodity compute processors along with servers, yet for others it may be Dejavu as they have been being used for years (ok, decades) in some solutions. For now, two other buzzwords, buzz terms to add or refresh your data infrastructure vocabulary include distributed ledgers (aka blockchains), composable resources and ephemeral instance storage (storage on a cloud instance).

May NVMe Momentum Movement Activity

May saw a lot of NVMe related activity, from chips and components (adapters, devices) to systems spanning direct attached to NVMe over Fabric (NVMeoF). Here is a primer (or refresh) for NVMe along with various deployment options. NVMeoF includes RDMA over Converged Ethernet (RoCE) based, along with NVMe over Fibre Channel (FC-NVMe), as well as emerging NVMe over IP.

NVMe options
NVMe being used for front-end accessed via shared PCIe along with back-end devices

There are many different facets of NVMe including for use as a front-end on storage systems supporting server attachment (e.g. competes with Fibre Channel, iSCSI, SAS among others). Another variation of NVMe is as a back-end for attachment of drives or other NVMe based devices in storage systems, as well as servers.

NVMe backend
Front-end using traditional block SAN access with back-end NVMe, SAS and SATA devices

Read more about the many different options and variations of NVMe including key questions to ask or understand, deployment topology along with other related topics at thenvmeplace.com.

NVMe frontend NVMeoF
Various NVMe front-end including NVMeoF along with NVMe back-end devices (U.2, M.2, AiC)

Software Defined Data Infrastructure Activity

Amazon Web Services (AWS) continues to add new features, functionality as well as extending those as along with existing capabilities into various regions. Some recent updates include new Elastic Cloud Compute (EC2) Microsoft Windows Servers versions 1709 and 1803 Amazon Machine Images (AMIs). Other AWS updates include spot instances support for Red Hat BYOL (Bring Your Own License), VPN enhancements, X1e instances available in Frankfurt, H1 instance price reduction, as well as LightSail now in Canada, Paris, and Seoul regions.

For those who are not familiar with LightSail, they are virtual private servers (VPS) which are different from traditional EC2 instances. LightSail can be a cost-effective way for those who need to move out of general population shared hosting, yet cannot justify a full EC2 instance while requiring more than a container.

The LightSail instance also is available with various software pre-installed such as for WordPress websites among others. For example, I have used LightSail as a backup and standby WordPress site for StorageIOblog using Updraft Plus  Pro for data protection.

In other news, AWS C5d EC2 instances are available in various regions. C5d instances are available with 2, 4, 8, 16, 36 and 72 vCPUs along with up to 1800GB of NVMe based ephemeral storage for on-demand reserved or spot instances.

Note that instance-based storage is temporary meaning that it persists for the life of the instance. What this means is that if you stop and restart the instance, the data is not persistence. Instance-based storage is useful for data that can be protected or persisted to other storage including EBS (Elastic Block Storage). Usage includes batch, log and analytics processing, burst buffers, cache or workspace.

AWS also announced a new Simple Storage Service (S3) storage class a month or so ago called One Zone Availability Infrequent Access. This new storage class primarily provides a lower cost of storage with lower durability (e.g., data spread across one zone vs. multiple). Over the past couple of months, I have been migrating from S3 Infrequent Access (IA) as well as standard into One Zone Availability. Some of my active data remains in S3 Standard storage class, while cold archives are in Glacier.

A tip about migrating to One Zone Availability, as well as between other S3 storage classes is paid attention to your API calls and monthly budget. You might see an increase in S3 costs during the migration time, that then settles into the lower prices once data has been moved due to API calls (gets, puts, lists, dir). In other words, pay attention to how many API calls you are allowed per storage class per month, along with other fees beyond focusing only on cost per TByte. Read about other recent AWS news updates here.

Software-defined storage startup Cloudian announced their technology available for test drive on Google Cloud Platform as part of a continued industry trend. That trend is for storage vendors to make their storage software technology available on different cloud platforms such as AWS, Azure, Google, Softlayer among others.

Dell Technology World 2018

Dell Technologies made several announcements as part of Dell Technologies World that are covered in a series of posts here. Announcements included PowerMax the successor to VMAX, XtremIO X2 updates, new servers, workstations among many other items, read more here.

Besides the data infrastructure, cloud service providers and systems vendors, component suppliers including Cavium announced NVMe over Fibre Channel updates (here and here), along with Marvel NVMe updates here. HPE announced new thin clients and software (t430 Thin Client, HP mt44 Mobile Thin Client, HP ThinPro software), as well as updates to 3PAR and other storage solutions.

IBM announced various storage enhancements (and here) as well as a Happy 30th anniversary to the IBM Power9 based i systems. In other news, Kaseya bought backup data protection vendor Unitrends.

NVMe NAND flash Intel Optane

Micron announced the first quad layer cell (QLC) nand flash solid state device (SSD) named 52100 has begun shipping to select customers (and vendors). QLC packs or stacks 4 bits per cell. The 5200 is optimized for read-intensive workloads with up to 33% higher densities compared to previous generation TLC (triple layer cell) NAND flash. Broader market availability is expected to occur later fall 2018, 5210 form factor is 2.5” as a standard SSD or HDD, with capacities from 1.92TB to 7.68TB.

In other news, Micron also announced a $10 Billion (USD) stock repurchase plan, along with an extension of Intel 3D NAND flash memory partnership involving 3D NAND flash, as well as 96 layer 3D NAND. Meanwhile, various vendors are increasingly talking about how their systems are or will be storage class memory (SCM) ready including for use such as Micron 3D XPoint also known as Intel Optane among others.

Microsoft has placed into public preview Azure Active Directory (AAD) Storage authentication for Azure Blobs and Queues. Azure Storage Explorer is now released as version 1.0. AAD storage authentication enables organizations to implement role-based access control of Azure storage resources. Speaking of Azure, Microsoft has published several architectures, reference and other content at the Azure Virtual Datacenter portal here.

If you have not done so, check out Azure File Sync which is currently in public preview. Having been involved and using it for over a year including during private preview, Azure File Sync is an exciting, useful technology for creating a hybrid distributed file sharing with cloud tiering solutions. Learn more Azure File Sync here and here. In other news, Microsoft has announced a preview as part of the April 2018 Windows 10 build for a Hyper-V Google Android emulator support.

NetApp has had Azure based NAS storage in preview for a while now, and also announced Cloud Volumes on Google Cloud Platform (GCP). In addition to Cloud Volumes on AWS, Azure, and GCP, NetApp also announced enhanced NVMe based storage systems among other updates.

Two companies that have similar names are Opendrives (video workflow acceleration) and Opendrive (cloud storage, backup, and data protection). Meanwhile, data infrastructure startup Pavilion has received new funding as well as begun talking about their NVMe including NVMe over Fabric (NVMeOF) hardware storage system. Long-time data infrastructure converged server storage startup Pivot3 announced additional cloud workload mobility.

Pure storage made a couple of announcements including  FlashArray//X NVMe based shared accelerated storage system as well as NVIDIA (GPU powered) based AIRI Mini for AI/DL/ML.

Have you heard about Snowflake computing, aka, the cloud data warehouse solution? If not, check them out here. Another cloud-related data infrastructure vendor to look into is Upbound.io who have received additional funding for their multi-cloud management solutions.

Building off of recent VMware vSphere updates (here), and Dell Technology World here, the following is an excellent post about Instant Clone in vSphere 6.7, and VMware vSAN HCI assessment tool here.

Check out other industry news, comments, trends perspectives here.

Data Infrastructure Server StorageIO Comments Content

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via SearchStorage: Comments Managing storage for IoT data at the enterprise edge
Via SearchCloudComputing: Comments Hybrid cloud deployment demands a change in security mindset
Via SearchStorage: Comments Dell EMC storage IPO, VMware merger plans still unclear
Via SearchStorage: Comments Dell EMC midrange storage keeps its overlapping arrays
Via SearchStorage: Comments Dell EMC all-flash PowerMax replaces VMAX, injects NVMe
Via IronMountain InfoGoto:  The growing Trend of Secondary Data Storage

View more Server, Storage and I/O trends and perspectives comments here.

Data Infrastructure Server StorageIOblog posts

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

Dell Technology World 2018 Announcement Summary
Part II Dell Technology World 2018 Modern Data Center Announcement Details
Part III Dell Technology World 2018 Storage Announcement Details
Part IV Dell Technology World 2018 PowerEdge MX Gen-Z Composable Infrastructure
Part V Dell Technology World 2018 Server Converged Announcement Details
April 2018 Server StorageIO Data Infrastructure Update Newsletter
VMware vSphere vSAN vCenter version 6.7 SDDC Update Summary
PCIe Fundamentals Server Storage I/O Network Essentials
Have you heard about the new CLOUD Act data regulation?
Data Protection Recovery Life Post World Backup Day Pre GDPR
Microsoft Windows Server 2019 Insiders Preview
Application Data Value Characteristics Everything Is Not The Same
Data Infrastructure Resource Links cloud data protection tradecraft trends
IT transformation Serverless Life Beyond DevOps Podcast
Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips
Introducing Windows Subsystem for Linux WSL Overview
Data Infrastructure Primer Overview (Its Whats Inside The Data Center)
If NVMe is the answer, what are the questions?

View other recent as well as past StorageIOblog posts here

Server StorageIO Recommended Reading (Watching and Listening) List

Software-Defined Data Infrastructure Essentials SDDI SDDC

In addition to my own books including Software Defined Data Infrastructure Essentials (CRC Press 2017) available at Amazon.com (check out special sale price), the following are Server StorageIO data infrastructure recommended reading, watching and listening list items. The Server StorageIO data infrastructure recommended reading list includes various IT, Data Infrastructure and related topics including Intel Recommended Reading List (IRRL) for developers is a good resource to check out. Speaking of my books, Didier Van Hoye (@WorkingHardInIt) has a good review over on his site you can view here, also check out the rest of his great content while there.

Containers, serverless, kubernetes continue to gain in industry adoption, as well as customer deployments. Here is some information about Microsoft Azure Kubernetes Service (AKS). Note that AWS has Elastic Kubernetes Service (EKS), Google, VMware and Pivotal with Pivotal Kubernetes Service (PKS) among others.

Here is an interesting perspective by Ben Kepps about Serverless (e.g. life beyond Kubernetes and containers (e.g. life beyond virtualization which to some is or was life (e.g. life beyond bare metal))) as well as the all to often punditry, evangelism of something new causing something else to be dead.

SNIA has updated their Emerald aka Green energy effectiveness (focus on productivity) measurement specification (V3.01) including NAS NFS file activity (besides block). Learn more at snia.org/forums/green.

Watch for more items to be added to the recommended reading list book shelf soon.

Data Infrastructure Server StorageIO event activities

Events and Activities

Recent and upcoming event activities.

June 27, 2018 – Webinar – TBA

May 29, 2018 – Webinar – Microsoft Windows as a Service

April 24, 2018 – Webinar – AWS and on-site, on-premises hybrid data protection

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Data Infrastructure Recommend Reading and watching list
Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows
storageio.com/links – Various industry links (over 1,000 with more to be added soon)
objectstoragecenter.com – Cloud and object storage topics, tips and news items
OpenStack.org – Various OpenStack related items
storageio.com/downloads – Various presentations and other download material
storageio.com/protect – Various data protection items and topics
thenvmeplace.com – Focus on NVMe trends and technologies
thessdplace.com – NVM and Solid State Disk topics, tips and techniques
storageio.com/converge – Various CI, HCI and related SDS topics
storageio.com/performance – Various server, storage and I/O benchmark and tools
VMware Technical Network – Various VMware related items

Connect and Converse With Us

Storage IO RSS storageio linkedin storageio facebook Server StorageIO on twitter @StorageIO   Google+  Server StorageIO email storageio youtube  storageio instagram

Subscribe to Newsletter – Newsletter Archives StorageIO.comStorageIOblog.com

What this all means and wrap-up

Data Infrastructures are what exists inside physical data centers spanning cloud, converged, hyper-converged, virtual, serverless and other software defined as well as legacy environments. So far this spring there has been a lot of data infrastructure related activity, from new technology announcements, to events, trends among others. Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter and watch for more NVMe, Gen-Z, cloud, data protection among other topics in future posts, articles, events, and newsletters.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

VMware continues cloud construction with March announcements

VMware continues cloud construction with March announcements

VMware continues cloud construction sddc

VMware continues cloud construction with March announcements of new features and other enhancements.

VMware continues cloud construction SDDC data infrastructure strategy big picture
VMware Cloud Provides Consistent Operations and Infrastructure Via: VMware.com

With its recent announcements, VMware continues cloud construction adding new features, enhancements, partnerships along with services.

VMware continues cloud construction, like other vendors and service providers who tried and test the waters of having their own public cloud, VMware has moved beyond its vCloud Air initiative selling that to OVH. VMware which while being a public traded company (VMW) is by way of majority ownership part of the Dell Technologies family of company via the 2016 acquisition of EMC by Dell. What this means is that like Dell Technologies, VMware is focused on providing solutions and services to its cloud provider partners instead of building, deploying and running its own cloud in competition with partners.

VMware continues cloud construction SDDC data infrastructure strategy layers
VMware Cloud Data Infrastructure and SDDC layers Via: VMware.com

The VMware Cloud message and strategy is focused around providing software solutions to cloud and other data infrastructure partners (and customers) instead of competing with them (e.g. divesting of vCloud Air, partnering with AWS, IBM Softlayer). Part of the VMware cloud message and strategy is to provide consistent operations and management across clouds, containers, virtual machines (VM) as well as other software defined data center (SDDC) and software defined data infrastructures.

In other words, what this means is VMware providing consistent management to leverage common experiences of data infrastructure staff along with resources in a hybrid, cross cloud and software defined environment in support of existing as well as cloud native applications.

VMware continues cloud construction on AWS SDDC

VMware Cloud on AWS Image via: AWS.com

Note that VMware Cloud services run on top of AWS EC2 bare metal (BM) server instances, as well as on BM instances at IBM softlayer as well as OVH. Learn more about AWS EC2 BM compute instances aka Metal as a Service (MaaS) here. In addition to AWS, IBM and OVH, VMware claims over 4,000 regional cloud and managed service providers who have built their data infrastructures out using VMware based technologies.

VMware continues cloud construction updates

Building off of previous announcements, VMware continues cloud construction with enhancements to their Amazon Web Services (AWS) partnership along with services for IBM Softlayer cloud as well as OVH. As a refresher, OVH is what formerly was known as VMware vCloud air before it was sold off.

Besides expanding on existing cloud partner solution offerings, VMware also announced additional cloud, software defined data center (SDDC) and other software defined data infrastructure environment management capabilities. SDDC and Data infrastructure management tools include leveraging VMwares acquisition of Wavefront among others.

VMware Cloud Updates and New Features

  • VMware Cloud on AWS European regions (now in London, adding Frankfurt German)
  • Stretch Clusters with synchronous replication for cross geography location resiliency
  • Support for data intensive workloads including data footprint reduction (DFR) with vSAN based compression and data de duplication
  • Fujitsu services offering relationships
  • Expanded VMware Cloud Services enhancements

VMware Cloud Services enhancements include:

  • Hybrid Cloud Extension
  • Log intelligence
  • Cost insight
  • Wavefront

VMware Cloud in additional AWS Regions

As part of service expansion, VMware Cloud on AWS has been extended into European region (London) with plans to expand into Frankfurt and an Asian Pacific location. Previously VMware Cloud on AWS has been available in US West Oregon and US East Northern Virginia regions. Learn more about AWS Regions and availability zones (AZ) here.

VMware Cloud Stretch Cluster

VMware Cloud on AWS Stretch Clusters Source: VMware.com

VMware Cloud on AWS Stretch Clusters

In addition to expanding into additional regions, VMware Cloud on AWS is also being extended with stretch clusters for geography dispersed protection. Stretched clusters provide protection against an AZ failure (e.g. data center site) for mission critical applications. Build on vSphere HA and DRS  automated host failure technology, stretched clusters provide recovery point objective zero (RPO 0) for continuous protection, high availability across AZs at the data infrastructure layer.

The benefit of data infrastructure layer based HA and resiliency is not having to re architect or modify upper level, higher up layered applications or software. Synchronous replication between AZs enables RPO 0, if one AZ goes down, it is treated as a vSphere HA event with VMs restarted in another AZ.

vSAN based Data Footprint Reduction (DFR) aka Compression and De duplication

To support applications that leverage large amounts of data, aka data intensive applications in marketing speak, VMware is leveraging vSAN based data footprint reduction (DFR) techniques including compression as well as de duplication (dedupe). Leveraging DFR technologies like compression and dedupe integrated into vSAN, VMware Clouds have the ability to store more data in a given cubic density. Storing more data in a given cubic density storage efficiency (e.g. space saving utilization) as well as with performance acceleration, also facilitate storage effectiveness along with productivity.

With VMware vSAN technology as one of the core underlying technologies for enabling VMware Cloud on AWS (among other deployments), applications with large data needs can store more data at a lower cost point. Note that VMware Cloud can support 10 clusters per SDDC deployment, with each cluster having 32 nodes, with cluster wide and aware dedupe. Also note that for performance, VMware Cloud on AWS leverages NVMe attached Solid State Devices (SSD) to boost effectiveness and productivity.

VMware Hybrid Cloud Extension

Extending VMware vSphere any to any migration across clouds Source: VMware.com

VMware Hybrid Cloud Extension

VMware Hybrid Cloud Extension enables common management of common underlying data infrastructure as well as software defined environments including across public, private as well as hybrid clouds. Some of the capabilities include enabling warm VM migration across various software defined environments from local on-premises and private cloud to public clouds.

New enhancements leverages previously available technology now as a service for enterprises besides service providers to support data center to data center, or cloud centric AZ to AZ, as well as region to region migrations. Some of the use cases include small to large bulk migrations of hundreds to thousands of VM move and migrations, both scheduling as well as the actual move. Move and migrations can span hybrid deployments with mix of on-premises as well as various cloud services.

VMware Cloud Cost Insight

VMware Cost Insight enables analysis, compare cloud costs across public AWS, Azure and private VMware clouds) to avoid flying blind in and among clouds. VMware Cloud cost insight enables awareness of how resources are used, their cost and benefit to applications as well as IT budget impacts. Integrates vSAN sizer tool along with AWS metrics for improved situational awareness, cost modeling, analysis and what if comparisons.

With integration to Network insight, VMware Cloud Cost Insight also provides awareness into networking costs in support of migrations. What this means is that using VMware Cloud Cost insight you can take the guess-work out of what your expenses will be for public, private on-premisess or hybrid cloud will be having deeper insight awareness into your SDDC environment. Learn more about VVMware Cost Insight here.

VMware Log Intelligence

Log Intelligence is a new VMware cloud service that provides real-time data infrastructure insight along with application visibility from private, on-premises, to public along with hybrid clouds. As its name implies, Log Intelligence provides syslog and other log insight, analysis and intelligence with real-time visibility into VMware as well as AWS among other resources for faster troubleshooting, diagnostics, event correlation and other data infrastructure management tasks.

Log and telemetry input sources for VMware Log Intelligence include data infrastructure resources such as operating systems, servers, system statistics, security, applications among other syslog events. For those familiar with VMware Log Insight, this capability is an extension of that known experience expanding it to be a cloud based service.

VMware Wavefront SaaS analytics
Wavefront by VMware Source: VMware.com

VMware Wavefront

VMware Wavefront enables monitoring of cloud native high scale environments with custom metrics and analytics. As a reminder Wavefront was acquired by VMware to enable deep metrics and analytics for developers, DevOps, data infrastructure operations as well as SaaS application developers among others. Wavefront integrates with VMware vRealize along with enabling monitoring of AWS data infrastructure resources and services. With the ability to ingest, process, analyze various data feeds, the Wavefront engine enables the predictive understanding of mixed application, cloud native data and data infrastructure platforms including big data based.

Where to learn more

Learn more about VMware, vSphere, vRealize, VMware Cloud, AWS (and other clouds), along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

VMware continues cloud construction. For now, it appears that VMware like Dell Technologies is content on being a technology provider partner to large as well as small public, private and hybrid cloud environments instead of building their own and competing. With these series of announcements, VMware continues cloud construction enabling its partners and customers on their various software defined data center (SDDC) and related data infrastructure journeys. Overall, this is a good set of enhancements, updates, new and evolving features for their partners as well as customers who leverage VMware based technologies. Meanwhile VMware continues cloud construction.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

World Backup Day 2018 Data Protection Readiness Reminder

World Backup Day 2018 Data Protection Readiness Reminder

server storage I/O trends

It’s that time of year again, World Backup Day 2018 Data Protection Readiness Reminder.

In case you have forgotten, or were not aware, this coming Saturday March 31 is World Backup (and recovery day). The annual day is a to remember to make sure you are protecting your applications, data, information, configuration settings as well as data infrastructures. While the emphasis is on Backup, that also means recovery as well as testing to make sure everything is working properly.

data infrastructure data protection

Its time that the focus of world backup day should expand from just a focus on backup to also broader data protection and things that start with R. Some data protection (and backup) related things, tools, tradecraft techniques, technologies and trends that start with R include readiness, recovery, reconstruct, restore, restart, resume, replication, rollback, roll forward, RAID and erasure codes, resiliency, recovery time objective (RTO), recovery point objective (RPO), replication among others.

data protection threats ransomware software defined

Keep in mind that Data Protection is a broader focus than just backup and recovery. Data protection includes disaster recovery DR, business continuance BC, business resiliency BR, security (logical and physical), standard and high availability HA, as well as durability, archiving, data footprint reduction, copy data management CDM along with various technologies, tradecraft techniques, tools.

data protection 4 3 2 1 rule and 3 2 1 rule

Quick Data Protection, Backup and Recovery Checklist

  • Keep the 4 3 2 1 or shorter older 3 2 1 data protection rules in mind
  • Do you know what data, applications, configuration settings, meta data, keys, certificates are being protected?
  • Do you know how many versions, copies, where stored and what is on or off-site, on or off-line?
  • Implement data protection at different intervals and coverage of various layers (application, transaction, database, file system, operating system, hypervisors, device or volume among others)
  • data infrastructure backup data protection

  • Have you protected your data protection environment including software, configuration, catalogs, indexes, databases along with management tools?
  • Verify that data protection point in time copies (backups, snapshots, consistency points, checkpoints, version, replicas) are working as intended
  • Make sure that not only are the point in time protection copies running when scheduled, also that they are protected what’s intended
  • data infrastructure backup data protection

  • Test to see if the protection copies can actually be used, this means restoring as well as accessing the data via applications
  • Watch out to prevent a disaster in the course of testing, plan, prepare, practice, learn, refine, improve
  • In addition to verifying your data protection (backup, bc, dr) for work, also take time to see how your home or personal data is protected
  • View additional tips, techniques, checklist items in this Data Protection fundamentals series of posts here.

storageio data protection toolbox

Where To Learn More

View additional Data Infrastructure Data Protection and related tools, trends, technology and tradecraft skills topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

You can not go forward if you can not go back to a particular point in time (e.g. recovery point objective or RPO). Likewise, if you can not go back to a given RPO, how can you go forward with your business as well as meet your recovery time objective (RTO)?

data protection restore rto rpo

Backup is as important as restore, without a good backup or data protection point in time copy, how can you restore? Some will say backup is more important than recovery, however its the enablement that matters, in other words being able to provide data protection and recover, restart, resume or other things that start with R. World backup day should be a reminder to think about broader data protection which also means recovery, restore and realizing if your copies and versions are good. Keep the above in mind and this is your World Backup Day 2018 Data Protection Readiness Reminder.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

IT transformation Serverless Life Beyond DevOps with New York Times CTO Nick Rockwell Podcast

IT transformation Serverless Life Beyond DevOps with New York Times CTO Nick Rockwell Podcast

server storage I/O data infrastructure trends

By Greg Schulzwww.storageioblog.com November 30, 2017

In this Server StorageIO podcast episode New York Times CTO / CIO Nick Rockwell (@nicksrockwell) joins me for a conversation discussing Digital, Business and IT transformation, Serverless Life Beyond DevOps and related topics.

In our conversation we discuss challenges with metrics, understanding value vs. cost particular for software, Nicks perspective as both a CIO and CTO of the New York Times, importance of IT being involved and understanding the business vs. just being technology focused. We also discuss the bigger broader opportunity of serverless (aka micro services, containers) life beyond DevOps and how higher level business logic developers can benefit from the technology instead of just a DevOps for infrastructure focus. Buzzwords, buzz terms and themes include datacenter technologies, NY Times, data infrastructure, management, trends, metrics, digital transformation, tradecraft skills, DevOps, serverless among others.

.Gregs Server StorageIO Podcast

Check out Nicks post The Futile Resistance to Serverless here. Listen to the podcast discussion here (MP3 16 minutes and 50 seconds) as well as on iTunes here.

Where to learn more

Learn more about Oracle, Database Performance, Benchmarking along with other tools via the following links:

What this all means and wrap-up

Check out my discussion here (MP3) with Nick Rockwell as we discuss IT and business transition, metrics, software development, and serverless life beyond DevOps. Also available on 

Ok, nuff said, for now…

Cheers
Gs

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips

Data Protection Fundamental Topics Tools Techniques Technologies Tips

Update 1/16/2018

Data protection fundamental companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulzwww.storageioblog.com November 26, 2017

This is Part I of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Software Defined Data Protection Fundamental Infrastructure Essentials Book SDDC

The focus of this series is around data protection fundamental topics including Data Infrastructure Services: Availability, RAS, RAID and Erasure Codes (including LRC) ( Chapter 9), Data Infrastructure Services: Availability, Recovery Point ( Chapter 10). Additional Data Protection related chapters include Storage Mediums and Component Devices ( Chapter 7), Management, Access, Tenancy, and Performance ( Chapter 8), as well as Capacity, Data Footprint Reduction ( Chapter 11), Storage Systems and Solutions Products and Cloud ( Chapter 12), Data Infrastructure and Software-Defined Management ( Chapter 13) among others.

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

Posts in this data protection fundamental series include:

SDDC, SDI, SDDI data infrastructure
Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Infrastructures

Data Infrastructures exists to support business, cloud and information technology (IT) among other applications that transform data into information or services. The fundamental role of data infrastructures is to provide a platform environment for applications and data that is resilient, flexible, scalable, agile, efficient as well as cost-effective.

Put another way, data infrastructures exist to protect, preserve, process, move, secure and serve data as well as their applications for information services delivery. Technologies that make up data infrastructures include hardware, software, or managed services, servers, storage, I/O and networking along with people, processes, policies along with various tools spanning legacy, software-defined virtual, containers and cloud. Read more about data infrastructures (its what’s inside data centers) here.

Why SDDC SDDI Need Data Protection
Various Needs Demand Drivers For Data Protection Fundamentals

Why The Need For Data Protection

Data Protection encompasses many different things, from accessibility, durability, resiliency, reliability, and serviceability ( RAS) to security and data protection along with consistency. Availability includes basic, high availability ( HA), business continuance ( BC), business resiliency ( BR), disaster recovery ( DR), archiving, backup, logical and physical security, fault tolerance, isolation and containment spanning systems, applications, data, metadata, settings, and configurations.

From a data infrastructure perspective, availability of data services spans from local to remote, physical to logical and software-defined, virtual, container, and cloud, as well as mobile devices. Figure 9.2 shows various data infrastructure availability, accessibility, protection, and security points of interest. On the left side of Figure 9.2 are various data protection and security threat risks and scenarios that can impact availability, or result in a data loss event ( DLE), data loss access ( DLA), or disaster. The right side of Figure 9.2 shows various techniques, tools, technologies, and best practices to protect data infrastructures, applications, and data from threat risks.

SDDI SDDC Data Protection Fundamental Big Picture
Figure 9.2 Various threat vectors, issues, problems, and challenges that drive the need for data protection

A fundamental role of data infrastructures (and data centers) is to protect, preserve, secure and serve information when needed with consistency. This also means that the data infrastructure resources (servers, storage, I/O networks, hardware, software, external services) and the applications (and data) they combine and are defined to protect are also accessible, durable and secure.

Data Protection topics include:

  • Maintaining availability, accessibility to information services, applications and data
  • Data include software, actual data, metadata, settings, certificates and telemetry
  • Ensuring data is durable, consistent, secure and recoverable to past points in time
  • Everything is not the same across different environments, applications and data
  • Aligning techniques and technologies to meet various service level objectives ( SLO)

Data Protection Fundamental Tradecraft Skills Experience Knowledge

Tools, technologies, trends are part of Data Protection, so to are the techniques of knowing (e.g. tradecraft) what to use when, where, why and how to protect against various threats risks (challenges, issues, problems).

Part of what is covered in this series of posts as well as in the Software Defined Data Infrastructure (SDDI) Essentials book is tradecraft skills, tips, experiences, insight into what to use, as well as how to use old and new things in new ways.

This means looking outside the technology box towards what is that you need to protect and why, then knowing how to use different skills, experiences, techniques part of your tradecraft combined with data protection toolbox tools. Read more about tradecraft here.

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Everything is not the same across environments, data centers, data infrastructures and applications.

Likewise everything is and does not have to be the same when it comes to Data Protection. Data protection fundamentals encompasses many different hardware, software, services including cloud technologies, tools, techniques, best practices, policies and tradecraft experience skills (e.g. knowing what to use when, where, why and how).

Since everything is not the same, various data protection approaches are needed to address various application performance availability capacity economic ( PACE) needs, as well as SLO and SLAs.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 2 Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Data Protection Diaries Reliability, Availability, Serviceability RAS Fundamentals

Reliability, Availability, Serviceability RAS Fundamentals

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulzwww.storageioblog.com November 26, 2017

This is Part 2 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Software Defined Data Infrastructure Essentials Book SDDC

Click here to view the previous post Part 1 Data Infrastructure Data Protection Fundamentals, and click here to view the next post Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC.

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

In this post the focus is around Data Protection availability from Chapter 9 which includes access, durability, RAS, RAID and Erasure Codes (including LRC), mirroring and replication along with related topics.

SDDC, SDI, SDDI data infrastructure
Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals

Reliability, Availability Serviceability (RAS) and other access availability along with Data Protection topics are covered in chapter 9. A resilient data infrastructure (software-defined, SDDC and legacy) protects, preserves, secures and serves information involving various layers of technology. These technologies enable various layers ( altitudes) of functionality, from devices up to and through the various applications themselves.

SDDI SDDC Data Protection Big Picture
Figure 9.2 Various threat issues and challenges that drive the need for data protection

Some applications need a faster rebuild, while others need sustained performance (bandwidth, latency, IOPs, or transactions) with the slower rebuild; some need lower cost at the expense of performance; others are ok with more space if other objectives are meet. The result is that since everything is different yet there are similarities, there is also the need to tune how data Infrastructure protects, preserves, secures, and serves applications and data.

General reliability, availability, serviceability, and data protection functionality includes:

  • Manually or automatically via policies, start, stop, pause, resume protection
  • Adjust priorities of protection tasks, including speed, for faster or slower protection
  • Fast-reacting to changes, disruptions or failures, or slower cautious approaches
  • Workload and application load balancing (performance, availability, and capacity)

RAS can be optimized for:

  • Reduced redundancy for lower overall costs vs. resiliency
  • Basic or standard availability (leverage component plus)
  • High availability (use better components, multiple systems, multiple sites)
  • Fault-tolerant with no single points of failure (SPOF)
  • Faster restart, restore, rebuild, or repair with higher overhead costs
  • Lower overhead costs (space and performance) with lower resiliency
  • Lower impact to applications during rebuild vs. faster repair
  • Maintenance and planned outages or for continues operations

Common availability Data Protection related terms, technologies, techniques, trends and topics pertaining to data protection from availability and access to durability and consistency to point in time protection and security are shown below.

Data Protection Gaps and Air Gap

There are Good Data Protection Gaps that provide recovery points to a past time enabling recoverability in the future to move forward. Another good data protection gap is an Air Gap that isolates protection copies off-site or off-line so that they can not be tampered with enabling recovery from ransomware and other software defined threats. There are Bad data protection gaps including gaps in coverage where data is not protected or items are missing. Then there are Ugly data protecting gaps which include Bad gaps that result in what you think is protected are not and finding that your copies are bad when it is too late.

Data Protection Gaps Good Bad Ugly
Data Protection Gaps Good Bad and Ugly

The following figure shows good data protection gaps including recovery points (point in time protection) along with air gaps.

Good Data Protection Gaps
Figure 9.9 Air Gaps and Data Protection

Fault / Failures To Tolerate (FTT)

FTT is how many faults or failures to tolerate for a given solution or service which in turn determines what mode of protection, or fault tolerant mode ( FTM) to use.

Fault Tolerant Mode (FTM)

FTM is the mode or technique used to enable resiliency and protect against some number of faults.

Fault / Failure Domains

Fault or Failure domains are places and things that can fail from regions, data centers or availability zones, clusters, stamps, pods, servers, networks, storage, hardware (systems, components including SSD and HDDs, power supplies, adapters). Other fault domain topics and focus areas include facility power, cooling, software including applications, databases, operating systems and hypervisors among others.

SDDI SDDC Fault Domains Zones Regions
Figure 9.5 Various Fault and Failure Domains, Regions, Locations

Clustering

Clustering is a technique and technology for enabling resiliency, as well as scaling performance, availability, and capacity. Clusters can be local, remote, or wide-area to support different data infrastructure objectives, combined with replication and other techniques.

SDDI SDDC Clustering
Figure 9.12 Clustering and Replication Examples

Another characteristic of clustering and resiliency techniques is the ability to detect and react quickly to failures to isolate and contain faults, as well as invoking automatic repair if needed. Different clustering technologies enable various approaches, from proprietary hardware and software tightly coupled to loosely coupled general-purpose hardware or software.

Clustering characteristics include:

  • Application, database, file system, operating system (Windows Storage Replica)
  • Storage systems, appliances, adapters and network devices
  • Hypervisors ( Hyper-V, VMware vSphere ESXi and vSAN among others)
  • Share everything, share some things, share nothing
  • Tightly or loosely coupled with common or individual system metadata
  • Local in a data center, campus, metro, or stretch cluster
  • Wide-area in different regions and availability zones
  • Active/active for fast fail over or restart, active/passive (standby) mode

Additional clustering considerations include:

  • How does performance scale as nodes are added, or what overhead exists?
  • How is cluster resource locking in shared environments handled?
  • How many (or few) nodes are needed for quorum to exist?
  • Network and I/O interface (and management) requirements
  • Cluster partition or split-brain (i.e., cluster splits into two)?
  • Fast-reacting fail over and resiliency vs. overhead of failing back
  • Locality of where applications are located vs. storage access and clustering

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Everything is not the same across different environments, data centers, data infrastructures and applications. There are various performance, availability, capacity economic (PACE) considerations along with service level objectives (SLO). Availability means being able to access information resources (applications, data and underlying data infrastructure resources), as well as data being consistent along with durable. Being durable means enabling data to be accessible in the event of a device, component or other fault domain item failures (hardware, software, data center).

Just as everything is not the same across different environments, there are various techniques, technologies and tools that can be used in different ways to enable availability and accessibility. These include high availability (HA), RAS, mirroring, replication, parity along with derivative erasure code (EC), LRC, RS and other RAID implementations, along with clustering. Also keep in mind that pertaining to data protection, there are good gaps (e.g. time intervals for recovery points, air gaps), bad gaps (missed coverage or lack of protection), and ugly gaps (not being able to recover from a gap in time).

Note that mirroring, replication, EC, LRC, RS or other Parity and RAID approaches are not replacements for backup, rather they are companions to time interval based recovery point protection such as snapshots, backup, checkpoints, consistency points and versioning among others (discussed in follow-up posts in this series).

Which data protection tool, technology to trend is the best depends on what you are trying to accomplish and your application workload PACE requirements along with SLOs. Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Data Protection Diaries Access Availability RAID Erasure Codes LRC Deep Dive

Access Availability RAID Erasure Codes including LRC Deep Dive

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulzwww.storageioblog.com November 26, 2017

This is Part 3 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Software Defined Data Infrastructure Essentials Book SDDC

Click here to view the previous post Part 2 Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals, and click here to view the next post Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

In this post part of the Data Protection diaries series as well as companion to Chapter 9 of SDDI Essentials book, we are going on a longer, deeper dive. We are going to look at availability, access and durability including mirror, replication, RAID including various traditional and newer parity approaches such as Erasure Codes ( EC), Local Reconstruction Code (LRC), Reed Solomon (RS) also known as RAID 2 among others. Later posts in this series look at point in time data protection to support recovery to a given time (e.g. RPO), while this and the previous post look at maintaining access and availability.

Keep in mind that if something can fail, it probably will, also that everything is not the same meaning different environments, application workloads (along with their data). Different environments and applications have diverse performance, availability, capacity economic (PACE) attributes, along with service level objectives ( SLOs). Various SLOs include PACE attributes, recovery point objectives ( RPO), recovery time objective ( RTO) among others.

Availability, accessibility and durability (see part two in this series) along with associated RAS topics are part of what enable RTO, as well as meet Faults (or failures) to tolerate ( FTT). This means that different fault tolerance modes ( FTM) determine what technologies, tools, trends and techniques to use to meet different RTO, FTT and application PACE needs.

Maintaining access and availability along with durability (e.g. how many copies of data as well as where stored) protects against loss or failure of a component device ( SSD, HDDs, adapters, power supply, controller), node or system, appliance, server, rack, clusters, stamps, data center, availability zones, regions, or other Fault or Failure domains spanning hardware, software, and services.

SDDC, SDI, SDDI data infrastructure
Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Protection Access Availability RAID Erasure Codes

This is a good place to mention some context for RAID and RAID array, which can mean different things pertaining to Data Protection. Some people associate RAID with a hardware storage array, or with a RAID card. Other people consider an array to be a storage array that is a RAID enabled storage system. A trend is to refer to legacy storage systems as RAID arrays or hardware-based RAID, to differentiate from newer implementations.

Context comes into play in that a RAID group (i.e., a collection of HDDs or SSD that is part of a RAID set) can be referred to as an array, a RAID array, or a virtual array. What this means is that while some RAID implementations may not be relevant, there are many new and evolving variations extending parity based protection making at least software-defined RAID still relevant

Keep context in mind, and don’t be afraid to ask what someone is referring to: a particular vendor storage system, a RAID implementation or packaging, a storage array, or a virtual array. Also keep the context of the virtual array in perspective vs. storage virtualization and virtual storage. RAID as a term is used to refer to different modes such as mirroring or parity, and parity can be legacy RAID 4, 5, or 6 along with erasure codes (EC). Note some people refer to erasure codes in the context of not being a RAID system, which can be an inference to not being a legacy storage system running hardware RAID (e.g. not software or software defined).

The following figure (9.13) shows various availability protection schemes (e.g. not recovery point) that maintain access while protecting against loss of a component, device, system, server, site, region or other part of a fault domain. Since everything is not the same with environments and applications having different Performance Availability Capacity Economic ( PACE) attributes, there are various approaches for enabling availability along with accessibility.

Keep in mind that RAID and Erasure codes along with their various, as well as replication and mirroring by themselves are not a replacement for backup or other point in time (e.g. enable recovery point) protection.

Instead, availability technologies such as RAID and erasure code along with mirror as well as replication need to be combined with snapshots, point in time copies, consistency points, checkpoints, backups among other recovery point protection for complete data protection.

Speaking of replacement for backup, while many vendors and their pundits claIm or want to see backup as being dead, as long as they keep talking about backup instead of broader data protection backup will remain alive.

SDDC SDDI RAID Parity Erasure Code EC
Figure 9.13 Various RAID, Mirror, Parity and Erasure Code (EC) approaches

Different RAID levels (including parity, EC, LRC and RS based) will affect storage energy effectiveness, similar to various SSD or HDD performance capacity characteristics; however, a balance of performance, availability, capacity, and energy needs to occur to meet application service needs. For example, RAID 1 mirroring or RAID 10 mirroring and striping use more HDDs and, thus, power, but will yield better performance than RAID 6 and erasure code parity protection.

 

Normal performance

 

Availability

Performance overhead

Rebuild overhead

Availability overhead

RAID 0 (stripe)

Very good read & write

None

None

Full volume restore

None

RAID 1 (mirror or replicate)

Good reads; writes = device speed

Very good; two or more copies

Multiple copies can benefit reads

Re-synchronize with existing volume

2:1 for dual, 3:1 for three-way copies

RAID 4 (stripe with dedicated parity, i.e., 4 + 1 = 5 drives total)

Poor writes without cache

Good for smaller drive groups and devices

High on write without cache (i.e., parity)

Moderate to high, based on number and type of drives

Varies; 1 Parity/N, where N = number of devices

RAID 5
(stripe with rotating parity, 4 + 1 = 5 drives)

Poor writes without cache

Good for smaller drive groups and devices

High on write without cache (i.e., parity)

Moderate to high, based on number and type of drives

Varies
1 Parity/N, where N = number of devices

RAID 6
(stripe with dual parity, 4 + 2 = 6 drives)

Poor writes without cache

Better for larger drive groups and devices

High on write without cache (i.e., parity)

Moderate to high, based on number and type of drives

Varies; 2 Parity/N, where N = number of devices

RAID 10
(mirror and stripe)

Good

Good

Minimum

Re-synchronize with existing volume

Twice mirror capacity stripe drives

Reed-Solomon (RS) parity, also known as erasure code (EC), local reconstruction code (LRC), and SHEC

Ok for reads, slow writes; good for static and cold data with front-end cache

Good

High on writes (CPU for parity calculation, extra I/O operations)

Moderate to high, based on number and type of drives, how implemented, extra I/Os for reconstruction

Varies, low overhead when using large number of devices; CPU, I/O, and network overhead.

Table 9.3 Common RAID Characteristics

Besides those shown in table 9.3, other RAID including parity based approaches include 2 (Reed Solomon), 3 (synchronized stripe and dedicated parity) along with others including combinations such as 10, 01, 50, 60 among others.

Similar to legacy parity-based RAID, some erasure code implementations use narrow drive groups while others use larger ones to increase protection and reduce capacity overhead. For example, some larger enterprise-class storage systems (RAID arrays) use narrow 3 + 1 or 4 + 1 RAID 5 or 4 + 2 or 6 + 2 RAID 6, which have higher protection storage capacity overhead and fault=impact footprint.

On the other hand, many smaller mid-range and scale-out storage systems, appliances, and solutions support wide stripes such as 7 + 1, 15 + 1, or larger RAID 5, or 14 + 2 or larger RAID 6. These solutions trade the lower storage capacity protection overhead for risk of a multiple drive failures or impacts. Similarly, some EC implementations use relatively small groups such as 6, 2 (8 drives) or 4, 2 (6 drives), while others use 14, 4 (18 drives), 16, 4 (20 drives), or larger.

Table 9.4 shows options for a number of data devices (k) vs. a number of protect devices (m).

k
(data devices)

m
(protect devices)

Availability;
Resiliency

Space capacity overhead

Normal performance

FTT

Comments;
Examples

Narrow

Wide

Very good;
Low impact of rebuild

Very high

Good (R/W)

Very good

Trade space for RAS;
Larger m vs. k;
1, 1; 1, 2; 2, 2; 4, 5

Narrow

Narrow

Good

Good

Good (R/W)

Good

Use with smaller drive groups;
2, 1; 3, 1; 6, 2

Wide

Narrow

Ok to good;
With larger m value

Low as m gets larger

Good (read);
Writes can be slow

Ok to good

Smaller m can impact rebuild;
3, 1; 7, 1; 14, 2; 13, 3

Wide

Wide

Very good;
Balanced

High

Good

Very good

Trade space for RAS;
2, 2; 4, 4; 8, 4; 18, 6

Table 9.4. Comparing Various Data Device vs. Protect Device Configurations

Note that wide k with no m, such as 4, 0, would not have protection. If you are focused on reducing costs and storage space capacity overhead, then a wider (i.e., more devices) with fewer protect devices might make sense. On the other hand, if performance, availability, and minimal to no impact during rebuild or reconstruction are important, then a narrower drive set, or a smaller ratio of data to protect drives, might make sense.

Also note that the higher or larger the RAID number, or parity scheme, or number of "m" devices in a parity and erasure code group may not be better, likewise smaller may not be better. What is better is which approach meets your specific application performance, availability, capacity, economic (PACE) needs, along with SLO, RTO, RPO requirements. What can also be good is to use hybrid approaches combining different technologies and tools to facilitate both access, availability, durability along with point in time recovery across different layers of granularity (e.g. device, drive, adapter, controller, cabinet, file system, data center, etc).

Some focus on the lower level RAID as the single or primary point of protection, however watch out for that being your single point of failure as well. For example, instead of building a resilient RAID 10 and then neglecting to have adequate higher level access, as well as recovery point protection, combine different techniques including file system protection, snapshots, and backups among others.

Figure 9.14 shows various options and considerations for balancing between too many or too few data (k) and protect (m) devices. The balance is about enabling particular FTT along with PACE attributes and SLO. This means, for some environments or applications, using different failure-tolerant modes ( FTM) in various combinations as well as configurations.

SDDC SDDI Data Protection
Figure 9.14 Comparing various data drive to protection devices

Figure 9.14 top shows no protection overhead (with no protection); the bottom shows 13 data drives and three protection drives in an EC (RS or LRC among others) configuration that could tolerate three devices failing before loss of data or access occurs. In between are various options that can also be scaled up or down across a different number of devices ( HDDs, SSD, or systems).

Some solutions allow the user or administrator to configure the I/O chunk, slabs, shard, or stripe size, for example, from 8 KB to 256 KB to 1 MB (or larger), aligning with application workload and I/O profiles. Other options include the ability to set or disable read-ahead, write-through vs. write-back cache (with battery-protected cache), among other options.

The width or number of devices in a RAID parity or erasure group is based on a combination of factor, including how much data is to be stored and what your FTT objective is, along with spreading out protection overhead. Another consideration is whether you have large or small files and objects.

For example, if you have many small files and a wide stripe, parity, or erasure code set with a large chunk or shard size, you may not have an optimal configuration from a performance perspective.

The following figure shows combing various data protection availability and accessibility technologies including local as well as remote mirroring and replication, along with parity or erasure code (including LRC, RS, SHEC among others) approaches. Instead of just using one technology, a hybrid approach is used leveraging mirror (local on SSD) and replication across sites including asynchronous and synchronous. Replication modes include Asynchronous (time-delayed, eventual consistency) for longer distance, higher latency networks, and synchronous (strong consistency, real-time) for short distance or low-latency networks.

Note that the mirror and replication can be done in software deployed as part of a storage system, appliance or as tin-wrapped software, virtual machine, virtual storage appliance, container or some other deployment mode. Likewise RAID, parity and erasure code software can be deployed and packaged in different ways.

In addition to mirror and replication, solutions are also using parity based including erasure code variations for lower cost, less active data. In other words, the mirror on SSD handles active hot data, as well as any buffering or cache, while lower performance, higher capacity, lower cost data gets de-staged or migrated to a parity erasure code tier. Some vendors, service provider and solutions leveraging variations of the approach in figure 9.15 include Microsoft ( Azure and Windows) and VMware among others.

SDDC SDDI Data Protection
Figure 9.15 Combining various availability data protection techniques

A tradecraft skill is finding the balance, knowing your applications, the data, and how the data is allocated as well as used, then leveraging that insight and your experience to configure to meet your application PACE requirements.

Consider:

  • Number of drives (width) in a group, along with protection copies or parity
  • Balance rebuild performance impact and time vs. storage space overhead savings
  • Ability to mix and match various devices in different drive groups in a system
  • Management interface, tools, wizards, GUIs, CLIs, APIs, and plug-ins
  • Different approaches for various applications and environments
  • Context of a physical RAID array, system, appliance, or solution vs. logical

Erasure Codes (EC)

Erasure Codes ( EC) combines advanced protection with variable space capacity overhead over many drives, devices, or systems using large parity chunks, shards compared to traditional parity RAID approaches. There are many variations of EC as well as parity based approaches, some are tied to Reed Solomon (RS) codes while others use different approaches.

Note that some EC are optimized for reducing the overhead and cost of storing data (e.g. less space capacity) for inactive, or primarily read data. Likewise, some EC or variations are optimized for performance of reads/writes as well as reducing overhead of rebuild, reconstructions, repairs with least impact. Which EC or parity derivative approach is best depends on what you are trying to do or impact to avoid.

Reed Solomon (RS) codes

Reed Solomon (RS) codes are advanced parity protection mathematical algorithm technique that works well on large amounts of data providing protection with lower space capacity overhead depending on how configured. Many Erasure Codes (EC) are based on derivatives of RS. Btw, did you know (or remember) that RAID 2 (rarely used with few legacy implementations) has ties to RS codes? Here are some additional links to RS including via Backblaze, CMU, and Dr Dobbs.

Local Reconstruction Codes (LRC)

Microsoft leverages LRC in Azure as well as in Windows Servers. LRC are optimized for a balance of protection, space capacity savings, normal performance as well as reducing impact on running workloads during a repair, rebuild or reconstruction. One of the tradeoffs that LRC uses is to add some amount of additional space capacity in exchange for normal and abnormal (e.g. during repair) performance improvements. Where RS, EC and other parity based derivatives typically use a (k,m) nomenclature (e.g. data, protection), LRC adds an extra variable to help with constructions (k,m,n).

Some might argue that LRC are not as space efficient as other EC, RS or parity derivative variations of which the counter argument can be that some of those approaches are not as performance effective. In other words, everything is not the same, one approach does not or should not have to be applied to all, unless of course your preferred solution approach can only do one thing.

Additional LRC related material includes:

  • (PDF by Microsoft) LRC Erasure Coding in Windows Storage Spaces
  • (Microsoft Usenix Paper) Best Paper Award Erasure Coding in Azure
  • (Via MSDN Shared) Azure Storage Erasure Coding with LRC
  • (Via Microsoft) Azure Storage with Strong Consistency
  • (Paper via Microsoft) 23rd ACM Symposium on Operating Systems Principles (SOSP)
  • (Microsoft) Erasure Coding in Azure with LRC
  • (Via Microsoft) Good collection of EC, RS, LRC and related material
  • (Via Microsoft) Storage Spaces Fault Tolerance
  • (Via Microsoft) Better Way To Store Data with EC/LRC
  • (Via Microsoft) Volume resiliency and efficiency in Storage Spaces

Shingled Erasure Code (SHEC)

Shingled Erasure Codes (SHEC) are a variation of Erasure Codes leveraging shingled overlay approach similar to what is being used in Shingled Magnetic Recording (SMR) on some HDDs. Ceph has been an early promoter of SHEC, read more here, and here.

Replication and Mirroring

Replication and Mirroring create a mirror or replica copy of data across different devices, systems, servers, clusters, sites or regions. In addition to keeping a copy, mirror and replication can occur on different time intervals such as real-time ( synchronous) and time deferred (Asynchronous). Besides time intervals, mirror and replication are implemented in different locations at various altitudes or stack layers from lower level hardware adapter or storage systems and appliances, to operating systems, hypervisors, software defined storage, volume managers, databases and applications themselves.

Covered in more detail in chapters 5 and 6, synchronous provides real-time, strong consistency, although high-latency local or remote interfaces can impact primary application performance. Note there is a common myth that high-latency networks are only long distance when in fact some local networks can also be high-latency. Asynchronous (also discussed in more depth in chapters 5 and 6) enables local and remote high-latency communications to be spanned, facilitating protection over a distance without impacting primary application performance, albeit with lower consistency, time deferred, also known as eventual consistency.

Mirroring (also known as RAID 1) and replication creates a copy (a mirror or replica) across two or more storage targets (devices, systems, file systems, cloud storage service, applications such as a database). The reason for using mirrors is to provide a faster (for normal running and during recovery) failure-tolerant mode for enabling availability, resiliency, and data protection, particularly for active data.

Figure 9.10 shows general replication scenarios. Illustrated are two basic mirror scenarios: At the top, a device, volume, file system, or object bucket is replicated to two other targets (i.e., three-way or three replicas); At the bottom, is a primary storage device using a hybrid replica and dispersal technique where multiple data chunks, shards, fragments, or extents are spread across devices in different locations.

SDDC SDDI Mirror and Replication
Figure 9.10 Various Mirror and Replication Approaches

Mirroring and replication can be done locally inside a system (server, storage system, or appliance), within a cabinet, rack, or data center, or remotely, including at cloud services. Mirroring can also be implemented inside a server in software or using RAID and HBA cards to off-load the processing.

SDDC SDDI Mirror Replication Techniques
Figure 9.11 Mirror or Replication combined with Snapshots or other PiT protection

Keep in mind that mirroring and replication by themselves are not a replacement for backups, versions, snapshots, or another recovery point, time-interval (time-gap) protection. The reason is that replication and mirroring maintain a copy of the source at one or more destination targets. What this means is that anything that changes on the primary source also gets applied to the target destination (mirror or replica). However, it also means that anything changed, deleted, corrupted, or damaged on the source is also impacted on the mirror replica (assuming the mirror or replicas were or are mounted and accessible on-line).

implementations in various locations (hardware, software, cloud) include:

  • Applications and databases such as SQL Server, Oracle among others
  • File systems, volume manager, Software-defined storage managers
  • Third-party storage software utilities and drivers
  • Operating systems and hypervisors
  • Hardware adapter and off-load devices
  • Storage systems and appliances
  • Cloud and managed services

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

There are various data protection technologies, tools and techniques for enabling availability of information resources including applications, data and data Infrastructure resources. Likewise there are many different aspects of RAID as well as context from legacy hardware based to cloud, virtual, container and software defined. In other words, not all RAID is in legacy storage systems, and there is a lot of FUD about RAID in general that is probably actually targeted more at specific implementations or products.

There are different approaches to meet various needs from stripe for performance with no protection by itself, to mirror and replication, as well as many parity approaches from legacy to erasure codes including Reed Solomon based as well as LRC among others. Which approach is best depends on your objects including balancing performance, availability, capacity economic (PACE) for normal running behavior as well as during faults and failure modes.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.