March 31st is world backup day; when is world recovery day

March 31st is world backup day; when is world recovery day

If March 31st is world backup day, when is world recovery day?

For several years, if not decades, March 31st has been world backup day, a reminder to protect and backup your apps and data. Data protection, including backup, recovery, business continuance (BC), disaster recovery (DR), and business resilience (BR), should be a 365-day-a-year focus. If you have regular data protection, including backup, that is great; when was the last time you tested restore?

Some related content

Upcoming and past events including webinars, tips and commentary
World Backup Day Reminder Don’t Be an April Fool Test Your Data Recovery
Data Infrastructure Overview, Its What’s Inside of a Data Center
Application Data Value Characteristics Everything Is Not The Same
Data Protection Diaries Topics Tools Techniques Technologies Tips

Reminder to Protect your data and apps and settings

Thus, this is also a reminder to protect your data and apps and their settings regularly. What’s even better is evolving from none once a year to more frequent data protection, including backup of your critical and noncritical apps and data. Notice I keep mentioning apps and not just the usual focus of or on data. Program apps are considered broadly data; after all, apps and your settings and metadata are just data when stored and protected.

There is also often a focus on just the data, which can lead to problems when it comes time to recover an app program, settings, or metadata. Also, a reminder that data protection, including backup, is not just for large enterprises; it applies to organizations and entities of all sizes, including small and medium businesses (SMBs), non-profits, and homes (e.g., your photos, worksheets, and other documents).

What About Recovery

If March 31st is world backup day, when is world recovery day? So far, I have been talking about backup as part of data protection or ensuring your apps, data, and settings are protected; what about recovery?

Sometimes with data protection, discussions can drift into what’s more critical, backup or recovery, which is a bit like a chicken and egg situation. In other words, what’s more important, the chicken or the egg? Similar to data protection, what’s more critical, backup or recovery?

Recovery is only as good as your backup (or snapshot, point-in-time copy, checkpoint, or consistency point), and your backup or protection copy is only as good as its recoverability. Recoverability means that not only is there something to restore from a point in time (e.g., recovery point objective or RPO) in a given amount of time (recovery time objective or RTO).

Recoverability also means that you can pull the data (e.g., bits, bytes, blocks, blobs, objects, files, tables) from the protection medium, media, or service and use it. Recovery means that the data is valid and consistent, has integrity, or is otherwise not bad, missing, damaged, or corrupted (e.g., usable).

What About Recovery Day?

For several years I have mentioned and will continue to do so that if March 31st is world backup day, then April 1st should be a world recovery day. So why April 1st for world recovery day? Simple, you don’t want to look like a fool the day after world backup day if you can’t restore and use data backed up the day before.

If you are not comfortable with April 1st for world recovery day? Then make your world recovery day (or test) a day or so later. The important message is to ensure your apps, data, and settings are protected (e.g., copied, backed up, snapshot, checkpoint, etc.), trust yet verify, and test your restorations.

Why do I mentation apps, data, and settings?

The important message here is that it is good if you are already protecting your data, your spreadsheets, worksheets, databases, files, photos, and the application programs that use them. However, also ensure that you are protecting application settings, configurations, metadata, encryption keys, the backup or protection mechanisms, and their data.

For example, when I accidentally delete a data file or configuration settings, I can restore those without recovering everything. Suppose, for instance, I accidentally or intentionally uninstall an application program. In that case, I can reinstall (assuming I have a copy of the program), then restore my settings and pick up where I resumed.

Who does this apply to?

From organizations of size and type to individuals. If you have or generate or save data, if it is worth having (or you have to keep it), then it should be protected. What how often to protect data (time interval) will be based on what your recovery point objective (RPO) is. Likewise how fast you need to recover with your recovery time objective (RTO).

Remember that it is not if you will need to restore, recover, reload, refresh, or repair your apps, data, and settings instead when. It might be because of accidental or planned deletion, accident, hardware, software, cloud service situation, ransomware, or malware, among other things that can and do happen.

What to do?

If March 31st is world backup day, when is world recovery day? Ensure you have regular copies of your apps, data, and configuration settings, including encryption keys. Implement a variation of the old school three two one (e.g., 3 2 1) data protection, e.g., backup scheme (e.g., three or more copies, stored on two or more devices, systems, media or mediums, and at least one of them offsite preferably offline including at cloud).

A variation of the new school 4 3 2 1 data protection scheme has:
Have four or more versions of your protected data.
Three or more copies (feel free to swap the number of copies and versions).
Stored on two or more different systems (devices, media, or locations).
At least one copy offsite (preferably with one offline), including cloud.

The big difference between the old school 3 2 1 and the new school 4 3 2 1 is the emphasis and distinction of having multiple copies and various versions (e.g., points in time). For example, storing three copies on two systems with one offsite is good unless all copies are damaged. Having different versions (e.g., point in time) and multiple copies of those versions stored in different places including at least one offline (e.g., air-gapped), is essential.

Trust yet verify, test your backups and recovery

Test to verify your data protection is working and that data (apps, data, settings) can be restored. When testing restores, be careful not to overwrite your good data and cause a disaster. Also, ensure your data is encrypted in multiple locations and layers and that you protect your encryption keys. Finally, make sure your backup, protection software, catalog, and settings are encrypted, secured, and protected.

If you have questions, not sure, learn more here in my book Software Defined Data Infrastructure Essentials (CRC Press), Data Infrastructure Management Insight and Strategies (CRC Press), as well as check out these listed below, or reach out to me or others. If you are an individual consumer and just looking to protect some photos, valuable documents, and heirlooms, get in touch with professionals who specialize in these types of things.

What do I do?

Implement 4 3 2 1 type data protection with different granularities and frequencies. For example, my data protection includes regular point-in-time copies, including backups and snapshots, checkpoints, consistency points of systems, volumes, shares, apps, files, data, and settings at different intervals. Having different types of apps and data, some of which are more static vs. others that are changing, protection is also varied to avoid treating everything the same, reduce cost, and increase coverage.

I protect my Apps, data, and settings with multiple versions and copies locally on different systems, devices, mediums, and offsite, including offline and at cloud services. So why do I store data offsite vs. having it all in the cloud? Simple, speed of recovery, and flexibility.

If it’s a few files, perhaps a few GBs of data, it is usually faster for me if I don’t have a good copy locally to get it from Microsoft Azure. Otoh, if I need to restore TBs of data (something terrible happens), then it can be faster to bring an offline, offsite copy back, correct that, then only pull the more recent data I need from the cloud.

What are some of the tools and technologies that I use?

Locally I have multiple Microsoft Windows Servers (Server 2022) with various storage (HDDs and SSDs), including removable devices. In addition to on-prem, I have data stored offsite on removable media and cloud copies. For my cloud copies, I have a mix of files and blobs stored at Microsoft Azure.

A challenge moving from AWS to Azure was Retrospect did not support objects (Azure blobs). I realized, no worries, Retrospect supports storing data on local storage (SSD or HDD) on regular filesystems as files. The solution was set up an Azure file share for Retrospect, and everything has worked fantastic.

Are there things I need and want to improve? Yes, it’s an ongoing process and journey.

What should you do next?

Make sure you have a data backup; if not, march 31st is a good reminder. Trust yet verify your backups are working and you can recover and not be an April 1st fool.

Where to learn more

Learn more about world backup day, recovery and data protection along with other related topics via the following links:

Upcoming and past events including webinars, tips and commentary
Next Generation Hybrid Data Infrastructures Are In Your Future
Cloud File Data Storage Consolidation and Economic Comparison Model
New Book Data Infrastructure Management Insight Strategies
World Backup Day Reminder Don’t Be an April Fool Test Your Data Recovery
Virtual, Cloud and IT Availability, it’s a shared responsibility
Don’t Stop Learning Expand Your Skills Experiences Everyday
Data Infrastructure Overview, Its What’s Inside of a Data Center
Application Data Value Characteristics Everything Is Not The Same
Data Protection Diaries Topics Tools Techniques Technologies Tips
Data Infrastructure Server Storage I/O related Tradecraft Overview

Additional learning experiences can be found in Software Defined Data Infrastructure Essentials book. Also check out Data Infrastructure Management Insight and Strategies.

Software Defined Data Infrastructure Essentials Book SDDC backup restore data protection cloud storage containers data footprint reduction

What this all means

If March 31st is world backup day, when is world recovery day? Every day should be a backup day (e.g., some protection, backup, copy, snapshot, checkpoint, consistency point). Likewise, every day should be able to be a recovery day. World backup day and recovery apply to organizations of all sizes and individuals. Remember that If March 31st is world backup day, when is world recovery day?

Ok, nuff said.

Cheers gs

Greg Schulz – Multi-year Microsoft MVP Cloud and Data Center Management, ten-time VMware vExpert. Author of Data Infrastructure Insights (CRC Press), Software Defined Data Infrastructure Essentials (CRC). Cloud and Virtual Data Storage Networking (CRC), The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier). Visit twitter @storageio as well as www.picturesoverstillwater.com to view various UAS/UAV e.g. drone based aerial content created by Greg Schulz. Courteous comments are welcome for consideration. First published on https://storageioblog.com. Any reproduction without attribution or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. Visit our companion site https://picturesoverstillwater.com to view drone based aerial photography and video related topics. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO and UnlimitedIO LLC.

Cloud Ready Data Protection for Hybrid Data Centers Are In Your Future

Cloud Ready Data Protection for Hybrid Data Centers Are In Your Future

Cloud Ready Data Protection for Hybrid Data Centers

Join me for a free webinar Cloud Ready Data Protection for Hybrid Data Centers and Data Infrastructures 11AM PT Thursday July 11th produced by Redmond Magazine sponsored by Quest Software.

Hybrid Data Infrastructure Data Center Cloud Container Software Defined Next Generation Cloud Ready Data Protection for Hybrid Data Centers

Hybrid Data Infrastructures and Data Centers

Hybrid cloud and on-prem data centers are in your future if not already a reality. In addition to using public cloud and on-prem resources, your environment is likely a mix of many different operating systems, applications and servers (virtual and physical), along with multiple backup and recovery technologies.

Cloud Ready Data Protection for Hybrid Data Centers

In this engaging, interactive webinar, we will look at trends, issues, and challenges, as well as provide best practices in what you can do to address them today. You’ll learn how to simplify and streamline your system, application and data protection in both the cloud and data center without compromise, all while removing complexity and cost.

What You Will Learn

Join Microsoft MVP, VMware vExpert and IT analyst Greg Schulz of Server StorageIO along with Michael Gogos, Data Protection expert from Quest, as they discuss how to:

  • Become hybrid and cloud data protection ready
  • Use the cloud for backup and disaster recovery
  • Protecting cloud applications and their data
  • Address different hybrid data protection scenarios
  • Take action today to prepare for tomorrow

 

Where to learn more

Learn more about world backup day, recovery and data protection along with other related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

I look forward to you joining Michael Gogos of Quest Software and myself on Thursday July 11th 11AM PT for our interactive discussion (bring your questions) around Cloud Ready Data Protection for Hybrid Data Centers and what you can do today (Register here).

Ok, nuff said, for now.

Cheers GS

Greg Schulz – Multi-year Microsoft MVP Cloud and Data Center Management, ten-time VMware vExpert. Author of Data Infrastructure Insights (CRC Press), Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Also visit www.picturesoverstillwater.com to view various UAS/UAV e.g. drone based aerial content created by Greg Schulz. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2019 Server StorageIO and UnlimitedIO. Visit our companion site https://picturesoverstillwater.com to view drone based aerial photography and video related topics. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

April 2018 Server StorageIO Data Infrastructure Update Newsletter

April 2018 Server StorageIO Data Infrastructure Update Newsletter

Server StorageIO data infrastructure Update Newsletter

Volume 18, Issue 4 (April 2018) Data Infrastructure Update Newsletter

Hello and welcome to the April 2018 Server StorageIO Data Infrastructure Update Newsletter.

In cased you missed it, the March 2018 Server StorageIO Data Infrastructure Update Newsletter can be viewed here (HTML and PDF).

In this issue themes include life beyond world backup day, focus on recovery, restoration and resiliency, as well as getting ready for GDPR. Also covered in this issue are themes of NVMe along with NVMe over Fabric (NVMeoF), Storage Class Memories (SCM) and related technologies covered in the March 2018 newsletter. Recent VMware public, private and hybrid cloud along with vSphere v6.7 are covered as well as other topics including:

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via SearchStorage: Comments on Dell EMC storage strategy talk buzzes Dell Tech World
Via GizModo: Comments Can a Loud Noise Really Bring Down a Data Center?
Via StateTech: Comments IT Ingenuity Lets State and Local Agencies Do More with Less
Via StateTech: Comments State and Local Agencies See Power in the VDI, HCI Combination
Via StateTech: Comments How Local Governments Can Meet the Demands of IoT Networking
Via IronMountain: Comments Hybrid cloud deployment demands a change in security mindset
Via IronMountain: Hybrid 4 3 2 1 Data Protection
Via IronMountain InfoGoto: The growing Trend of Secondary Data Storage
Via IronMountain InfoGoto: World Backup Day Best Practices For a Hybrid Approach
Via BizTech: Why Hybrid (SSD and HDD) Storage Might Be Fit for SMB environments

View more Server, Storage and I/O trends and perspectives comments here.

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

VMware vSphere vSAN vCenter version 6.7 SDDC Update Summary
VMware vSphere vSAN vCenter v6.7 SDDC details
VMware vSphere vSAN vCenter Server Storage I/O Enhancements
Have you heard about the new CLOUD Act data regulation?
Data Protection Recovery Life Post World Backup Day Pre GDPR
Microsoft Windows Server 2019 Insiders Preview
March 2018 Server StorageIO Data Infrastructure Update Newsletter
Application Data Value Characteristics Everything Is Not The Same
Application Data Availability 4 3 2 1 Data Protection
VMware continues cloud construction with March announcements
World Backup Day 2018 Data Protection Readiness Reminder
Use Intel Optane NVMe U.2 SFF 8639 SSD drive in PCIe slot
Data Infrastructure Resource Links cloud data protection tradecraft trends
IT transformation Serverless Life Beyond DevOps Podcast
Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips
AWS Announces New S3 Cloud Storage Security Encryption Features
Introducing Windows Subsystem for Linux WSL Overview #blogtober
Hot Popular New Trending Data Infrastructure Vendors To Watch

View other recent as well as past StorageIOblog posts here

Events and Activities

Recent and upcoming event activities.

April 25, 2018 – Webinar – SDDC and Data Protection Discussion

April 24, 2018 – Webinar – AWS and on-site, on-premises hybrid data protection

March 27, 2018 – Webinar – Veeams Road to GDPR compliance The 5 Lessons Learned

Feb 28, 2018 – Webinar – Benefits of Moving Hyper-V Disaster Recovery to the Cloud

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Data Infrastructure Recommend Reading and watching list
Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows
storageio.com/links – Various industry links (over 1,000 with more to be added soon)
objectstoragecenter.com – Cloud and object storage topics, tips and news items
OpenStack.org – Various OpenStack related items
storageio.com/downloads – Various presentations and other download material
storageio.com/protect – Various data protection items and topics
thenvmeplace.com – Focus on NVMe trends and technologies
thessdplace.com – NVM and Solid State Disk topics, tips and techniques
storageio.com/converge – Various CI, HCI and related SDS topics
storageio.com/performance – Various server, storage and I/O benchmark and tools
VMware Technical Network – Various VMware related items

Connect and Converse With Us

Storage IO RSS storageio linkedin storageio facebook Server StorageIO on twitter @StorageIO   Google+  Server StorageIO email storageio youtube  storageio instagram

Subscribe to Newsletter – Newsletter Archives StorageIO.comStorageIOblog.com

What this all means and wrap-up

Data Infrastructures are what exist inside of physical and cloud data centers that in turn support applications that transform data into information. Clouds continue to be a popular topic as well as deployment platform for and of data infrastructures. VMware has made several announcements in support of their public, private and hybrid cloud environments. A fundamental role of data infrastructures is to protect, preserve, secure and serve information via server, storage, I/O network hardware, software as well as services.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

March 2018 Server StorageIO Data Infrastructure Update Newsletter

March 2018 Server StorageIO Data Infrastructure Update Newsletter

Server and StorageIO Update Newsletter

Volume 18, Issue 3 (March 2018)

Hello and welcome to the March 2018 Server StorageIO Data Infrastructure Update Newsletter.

If you are wondering where the January and February 2018 update newsletters are, they are rolled into this combined edition. In addition to the short email version (free signup here), you can access full versions (html here and PDF here) along with previous editions here.

In this issue:

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

Data Infrastructure Data Protection and Backup BC BR DR HA Security

World Backup day is coming up on March 31 which is a good time to remember to verify and validate that your data protection is working as intended. On one hand I think it is a good idea to call out the importance of making sure your data is protected including backed up.

On the other hand data protection is not a once a year, rather a year around, 7 x 24 x 365 day focus. Also the focus needs to be on more than just backup, rather, all aspects of data protection from archiving to business continuance (BC), business resiliency (BR), disaster recovery (DR), always on, always accessible, along with security and recovery.

Data Infrastructure Data Protection Backup 4 3 2 1 rule
Data Infrastructure 4 3 2 1 Data Protection and Backup

Some data spring thoughts, perspectives and reminders. Data lakes may swell beyond their banks causing rivers of data to flood as they flow into larger reservoirs, great data lakes, gulfs of data, seas and oceans of data. Granted, some of that data will be inactive cold parked like glaciers while others semi-active floating around like icebergs. Hopefully your data is stored on durable storage solutions or services and does not melt.

Data Infrastructure Server Storage I/O flash SSD NVMe
Various NAND Flash SSD devices and SAS, SATA, NVMe, M.2 interfaces

Non-Volatile Memory (NVM) including various solid state device (SSD) mediums (e.g. nand flash, 3D XPoint, MRAM among others), packaging (drives, PCIe Add in cars [AiC] along with entire systems, appliances or arrays). Also part of the continue evolution of NVM, SSD and other persistent memories (PM) including storage class memories (SCM) are different access protocol interfaces.

Keep in mind that there is a difference between NVM (medium) and NVMe (access), NVM is the generic category of mediums or media and devices such as nand flash, nvram, 3D XPoint among others SCM (and PMs). In other words, NVM is what data devices use for storing data, NVMe is how devices and systems are accessed. NVMe and its variations is how NVM, SSD, PM, SCM media and devices get accessed locally, as well as over network fabrics (e.g. NVMe-oF an FC-NVMe).

NVMe continues to evolve including with networked fabric variations such as RDMA based NVMe over Fabric (NVMe-oF), along with Fibre Channel based (FC-NVMe). The Fibre Channel Industry Association trade group recently held its second multi-vendor plugfest in support of NVMe over Fibre Channel.

Read more about NVM, NVMe, SSD, SCM, flash and related technologies, tools, trends, tips via the following resources:

Has Object Storage failed to live up to its industry hype lacking traction? Or, is object storage (also known as blobs) progressing with customer adoption and deployment on normal realistic timelines? Recently I have seen some industry comments about object storage not catching on with customers or failing to live up to its hyped expectation. IMHO object storage is very much alive along with block, file, table (e.g. database SQL and NoSQL repositories), message/queue among others, as well as emerging blockchain aka data exchanges.

Various Industry and Customer Adoption Deployment timeline
Various Industry and Customer Adoption Deployment Timeline (Via: StorageIOblog.com)

An issue with object storage is that it is still new, still evolving, many IT environments applications do not yet speak or access objects and blobs natively. Likewise as is often the case, industry adoption and deployment is usually early and short term around the hype, vs. the longer cycle of customer adoption and deployment. The downside for those who only focus on object storage (or blobs) is that they may be under pressure to do things short term instead of adjusting to customer cycles which take longer, however real adoption and deployment also last longer.

While the hype and industry buzz around object storage (and blobs) may have faded, customer adoption continues and is here to stay, along with block, file among others, learn more at www.objectstoragecenter.com. Also keep in mind that there is a difference between industry and customer adoption along with deployment.

Some recent Industry Activities, Trends, News and Announcements include:

In case you missed it, Amazon Web Services (e.g. AWS) announced EKS (Elastic Kubernetes Service) which as its name implies, is an easy to use and manage Kubernetes (containers, serverless data infrastructure) running on AWS. AWS joins others including Microsoft Azure Kubernetes Services (AKS), Googles Kubernetes Engine, EasyStack (ESContainer for openstack and Kubernetes),VMware Pivotal Container Service (PKS) among others. What this means is that in the container serverless data infrastructure ecosystem Kubernetes container management (orchestration platform) is gaining in both industry as well as customer adoption along with deployment.

Check out other industry news, comments, trends perspectives here.

Data Infrastructure Server StorageIO Comments Content

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via BizTech: Why Hybrid (SSD and HDD) Storage Might Be Fit for SMB environments
Via Excelero: Server StorageIO white paper enabling database DBaaS productivity
Via Cloudian: YouTube video interview file services on object storage with HyperFile
Via CDW Solutions: Comments on Software Defined Access
Via SearchStorage: Comments on Cloudian HyperStore on demand cloud like pricing
Via EnterpriseStorageForum: Comments and tips on Software Defined Storage Best Practices
Via PRNewsWire: Comments on Excelero NVMe NVMesh Database and DBaaS solutions
Via SearchStorage: Comments on NooBaa multi-cloud storage management
Via CDW: Comments on New IT Strategies Improve Your Bottom Line 
Via EnterpriseStorageForum: Comments on Software Defined Storage: Pros and Cons
Via DataCenterKnowledge: Comments on The Great Data Center Headache IoT
Via SearchStorage: Comments on Dell and VMware merger scenario options
Via PRNewswire: Comments on Chelsio Microsoft Validation of iWARP/RDMA
Via SearchStorage: Comments on Server Storage Industry trends and Dell EMC
Via ChannelProSMB: Comments on Hybrid HDD and SSD storage solutions
Via ChannelProNetwork: Comments on What the Future Holds for HDDs
Via HealthcareITnews: Comments on MOUNTAINS OF MOBILE DATA
Via SearchStorage: Comments on Cloudian HyperStore 7 targets multi-cloud complexities
Via GlobeNewsWire: Comments on Cloudian HyperStore 7
Via GizModo: Comments on Intel Optane 800P NVMe M.2 SSD
Via DataCenterKnowledge: Comments on getting data centers ready for IoT
Via DataCenterKnowledge: Comments on Beyond the Hype: AI in the Data Center
Via DataCenterKnowledge: Comments on Data Center and Cloud Disaster Recovery
Via SearchStoragae: Comments on Cloudian HyperFile marries NAS and object storage
Via SearchStoragae: Comments on Top 10 Tips on Solid State Storage Adoption Strategy
Via SearchStoragae: Comments on 8 Top Tips for Beating the Big Data Deluge

View more Server, Storage and I/O trends and perspectives comments here.

Data Infrastructure Server StorageIOblog posts

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

Application Data Value Characteristics Everything Is Not The Same
Application Data Availability 4 3 2 1 Data Protection
AWS Cloud Application Data Protection Webinar
Microsoft Windows Server 2019 Insiders Preview
Application Data Characteristics Types Everything Is Not The Same
Application Data Volume Velocity Variety Everything Is Not The Same
Application Data Access Lifecycle Patterns Everything Is Not The Same
Veeam GDPR preparedness experiences Webinar walking the talk
VMware continues cloud construction with March announcements
Benefits of Moving Hyper-V Disaster Recovery to the Cloud Webinar
World Backup Day 2018 Data Protection Readiness Reminder
Use Intel Optane NVMe U.2 SFF 8639 SSD drive in PCIe slot
Data Infrastructure Resource Links cloud data protection tradecraft trends
How to Achieve Flexible Data Protection Availability with All Flash Storage Solutions
November 2017 Server StorageIO Data Infrastructure Update Newsletter
IT transformation Serverless Life Beyond DevOps Podcast
Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips
HPE Announces AMD Powered Gen 10 ProLiant DL385 For Software Defined Workloads
AWS Announces New S3 Cloud Storage Security Encryption Features
Introducing Windows Subsystem for Linux WSL Overview #blogtober
Hot Popular New Trending Data Infrastructure Vendors To Watch

View other recent as well as past StorageIOblog posts here

Server StorageIO Recommended Reading (Watching and Listening) List

Software-Defined Data Infrastructure Essentials SDDI SDDC

In addition to my own books including Software Defined Data Infrastructure Essentials (CRC Press 2017) available at Amazon.com (check out special sale price), the following are Server StorageIO data infrastructure recommended reading, watching and listening list items. The Server StorageIO data infrastructure recommended reading list includes various IT, Data Infrastructure and related topics including Intel Recommended Reading List (IRRL) for developers is a good resource to check out. Speaking of my books, Didier Van Hoye (@WorkingHardInIt) has a good review over on his site you can view here, also check out the rest of his great content while there.

In case you may have missed it, here is a good presentation from AWS re:invent 2017 by Brendan Gregg (@brendangregg) about how Netflix does EC2 and other AWS tuning along with plenty of great resource links. Keith Tenzer (@keithtenzer) provides a good perspective piece about containers in a large IT enterprise environment here including various options.

Speaking of IT data centers and data infrastructure environments, checkout the list of some of the worlds most extreme habitats for technology here. Mark Betz (@markbetz) has a series of Docker and Kubernetes networking fundamentals posts on his site here, as well as over at Medium including mention of Google Cloud (@googlecloud). The posts in Marks series are good refresher or intros to how Docker and Kubernetes handles basic networking between containers, pods, nodes, hosts in clusters. Check out part I here and part II here.

Blockchain elements
Image via https://stevetodd.typepad.com

Steve Todd (@Stevetodd) has some good perspectives about Trusted Data Exchanges e.g. life beyond blockchain and bitcoin here along with core element considerations (beyond the product pitch) here, along with associated data infrastructure and storage evolution vs. revolution here.

Watch for more items to be added to the recommended reading list book shelf soon.

Data Infrastructure Server StorageIO event activities

Events and Activities

Recent and upcoming event activities.

March 27, 2018 – Webinar – Veeams Road to GDPR Compliancy The 5 Lessons Learned

Feb 28, 2018 – Webinar – Benefits of Moving Hyper-V Disaster Recovery to the Cloud

Jan 30, 2018 – Webinar – Achieve Flexible Data Protection and Availability with All Flash Storage

Nov. 9, 2017 – Webinar – All You Need To Know about ROBO Data Protection Backup

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Data Infrastructure Recommend Reading and watching list
Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows
storageio.com/links – Various industry links (over 1,000 with more to be added soon)
objectstoragecenter.com – Cloud and object storage topics, tips and news items
OpenStack.org – Various OpenStack related items
storageio.com/downloads – Various presentations and other download material
storageio.com/protect – Various data protection items and topics
thenvmeplace.com – Focus on NVMe trends and technologies
thessdplace.com – NVM and Solid State Disk topics, tips and techniques
storageio.com/converge – Various CI, HCI and related SDS topics
storageio.com/performance – Various server, storage and I/O benchmark and tools
VMware Technical Network – Various VMware related items

Connect and Converse With Us

Storage IO RSS storageio linkedin storageio facebook    Google+   storageio youtube  storageio instagram

Subscribe to Newsletter – Newsletter Archives StorageIO.comStorageIOblog.com

What this all means and wrap-up

Data Infrastructures are what exists inside physical data centers spanning cloud, converged, hyper-converged, virtual, serverless and other software defined as well as legacy environments. The fundamental role of data infrastructures comprising server (compute), storage, I/O networking hardware, software, services defined by management tools, best practices and policies is to provide a platform for applications along with their data to deliver information services. With March 31 being world backup day, also focus on making sure that on April 1st you are not a fool trying to recover from a bad data protection copy. With the continued movement to flash SSD along with other forms of storage class memory (SCM) and persistent memories (PM), data moves at a faster rate meaning data protection is even more important to get you out of trouble as fast as you get into issues.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Value Characteristics Everything Is Not The Same (Part I)

Application Data Value Characteristics Everything Is Not The Same

Application Data Value Characteristics Everything Is Not The Same

Application Data Value Characteristics Everything Is Not The Same

This is part one of a five-part mini-series looking at Application Data Value Characteristics Everything Is Not The Same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we start things off by looking at general application server storage I/O characteristics that have an impact on data value as well as access.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Everything is not the same across different organizations including Information Technology (IT) data centers, data infrastructures along with the applications as well as data they support. For example, there is so-called big data that can be many small files, objects, blobs or data and bit streams representing telemetry, click stream analytics, logs among other information.

Keep in mind that applications impact how data is accessed, used, processed, moved and stored. What this means is that a focus on data value, access patterns, along with other related topics need to also consider application performance, availability, capacity, economic (PACE) attributes.

If everything is not the same, why is so much data along with many applications treated the same from a PACE perspective?

Data Infrastructure resources including servers, storage, networks might be cheap or inexpensive, however, there is a cost to managing them along with data.

Managing includes data protection (backup, restore, BC, DR, HA, security) along with other activities. Likewise, there is a cost to the software along with cloud services among others. By understanding how applications use and interact with data, smarter, more informed data management decisions can be made.

IT Applications and Data Infrastructure Layers
IT Applications and Data Infrastructure Layers

Keep in mind that everything is not the same across various organizations, data centers, data infrastructures, data and the applications that use them. Also keep in mind that programs (e.g. applications) = algorithms (code) + data structures (how data defined and organized, structured or unstructured).

There are traditional applications, along with those tied to Internet of Things (IoT), Artificial Intelligence (AI) and Machine Learning (ML), Big Data and other analytics including real-time click stream, media and entertainment, security and surveillance, log and telemetry processing among many others.

What this means is that there are many different application with various character attributes along with resource (server compute, I/O network and memory, storage requirements) along with service requirements.

Common Applications Characteristics

Different applications will have various attributes, in general, as well as how they are used, for example, database transaction activity vs. reporting or analytics, logs and journals vs. redo logs, indices, tables, indices, import/export, scratch and temp space. Performance, availability, capacity, and economics (PACE) describes the applications and data characters and needs shown in the following figure.

Application and data PACE attributes
Application PACE attributes (via Software Defined Data Infrastructure Essentials)

All applications have PACE attributes, however:

  • PACE attributes vary by application and usage
  • Some applications and their data are more active than others
  • PACE characteristics may vary within different parts of an application

Think of applications along with associated data PACE as its personality or how it behaves, what it does, how it does it, and when, along with value, benefit, or cost as well as quality-of-service (QoS) attributes.

Understanding applications in different environments, including data values and associated PACE attributes, is essential for making informed server, storage, I/O decisions and data infrastructure decisions. Data infrastructures decisions range from configuration to acquisitions or upgrades, when, where, why, and how to protect, and how to optimize performance including capacity planning, reporting, and troubleshooting, not to mention addressing budget concerns.

Primary PACE attributes for active and inactive applications and data are:

P – Performance and activity (how things get used)
A – Availability and durability (resiliency and data protection)
C – Capacity and space (what things use or occupy)
E – Economics and Energy (people, budgets, and other barriers)

Some applications need more performance (server computer, or storage and network I/O), while others need space capacity (storage, memory, network, or I/O connectivity). Likewise, some applications have different availability needs (data protection, durability, security, resiliency, backup, business continuity, disaster recovery) that determine the tools, technologies, and techniques to use.

Budgets are also nearly always a concern, which for some applications means enabling more performance per cost while others are focused on maximizing space capacity and protection level per cost. PACE attributes also define or influence policies for QoS (performance, availability, capacity), as well as thresholds, limits, quotas, retention, and disposition, among others.

Performance and Activity (How Resources Get Used)

Some applications or components that comprise a larger solution will have more performance demands than others. Likewise, the performance characteristics of applications along with their associated data will also vary. Performance applies to the server, storage, and I/O networking hardware along with associated software and applications.

For servers, performance is focused on how much CPU or processor time is used, along with memory and I/O operations. I/O operations to create, read, update, or delete (CRUD) data include activity rate (frequency or data velocity) of I/O operations (IOPS). Other considerations include the volume or amount of data being moved (bandwidth, throughput, transfer), response time or latency, along with queue depths.

Activity is the amount of work to do or being done in a given amount of time (seconds, minutes, hours, days, weeks), which can be transactions, rates, IOPs. Additional performance considerations include latency, bandwidth, throughput, response time, queues, reads or writes, gets or puts, updates, lists, directories, searches, pages views, files opened, videos viewed, or downloads.
 
Server, storage, and I/O network performance include:

  • Processor CPU usage time and queues (user and system overhead)
  • Memory usage effectiveness including page and swap
  • I/O activity including between servers and storage
  • Errors, retransmission, retries, and rebuilds

the following figure shows a generic performance example of data being accessed (mixed reads, writes, random, sequential, big, small, low and high-latency) on a local and a remote basis. The example shows how for a given time interval (see lower right), applications are accessing and working with data via different data streams in the larger image left center. Also shown are queues and I/O handling along with end-to-end (E2E) response time.

fundamental server storage I/O
Server I/O performance fundamentals (via Software Defined Data Infrastructure Essentials)

Click here to view a larger version of the above figure.

Also shown on the left in the above figure is an example of E2E response time from the application through the various data infrastructure layers, as well as, lower center, the response time from the server to the memory or storage devices.

Various queues are shown in the middle of the above figure which are indicators of how much work is occurring, if the processing is keeping up with the work or causing backlogs. Context is needed for queues, as they exist in the server, I/O networking devices, and software drivers, as well as in storage among other locations.

Some basic server, storage, I/O metrics that matter include:

  • Queue depth of I/Os waiting to be processed and concurrency
  • CPU and memory usage to process I/Os
  • I/O size, or how much data can be moved in a given operation
  • I/O activity rate or IOPs = amount of data moved/I/O size per unit of time
  • Bandwidth = data moved per unit of time = I/O size × I/O rate
  • Latency usually increases with larger I/O sizes, decreases with smaller requests
  • I/O rates usually increase with smaller I/O sizes and vice versa
  • Bandwidth increases with larger I/O sizes and vice versa
  • Sequential stream access data may have better performance than some random access data
  • Not all data is conducive to being sequential stream, or random
  • Lower response time is better, higher activity rates and bandwidth are better

Queues with high latency and small I/O size or small I/O rates could indicate a performance bottleneck. Queues with low latency and high I/O rates with good bandwidth or data being moved could be a good thing. An important note is to look at several metrics, not just IOPs or activity, or bandwidth, queues, or response time. Also, keep in mind that metrics that matter for your environment may be different from those for somebody else.

Something to keep in perspective is that there can be a large amount of data with low performance, or a small amount of data with high-performance, not to mention many other variations. The important concept is that as space capacity scales, that does not mean performance also improves or vice versa, after all, everything is not the same.

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. However all applications have some element (high or low) of performance, availability, capacity, economic (PACE) along with various similarities. Likewise data has different value at various times. Continue reading the next post (Part II Application Data Availability Everything Is Not The Same) in this five-part mini-series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Availability 4 3 2 1 Data Protection

Application Data Availability 4 3 2 1 Data Protection

4 3 2 1 data protection Application Data Availability Everything Is Not The Same

Application Data Availability 4 3 2 1 Data Protection

This is part two of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we continue looking at application performance, availability, capacity, economic (PACE) attributes that have an impact on data value as well as availability.

4 3 2 1 data protection  Book SDDC

Availability (Accessibility, Durability, Consistency)

Just as there are many different aspects and focus areas for performance, there are also several facets to availability. Note that applications performance requires availability and availability relies on some level of performance.

Availability is a broad and encompassing area that includes data protection to protect, preserve, and serve (backup/restore, archive, BC, BR, DR, HA) data and applications. There are logical and physical aspects of availability including data protection as well as security including key management (manage your keys or authentication and certificates) and permissions, among other things.

Availability = accessibility (can you get to your application and data) + durability (is the data intact and consistent). This includes basic Reliability, Availability, Serviceability (RAS), as well as high availability, accessibility, and durability. “Durable” has multiple meanings, so context is important. Durable means how data infrastructure resources hold up to, survive, and tolerate wear and tear from use (i.e., endurance), for example, Flash SSD or mechanical devices such as Hard Disk Drives (HDDs). Another context for durable refers to data, meaning how many copies in various places.

Server, storage, and I/O network availability topics include:

  • Resiliency and self-healing to tolerate failure or disruption
  • Hardware, software, and services configured for resiliency
  • Accessibility to reach or be reached for handling work
  • Durability and consistency of data to be available for access
  • Protection of data, applications, and assets including security

Additional server I/O and data infrastructure along with storage topics include:

  • Backup/restore, replication, snapshots, sync, and copies
  • Basic Reliability, Availability, Serviceability, HA, fail over, BC, BR, and DR
  • Alternative paths, redundant components, and associated software
  • Applications that are fault-tolerant, resilient, and self-healing
  • Non disruptive upgrades, code (application or software) loads, and activation
  • Immediate data consistency and integrity vs. eventual consistency
  • Virus, malware, and other data corruption or loss prevention

From a data protection standpoint, the fundamental rule or guideline is 4 3 2 1, which means having at least four copies consisting of at least three versions (different points in time), at least two of which are on different systems or storage devices and at least one of those is off-site (on-line, off-line, cloud, or other). There are many variations of the 4 3 2 1 rule shown in the following figure along with approaches on how to manage technology to use. We will go into deeper this subject in later chapters. For now, remember the following.

large version application server storage I/O
4 3 2 1 data protection (via Software Defined Data Infrastructure Essentials)

4    At least four copies of data (or more), Enables durability in case a copy goes bad, deleted, corrupted, failed device, or site.
3    The number (or more) versions of the data to retain, Enables various recovery points in time to restore, resume, restart from.
2    Data located on two or more systems (devices or media/mediums), Enables protection against device, system, server, file system, or other fault/failure.

1    With at least one of those copies being off-premise and not live (isolated from active primary copy), Enables resiliency across sites, as well as space, time, distance gap for protection.

Capacity and Space (What Gets Consumed and Occupied)

In addition to being available and accessible in a timely manner (performance), data (and applications) occupy space. That space is memory in servers, as well as using available consumable processor CPU time along with I/O (performance) including over networks.

Data and applications also consume storage space where they are stored. In addition to basic data space, there is also space consumed for metadata as well as protection copies (and overhead), application settings, logs, and other items. Another aspect of capacity includes network IP ports and addresses, software licenses, server, storage, and network bandwidth or service time.

Server, storage, and I/O network capacity topics include:

  • Consumable time-expiring resources (processor time, I/O, network bandwidth)
  • Network IP and other addresses
  • Physical resources of servers, storage, and I/O networking devices
  • Software licenses based on consumption or number of users
  • Primary and protection copies of data and applications
  • Active and standby data infrastructure resources and sites
  • Data footprint reduction (DFR) tools and techniques for space optimization
  • Policies, quotas, thresholds, limits, and capacity QoS
  • Application and database optimization

DFR includes various techniques, technologies, and tools to reduce the impact or overhead of protecting, preserving, and serving more data for longer periods of time. There are many different approaches to implementing a DFR strategy, since there are various applications and data.

Common DFR techniques and technologies include archiving, backup modernization, copy data management (CDM), clean up, compress, and consolidate, data management, deletion and dedupe, storage tiering, RAID (including parity-based, erasure codes , local reconstruction codes [LRC] , and Reed-Solomon , Ceph Shingled Erasure Code (SHEC ), among others), along with protection configurations along with thin-provisioning, among others.

DFR can be implemented in various complementary locations from row-level compression in database or email to normalized databases, to file systems, operating systems, appliances, and storage systems using various techniques.

Also, keep in mind that not all data is the same; some is sparse, some is dense, some can be compressed or deduped while others cannot. Likewise, some data may not be compressible or dedupable. However, identical copies can be identified with links created to a common copy.

Economics (People, Budgets, Energy and other Constraints)

If one thing in life and technology that is constant is change, then the other constant is concern about economics or costs. There is a cost to enable and maintain a data infrastructure on premise or in the cloud, which exists to protect, preserve, and serve data and information applications.

However, there should also be a benefit to having the data infrastructure to house data and support applications that provide information to users of the services. A common economic focus is what something costs, either as up-front capital expenditure (CapEx) or as an operating expenditure (OpEx) expense, along with recurring fees.

In general, economic considerations include:

  • Budgets (CapEx and OpEx), both up front and in recurring fees
  • Whether you buy, lease, rent, subscribe, or use free and open sources
  • People time needed to integrate and support even free open-source software
  • Costs including hardware, software, services, power, cooling, facilities, tools
  • People time includes base salary, benefits, training and education

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. All applications have some element of performance, availability, capacity, economic (PACE) needs as well as resource demands. There is often a focus around data storage about storage efficiency and utilization which is where data footprint reduction (DFR) techniques, tools, trends and as well as technologies address capacity requirements. However with data storage there is also an expanding focus around storage effectiveness also known as productivity tied to performance, along with availability including 4 3 2 1 data protection. Continue reading the next post (Part III Application Data Characteristics Types Everything Is Not The Same) in this series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Characteristics Types Everything Is Not The Same

Application Data Characteristics Types Everything Is Not The Same

Application Data Characteristics Types Everything Is Not The Same

Application Data Characteristics Types Everything Is Not The Same

This is part three of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we continue looking at application and data characteristics with a focus on different types of data. There is more to data than simply being big data, fast data, big fast or unstructured, structured or semistructured, some of which has been touched on in this series, with more to follow. Note that there is also data in terms of the programs, applications, code, rules, policies as well as configuration settings, metadata along with other items stored.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Various Types of Data

Data types along with characteristics include big data, little data, fast data, and old as well as new data with a different value, life-cycle, volume and velocity. There are data in files and objects that are big representing images, figures, text, binary, structured or unstructured that are software defined by the applications that create, modify and use them.

There are many different types of data and applications to meet various business, organization, or functional needs. Keep in mind that applications are based on programs which consist of algorithms and data structures that define the data, how to use it, as well as how and when to store it. Those data structures define data that will get transformed into information by programs while also being stored in memory and on data stored in various formats.

Just as various applications have different algorithms, they also have different types of data. Even though everything is not the same in all environments, or even how the same applications get used across various organizations, there are some similarities. Even though there are different types of applications and data, there are also some similarities and general characteristics. Keep in mind that information is the result of programs (applications and their algorithms) that process data into something useful or of value.

Data typically has a basic life cycle of:

  • Creation and some activity, including being protected
  • Dormant, followed by either continued activity or going inactive
  • Disposition (delete or remove)

In general, data can be

  • Temporary, ephemeral or transient
  • Dynamic or changing (“hot data”)
  • Active static on-line, near-line, or off-line (“warm-data”)
  • In-active static on-line or off-line (“cold data”)

Data is organized

  • Structured
  • Semi-structured
  • Unstructured

General data characteristics include:

  • Value = From no value to unknown to some or high value
  • Volume = Amount of data, files, objects of a given size
  • Variety = Various types of data (small, big, fast, structured, unstructured)
  • Velocity = Data streams, flows, rates, load, process, access, active or static

The following figure shows how different data has various values over time. Data that has no value today or in the future can be deleted, while data with unknown value can be retained.

Different data with various values over time

Application Data Value across sddc
Data Value Known, Unknown and No Value

General characteristics include the value of the data which in turn determines its performance, availability, capacity, and economic considerations. Also, data can be ephemeral (temporary) or kept for longer periods of time on persistent, non-volatile storage (you do not lose the data when power is turned off). Examples of temporary scratch include work and scratch areas such as where data gets imported into, or exported out of, an application or database.

Data can also be little, big, or big and fast, terms which describe in part the size as well as volume along with the speed or velocity of being created, accessed, and processed. The importance of understanding characteristics of data and how their associated applications use them is to enable effective decision-making about performance, availability, capacity, and economics of data infrastructure resources.

Data Value

There is more to data storage than how much space capacity per cost.

All data has one of three basic values:

  • No value = ephemeral/temp/scratch = Why keep it?
  • Some value = current or emerging future value, which can be low or high = Keep
  • Unknown value = protect until value is unlocked, or no remaining value

In addition to the above basic three, data with some value can also be further subdivided into little value, some value, or high value. Of course, you can keep subdividing into as many more or different categories as needed, after all, everything is not always the same across environments.

Besides data having some value, that value can also change by increasing or decreasing in value over time or even going from unknown to a known value, known to unknown, or to no value. Data with no value can be discarded, if in doubt, make and keep a copy of that data somewhere safe until its value (or lack of value) is fully known and understood.

The importance of understanding the value of data is to enable effective decision-making on where and how to protect, preserve, and cost-effectively store the data. Note that cost-effective does not necessarily mean the cheapest or lowest-cost approach, rather it means the way that aligns with the value and importance of the data at a given point in time.

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software-defined data center (SDDC), software-defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Data has different value at various times, and that value is also evolving. Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. Continue reading the next post (Part IV Application Data Volume Velocity Variety Everything Not The Same) in this series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Volume Velocity Variety Everything Is Not The Same

Application Data Volume Velocity Variety Everything Not The Same

Application Data Volume Velocity Variety Everything Is Not The Same

Application Data Volume Velocity Variety Everything Not The Same

This is part four of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we continue looking at application and data characteristics with a focus on data volume velocity and variety, after all, everything is not the same, not to mention many different aspects of big data as well as little data.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Volume of Data

More data is growing at a faster rate every day, and that data is being retained for longer periods. Some data being retained has known value, while a growing amount of data has an unknown value. Data is generated or created from many sources, including mobile devices, social networks, web-connected systems or machines, and sensors including IoT and IoD. Besides where data is created from, there are also many consumers of data (applications) that range from legacy to mobile, cloud, IoT among others.

Unknown-value data may eventually have value in the future when somebody realizes that he can do something with it, or a technology tool or application becomes available to transform the data with unknown value into valuable information.

Some data gets retained in its native or raw form, while other data get processed by application program algorithms into summary data, or is curated and aggregated with other data to be transformed into new useful data. The figure below shows, from left to right and front to back, more data being created, and that data also getting larger over time. For example, on the left are two data items, objects, files, or blocks representing some information.

In the center of the following figure are more columns and rows of data, with each of those data items also becoming larger. Moving farther to the right, there are yet more data items stacked up higher, as well as across and farther back, with those items also being larger. The following figure can represent blocks of storage, files in a file system, rows, and columns in a database or key-value repository, or objects in a cloud or object storage system.

Application Data Value sddc
Increasing data velocity and volume, more data and data getting larger

In addition to more data being created, some of that data is relatively small in terms of the records or data structure entities being stored. However, there can be a large quantity of those smaller data items. In addition to the amount of data, as well as the size of the data, protection or overhead copies of data are also kept.

Another dimension is that data is also getting larger where the data structures describing a piece of data for an application have increased in size. For example, a still photograph was taken with a digital camera, cell phone, or another mobile handheld device, drone, or other IoT device, increases in size with each new generation of cameras as there are more megapixels.

Variety of Data

In addition to having value and volume, there are also different varieties of data, including ephemeral (temporary), persistent, primary, metadata, structured, semi-structured, unstructured, little, and big data. Keep in mind that programs, applications, tools, and utilities get stored as data, while they also use, create, access, and manage data.

There is also primary data and metadata, or data about data, as well as system data that is also sometimes referred to as metadata. Here is where context comes into play as part of tradecraft, as there can be metadata describing data being used by programs, as well as metadata about systems, applications, file systems, databases, and storage systems, among other things, including little and big data.

Context also matters regarding big data, as there are applications such as statistical analysis software and Hadoop, among others, for processing (analyzing) large amounts of data. The data being processed may not be big regarding the records or data entity items, but there may be a large volume. In addition to big data analytics, data, and applications, there is also data that is very big (as well as large volumes or collections of data sets).

For example, video and audio, among others, may also be referred to as big fast data, or large data. A challenge with larger data items is the complexity of moving over the distance promptly, as well as processing requiring new approaches, algorithms, data structures, and storage management techniques.

Likewise, the challenges with large volumes of smaller data are similar in that data needs to be moved, protected, preserved, and served cost-effectively for long periods of time. Both large and small data are stored (in memory or storage) in various types of data repositories.

In general, data in repositories is accessed locally, remotely, or via a cloud using:

  • Object and blobs stream, queue, and Application Programming Interface (API)
  • File-based using local or networked file systems
  • Block-based access of disk partitions, LUNs (logical unit numbers), or volumes

The following figure shows varieties of application data value including (left) photos or images, audio, videos, and various log, event, and telemetry data, as well as (right) sparse and dense data.

Application Data Value bits bytes blocks blobs bitstreams sddc
Varieties of data (bits, bytes, blocks, blobs, and bitstreams)

Velocity of Data

Data, in addition to having value (known, unknown, or none), volume (size and quantity), and variety (structured, unstructured, semi structured, primary, metadata, small, big), also has velocity. Velocity refers to how fast (or slowly) data is accessed, including being stored, retrieved, updated, scanned, or if it is active (updated, or fixed static) or dormant and inactive. In addition to data access and life cycle, velocity also refers to how data is used, such as random or sequential or some combination. Think of data velocity as how data, or streams of data, flow in various ways.

Velocity also describes how data is used and accessed, including:

  • Active (hot), static (warm and WORM), or dormant (cold)
  • Random or sequential, read or write-accessed
  • Real-time (online, synchronous) or time-delayed

Why this matters is that by understanding and knowing how applications use data, or how data is accessed via applications, you can make informed decisions. Also, having insight enables how to design, configure, and manage servers, storage, and I/O resources (hardware, software, services) to meet various needs. Understanding Application Data Value including the velocity of the data both for when it is created as well as when used is important for aligning the applicable performance techniques and technologies.

Where to learn more

Learn more about Application Data Value, application characteristics, performance, availability, capacity, economic (PACE) along with data protection, software-defined data center (SDDC), software-defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Data has different value, size, as well as velocity as part of its characteristic including how used by various applications. Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. Continue reading the next post (Part V Application Data Access life cycle Patterns Everything Is Not The Same) in this series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Access Lifecycle Patterns Everything Is Not The Same

Application Data Access Life cycle Patterns Everything Is Not The Same(Part V)

Application Data Access Life cycle Patterns Everything Is Not The Same

Application Data Access Life cycle Patterns Everything Is Not The Same

This is part five of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we look at various application and data lifecycle patterns as well as wrap up this series.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Active (Hot), Static (Warm and WORM), or Dormant (Cold) Data and Lifecycles

When it comes to Application Data Value, a common question I hear is why not keep all data?

If the data has value, and you have a large enough budget, why not? On the other hand, most organizations have a budget and other constraints that determine how much and what data to retain.

Another common question I get asked (or told) it isn’t the objective to keep less data to cut costs?

If the data has no value, then get rid of it. On the other hand, if data has value or unknown value, then find ways to remove the cost of keeping more data for longer periods of time so its value can be realized.

In general, the data life cycle (called by some cradle to grave, birth or creation to disposition) is created, save and store, perhaps update and read with changing access patterns over time, along with value. During that time, the data (which includes applications and their settings) will be protected with copies or some other technique, and eventually disposed of.

Between the time when data is created and when it is disposed of, there are many variations of what gets done and needs to be done. Considering static data for a moment, some applications and their data, or data and their applications, create data which is for a short period, then goes dormant, then is active again briefly before going cold (see the left side of the following figure). This is a classic application, data, and information life-cycle model (ILM), and tiering or data movement and migration that still applies for some scenarios.

Application Data Value
Changing data access patterns for different applications

However, a newer scenario over the past several years that continues to increase is shown on the right side of the above figure. In this scenario, data is initially active for updates, then goes cold or WORM (Write Once/Read Many); however, it warms back up as a static reference, on the web, as big data, and for other uses where it is used to create new data and information.

Data, in addition to its other attributes already mentioned, can be active (hot), residing in a memory cache, buffers inside a server, or on a fast storage appliance or caching appliance. Hot data means that it is actively being used for reads or writes (this is what the term Heat map pertains to in the context of the server, storage data, and applications. The heat map shows where the hot or active data is along with its other characteristics.

Context is important here, as there are also IT facilities heat maps, which refer to physical facilities including what servers are consuming power and generating heat. Note that some current and emerging data center infrastructure management (DCIM) tools can correlate the physical facilities power, cooling, and heat to actual work being done from an applications perspective. This correlated or converged management view enables more granular analysis and effective decision-making on how to best utilize data infrastructure resources.

In addition to being hot or active, data can be warm (not as heavily accessed) or cold (rarely if ever accessed), as well as online, near-line, or off-line. As their names imply, warm data may occasionally be used, either updated and written, or static and just being read. Some data also gets protected as WORM data using hardware or software technologies. WORM (immutable) data, not to be confused with warm data, is fixed or immutable (cannot be changed).

When looking at data (or storage), it is important to see when the data was created as well as when it was modified. However, you should avoid the mistake of looking only at when it was created or modified: Instead, also look to see when it was the last read, as well as how often it is read. You might find that some data has not been updated for several years, but it is still accessed several times an hour or minute. Also, keep in mind that the metadata about the actual data may be being updated, even while the data itself is static.

Also, look at your applications characteristics as well as how data gets used, to see if it is conducive to caching or automated tiering based on activity, events, or time. For example, there is a large amount of data for an energy or oil exploration project that normally sits on slower lower-cost storage, but that now and then some analysis needs to run on.

Using data and storage management tools, given notice or based on activity, which large or big data could be promoted to faster storage, or applications migrated to be closer to the data to speed up processing. Another example is weekly, monthly, quarterly, or year-end processing of financial, accounting, payroll, inventory, or enterprise resource planning (ERP) schedules. Knowing how and when the applications use the data, which is also understanding the data, automated tools, and policies, can be used to tier or cache data to speed up processing and thereby boost productivity.

All applications have performance, availability, capacity, economic (PACE) attributes, however:

  • PACE attributes vary by Application Data Value and usage
  • Some applications and their data are more active than others
  • PACE characteristics may vary within different parts of an application
  • PACE application and data characteristics along with value change over time

Read more about Application Data Value, PACE and application characteristics in Software Defined Data Infrastructure Essentials (CRC Press 2017).

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that Application Data Value everything is not the same across various organizations, data centers, data infrastructures, data and the applications that use them.

Also keep in mind that there is more data being created, the size of those data items, files, objects, entities, records are also increasing, as well as the speed at which they get created and accessed. The challenge is not just that there is more data, or data is bigger, or accessed faster, it’s all of those along with changing value as well as diverse applications to keep in perspective. With new Global Data Protection Regulations (GDPR) going into effect May 25, 2018, now is a good time to assess and gain insight into what data you have, its value, retention as well as disposition policies.

Remember, there are different data types, value, life-cycle, volume and velocity that change over time, and with Application Data Value Everything Is Not The Same, so why treat and manage everything the same?

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Benefits of Moving Hyper-V Disaster Recovery to the Cloud Webinar

Benefits of Moving Hyper-V Disaster Recovery to the Cloud Webinar

Hyper-V Disaster Recovery sddc server storage I/O data infrastructure trends

Benefits of Moving Hyper-V Disaster Recovery to the Cloud and Achieve global cloud data availability from an Always-On approach with Veeam Cloud Connect webinar.

Feb. 28, 2018 at 11am PT / 2pm ET

Windows Server and Hyper-V software defined data center (SDDC) based applications need always on availability and access to data which means enabling cloud based data protection (including backup/recovery) for seamless disaster recovery (DR), business continuance (BC), business resiliency (BR) and high availability (HA). Key to an always on, available and accessible environment is having robust  RTO and RPO aligned to your application workload needs. In other words, time for data protection to work for you and your applications instead of you working for it (e.g. the data protection tools and technologies).

This free data protection webinar (registration required) sponsored by KeepItSafe produced by Virtualization & Cloud Review will be an interactive webinar discussion (not death by power point or Ui Gui product demo ;)) pertaining to enabling always on application (as well as data) availability for Windows Server and Hyper-V environments. Keep in mind with world backup day coming up on March 31 now is a good time to make sure your applications and data are protected as well as recoverable when something bad happens leveraging Hyper-V Disaster Recovery.

Hyper-V Disaster Recovery SDDC Data Infrastructure Data Protection

Join me along with representatives from Veeam and KeepItSafe for an informal conversation including strategies along with how to enable an always on, always available applications data infrastructure for Hyper-V based solutions.

Our conversation will include discussion around:

  • Data protection strategies for Microsoft Windows Server Hyper-V applications
  • Enabling rapid recovery time objectives (RTO) and good recovery point objectives (RPO)
  • Evolving from VM disaster recovery to cloud-based DRaaS
  • Implement 4 3 2 1 data protection availability for Hyper-V with Veeam and KeepItSafe DRaaS

Register for the live event or catch the replay here.

Where to learn more

Learn more about data protection, software defined data center (SDDC), software defined data infrastructures (SDDI), Hyper-V, cloud and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

You can not go forward if you can not go back to a particular point in time (e.g. recovery point objective or RPO). Likewise, if you can not go back to a given RPO, how can you go forward with your business as well as meet your recovery time objective (RTO)? Join us for the live conversation or replay by registering (free) here to learn how to enable robust Hyper-V Disaster Recovery and business resiliency.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

How to Achieve Flexible Data Protection Availability with All Flash Storage Solutions

Achieve Flexible Data Protection Availability with All Flash Solutions

server storage I/O data infrastructure trends

Updated 1/21/2018

How to Achieve Flexible flash data protection and Availability with All-Flash Storage Solutions

Interactive webinar discussion (not death by power point or Ui Gui product demo ;) pertaining flash data protection )
Tuesday January 30 2018 11AM PT / 2PM ET
Via Redmond Magazine (Free with registration)

Everything is not the same across different organizations, environments, application workloads and the data infrastructures that support them. Fast application and workloads need fast protection, restoration, and resumption as well as fast flash storage. This applies across legacy, software-defined, virtual, container, cloud, hybrid, converged and HCI among other environments.

SDDC Data Infrastructure Data Protection

Join me along with representatives from Pure Storage along with Veeam for this interactive discussion as we explore how to boost the performance, availability, capacity, and economics (PACE) of your applications along with the data infrastructures that support them.

  • How all-flash storage enables faster protection and restoration of fast applications
  • Why data protection and availability should not be an afterthought
  • Ways to leverage your data protection storage to drive business change
  • How to simplify and reduce complexity to boost productivity while lowering costs
  • Why workload aggregation consolidation should not cause aggravation

Register for the live event or catch the replay here.

Where to learn more

Learn more about data protection, SSD, flash, data infrastructure and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Fast applications need fast and resilient data infrastructures that include server, storage, I/O networking along with data protection. Likewise performance depends on availability along with durability, likewise, availability and accessibility depend on performance, they go hand in hand. Join me and others from Pure Storage as well as Veeam for this conversational discussion about How to Achieve Flexible Data Protection and Availability with All-Flash Storage Solutions.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips

Data Protection Fundamental Topics Tools Techniques Technologies Tips

Update 1/16/2018

Data protection fundamental companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulzwww.storageioblog.com November 26, 2017

This is Part I of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Software Defined Data Protection Fundamental Infrastructure Essentials Book SDDC

The focus of this series is around data protection fundamental topics including Data Infrastructure Services: Availability, RAS, RAID and Erasure Codes (including LRC) ( Chapter 9), Data Infrastructure Services: Availability, Recovery Point ( Chapter 10). Additional Data Protection related chapters include Storage Mediums and Component Devices ( Chapter 7), Management, Access, Tenancy, and Performance ( Chapter 8), as well as Capacity, Data Footprint Reduction ( Chapter 11), Storage Systems and Solutions Products and Cloud ( Chapter 12), Data Infrastructure and Software-Defined Management ( Chapter 13) among others.

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

Posts in this data protection fundamental series include:

SDDC, SDI, SDDI data infrastructure
Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Infrastructures

Data Infrastructures exists to support business, cloud and information technology (IT) among other applications that transform data into information or services. The fundamental role of data infrastructures is to provide a platform environment for applications and data that is resilient, flexible, scalable, agile, efficient as well as cost-effective.

Put another way, data infrastructures exist to protect, preserve, process, move, secure and serve data as well as their applications for information services delivery. Technologies that make up data infrastructures include hardware, software, or managed services, servers, storage, I/O and networking along with people, processes, policies along with various tools spanning legacy, software-defined virtual, containers and cloud. Read more about data infrastructures (its what’s inside data centers) here.

Why SDDC SDDI Need Data Protection
Various Needs Demand Drivers For Data Protection Fundamentals

Why The Need For Data Protection

Data Protection encompasses many different things, from accessibility, durability, resiliency, reliability, and serviceability ( RAS) to security and data protection along with consistency. Availability includes basic, high availability ( HA), business continuance ( BC), business resiliency ( BR), disaster recovery ( DR), archiving, backup, logical and physical security, fault tolerance, isolation and containment spanning systems, applications, data, metadata, settings, and configurations.

From a data infrastructure perspective, availability of data services spans from local to remote, physical to logical and software-defined, virtual, container, and cloud, as well as mobile devices. Figure 9.2 shows various data infrastructure availability, accessibility, protection, and security points of interest. On the left side of Figure 9.2 are various data protection and security threat risks and scenarios that can impact availability, or result in a data loss event ( DLE), data loss access ( DLA), or disaster. The right side of Figure 9.2 shows various techniques, tools, technologies, and best practices to protect data infrastructures, applications, and data from threat risks.

SDDI SDDC Data Protection Fundamental Big Picture
Figure 9.2 Various threat vectors, issues, problems, and challenges that drive the need for data protection

A fundamental role of data infrastructures (and data centers) is to protect, preserve, secure and serve information when needed with consistency. This also means that the data infrastructure resources (servers, storage, I/O networks, hardware, software, external services) and the applications (and data) they combine and are defined to protect are also accessible, durable and secure.

Data Protection topics include:

  • Maintaining availability, accessibility to information services, applications and data
  • Data include software, actual data, metadata, settings, certificates and telemetry
  • Ensuring data is durable, consistent, secure and recoverable to past points in time
  • Everything is not the same across different environments, applications and data
  • Aligning techniques and technologies to meet various service level objectives ( SLO)

Data Protection Fundamental Tradecraft Skills Experience Knowledge

Tools, technologies, trends are part of Data Protection, so to are the techniques of knowing (e.g. tradecraft) what to use when, where, why and how to protect against various threats risks (challenges, issues, problems).

Part of what is covered in this series of posts as well as in the Software Defined Data Infrastructure (SDDI) Essentials book is tradecraft skills, tips, experiences, insight into what to use, as well as how to use old and new things in new ways.

This means looking outside the technology box towards what is that you need to protect and why, then knowing how to use different skills, experiences, techniques part of your tradecraft combined with data protection toolbox tools. Read more about tradecraft here.

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Everything is not the same across environments, data centers, data infrastructures and applications.

Likewise everything is and does not have to be the same when it comes to Data Protection. Data protection fundamentals encompasses many different hardware, software, services including cloud technologies, tools, techniques, best practices, policies and tradecraft experience skills (e.g. knowing what to use when, where, why and how).

Since everything is not the same, various data protection approaches are needed to address various application performance availability capacity economic ( PACE) needs, as well as SLO and SLAs.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 2 Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Data Protection Diaries Reliability, Availability, Serviceability RAS Fundamentals

Reliability, Availability, Serviceability RAS Fundamentals

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulzwww.storageioblog.com November 26, 2017

This is Part 2 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Software Defined Data Infrastructure Essentials Book SDDC

Click here to view the previous post Part 1 Data Infrastructure Data Protection Fundamentals, and click here to view the next post Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC.

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

In this post the focus is around Data Protection availability from Chapter 9 which includes access, durability, RAS, RAID and Erasure Codes (including LRC), mirroring and replication along with related topics.

SDDC, SDI, SDDI data infrastructure
Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals

Reliability, Availability Serviceability (RAS) and other access availability along with Data Protection topics are covered in chapter 9. A resilient data infrastructure (software-defined, SDDC and legacy) protects, preserves, secures and serves information involving various layers of technology. These technologies enable various layers ( altitudes) of functionality, from devices up to and through the various applications themselves.

SDDI SDDC Data Protection Big Picture
Figure 9.2 Various threat issues and challenges that drive the need for data protection

Some applications need a faster rebuild, while others need sustained performance (bandwidth, latency, IOPs, or transactions) with the slower rebuild; some need lower cost at the expense of performance; others are ok with more space if other objectives are meet. The result is that since everything is different yet there are similarities, there is also the need to tune how data Infrastructure protects, preserves, secures, and serves applications and data.

General reliability, availability, serviceability, and data protection functionality includes:

  • Manually or automatically via policies, start, stop, pause, resume protection
  • Adjust priorities of protection tasks, including speed, for faster or slower protection
  • Fast-reacting to changes, disruptions or failures, or slower cautious approaches
  • Workload and application load balancing (performance, availability, and capacity)

RAS can be optimized for:

  • Reduced redundancy for lower overall costs vs. resiliency
  • Basic or standard availability (leverage component plus)
  • High availability (use better components, multiple systems, multiple sites)
  • Fault-tolerant with no single points of failure (SPOF)
  • Faster restart, restore, rebuild, or repair with higher overhead costs
  • Lower overhead costs (space and performance) with lower resiliency
  • Lower impact to applications during rebuild vs. faster repair
  • Maintenance and planned outages or for continues operations

Common availability Data Protection related terms, technologies, techniques, trends and topics pertaining to data protection from availability and access to durability and consistency to point in time protection and security are shown below.

Data Protection Gaps and Air Gap

There are Good Data Protection Gaps that provide recovery points to a past time enabling recoverability in the future to move forward. Another good data protection gap is an Air Gap that isolates protection copies off-site or off-line so that they can not be tampered with enabling recovery from ransomware and other software defined threats. There are Bad data protection gaps including gaps in coverage where data is not protected or items are missing. Then there are Ugly data protecting gaps which include Bad gaps that result in what you think is protected are not and finding that your copies are bad when it is too late.

Data Protection Gaps Good Bad Ugly
Data Protection Gaps Good Bad and Ugly

The following figure shows good data protection gaps including recovery points (point in time protection) along with air gaps.

Good Data Protection Gaps
Figure 9.9 Air Gaps and Data Protection

Fault / Failures To Tolerate (FTT)

FTT is how many faults or failures to tolerate for a given solution or service which in turn determines what mode of protection, or fault tolerant mode ( FTM) to use.

Fault Tolerant Mode (FTM)

FTM is the mode or technique used to enable resiliency and protect against some number of faults.

Fault / Failure Domains

Fault or Failure domains are places and things that can fail from regions, data centers or availability zones, clusters, stamps, pods, servers, networks, storage, hardware (systems, components including SSD and HDDs, power supplies, adapters). Other fault domain topics and focus areas include facility power, cooling, software including applications, databases, operating systems and hypervisors among others.

SDDI SDDC Fault Domains Zones Regions
Figure 9.5 Various Fault and Failure Domains, Regions, Locations

Clustering

Clustering is a technique and technology for enabling resiliency, as well as scaling performance, availability, and capacity. Clusters can be local, remote, or wide-area to support different data infrastructure objectives, combined with replication and other techniques.

SDDI SDDC Clustering
Figure 9.12 Clustering and Replication Examples

Another characteristic of clustering and resiliency techniques is the ability to detect and react quickly to failures to isolate and contain faults, as well as invoking automatic repair if needed. Different clustering technologies enable various approaches, from proprietary hardware and software tightly coupled to loosely coupled general-purpose hardware or software.

Clustering characteristics include:

  • Application, database, file system, operating system (Windows Storage Replica)
  • Storage systems, appliances, adapters and network devices
  • Hypervisors ( Hyper-V, VMware vSphere ESXi and vSAN among others)
  • Share everything, share some things, share nothing
  • Tightly or loosely coupled with common or individual system metadata
  • Local in a data center, campus, metro, or stretch cluster
  • Wide-area in different regions and availability zones
  • Active/active for fast fail over or restart, active/passive (standby) mode

Additional clustering considerations include:

  • How does performance scale as nodes are added, or what overhead exists?
  • How is cluster resource locking in shared environments handled?
  • How many (or few) nodes are needed for quorum to exist?
  • Network and I/O interface (and management) requirements
  • Cluster partition or split-brain (i.e., cluster splits into two)?
  • Fast-reacting fail over and resiliency vs. overhead of failing back
  • Locality of where applications are located vs. storage access and clustering

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Everything is not the same across different environments, data centers, data infrastructures and applications. There are various performance, availability, capacity economic (PACE) considerations along with service level objectives (SLO). Availability means being able to access information resources (applications, data and underlying data infrastructure resources), as well as data being consistent along with durable. Being durable means enabling data to be accessible in the event of a device, component or other fault domain item failures (hardware, software, data center).

Just as everything is not the same across different environments, there are various techniques, technologies and tools that can be used in different ways to enable availability and accessibility. These include high availability (HA), RAS, mirroring, replication, parity along with derivative erasure code (EC), LRC, RS and other RAID implementations, along with clustering. Also keep in mind that pertaining to data protection, there are good gaps (e.g. time intervals for recovery points, air gaps), bad gaps (missed coverage or lack of protection), and ugly gaps (not being able to recover from a gap in time).

Note that mirroring, replication, EC, LRC, RS or other Parity and RAID approaches are not replacements for backup, rather they are companions to time interval based recovery point protection such as snapshots, backup, checkpoints, consistency points and versioning among others (discussed in follow-up posts in this series).

Which data protection tool, technology to trend is the best depends on what you are trying to accomplish and your application workload PACE requirements along with SLOs. Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 3 Data Protection Access Availability RAID Erasure Codes (EC) including LRC.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Data Protection Diaries Access Availability RAID Erasure Codes LRC Deep Dive

Access Availability RAID Erasure Codes including LRC Deep Dive

Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)

server storage I/O data infrastructure trends

By Greg Schulzwww.storageioblog.com November 26, 2017

This is Part 3 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).

Software Defined Data Infrastructure Essentials Book SDDC

Click here to view the previous post Part 2 Reliability, Availability, Serviceability (RAS) Data Protection Fundamentals, and click here to view the next post Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.

In this post part of the Data Protection diaries series as well as companion to Chapter 9 of SDDI Essentials book, we are going on a longer, deeper dive. We are going to look at availability, access and durability including mirror, replication, RAID including various traditional and newer parity approaches such as Erasure Codes ( EC), Local Reconstruction Code (LRC), Reed Solomon (RS) also known as RAID 2 among others. Later posts in this series look at point in time data protection to support recovery to a given time (e.g. RPO), while this and the previous post look at maintaining access and availability.

Keep in mind that if something can fail, it probably will, also that everything is not the same meaning different environments, application workloads (along with their data). Different environments and applications have diverse performance, availability, capacity economic (PACE) attributes, along with service level objectives ( SLOs). Various SLOs include PACE attributes, recovery point objectives ( RPO), recovery time objective ( RTO) among others.

Availability, accessibility and durability (see part two in this series) along with associated RAS topics are part of what enable RTO, as well as meet Faults (or failures) to tolerate ( FTT). This means that different fault tolerance modes ( FTM) determine what technologies, tools, trends and techniques to use to meet different RTO, FTT and application PACE needs.

Maintaining access and availability along with durability (e.g. how many copies of data as well as where stored) protects against loss or failure of a component device ( SSD, HDDs, adapters, power supply, controller), node or system, appliance, server, rack, clusters, stamps, data center, availability zones, regions, or other Fault or Failure domains spanning hardware, software, and services.

SDDC, SDI, SDDI data infrastructure
Figure 1.5 Data Infrastructures and other IT Infrastructure Layers

Data Protection Access Availability RAID Erasure Codes

This is a good place to mention some context for RAID and RAID array, which can mean different things pertaining to Data Protection. Some people associate RAID with a hardware storage array, or with a RAID card. Other people consider an array to be a storage array that is a RAID enabled storage system. A trend is to refer to legacy storage systems as RAID arrays or hardware-based RAID, to differentiate from newer implementations.

Context comes into play in that a RAID group (i.e., a collection of HDDs or SSD that is part of a RAID set) can be referred to as an array, a RAID array, or a virtual array. What this means is that while some RAID implementations may not be relevant, there are many new and evolving variations extending parity based protection making at least software-defined RAID still relevant

Keep context in mind, and don’t be afraid to ask what someone is referring to: a particular vendor storage system, a RAID implementation or packaging, a storage array, or a virtual array. Also keep the context of the virtual array in perspective vs. storage virtualization and virtual storage. RAID as a term is used to refer to different modes such as mirroring or parity, and parity can be legacy RAID 4, 5, or 6 along with erasure codes (EC). Note some people refer to erasure codes in the context of not being a RAID system, which can be an inference to not being a legacy storage system running hardware RAID (e.g. not software or software defined).

The following figure (9.13) shows various availability protection schemes (e.g. not recovery point) that maintain access while protecting against loss of a component, device, system, server, site, region or other part of a fault domain. Since everything is not the same with environments and applications having different Performance Availability Capacity Economic ( PACE) attributes, there are various approaches for enabling availability along with accessibility.

Keep in mind that RAID and Erasure codes along with their various, as well as replication and mirroring by themselves are not a replacement for backup or other point in time (e.g. enable recovery point) protection.

Instead, availability technologies such as RAID and erasure code along with mirror as well as replication need to be combined with snapshots, point in time copies, consistency points, checkpoints, backups among other recovery point protection for complete data protection.

Speaking of replacement for backup, while many vendors and their pundits claIm or want to see backup as being dead, as long as they keep talking about backup instead of broader data protection backup will remain alive.

SDDC SDDI RAID Parity Erasure Code EC
Figure 9.13 Various RAID, Mirror, Parity and Erasure Code (EC) approaches

Different RAID levels (including parity, EC, LRC and RS based) will affect storage energy effectiveness, similar to various SSD or HDD performance capacity characteristics; however, a balance of performance, availability, capacity, and energy needs to occur to meet application service needs. For example, RAID 1 mirroring or RAID 10 mirroring and striping use more HDDs and, thus, power, but will yield better performance than RAID 6 and erasure code parity protection.

 

Normal performance

 

Availability

Performance overhead

Rebuild overhead

Availability overhead

RAID 0 (stripe)

Very good read & write

None

None

Full volume restore

None

RAID 1 (mirror or replicate)

Good reads; writes = device speed

Very good; two or more copies

Multiple copies can benefit reads

Re-synchronize with existing volume

2:1 for dual, 3:1 for three-way copies

RAID 4 (stripe with dedicated parity, i.e., 4 + 1 = 5 drives total)

Poor writes without cache

Good for smaller drive groups and devices

High on write without cache (i.e., parity)

Moderate to high, based on number and type of drives

Varies; 1 Parity/N, where N = number of devices

RAID 5
(stripe with rotating parity, 4 + 1 = 5 drives)

Poor writes without cache

Good for smaller drive groups and devices

High on write without cache (i.e., parity)

Moderate to high, based on number and type of drives

Varies
1 Parity/N, where N = number of devices

RAID 6
(stripe with dual parity, 4 + 2 = 6 drives)

Poor writes without cache

Better for larger drive groups and devices

High on write without cache (i.e., parity)

Moderate to high, based on number and type of drives

Varies; 2 Parity/N, where N = number of devices

RAID 10
(mirror and stripe)

Good

Good

Minimum

Re-synchronize with existing volume

Twice mirror capacity stripe drives

Reed-Solomon (RS) parity, also known as erasure code (EC), local reconstruction code (LRC), and SHEC

Ok for reads, slow writes; good for static and cold data with front-end cache

Good

High on writes (CPU for parity calculation, extra I/O operations)

Moderate to high, based on number and type of drives, how implemented, extra I/Os for reconstruction

Varies, low overhead when using large number of devices; CPU, I/O, and network overhead.

Table 9.3 Common RAID Characteristics

Besides those shown in table 9.3, other RAID including parity based approaches include 2 (Reed Solomon), 3 (synchronized stripe and dedicated parity) along with others including combinations such as 10, 01, 50, 60 among others.

Similar to legacy parity-based RAID, some erasure code implementations use narrow drive groups while others use larger ones to increase protection and reduce capacity overhead. For example, some larger enterprise-class storage systems (RAID arrays) use narrow 3 + 1 or 4 + 1 RAID 5 or 4 + 2 or 6 + 2 RAID 6, which have higher protection storage capacity overhead and fault=impact footprint.

On the other hand, many smaller mid-range and scale-out storage systems, appliances, and solutions support wide stripes such as 7 + 1, 15 + 1, or larger RAID 5, or 14 + 2 or larger RAID 6. These solutions trade the lower storage capacity protection overhead for risk of a multiple drive failures or impacts. Similarly, some EC implementations use relatively small groups such as 6, 2 (8 drives) or 4, 2 (6 drives), while others use 14, 4 (18 drives), 16, 4 (20 drives), or larger.

Table 9.4 shows options for a number of data devices (k) vs. a number of protect devices (m).

k
(data devices)

m
(protect devices)

Availability;
Resiliency

Space capacity overhead

Normal performance

FTT

Comments;
Examples

Narrow

Wide

Very good;
Low impact of rebuild

Very high

Good (R/W)

Very good

Trade space for RAS;
Larger m vs. k;
1, 1; 1, 2; 2, 2; 4, 5

Narrow

Narrow

Good

Good

Good (R/W)

Good

Use with smaller drive groups;
2, 1; 3, 1; 6, 2

Wide

Narrow

Ok to good;
With larger m value

Low as m gets larger

Good (read);
Writes can be slow

Ok to good

Smaller m can impact rebuild;
3, 1; 7, 1; 14, 2; 13, 3

Wide

Wide

Very good;
Balanced

High

Good

Very good

Trade space for RAS;
2, 2; 4, 4; 8, 4; 18, 6

Table 9.4. Comparing Various Data Device vs. Protect Device Configurations

Note that wide k with no m, such as 4, 0, would not have protection. If you are focused on reducing costs and storage space capacity overhead, then a wider (i.e., more devices) with fewer protect devices might make sense. On the other hand, if performance, availability, and minimal to no impact during rebuild or reconstruction are important, then a narrower drive set, or a smaller ratio of data to protect drives, might make sense.

Also note that the higher or larger the RAID number, or parity scheme, or number of "m" devices in a parity and erasure code group may not be better, likewise smaller may not be better. What is better is which approach meets your specific application performance, availability, capacity, economic (PACE) needs, along with SLO, RTO, RPO requirements. What can also be good is to use hybrid approaches combining different technologies and tools to facilitate both access, availability, durability along with point in time recovery across different layers of granularity (e.g. device, drive, adapter, controller, cabinet, file system, data center, etc).

Some focus on the lower level RAID as the single or primary point of protection, however watch out for that being your single point of failure as well. For example, instead of building a resilient RAID 10 and then neglecting to have adequate higher level access, as well as recovery point protection, combine different techniques including file system protection, snapshots, and backups among others.

Figure 9.14 shows various options and considerations for balancing between too many or too few data (k) and protect (m) devices. The balance is about enabling particular FTT along with PACE attributes and SLO. This means, for some environments or applications, using different failure-tolerant modes ( FTM) in various combinations as well as configurations.

SDDC SDDI Data Protection
Figure 9.14 Comparing various data drive to protection devices

Figure 9.14 top shows no protection overhead (with no protection); the bottom shows 13 data drives and three protection drives in an EC (RS or LRC among others) configuration that could tolerate three devices failing before loss of data or access occurs. In between are various options that can also be scaled up or down across a different number of devices ( HDDs, SSD, or systems).

Some solutions allow the user or administrator to configure the I/O chunk, slabs, shard, or stripe size, for example, from 8 KB to 256 KB to 1 MB (or larger), aligning with application workload and I/O profiles. Other options include the ability to set or disable read-ahead, write-through vs. write-back cache (with battery-protected cache), among other options.

The width or number of devices in a RAID parity or erasure group is based on a combination of factor, including how much data is to be stored and what your FTT objective is, along with spreading out protection overhead. Another consideration is whether you have large or small files and objects.

For example, if you have many small files and a wide stripe, parity, or erasure code set with a large chunk or shard size, you may not have an optimal configuration from a performance perspective.

The following figure shows combing various data protection availability and accessibility technologies including local as well as remote mirroring and replication, along with parity or erasure code (including LRC, RS, SHEC among others) approaches. Instead of just using one technology, a hybrid approach is used leveraging mirror (local on SSD) and replication across sites including asynchronous and synchronous. Replication modes include Asynchronous (time-delayed, eventual consistency) for longer distance, higher latency networks, and synchronous (strong consistency, real-time) for short distance or low-latency networks.

Note that the mirror and replication can be done in software deployed as part of a storage system, appliance or as tin-wrapped software, virtual machine, virtual storage appliance, container or some other deployment mode. Likewise RAID, parity and erasure code software can be deployed and packaged in different ways.

In addition to mirror and replication, solutions are also using parity based including erasure code variations for lower cost, less active data. In other words, the mirror on SSD handles active hot data, as well as any buffering or cache, while lower performance, higher capacity, lower cost data gets de-staged or migrated to a parity erasure code tier. Some vendors, service provider and solutions leveraging variations of the approach in figure 9.15 include Microsoft ( Azure and Windows) and VMware among others.

SDDC SDDI Data Protection
Figure 9.15 Combining various availability data protection techniques

A tradecraft skill is finding the balance, knowing your applications, the data, and how the data is allocated as well as used, then leveraging that insight and your experience to configure to meet your application PACE requirements.

Consider:

  • Number of drives (width) in a group, along with protection copies or parity
  • Balance rebuild performance impact and time vs. storage space overhead savings
  • Ability to mix and match various devices in different drive groups in a system
  • Management interface, tools, wizards, GUIs, CLIs, APIs, and plug-ins
  • Different approaches for various applications and environments
  • Context of a physical RAID array, system, appliance, or solution vs. logical

Erasure Codes (EC)

Erasure Codes ( EC) combines advanced protection with variable space capacity overhead over many drives, devices, or systems using large parity chunks, shards compared to traditional parity RAID approaches. There are many variations of EC as well as parity based approaches, some are tied to Reed Solomon (RS) codes while others use different approaches.

Note that some EC are optimized for reducing the overhead and cost of storing data (e.g. less space capacity) for inactive, or primarily read data. Likewise, some EC or variations are optimized for performance of reads/writes as well as reducing overhead of rebuild, reconstructions, repairs with least impact. Which EC or parity derivative approach is best depends on what you are trying to do or impact to avoid.

Reed Solomon (RS) codes

Reed Solomon (RS) codes are advanced parity protection mathematical algorithm technique that works well on large amounts of data providing protection with lower space capacity overhead depending on how configured. Many Erasure Codes (EC) are based on derivatives of RS. Btw, did you know (or remember) that RAID 2 (rarely used with few legacy implementations) has ties to RS codes? Here are some additional links to RS including via Backblaze, CMU, and Dr Dobbs.

Local Reconstruction Codes (LRC)

Microsoft leverages LRC in Azure as well as in Windows Servers. LRC are optimized for a balance of protection, space capacity savings, normal performance as well as reducing impact on running workloads during a repair, rebuild or reconstruction. One of the tradeoffs that LRC uses is to add some amount of additional space capacity in exchange for normal and abnormal (e.g. during repair) performance improvements. Where RS, EC and other parity based derivatives typically use a (k,m) nomenclature (e.g. data, protection), LRC adds an extra variable to help with constructions (k,m,n).

Some might argue that LRC are not as space efficient as other EC, RS or parity derivative variations of which the counter argument can be that some of those approaches are not as performance effective. In other words, everything is not the same, one approach does not or should not have to be applied to all, unless of course your preferred solution approach can only do one thing.

Additional LRC related material includes:

  • (PDF by Microsoft) LRC Erasure Coding in Windows Storage Spaces
  • (Microsoft Usenix Paper) Best Paper Award Erasure Coding in Azure
  • (Via MSDN Shared) Azure Storage Erasure Coding with LRC
  • (Via Microsoft) Azure Storage with Strong Consistency
  • (Paper via Microsoft) 23rd ACM Symposium on Operating Systems Principles (SOSP)
  • (Microsoft) Erasure Coding in Azure with LRC
  • (Via Microsoft) Good collection of EC, RS, LRC and related material
  • (Via Microsoft) Storage Spaces Fault Tolerance
  • (Via Microsoft) Better Way To Store Data with EC/LRC
  • (Via Microsoft) Volume resiliency and efficiency in Storage Spaces

Shingled Erasure Code (SHEC)

Shingled Erasure Codes (SHEC) are a variation of Erasure Codes leveraging shingled overlay approach similar to what is being used in Shingled Magnetic Recording (SMR) on some HDDs. Ceph has been an early promoter of SHEC, read more here, and here.

Replication and Mirroring

Replication and Mirroring create a mirror or replica copy of data across different devices, systems, servers, clusters, sites or regions. In addition to keeping a copy, mirror and replication can occur on different time intervals such as real-time ( synchronous) and time deferred (Asynchronous). Besides time intervals, mirror and replication are implemented in different locations at various altitudes or stack layers from lower level hardware adapter or storage systems and appliances, to operating systems, hypervisors, software defined storage, volume managers, databases and applications themselves.

Covered in more detail in chapters 5 and 6, synchronous provides real-time, strong consistency, although high-latency local or remote interfaces can impact primary application performance. Note there is a common myth that high-latency networks are only long distance when in fact some local networks can also be high-latency. Asynchronous (also discussed in more depth in chapters 5 and 6) enables local and remote high-latency communications to be spanned, facilitating protection over a distance without impacting primary application performance, albeit with lower consistency, time deferred, also known as eventual consistency.

Mirroring (also known as RAID 1) and replication creates a copy (a mirror or replica) across two or more storage targets (devices, systems, file systems, cloud storage service, applications such as a database). The reason for using mirrors is to provide a faster (for normal running and during recovery) failure-tolerant mode for enabling availability, resiliency, and data protection, particularly for active data.

Figure 9.10 shows general replication scenarios. Illustrated are two basic mirror scenarios: At the top, a device, volume, file system, or object bucket is replicated to two other targets (i.e., three-way or three replicas); At the bottom, is a primary storage device using a hybrid replica and dispersal technique where multiple data chunks, shards, fragments, or extents are spread across devices in different locations.

SDDC SDDI Mirror and Replication
Figure 9.10 Various Mirror and Replication Approaches

Mirroring and replication can be done locally inside a system (server, storage system, or appliance), within a cabinet, rack, or data center, or remotely, including at cloud services. Mirroring can also be implemented inside a server in software or using RAID and HBA cards to off-load the processing.

SDDC SDDI Mirror Replication Techniques
Figure 9.11 Mirror or Replication combined with Snapshots or other PiT protection

Keep in mind that mirroring and replication by themselves are not a replacement for backups, versions, snapshots, or another recovery point, time-interval (time-gap) protection. The reason is that replication and mirroring maintain a copy of the source at one or more destination targets. What this means is that anything that changes on the primary source also gets applied to the target destination (mirror or replica). However, it also means that anything changed, deleted, corrupted, or damaged on the source is also impacted on the mirror replica (assuming the mirror or replicas were or are mounted and accessible on-line).

implementations in various locations (hardware, software, cloud) include:

  • Applications and databases such as SQL Server, Oracle among others
  • File systems, volume manager, Software-defined storage managers
  • Third-party storage software utilities and drivers
  • Operating systems and hypervisors
  • Hardware adapter and off-load devices
  • Storage systems and appliances
  • Cloud and managed services

Where To Learn More

Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

There are various data protection technologies, tools and techniques for enabling availability of information resources including applications, data and data Infrastructure resources. Likewise there are many different aspects of RAID as well as context from legacy hardware based to cloud, virtual, container and software defined. In other words, not all RAID is in legacy storage systems, and there is a lot of FUD about RAID in general that is probably actually targeted more at specific implementations or products.

There are different approaches to meet various needs from stripe for performance with no protection by itself, to mirror and replication, as well as many parity approaches from legacy to erasure codes including Reed Solomon based as well as LRC among others. Which approach is best depends on your objects including balancing performance, availability, capacity economic (PACE) for normal running behavior as well as during faults and failure modes.

Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 4 Data Protection Recovery Points (Archive, Backup, Snapshots, Versions).

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.