Fundamental Tools, Technologies, Toolbox, Buzzword Bingo Trends
Companion to Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Fundamental Server Storage I/O Tradecraft ( CRC Press 2017)
This is Part 7 of a multi-part series on Data Protection fundamental tools topics techniques terms technologies trends tradecraft tips as a follow-up to my Data Protection Diaries series, as well as a companion to my new book Software Defined Data Infrastructure Essentials – Cloud, Converged, Virtual Server Storage I/O Fundamental tradecraft (CRC Press 2017).
Post in the series includes excerpts from Software Defined Data Infrastructure (SDDI) pertaining to data protection for legacy along with software defined data centers ( SDDC), data infrastructures in general along with related topics. In addition to excerpts, the posts also contain links to articles, tips, posts, videos, webinars, events and other companion material. Note that figure numbers in this series are those from the SDDI book and not in the order that they appear in the posts.
Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
There are many data Infrastructure related topics, technologies, tools, trends, techniques and tips that pertain to data protection, many of which have been covered in this series of posts already, as well as in the SDDI Essentials book, and elsewhere. The following are some additional related data Infrastructure data protection topics, tools, technologies.
Buzzword Bingo is a popular industry activity involving terms, trends, tools and more, read more here, here, and here. The basic idea of buzzword bingo is when somebody starts mentioning lots of buzzwords, buzz terms, buzz trends at some point just say bingo. Sometimes you will get somebody who asks what that means, while others will know, perhaps get the point to move on to what’s relevant vs. talking the talk or showing how current they are on industry activity, trends and terms.
Just as everything is not the same across different environments, there are various size and focus from hyper-scale clouds and managed service providers (MSP) server (and storage along with applications focus), smaller and regional cloud, hosting and MSPs, as well as large enterprise, small medium enterprise (SME), small medium business (SMB), remote office branch office (ROBO), small office home office (SOHO), prosumer, consumer and client or edge. Sometimes you will hear server vs. edge or client focus, thus context is important.
Data protection just like data infrastructures span servers, storage, I/O networks, hardware, software, clouds, containers, virtual, hypervisors and related topics. Otoh, some might view data protection as unique to a particular technology focus area or domain. For example, I once had backup vendor tell me that backups and data protection was not a storage topic, can you guess which vendor did not get recommend for data protection of data stored on storage?
Data gets protected to different target media, mediums or services including HDDs, SSD, tape, cloud, bulk and object storage among others in various format from native to encapsulated in save sets, zips, tar ball among others.
Bulk storage can be on-site, on-premises low-cost tape, disk (file, block or object) as well as off-site including cloud services such as AWS S3 (buckets and objects), Microsoft Azure (containers and blobs), Google among others using various Access ( Protocols, Personalities, Front-end, Back-end) technologies. Which type of data protection storage medium, location or service is best depends on what you are trying to do, along with other requirements.
Data Protection Toolbox
Before discussing Object Storage lets take a step back and look at some context that can clarify some confusion around the term object. The word object has many different meanings and context, both inside of the IT world as well as outside. Context matters with the term object such as a verb being a thing that can be seen or touched as well as a person or thing of action or feeling directed towards.
Besides a person, place or physical thing, an object can be a software defined data structure that describes something. For example, a database record describing somebody’s contact or banking information, or a file descriptor with name, index ID, date and time stamps, permissions and access control lists along with other attributes or metadata. Another example is an object or blob stored in a cloud or object storage system repository, as well as an item in a hypervisor, operating system, container image or other application.
Besides being a verb, object can also be a noun such as disapproval or disagreement with something or someone. From an IT context perspective, object can also refer to a programming method (e.g. object oriented programming [oop], or Java [among other environments] objects and class’s) and systems development in addition to describing entities with data structures.
In other words, a data structure describes an object that can be a simple variable, constant, complex descriptor of something being processed by a program, as well as a function or unit of work. There are also objects unique or with context to specific environments besides Java or databases, operating systems, hypervisors, file systems, cloud and other things.
The role of object storage (view more at www.objectstoragecenter.com) is to provide low-cost, scalable capacity, durable availability of data including data protection copies on-premises or off-site. Note that not all object storage solutions or services are the same, some are immutable with write once read many (WORM) like attributes, while others non-immutable meaning that they can be not only appended to, also updated to page or block level granularity.
Also keep in mind that some solutions and services refer to items being stored as objects while others as blobs, and the name space those are part of as a bucket or container. Note that context is important not to confuse an object container with a docker, kubernetes or micro services container.
Many applications and storage systems as well as appliances support as back-end targets cloud access using AWS S3 API (of AWS S3 service or other solutions), as well as OpenStack Switch API among others. There are also many open source and third-party tools for working with cloud storage including objects and blobs. Learn more about object storage, cloud storage at www.objectstoragecenter.com as well as in chapters 3, 4, 13 and 14 in SDDI Essentials book.
S3 Simple Storage Service
Simple Storage Service ( S3) is the Amazon Web Service (AWS) cloud object storage service that can be used for bulk and other storage needs. The S3 service can be accessed from within AWS as well as externally via different tools. AWS S3 supports large number of buckets and objects across different regions and availability zones. Objects can be stored in a hierarchical directory structure format for compatibility with existing file systems or as a simple flat name space.
Context is important with data protection and S3 which can mean the access API, or AWS service. Likewise context is important in that some solutions, software and services support S3 API access as part of their front-end (e.g. how servers or clients access their service), as well as a back-end target (what they can store data on).
Additional AWS S3 (service) and related resources include:
- Cloud conversations: AWS EBS, Glacier and S3 overview (Part I)
- AWS S3 Storage Gateway Revisited (Part I)
- S3motion Buckets Containers Objects AWS S3 Cloud and EMCcode
- Cloud conversations: AWS EBS, Glacier and S3 overview (Part II S3)
- Cloud Conversations: AWS S3 Cross Region Replication storage enhancements
- Part II Revisiting AWS S3 Storage Gateway (Test Drive Deployment)
Data Infrastructure Environments and Applications
Data Infrastructure environments that need to be protected include legacy, software defined (SDDC, SDDI, SDS), cloud, virtual and container based, as well as clustered, scale-out, converged Infrastructure (CI), hyper-converged Infrastructure (HCI) among others. In addition to data protection related topics already converged in the posts in this series (as well as those to follow), a related topic is Data Footprint Reduction ( DFR). DFR comprises several different technologies and techniques including archiving, compression, compaction, deduplication (dedupe), single instance storage, normalization, factoring, zip, tiering and thin provisioning among many others.
Data Footprint Reduction (DFR) Including Dedupe
There is a long-term relationship with data protection and DFR in that to reduce the impact of storing more data, traditional techniques such as compression and compaction have been used, along with archive and more recently dedupe among others. In the Software Defined Data Infrastructure Essentials book there is an entire chapter on DFR ( chapter 11), as well as related topics in chapters 8 and 13 among others. For those interested in DFR and related topics, there is additional material in my books Cloud and Virtual Data Storage Networking (CRC Press), along with in The Green and Virtual Data Center (CRC Press), as well as various posts on StorageIOblog.com and storageio.com. Figure 11.4 is from Software Defined Data Infrastructure Essentials showing big picture of various places where DFR can be implemented along with different technologies, tools and techniques.
Just as everything is not the same, there are different DFR techniques along with implementations to address various application workload and data performance, availability, capacity, economics (PACE) needs. Where is the best location for DFR that depends on your objectives as well as what your particular technology can support. However in general, I recommend putting DFR as close to where the data is created and stored as possible to maximize its effectiveness which can be on the host server. That however also means leveraging DFR techniques downstream where data gets sent to be stored or protected. In other words, a hybrid DFR approach as a companion to data protection should use various techniques, technologies in different locations. Granted, your preferred vendor might only work in a given location or functionality so you can pretty much guess what the recommendations will be ;) .
Tips, Recommendations and Considerations
General action items, tips, considerations and recommendations include:
- Everything is not the same; different applications with SLO, PACE, FTT, FTM needs
- Understand the 4 3 2 1 data protection rule and how to implement it.
- Balance rebuild performance impact and time vs. storage space overhead savings.
- Use different approaches for various applications and environments.
- What is best for somebody else may not be best for you and your applications.
- You cant go forward in the future after a disaster if you cant go back
- Data protection is a shared responsibility between vendors, service providers and yourself
- There are various aspects to data protection and data Infrastructure management
Where To Learn More
Continue reading additional posts in this series of Data Infrastructure Data Protection fundamentals and companion to Software Defined Data Infrastructure Essentials (CRC Press 2017) book, as well as the following links covering technology, trends, tools, techniques, tradecraft and tips.
- Part 1 – Data Infrastructure Data Protection Fundamentals
- Part 2 – Reliability, Availability, Serviceability ( RAS) Data Protection Fundamentals
- Part 3 – Data Protection Access Availability RAID Erasure Codes ( EC) including LRC
- Part 4 – Data Protection Recovery Points (Archive, Backup, Snapshots, Versions)
- Part 5 – Point In Time Data Protection Granularity Points of Interest
- Part 6 – Data Protection Security Logical Physical Software Defined
- Part 7 – Data Protection Tools, Technologies, Toolbox, Buzzword Bingo Trends
- Part 8 – Data Protection Diaries Walking Data Protection Talk
- Part 9 – who’s Doing What ( Toolbox Technology Tools)
- Part 10 – Data Protection Resources Where to Learn More
- Data Protection Diaries series
- Data Infrastructure server storage I/O network Recommended Reading List Book Shelf
- Software Defined Data Infrastructure Essentials (CRC 2017) Book
Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.
What This All Means
There are many different buzzword, buzz terms, buzz trends pertaining to data infrastructure and data protection. These technologies span legacy and emerging, software-defined, cloud, virtual, container, hardware and software. Key point is what technology is best fit for your needs and applications, as well as how to use the tools in different ways (e.g. skill craft techniques and tradecraft). Keep context in mind when looking at and discussing different technologies such as objects among others.
Get your copy of Software Defined Data Infrastructure Essentials here at Amazon.com, at CRC Press among other locations and learn more here. Meanwhile, continue reading with the next post in this series, Part 8 Walking The Data Protection Talk.
Ok, nuff said, for now.
Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.
All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2018 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.