Industry trend: People plus data are aging and living longer

Lets face it, people and information are living longer and thus there are more of each along with a strong interdependency by both.

People living and data being retained longer should not be a surprise, take a step back and look at the bigger picture. There is no such thing as an information recession with more data being generated, processed, moved and stored for longer periods of time not to mention that a data object is also getting larger.

Industry trend and performance

By data objects getting larger, think about a digital photo taken on a typical camera ten years ago which whose resolution was lower and thus its file size would have been measured in kilo bytes (thousands). Today megapixel resolutions are common from cell phones, smart phones, PDAs and even larger with more robust digital and high definition (HD) still and video cameras. This means that a photo of the same object that resulted in a file of hundreds of Kbytes ten years ago would be measured in Megabytes today. With three dimensional (3D) cameras appearing along with higher resolution, you do not need to be a rocket scientist or industry pundit to figure out what that growth trend trajectory looks like.

However it is not just the size of the data that is getting larger, there are also more instances along with copies of those files, photos, videos and other objects being created, stored and retained. Similar to data, there are more people now than ten years ago and some of those have also grown larger, or at least around the waistline. This means that more people are creating and relying on larger amounts of information being available or accessible when and where needed. As people grow older, the amount of data that they generate will naturally increase as will the information that they consume and rely upon.

Where things get interesting is that looking back in history, that is more than ten or even a hundred years, the trend is that there are more people, they are living longer, and they are generating larger amounts of data that is taking on new value or meaning. Heck you can even go back from hundreds to thousands of years and see early forms of data archiving and storage with drawings on walls of caves or other venues. I Wonder if had the cost (and ease of use) to store and keep data had been lower back than would there have been more information saved? Or was it a case of being too difficult to use the then state of art data and information storage medium combined with limited capacities so they simply ran out of storage and retention mediums (e.g. walls and ceilings)?

Lets come back to the current for a moment which is another trend of data that in the past would have been kept offline or best case near line due to cost and limits or constraints are finding their way online either in public or private venues (or clouds if you prefer).

Thus the trend of expanding data life cycles with some types of data being kept online or readily accessible as its value is discovered.

Evolving data life cycle and access patterns

Here is an easy test, think of something that you may have googled or searched for a year or two ago that either could not be found or was very difficult to find. Now take that same search or topic query and see if anything appears and if it does, how many instances of it appear. Now make a note to do the same test again in a year or even six months and compare the results.

Now back to the future however with an eye to the past and things get even more interesting in that some researchers are saying that in centuries to come, we should expect to see more people not only living into their hundreds, however even longer. This follows the trend of the average life expectancy of people continues to increase over decades and centuries.

What if people start to live hundreds of years or even longer, what about the information they will generate and rely upon and its later life cycle or span?

More information and data

Here is a link to a post where a researcher sees that very far down the road, people could live to be a thousand years old which brings up the question, what about all the data they generate and rely upon during their lifetime.

Ok, now back to the 21st century and it is safe to say that there will be more data and information to process, move, store and keep for longer periods of time in a cost effective way. This means applying data footprint reduction (DFR) such as archiving, backup and data protection modernization, compression, consolidation where possible, dedupe and data management including deletion where applicable along with other techniques and technologies combined with best practices.

Will you out live your data, or will your data survive you?

These are among other things to ponder while you enjoy your summer (northern hemisphere) vacation sitting on a beach or pool side enjoying a cool beverage perhaps gazing at the passing clouds reflecting on all things great and small.

Clouds: Dont be scared, however look before you leap and be prepared

Ok, nuff said for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

Full RSS archive feeds are now available for StorageIOblog

To speed up access to the StorageIO and StorageIOblog site RSS full and RSS summary feeds, older posts have been moved to a new archive RSS feed. Theese changes are only to the RSS full and summary feed files, no changes have been made to the StorageIOblog site.

View or access the full StorageIO RSS feed (httP://storageioblog.com/RSSfullArchive.xml) here.

Enjoy the faster access RSS full and summary feed, plus archived feeds. Ok, nuf said for now.

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

StorageIO going Dutch: Seminar for Storage and I/O professionals

Data and Storage Networking Industry Trends and Technology Seminar

Greg Schulz of StorageIO in conjunction with or dutch parter Brouwer Storage Consultancy will be presenting a two day seminar for Storage Professionals Tuesday 24th and Wednesday 25th of May 2011 at Ampt van Nijkerk Netherlands.

Brouwer Storage ConsultanceyThe Server and StorageIO Group

This two day interactive education seminar for storage professionals will focus on current data and storage networking trends, technology and business challenges along with available technologies and solutions. During the seminar learn what technologies and management techniques are available, how different vendors solutions compare and what to use when and where. This seminar digs into the various IT tools, techniques, technologies and best practices for enabling an efficient, effective, flexible, scalable and resilient data infrastructure.

The format of this two seminar will be a mix of presentation and interactive discussion allowing attendees plenty of time to discuss among themselves and with seminar presenters. Attendees will gain insight into how to compare and contrast various technologies and solutions in addition to identifying and aligning those solutions to their specific issues, challenges and requirements.

Major themes that will be discussed include:

  • Who is doing what with various storage solutions and tools
  • Is RAID still relevant for today and tomorrow
  • Are hard disk drives and tape finally dead at the hands of SSD and clouds
  • What am I routinely hearing, seeing or being asked to comment on
  • Enabling storage optimization, efficiency and effectiveness (performance and capacity)
  • What do I see as opportunities for leveraging various technologies, techniques,trends
  • Supporting virtual servers including re-architecting data protection
  • How to modernize data protection (backup/restore, BC, DR, replication, snapshots)
  • Data footprint reduction (DFR) including archive, compression and dedupe
  • Clarifying cloud confusion, don’t be scared, however look before you leap

In addition this two day seminar will look at what are some new and improved technologies and techniques, who is doing what along with discussions around industry and vendor activity including mergers and acquisitions. Greg will also preview the contents and themes of his new book Cloud and Virtual Data Storage Networking (CRC) for enabling efficient, optimized and effective information services delivery across cloud, virtual and traditional environments.

Buzzwords and topic themes to be discussed among others include:
E2E, FCoE and DCB, CNAs, SAS, I/O virtualization, server and storage virtualization, public and private cloud, Dynamic Infrastructures, VDI, RAID and advanced data protection options, SSD, flash, SAN, DAS and NAS, object storage, application optimized or aware storage, open storage, scale out storage solutions, federated management, metrics and measurements, performance and capacity, data movement and migration, storage tiering, data protection modernization, SRA and SRM, data footprint reduction (archive, compress, dedupe), unified and multi-protocol storage, solution bundle and stacks.

For more information or to register contact Brouwer Storage Consultancy

Brouwer Storage Consultancy
Olevoortseweg 43
3861 MH Nijkerk
The Netherlands
Telephone: +31-33-246-6825
Cell: +31-652-601-309
Fax: +31-33-245-8956
Email: info@brouwerconsultancy.com
Web: www.brouwerconsultancy.com

Brouwer Storage Consultancey

Learn about other events involving Greg Schulz and StorageIO at www.storageio.com/events

Ok, nuff said for now

Cheers Gs

Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

Using Removable Hard Disk Drives (RHDDs)

Removable hard disk drives (RHDD) are a form of removable media which includes magnetic tape that address many common use cases. Usage scenarios include enabling bulk data portability for larger environments or for D2D backup where the media needs to be physically moved offsite for small and mid sized environments. RHDDs include among others those from Imation such as the Odyssey (which is what I use) and the Prostor RDX (OEMed by Imation and others). RHDD, tape along with other forms of portable media including those that use flash by being removable and portable presumable should have some extra packaging protection to safeguard against static shock in addition to supporting encryption capabilities.

Compared to disks including RHDD, tape for most and particularly larger environments should have an overall lower media cost for parking, preserving and when needed serving inactive or archived data (e.g. the changing roll of tape from day to back up to archive). Of course your real costs will vary by use in addition to how combined with data footprint reduction and other technologies.

A big benefit of RHDDs is that they are random meaning data can be searched and found quickly vs. tape media which has great sequential or streaming capabilities if you have a system that can support that ability. The other benefit of RHDD is that depending on their implementation, they should plug and play with your systems appearing as disk without any extra drivers or configuration or software tools making for ease of use. Being removable they can be used for portability such as sending data to a cloud or MSP as part of an initial bulk copy, or sending data offset or taking home as part of an offsite backup, data protection or BC/DR strategy as well as being used for archiving. The warning with RHDD is their cost per TByte will generally be higher than compared to tape as well as having to have a docking station or specific drive interface depending on specific product and configuration.

RHDD are a great compliment to traditional fixed or non removable disk, Hybrid Hard Disk Drive (HHDD) and Solid State Device (SSD) based storage as well as coexist with cloud or MSP backup and archive solutions. The smaller the environment the more affordable using RHDD become vs. tape for backup and archive operations or when portability is required. Even if using a cloud or managed service provider (MSP) backup provider, network bandwidth costs, availability or performance may limit the amount of data that can be moved in a cost effective way. For example placing an archive and gold or master copy of your static data on a RHDD that may be kept on site in a safe off-site place and then sending data that is routinely changed to the cloud or MSP provider (think full local and offsite plus partial full and incremental in the cloud).

By leveraging archiving and data footprint reduction (DFR) techniques including dedupe and compression, you can stretch your budget by sending less data to cloud or MSP services while using removable media for data protection. You would be surprised how many TBytes of data can be kept in a safe deposit box. For my own business, I have used RHDDs for several years to keep gold master copies as well as archives offsite as part of a disk to disk (D2D) or D2D2RHDD strategy. The data protection strategy is also complimented by sending active data to a cloud backup MSP (encrypted of course). It might be belt and suspenders, however it is also eating my own dog food practicing what I talk about and the approach has proven itself a few times.

Here are some related links to more material:
Removable disk drives vs. tape storage for small businesses
The pros and cons of removable disk storage for small businesses
Removable storage media appealing to SMBs, but with caveats
StorageIO Momentus Hybrid Hard Disk Drive (HHDD) Moments

Ok, nuff said for now

Cheers Gs

Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

Spring 2011 Server and StorageIO News Letter

StorageIO News Letter Image
Spring 2011 Newsletter

Welcome to the Spring 2011 edition of the Server and StorageIO Group (StorageIO) newsletter. This follows the Winter 2011 edition.

You can access this news letter via various social media venues (some are shown below) in addition to StorageIO web sites and subscriptions.

 

Click on the following links to view the Spring 2011 edition as an HTML or PDF or, to go to the newsletter page to view previous editions.

Follow via Goggle Feedburner here or via email subscription here.

You can also subscribe to the news letter by simply sending an email to newsletter@storageio.com

Enjoy this edition of the StorageIO newsletter, let me know your comments and feedback.

Cheers gs

Nuff said for now

Cheers
Gs

Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

StorageIO V20.11 (2011) events seminars and web casts schedule

The V20.11 (e.g. 2011 or follow up from V20.10) Server and StorageIO (StorageIO) out and about events schedule continues to evolve.

In the meantime, here are a few (actually a couple dozen) seminars and web casts currently on the event calendar for 2011 that I will be speaking or presenting at. Topics and themes include Server and Storage Optimization, Clouds, Virtualization, Data Protection Modernization (HA, BC, DR, Backup/restore) along with Data Footprint Reduction (DFR including archive, compression, dedupe), End to End (E2E) Management, efficient IT data centers (and storage) among other related items.

Later this summer watch for the release of my new book Cloud and Virtual Data Storage Networking (CRC) as well as keep an eye on the StorageIO events page for additional events or details to appear. Also check out the news page for commentary on industry activities, announcements, trends or related topics in addition to the tips (or articles) page. You can also view videos, webinars and pod casts along with news letters containing links from while out and about during 2011 activities (or from past events).

WhenVenueEvent Name Location
Nov 10, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerLos Angeles, CA
Nov 8, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerSeattle, WA
Nov 3, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerDenver, CO
Nov 1, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerChicago, IL
Sept 29, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerMinneapolis, MN
Aug 4, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerSan Francisco, CA
Jul 28, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerHouston, TX
Jul 21, 2011 Event keynote Speaker: Data Center Summit: Virtualization, Business Continuity and Cloud ComputingRaleigh, NC
Jun 28, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerBoston, MA
June 23, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerNew York City
June 21, 2011 Event keynote Speaker: Data Center Summit: Virtualization, Business Continuity and Cloud ComputingTampa, FL
May 18, 2011Keynote: 2011 Virtualization Best PracticesIrvine, CA
May 12, 2011Keynote: 2011 Virtualization Best PracticesChicago, IL
May 10, 2011Keynote: 2011 Virtualization Best PracticesDallas, TX
May 5, 2011Keynote: 2011 Virtualization Best PracticesNew York, NY
May 3, 2011Keynote: 2011 Virtualization Best PracticesBoston, MA
Apr 28, 2011 Data Center Summit BC and DR Track Key note speaker: Protect, preserve and server your organizations essential applications and information services in an affordable mannerDallas, TX
Apr 12, 2011 Event keynote Speaker: Data Center Summit: Virtualization, Business Continuity and Cloud ComputingSt. Louis, MO
Mar 29, 2011WebcastCloud and Virtual BC/DRWebcast
Mar 24, 2011WebcastTapes Evolving Data Storage RoleWebcast
Mar 15, 2011Wildfire GrilleKeynote: Virtualization, storage and the enterprise cloudEden Prairie, MN
Feb 10, 2011 Guest participant – Enabling safe and secure SaaSOn demand eSeminar
Jan 31, 2011century CollegeCloud and Virtual Data Storage Networking: Industry TrendsMahtomedi MN
Jan 12, 2011 Presenter – E2E Awareness and insight for cloud, virtualized and legacy IT environments
See more here including viewing the webcast
Virtual event
More information

Watch here for more events updates and information as well as signup for the free StorageIO news letter here.

Nuff said for now, look forward to seeing as well as hearing from you while out and about during 2011.

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

Tape talk time (tape summit and tape is alive, for some)

Welcome to the tape summit resources and tape summit resources micro site with links for those who are interested in magnetic tape for backup, archive, BC, DR, big and little data

For being a declared dead or zombie technology (here, here or here) tape remains very much alive however its role is changing. There is no disputing that hard disk drives (HDDs) are continuing to expand their role for data protection including backup/restore, BC and DR where tape has been used  for decades.

What is also occurring is that tapes role is changing from day to day backup to that of longer term data preservation including archiving with more data stored on tape today than in past history at a lower cost. In fact the continued reduced cost per tape and improved capacity as well as utilization has worked against tape from a marketing competitive standpoint. For example if you look at a chart showing tape (media and drive) revenues you see a decline, similar to what was seen a couple of years ago for HDDs.

What is not shown on some charts are how many units (drives or media) shipped with more capacity for a given price (again what was reported for HDDs a few years ago) when net capacity had increased. Vendors of tape technology have also had a rather low profile particular for those with other technologies that have received more marketing resources (people, time, money). After all, if a product is on a plateau of productivity and profitability why spend time or effort on extensive marketing or promotion vs. directing resources to get new items into the market.

As a result, for those looking to make a case that tape is on the decline based on revenues to convince customers to move away from that technology should have a marketing freebie. Recently Oracle announced a new large capacity tape drive and media following on previous announcements of enhanced LTO roadmap and future 35TByte  tape capabilities announced January 2010 by Fujifilm and IBM.

For those who are interested following are some links to various topics including how SSD, HDD and tape can coexist complementing each other for different roles or functions. As to those who do not like tape, feel free to read if you like as there is also material on SSD, HDD, dedupe, cloud, data protection and other topics.

Some previous blog posts:

Here are some additional articles, commentary and reports pertaining to tape related topics:

Something tells me we will be hearing, reading or watching more about tape being alive in the months to come.

Nuff said for now

Cheers gs

Thanks for visiting tape summit resources and tape summit resources micro site with links for those who are interested in magnetic tape for backup, archive, BC, DR, big and little data

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Have VTLs or VxLs become Zombies, Declared dead yet still alive?

Have you heard or read the reports and speculation that VTLs (Virtual Tape Libraries) are dead?

It seems that in IT the all to popular trend is to declare something dead so that your new product or technology can have a chance of making it in to the market or perhaps seen in a better light.

Sometimes this approach works to temporary freeze the market until common sense and clarity returns to the market or until something else fun to talk about comes along and in other cases, the messages can fall on deft ears.

The approach of declaring something dead tends to play well for those who like shiny new toys (SNT) or new shiny toys (NST) and being on the popular, cool trendy bandwagon.

Not surprisingly, while some actual IT customers can fall into the SNT or NST syndrome, its often the broader industry including media, bloggers, analysts, consultants and other self proclaimed or anointed pundits as well as vendors who latch on to the declare it dead movement. After all, who wants to talk about something that is old, boring and already being sold to paying customers who are using it. Now this is not a bad thing as we need a balance of up and coming challengers to keep the status quo challenged, likewise we need a balance of the new to avoid death grips on the old and what is working.

Likewise, many IT customers particularly larger ones tend to be very risk averse and conservative with their budgets protecting their investments thus they may only go leading bleeding edge if there is a dual redundant blood bank with a backup on hot standby (thats some HA humor BTW).

Another reason that declaring items dead in support of SNT and NST is that while many of the commonly declared dead items are on the proverbial plateau of productivity for IT customers, that also can mean that they are on the plateau of profitability for the vendors.

However, not all good things last and at sometime, there is the need to transition from the old to the new and this is where things like virtualization including virtual tape libraries or virtual disk libraries or virtual storage library or what ever you want to call a VxL (more on what a VxL is in a moment) can come into play.

I realize that for some, particularly those who like to grasp on to SNT, NST and ride the dead pool bandwagons this will probably appear as snarky or cynical which is fine, after all, for some, you should be laughing to the bank and if not, you may in fact be missing out on an opportunity for playing in the dead pool marketing game.

Now back to VxL.

In the case of VTLs, for some it is the T word that bothers them, you know T as in Tape which is not a SNT or NST in an age where SSD has supposedly killed the disk drive which allegedly terminated tape (yeah right). Sure tape is not being used as much for backup as it has in the past with its role shifting to that of longer term retention, something that it is well suited for.

For tape fans (or cynics) you can read more here, here and here. However there is still a large amount of backup/restore along with other data protection or preservation (e.g. archiving) processing (software tools, processes, procedures, skill sets, management tools) that still expects to see tape.

Hence this is where VTLs or VxLs come into play leveraging virtualization in an Life Beyond Consolidation (and here) scenario providing abstraction, transparency, agility and emulation and IMHO are still very much alive and evolving.

Ok, for those who do not like or believe in or of its continued existence and evolving role, substitute the T (tape) with X and you get a VxL. That is, plug in what ever X word that makes you happy or marketable or a Shiny New TLA. For example Virtual Disk Library, Virtual Storage Library, Virtual Backup Library, Virtual Compression Library, Virtual Dedupe Library, Virtual ILM Library, Virtual Archive Library, Virtual Cloud Library and so forth. Granted some VxLs only emulate tape and hence are VTLs while others support NAS and other protocols (or personalities) not to mention functionality ranging from replication, DFR as well as automated policy management.

However, keep in mind that if your preference is VTL, VxL or what ever other buzzword bingo name that you want to use or come up with, look at how virtualization in the form of abstraction, transparency and emulation can bridge the gap between the new (disk based data protection) combined with DFR (Data Footprint Reduction) and the old (existing backup/restore, archive or other management tools and processes.

Here are some additional links pertaining to VTLs (excuse me, VxLs):

  • Virtual tape libraries: Old backup technology holdover or gateway to the future?
  • Not to mention here, here, here, here or here.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

More Data Footprint Reduction (DFR) Material

This is part of an ongoing series of short industry trends and perspectives (ITP) blog posts briefs based on what I am seeing and hearing in my conversations with IT professionals on a global basis.

These short posts compliment other longer posts along with traditional industry trends and perspective white papers, research reports, videos, podcasts, webcasts as well as solution brief content found a www.storageioblog.com/reports and www.storageio.com/articles.

If you recall from previous posts including here, here or here among others, Data Footprint Reduction (DFR) is a collection of tools, technologies and best practices for addressing growing data storage management and cost impacts.

DFR encompasses many different tools, techniques and technologies across various applications ranging from active or primary storage to secondary and inactive along with backup and archive.

Some of the technologies techniques and technologies include archiving, backup modernization, compression, data management, dedupe, space saving snapshots and thin provisioning among others.

Following are some links to various articles and commentary pertaining to DFR:

  • Using DFR including dedupe and compression to defry storage and management costs
  • Deduplicate, compress and defray costs of data storage management
  • Virtual tape libraries: Old backup technology holdover or gateway to the future?
  • As well as here, here or here

In the spirit of DFR, that is doing more with less, nuff said (for now).

Of course let me know what your thoughts and perspectives are on this and other related topics.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

Updated 10/9/2018

What is DFR or Data Footprint Reduction?

Data Footprint Reduction (DFR) is a collection of techniques, technologies, tools and best practices that are used to address data growth management challenges. Dedupe is currently the industry darling for DFR particularly in the scope or context of backup or other repetitive data.

However DFR expands the scope of expanding data footprints and their impact to cover primary, secondary along with offline data that ranges from high performance to inactive high capacity.

Consequently the focus of DFR is not just on reduction ratios, its also about meeting time or performance rates and data protection windows.

This means DFR is about using the right tool for the task at hand to effectively meet business needs, and cost objectives while meeting service requirements across all applications.

Examples of DFR technologies include Archiving, Compression, Dedupe, Data Management and Thin Provisioning among others.

Read more about DFR in Part I and Part II of a two part series found here and here.

Where to learn more

Learn more about data footprint reducton (DFR), data footprint overhead and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

That is all for now, hope you find these ongoing series of current or emerging Industry Trends and Perspectives posts of interest.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Data footprint reduction (Part 2): Dell, IBM, Ocarina and Storwize

Dell

IBM

Over the past couple of weeks there has been a flurry of IT industry activity around data footprint impact reduction with Dell buying Ocarina and IBM acquiring Storwize. For those who want the quick (compacted, reduced) synopsis of what Dell buying Ocarina as well as IBM acquiring Storwize means read the first post in this two part series as well as some of my comments here and here.

This piece and it companion in part I of this two part series is about expanding the discussion to the much larger opportunity for vendors or vars of overall data footprint impact reduction beyond where they are currently focused. Likewise, this is about IT customers realizing that there are more opportunities to address data and storage optimization across your entire organization using various techniques instead of just focusing on backup or vmware virtual servers.

Who is Ocarina and Storwize?
Ocarina is a data and storage management software startup focused on data footprint reduction using a variety of approaches, techniques and algorithms. They differ from the traditional data dedupers (e.g. Asigra, Bakbone, Commvault, EMC Avamar, Datadomain and Networker, Exagrid, Falconstor, HP, IBM Protectier and TSM, Quantum, Sepaton and Symantec among others) by looking at data footprint reduction beyond just backup.

This means looking at how to reduce data footprint across different types of data including videos, image as well as text based documents among others. As a result, the market sweet spot for Ocarina is for general data footprint reduction including static along with active data including entertainment, video surveillance or gaming, reference data, web 2.0 and other bulk storage application data needs (this should compliment Dells recent Exanet acquisition).

What this means is that Ocarina is very well suited to address the rapidly growing amount of unstructured data that may not otherwise be handled as efficiently with by dedupe alone.

Storwize is a data and storage management startup focused on data footprint reduction using inline compression with an emphasis on maintaining performance for reads as well as writes of unstructured as well as structured database data. Consequently the market sweet spot for Storwize is around boosting the capacity of existing NAS storage systems from different vendors without negatively impacting performance. The trade off of the Storwize approach is that you do not get the spectacular data reduction ratios associated with backup centric or focused dedupe, however, you maintain performance associated with online storage that some dedupers dream of.

Both Dell and IBM have existing dedupe solutions for general purpose as well as backup along with other data footprint impact reduction tools (either owned or via partners). Now they are both expanding their focus and reach similar to what others such as EMC, HP, NetApp, Oracle and Symantec among others are doing. What this means is that someone at Dell and IBM see that there is much more to data footprint impact reduction than just a focus on dedupe for backup.

Wait, what does all of this discussion (or read here for background issues, challenges and opportunities) about unstructured data and changing access lifecycles have to do with dedupe, Ocarina and Storwize?

Continue reading on as this is about the expanding opportunity for data footprint reduction across entire organizations. That is, more data is being kept online and expanding data footprint impact needs to be addressed to meet business objectives using various techniques balancing performance, availability, capacity and energy or economics (PACE).

Dell

IBM

What does all of this have to do with IBM buying Storwize and Dell acquiring Ocarina?
If you have not pieced this together yet, let me net it out.

This is about the opportunity to address the organization wide expanding data footprint impact across all applications, types of data as well as tiers of storage to support business growth (more data to store) while maintaining QoS yet reduce per unit costs including management.

This is about expanding the story to the broader data footprint impact reduction from the more narrowly focused backup and dedupe discussion which are still in their infancy on a relative basis to their full market potential (read more here).

Now are you seeing where this is going and fits?

Does this mean IBM and Dell defocus on their existing Dedupe product lines or partners?
I do not believe so, at least as long as their respective revenue prevention departments are kept on the sidelines and off of the field of play. What I mean by this is that the challenge for IBM and Dell is similar to that of what others such as EMC are faced with having diverse portfolios or technology toolboxes. The challenge is messaging to the bigger issues, then aligning the right tool to the task at hand to address given issues and opportunities instead of singularly focused on a specific product causing revenue prevention elsewhere.

As an example, for backup, I would expect Dell to continue to work with its existing dedupe backup centric partners and technologies however find new opportunities to leverage their Ocarina solution. Likewise, IBM I would expect to continue to show customers where Tivoli software based dedupe or Protectier (aka the deduper formerly known as Diligent) or other target based dedupe fits and expand into other data footprint impact areas with Storewize.

Does this change the playing field?
IMHO these moves as well as some previous moves by the likes of EMC and NetApp among others are examples of expanding the scope and dimension of the playing field. That is, the focus is much more than just dedupe for backup or of virtual machines (e.g. VMware vSphere or Microsoft HyperV).

This signals a growing awareness around the much larger and broader opportunity around organization wide data footprint impact reduction. In the broader context some applications or data gets compressed either in application software such as databases, file systems, operating systems or even hypervisors as well as in networks using protocol or bandwidth optimizers as well as inline compression or post processing techniques as has been the case with streaming tape devices for some time.

This also means that where with dedupe the primary focus or marketing angle up until recently has been around reduction ratios, to meet the needs of time or performance sensitive applications data transfer rates also become important.

Hence the role of policy based data footprint reduction where the right tool or technique to meet specific service requirements is applied. For those vendors with a diverse data footprint impact reduction tool kit including archive, compression, dedupe, thin provision among other techniques, I would expect to hear expanded messaging around the theme of applying the right tool to the task at hand.

Does this mean Dell bought Ocarina to accessorize EqualLogic?
Perhaps, however that would then beg the question of why EqualLogic needs accessorizing. Granted there are many EqualLogic along with other Dell sold storage systems attached to Dell and other vendors servers operating as NFS or Windows CIFS file servers that are candidates for Ocarina. However there are also many environments that do not yet include Dell EqualLogic solutions where Ocarina is a means for Dell to extend their reach enabling those organizations to do more with what they have while supporting growth.

In other words, Ocarina can be used to accessorize, or, it can be used to generate and create pull through for various Dell products. I also see a very strong affinity and opportunity for Dell to combine their recent Exanet NAS storage clustering software with Dell servers, storage to create bulk or scale out solutions similar to what HP and other vendors have done. Of course what Dell does with the Ocarina software over time, where they integrate it into their own products as well as OEM to others should be interesting to watch or speculate upon.

Does this mean IBM bought Storwize to accessorize XIV?
Well, I guess if you put a gateway (or software on a server which is the same thing) in front of XIV to transform it into a NAS system, sure, then Storwize could be used to increase the net usable capacity of the XIV installed base. However that is a lot of work and cost for what is on a relative basis a small footprint, yet it is a viable option never the less.

IMHO IBM has much more of a play, perhaps a home run by walking before they run by placing Storwize in front of their existing large installed base of NetApp N series (not to mention targeting NetApps own install base) as well as complimenting their SONAS solutions. From there as IBM gets their legs and mojo, they could go on the attack by going after other vendors NAS solutions with an efficiency story similar to how IBM server groups target other vendors server business for takeout opportunities except in a complimenting manner.

Longer term I would not be surprised to see IBM continue development of the block based IP (as well as file) in the storwize product for deployment in solutions ranging from SVC to their own or OEM based products along with articulating their comprehensive data footprint reduction solution portfolio. What will be important for IBM to do is articulating what solution to use when, where, why and how without confusing their customers, partners and rest of the industry (something that Dell will also have to do).

Some links for additional reading on the above and related topics

Wrap up (for now)

Organizations of all shape and size are encountering some form of growing data footprint impact that currently, or soon will need to be addressed. Given that different applications and types of data along with associated storage mediums or tiers have various performance, availability, capacity, energy as well as economic characteristics multiple data footprint impact reduction tools or techniques are needed. What this all means is that the focus of data footprint reduction is expanding beyond that of just dedupe for backup or other early deployment scenarios.

Note what this means is that dedupe has an even brighter future than where it currently is focused which is still only scratching the surface of potential market adoption as was discussed in part 1 of this series.

However this also means that dedupe is not the only solution to all data footprint reduction scenarios. Other techniques including archiving, compression, data management, thin provisioning, data deletion, tiered storage and consolidation will start to gain respect, coverage discussions and debates.

Bottom line, use the most applicable technologies or combinations along with best practice for the task and activity at hand.

For some applications reduction ratios are an important focus on the tools or modes of operations that achieve those results.

Likewise for other applications where the focus is on performance with some data reduction benefit, tools are optimized for performance first and reduction secondary.

Thus I expect messaging from some vendors to adjust (expand) to those capabilities that they have in their toolboxes (product portfolios) offerings

Consequently, IMHO some of the backup centric dedupe solutions may find themselves in niche roles in the future unless they can diversity. Vendors with multiple data footprint reduction tools will also do better than those with only a single function or focused tool.

However for those who only have a single or perhaps a couple of tools, well, guess what the approach and messaging will be. After all, if all you have is a hammer everything looks like a nail, if all you have is a screw driver, well, you get the picture.

On the other hand, if you are still not clear on what all this means, send me a note, give a call, post a comment or a tweet and will be happy to discuss with you.

Oh, FWIW, if interested, disclosure: Storwize was a client a couple of years ago.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Data footprint reduction (Part 1): Life beyond dedupe and changing data lifecycles

Over the past couple of weeks there has been a flurry of IT industry activity around data footprint impact reduction with Dell buying Ocarina and IBM acquiring Storwize. For those who want the quick (compacted, reduced) synopsis of what Dell buying Ocarina as well as IBM acquiring Storwize means read this post here along with some of my comments here and here.

Now, before any Drs or Divas of Dedupe get concerned and feel the need to debate dedupes expanding role, success or applicability, relax, take a deep breath, then read on and take another breath before responding if so inclined.

The reason I mention this is that some may mistake this as a piece against or not in favor of dedupe as it talks about life beyond dedupe which could be mistaken as indicating dedupes diminished role which is not the case (read ahead and see figure 5 to see the bigger picture).

Likewise some might feel that since this piece talks about archiving for compliance and non regulatory situations along with compression, data management and other forms of data footprint reduction they may be compelled to defend dedupes honor and future role.

Again, relax, take a deep breath and read on, this is not about the death of dedupe.

Now for others, you might wonder why the dedupe tongue in check humor mentioned above (which is what it is) and the answer is quite simple. The industry in general is drunk on dedupe and in some cases thus having numbed its senses not to mention having blurred its vision of the even bigger opportunities for the business benefits of data footprint reduction beyond todays backup centric or vmware server virtualization dedupe discussions.

Likewise, it is time for the industry to wake (or sober) up and instead of trying to stuff everything under or into the narrowly focused dedupe bottle. Instead, realize that there is a broader umbrella called data footprint impact reduction which includes among other techniques, dedupe, archive, compression, data management, data deletion and thin provisioning across all types of data and applications. What this means is a broader opportunity or market than what exists or being discussed today leveraging different techniques, technologies and best practices.

Consequently this piece is about expanding the discussion to the larger opportunity for vendors or vars to extend their focus to the bigger world of overall data footprint impact reduction beyond where currently focused. Likewise, this is about IT customers realizing that there are more opportunities to address data and storage optimization across your entire organization using various techniques instead of just focusing on backup.

In other words, there is a very bright future for dedupe as well as other techniques and technologies that fall under the data footprint reduction umbrella including data stored online, offline, near line, primary, secondary, tertiary, virtual and in a public or private cloud..

Before going further however lets take a step back and look at some business along with IT issues, challenges and opportunities.

What is the business and IT issue or challenge?
Given that there is no such thing as a data or information recession shown in figure 1, IT organizations of all size are faced with the constant demand to store more data, including multiple copies of the same or similar data, for longer periods of time.


Figure 1: IT resource demand growth continues

The result is an expanding data footprint, increased IT expenses, both capital and operational, due to additional Infrastructure Resource Management (IRM) activities to sustain given levels of application Quality of Service (QoS) delivery shown in figure 2.

Some common IT costs associated with supporting an increased data footprint include among others:

  • Data storage hardware and management software tools acquisition
  • Associated networking or IO connectivity hardware, software and services
  • Recurring maintenance and software renewal fees
  • Facilities fees for floor space, power and cooling along with IT staffing
  • Physical and logical security for data and IT resources
  • Data protection for HA, BC or DR including backup, replication and archiving


Figure 2: IT Resources and cost balancing conflicts and opportunities

Figure 2 shows the result is that IT organizations of all size are faced with having to do more with what they have or with less including maximizing available resources. In addition, IT organizations often have to overcome common footprint constraints (available power, cooling, floor space, server, storage and networking resources, management, budgets, and IT staffing) while supporting business growth.

Figure 2 also shows that to support demand, more resources are needed (real or virtual) in a denser footprint, while maintaining or enhancing QoS plus lowering per unit resource cost. The trick is improving on available resources while maintaining QoS in a cost effective manner. By comparison, traditionally if costs are reduced, one of the other curves (amount of resources or QoS) are often negatively impacted and vice versa. Meanwhile in other situations the result can be moving problems around that later resurface elsewhere. Instead, find, identify, diagnose and prescribe the applicable treatment or form of data footprint reduction or other IT IRM technology, technique or best practices to cure the ailment.

What is driving the expanding data footprint?
Granted more data can be stored in the same or smaller physical footprint than in the past, thus requiring less power and cooling per Gbyte, Tbyte or PByte. Data growth rates necessary to sustain business activity, enhanced IT service delivery and enable new applications are placing continued demands to move, protect, preserve, store and serve data for longer periods of time.

The popularity of rich media and Internet based applications has resulted in explosive growth of unstructured file data requiring new and more scalable storage solutions. Unstructured data includes spreadsheets, Power Point, slide decks, Adobe PDF and word documents, web pages, video and audio JPEG, MP3 and MP4 files. This trend towards increasing data storage requirements does not appear to be slowing anytime soon for organizations of all sizes.

After all, there is no such thing as a data or information recession!

Changing data access lifecycles
Many strategies or marketing stories are built around the premise that shortly after data is created data is seldom, if ever accessed again. The traditional transactional model lends itself to what has become known as information lifecycle management (ILM) where data can and should be archived or moved to lower cost, lower performing, and high density storage or even deleted where possible.

Figure 3 shows as an example on the left side of the diagram the traditional transactional data lifecycle with data being created and then going dormant. The amount of dormant data will vary by the type and size of an organization along with application mix. 


Figure 3: Changing access and data lifecycle patterns

However, unlike the transactional data lifecycle models where data can be removed after a period of time, Web 2.0 and related data needs to remain online and readily accessible. Unlike traditional data lifecycles where data goes dormant after a period of time, on the right side of figure 3, data is created and then accessed on an intermittent basis with variable frequency. The frequency between periods of inactivity could be hours, days, weeks or months and, in some cases, there may be sustained periods of activity.

A common example is a video or some other content that gets created and posted to a web site or social networking site such as Face book, Linked in, or You Tube among others. Once the content is discussed, while it may not change, additional comment and collaborative data can be wrapped around the data as additional viewers discover and comment on the content. Solution approaches for the new category and data lifecycle model include low cost, relative good performing high capacity storage such as clustered bulk storage as well as leveraging different forms of data footprint reduction techniques.

Given that a large (and growing) percentage of new data is unstructured, NAS based storage solutions including clustered, bulk, cloud and managed service offerings with file based access are gaining in popularity. To reduce cost along with support increased business demands (figure 2), a growing trend is to utilize clustered, scale out and bulk NAS file systems that support NFS, CIFS for concurrent large and small IOs as well as optionally pNFS for large parallel access of files. These solutions are also increasingly being deployed with either built in or add on accessorized data footprint reduction techniques including archive, policy management, dedupe and compression among others.

What is your data footprint impact?
Your data footprint impact is the total data storage needed to support your various business application and information needs. Your data footprint may be larger than how much actual data storage you have as seen in figure 4. In Figure 4, an example is an organization that has 20TBytes of storage space allocated and being used for databases, email, home directories, shared documents, engineering documents, financial and other data in different formats (structured and unstructured) not to mention varying access patterns.


Figure 4: Expanding data footprint due to data proliferation and copies being retained

Of the 20TBytes of data allocated and used, it is very likely that the consumed storage space is not 100 percent used. Database tables may be sparsely (empty or not fully) allocated and there is likely duplicate data in email and other shared documents or folders. Additionally, of the 20TBytes, 10TBytes are duplicated to three different areas on a regular basis for application testing, training and business analysis and reporting purposes.

The overall data footprint is the total amount of data including all copies plus the additional storage required for supporting that data such as extra disks for Redundant Array of Independent Disks (RAID) protection or remote mirroring.

In this overly simplified example, the data footprint and subsequent storage requirement are several times that of the 20TBytes of data. Consequently, the larger the data footprint the more data storage capacity and performance bandwidth needed, not to mention being managed, protected and housed (powered, cooled, situated in a rack or cabinet on a floor somewhere).

Data footprint reduction techniques
While data storage capacity has become less expensive on a relative basis, as data footprint continue to expand in order to support business requirements, more IT resources will be needed to be made available in a cost effective, yet QoS satisfying manner (again, refer back to figure 2). What this means is that more IT resources including server, storage and networking capacity, management tools along with associated software licensing and IT staff time will be required to protect, preserve and serve information.

By more effectively managing the data footprint across different applications and tiers of storage, it is possible to enhance application service delivery and responsiveness as well as facilitate more timely data protection to meet compliance and business objectives. To realize the full benefits of data footprint reduction, look beyond backup and offline data improvements to include online and active data using various techniques such as those in table 1 among others.

There are several methods (shown in table 1) that can be used to address data footprint proliferation without compromising data protection or negatively impacting application and business service levels. These approaches include archiving of structured (database), semi structured (email) and unstructured (general files and documents), data compression (real time and offline) and data deduplication.

 

Archiving

Compression

Deduplication

When to use

Structured (database), email and unstructured

Online (database, email, file sharing), backup or archive

Backup or archiving or recurring and similar data

Characteristic

Software to identify and remove unused data from active storage devices

Reduce amount of data to be moved (transmitted) or stored on disk or tape.

Eliminate duplicate files or file content observed over a period of time to reduce data footprint

Examples

Database, email, unstructured file solutions with archive storage

Host software, disk or tape, (network routers) and compression appliances or software as well as appearing in some primary storage system solutions

Backup and archive target devices and Virtual Tape Libraries (VTLs), specialized appliances

Caveats

Time and knowledge to know what and when to archive and delete, data and application aware

Software based solutions require host CPU cycles impacting application performance

Works well in background mode for backup data to avoid performance impact during data ingestion

Table 1: Data footprint reduction approaches and techniques

Archiving for compliance and general data retention
Data archiving is often perceived as a solution for compliance, however, archiving can be used for many other non compliance purposes. These include general data footprint reduction, to boost performance and enhance routine data maintenance and data protection. Archiving can be applied to structured databases data, semi structured email data and attachments and unstructured file data.

A key to deploying an archiving solution is having insight into what data exists along with applicable rules and policies to determine what can be archived, for how long, how many copies and how data ultimately may be finally retired or deleted. Archiving requires a combination of hardware, software and people to implement business rules.

A challenge with archiving is having the time and tools available to identify what data should be archived and what data can be securely destroyed when no longer needed. Further complicating archiving is that knowledge of the data value is also needed; this may well include legal issues as to who is responsible for making decisions on what data to keep or discard.

If a business can invest in the time and software tools, as well as identify which data to archive to support an effective archive strategy, the returns can be very positive towards reducing the data footprint without limiting the amount of information available for use.

Data compression (real time and offline)
Data compression is a commonly used technique for reducing the size of data being stored or transmitted to improve network performance or reduce the amount of storage capacity needed for storing data. If you have used a traditional or TCP/IP based telephone or cell phone, watched either a DVD or HDTV, listened to an MP3, transferred data over the internet or used email you have most likely relied on some form of compression technology that is transparent to you. Some forms of compression are time delayed, such as using PKZIP to zip files, while others are real time or on the fly based such as when using a network, cell phone or listening to an MP3.

Two different approaches to data compression that vary in time delay or impact on application performance along with the amount of compression and loss of data are loss less (no data loss) and lossy (some data loss for higher compression ratio). In addition to these approaches, there are also different implementations of including real time for no performance impact to applications and time delayed where there is a performance impact to applications.

In contrast to traditional ZIP or offline, time delayed compression approaches that require complete decompression of data prior to modification, online compression allows for reading from, or writing to, any location within a compressed file without full file decompression and resulting application or time delay. Real time appliance or target based compression capabilities are well suited for supporting online applications including databases, OLTP, email, home directories, web sites and video streaming among others without consuming host server CPU or memory resources or degrading storage system performance.

Note that with the increase of CPU server processing performance along with multiple cores, server based compression running in applications such as database, email, file systems or operating systems can be a viable option for some environments.

A scenario for using real time data compression is for time sensitive applications that require large amounts of data such as online databases, video and audio media servers, web and analytic tools. For example, databases such as Oracle support NFS3 Direct IO (DIO) and Concurrent IO (CIO) capabilities to enable random and direct addressing of data within an NFS based file. This differs from traditional NFS operations where a file would be sequential read or written.

Another example of using real time compression is to combine a NAS file server configured with 300GB or 600GB high performance 15.5K Fibre Channel or SAS HDDs in addition to flash based SSDs to boost the effective storage capacity of active data without introducing a performance bottleneck associated with using larger capacity HDDs. Of course, compression would vary with the type of solution being deployed and type of data being stored just as dedupe ratios will differ depending on algorithm along with if text or video or object based among other factors.

Deduplication (Dedupe)
Data deduplication (also known as single instance storage, commonalty factoring, data difference or normalization) is a data footprint reduction technique that eliminates the occurrence of the same data. Deduplication works by normalizing the data being backed up or stored by eliminating recurring or duplicate copies of files or data blocks depending on the implementation.

Some data deduplication solutions boast spectacular ratios for data reduction given specific scenarios, such as backup of repetitive and similar files, while providing little value over a broader range of applications.

This is in contrast with traditional data compression approaches that provide lower, yet more predictable and consistent data reduction ratios over more types of data and application, including online and primary storage scenarios. For example, in environments where there is little to no common or repetitive data files, data deduplication will have little to no impact while data compression generally will yield some amount of data footprint reduction across almost all types of data.

Some data deduplication solution providers have either already added, or have announced plans to add, compression techniques to compliment and increase the data footprint effectiveness of their solutions across a broader range of applications and storage scenarios, attesting to the value and importance of data compression to reduce data footprint.

When looking at deduplication solutions, determine if the solution is designed to scale in terms of performance, capacity and availability over a large amount of data along with how restoration of data will be impacted by scaling for growth. Other items to consider include how data is reduplicated, such as real time using inline or some form of time delayed post processing, and the ability to select the mode of operation.

For example, a dedupe solution may be able to process data at a specific ingest rate inline until a certain threshold is hit and then processing reverts to post processing so as to not cause a performance degradation to the application writing data to the deduplication solution. The downside of post processing is that more storage is needed as a buffer. It can, however, also enable solutions to scale without becoming a bottleneck during data ingestion.

However, there is life beyond dedupe which is to in no way diminish dedupe or its very strong and bright future, one that Im increasingly convinced of having talked with hundreds of IT professionals (e.g. the customers) is that only the surface is being scratched for dedupe, not to mention larger data footprint impact opportunity seen in figure 5.


Figure 5: Dedupe adoption and deployment waves over time

While dedupe is a popular technology from a discussion standpoint and has good deployment traction, it is far from reaching mass customer adoption or even broad coverage in environments where it is being used. StorageIO research shows broadest adoption of dedupe centered around backup in smaller or SMB environments (dedupe deployment wave one in figure 5) with some deployment in Remote Office Branch Office (ROBO) work groups as well as departmental environments.

StorageIO research also shows that complete adoption in many of those SMB, ROBO, work group or smaller environments has yet to reach 100 percent. This means that there remains a large population that has yet to deploy dedupe as well as further opportunities to increase the level of dedupe deployment by those already doing so.

There has also been some early adoption in larger core IT environments where dedupe coexists with complimenting existing data protection and preservation practices. Another current deployment scenario for dedupe has been for supporting core edge deployments in larger environments that provide support for backup and data protection of ROBO, work group and departmental systems.

Note that figure 5 simply shows the general types of environments in which dedupe is being adopted and not any sort of indicators as to the degree of deployment by a given customer or IT environment.

What to do about your expanding data footprint impact?
Develop an overall data foot reduction strategy that leverages different techniques and technologies addressing online primary, secondary and offline data. Assess and discover what data exists and how it is used in order to effectively manage storage needs.

Determine policies and rules for retention and deletion of data combining archiving, compression (online and offline) and dedupe in a comprehensive data footprint strategy. The benefit of a broader, more holistic, data footprint reduction strategy is the ability to address the overall environment, including all applications that generate and use data as well as IRM or overhead functions that compound and impact the data footprint.

Data footprint reduction: life beyond (and complimenting) dedupe
The good news is that the Drs. and Divas of dedupe marketing (the ones who also are good at the disco dedupe dance debates) have targeted backup as an initial market sweet (and success) spot shown in figure 5 given the high degree of duplicate data.


Figure 6: Leverage multiple data footprint reduction techniques and technologies

However that same good news is bad news in that there is now a stigma that dedupe is only for backup, similar to how archive was hijacked by the compliance marketing folks in the post Y2K era. There are several techniques that can be used individually to address specific data footprint reduction issues or in combination as seen in figure 7 to implement a more cohesive and effective data footprint reduction strategy.


Figure 7: How various data footprint reduction techniques are complimentary

What this means is that both archive, dedupe as well as other forms of data footprint reduction can and should be used beyond where they have been target marketed using the applicable tool for the task at hand. For example, a common industry rule of thumb is that on average, ten percent of data changes per day (your mileage and rate of change will certainly vary given applications, environment and other factors).

Now assuming that you have 100TB (feel free to subtract a zero or two, or add as many as needed) of data (note I did not say storage capacity or percent utilized), ten percent change would be 10TB that needs to be backed up, replicated and so forth. Now with basic 2 to 1 streaming tape compression (2.5 to 1 in upcoming LTO enhancements) would reduce the daily backup footprint from 10TB to 5TB.

Using dedupe with 10 to 1 would get that from 10TB down to 1TB or about the size of a large capacity disk drive. With 20 to 1 that cuts the daily backup down to 500GB and so forth. The net effect is that more daily backups can be stored in the same footprint which in turn helps expedite individual file recover by having more options to choose from off of the disk based cache, buffer or storage pool.

On the other hand, if your objective is to reduce and eliminate storage capacity, then the same amount of backups can be stored on less disk freeing up resources. Now take the savings times the number of days in your backup retention and you should see the numbers start to add up.

Now what about the other 90 percent of the data that may not have changed, or, that did change and exists on higher performance storage?

Can its footprint impact be reduced?

The answer should be perhaps or it depends as well as prompts the question of what tool would be best. There is a popular thinking as is often the case with industry buzzwords or technologies to use it everywhere. After all goes the thinking, if it is a good thing why not use and deploy more of it everywhere?

Keep in mind that dedupe trades time to perform thinking and apply intelligence to further reduce data in exchange for space capacity. Thus trading time for space capacity can have a negative impact on applications that need lower response time, higher performance where the focus is on rates vs ratios. For example, the other 90 to 100 percent of the data in the above example may have to be on a mix of high and medium performance storage to meet QoS or service level agreement (SLA) objectives. While it would fun or perhaps cool to try and achieve a high data reduction ratio on the entire 100TB of active data with dedupe (e.g. trying to achieve primary dedupe), the performance impacts could have a negative impact.

The option is to apply a mix of different data footprint reduction techniques across the entire 100TB. That is, use dedupe where applicable and higher reduction ratios can be achieved while balancing performance, compression used for streaming data to tape for retention or archive as well as in databases or other applications software not to mention in networks. Likewise, use real time compression or what some refer to as primary dedupe for online active changing data along with online static read only data.

Deploy a comprehensive data footprint reduction strategy combining various techniques and technologies to address point solution needs as well as the overall environment, including online, near line for backup, and offline for archive data.

Lets not forget about archiving, thin provisioning, space saving snapshots, commonsense data management among other techniques across the entire environment. In other words, if your focus is just on dedupe for backup to
achieve an optimized and efficient storage environment, you are also missing

out on a larger opportunity. However, this also means having multiple tools or

technologies in your IT IRM toolbox as well as understanding what to use when, where and why.

Data transfer rates is a key metric for performance (time) optimization such as meeting backup or restore or other data protection windows. Data reduction ratios is a key metric for capacity (space) optimization where the focus is on storing as much data in a given footprint

Some additional take away points:

  • Develop a data footprint reduction strategy for online and offline data
  • Energy avoidance can be accomplished by powering down storage
  • Energy efficiency can be accomplished by using tiered storage to meet different needs
  • Measure and compare storage based on idle and active workload conditions
  • Storage efficiency metrics include IOPS or bandwidth per watt for active data
  • Storage capacity per watt per footprint and cost is a measure for in active data
  • Small percentage reductions on a large scale have big benefits
  • Align the applicable form of virtualization for the given task at hand

Some links for additional reading on the above and related topics

Wrap up (for now, read part II here)

For some applications reduction ratios are an important focus on the tools or modes of operations that achieve those results.

Likewise for other applications where the focus is on performance with some data reduction benefit, tools are optimized for performance first and reduction secondary.

Thus I expect messaging from some vendors to adjust (expand) to those capabilities that they have in their toolboxes (product portfolios) offerings

Consequently, IMHO some of the backup centric dedupe solutions may find themselves in niche roles in the future unless they can diversity. Vendors with multiple data footprint reduction tools will also do better than those with only a single function or focused tool.

However for those who only have a single or perhaps a couple of tools, well, guess what the approach and messaging will be.

After all, if all you have is a hammer everything looks like a nail, if all you have is a screw driver, well, you get the picture.

On the other hand, if you are still not clear on what all this means, send me a note, give a call, post a comment or a tweet and will be happy to discuss with you.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Supreme Court Rules Sarbox intact, Oversight Board Changes


Today the US Supreme Court ruled on a Nevada case involving constitutionality of the 2002 Sarbanes-Oxley (Sarbox) accounting regulations pertaining to appointments to the independent public company accounting oversight board.

The Supreme Court ruled that the Sarbox regulations or law remains intact, however the process or controls around the oversight board must change.

My interpretation and perspective from reading a few different reports is that Sarbox as you know and love (or hate) it is essentially still intact. However what has changed or will be is that individual board members can now be removed or at least in an easier manner. Instead of the request to strike down the Sarbox regulations, the Supreme Court instead appears to have left the regulations intact instead ruling that board members can be changed or removed.

What does this all mean?

Perhaps not much other than firms who have been making money on Sarbox now having something else to talk or consult about (Hmmm, a Sarbox stimulus?).

On the other hand, with the ability to have Sarbox board members more easily removed, perhaps we will see a new board installed that could influence the thinking and thus applicability of Sarbox activity.

Near term, I can see this as being non news for some, and for others, confusion and lets not forget that in chaos or confusion there is opportunity.

Here are some links to read more

  • US Supreme Court website and other news
  • Supreme Court to Hear Challenge to Accounting Board
  • Court Strikes Down Part of Sarbanes-Oxley
  • Nuff said about this for now, whats your take?

    Cheers gs

    Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    March Metric Madness: Fun with Simple Math

    Its March and besides being spring in north America, it also means tournament season including the NCAA basket ball series among others known as March Madness.

    Given the office pools and other forms of playing with numbers tied to the tournaments and real or virtual money, here is a quick timeout looking at some fun with math.

    The fun is showing how simple math can be used to show relative growth for IT resources such as data storage. For example, say that you have 10Tbytes of storage or data and that it is growing at only 10 percent per year, in five years with simple math yields 14.6Tbytes.

    Now lets assume growth rate is 50 percent per year and in the course of five years, instead of having 10Tbytes, that now jumps to 50.6Tbytes. If you have 100Tbytes today and at 50 percent growth rate, that would yield 506.3 Tbytes or about half of a petabyte in 5 years. If by chance you have say 1Pbyte or 1,000Tbytes today, at 25% year of year growth you would have 2.44Pbytes in 5 years.
    Basic Storage Forecast
    Figure 1 Fun with simple math and projected growth rates

    Granted this is simple math showing basic examples however the point is that depending on your growth rate and amount of either current data or storage, you might be surprised at the forecast or projected needs in only five years.

    In a nutshell, these are examples of very basic primitive capacity forecasts that would vary by other factors including if the data is 10Tbytes and your policies is for 25 percent free space, that would require even more storage than the base amount. Go with a different RAID level, some extra space for replication, snapshots, disk to disk backups and replication not to mention test development and those numbers go up even higher.

    Sure those amounts can be offset with thin provisioning, dedupe, archiving, compression and other forms of data footprint reduction, however the point here is to realize how simple math can portray a very basic forecast and picture of growth.

    Read more about performance and capacity in Chapter 10 – Performance and capacity planning for storage networks – Resilient Storage Networks (Elsevier) as well as at www.cmg.org (Computer Measurement Group)..

    And that is all I have to say about this for now, enjoy March madness and fun with numbers.

    Ok, nuff said.

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved