Behind the Scenes, SANta Claus Global Cloud Story

There is a ton of discussion, stories, articles, videos, conferences and blogs about the benefits and value proposition of cloud computing. Not to mention, discussion or debates about what is or what is not a cloud or cloud product, service or architecture including some perspectives and polls from me.

Now SANta does not really care about these and other similar debates I have learned. However he is concerned with who has been naughty and nice as well watching out for impersonators or members of his crew who misbehave.

In the spirit of the holidays, how about a quick look at how SANta leverages cloud technologies to support his global operations.

Many in IT think that SANta bases his operations out of the North Pole as it is convenient for him to cool all of his servers, storage, networks and telecom equipment (which it is). However its also centrally located (See chart) for the northern hemisphere (folks down under may get serviced via SANtas secret Antarctica base of operations). Just like ANC (Anchorage International Airport) is a popular cargo transient, transload and refueling base for cargo carriers, SANta also leverages the north and South Pole regions to his advantage.

Great Circle Mapper
SANtas Global Reach via Great Circle Mapper

Now do not worry if you have never heard about SANta dual redundant South Pole operations, its one of his better kept secrets. Many organizations including SANtas partners such as Microsoft that have global mega IT operations and logistics centers have followed SANtas lead of leveraging various locations outside of the pacific northwest. Granted like some of his partners and managed service providers, he does maintain a presence in Washington Columbia river basin which provides a nice PR among other benefits.

Likewise, many in business as well as those in IT think that SANta leverages cloud technologies for cost savings or avoidance which is partially the case. However he also leverages cloud, hosting, managed service provider (MSP), virtual data centers, virtual operations centers, Xaas, SaaS or SOA technologies, services, protocols and products that are transparent and complimentary to his own in house resources addressing various business and service requirement needs.

What this has to do with the holidays and clouds is that you may not realize how Santa or St. Nick if you prefer (feel free to plug in whoever you like if Santa or St. Nick does not turn your crank) extensively relies on flexible and scalable resilient technologies for boosting productivity in a cost effective manner. Some of it is IT related, some of it is not. For example, from the GPS and Radar along with recently added RNP and RNAV enhanced capabilities to his increasingly high tech bio fueled powered sleigh, not to mention his information technology (IT) that powers his global operations, old St Nick has got it together when it comes to technology.

The heart or brains of the SANta operation is his global system operations center (SOC) or network operation center (NOC) that rivals those seen at NASA among others with multiple data feeds. The SOC is a 24×365 operations function that covers all aspects from transportation, logistics, distribution, assembly or packaging, financials back office, CRM, IT and communications among other functions.

Naturally, like the Apollo moon shots whose Grumman built LEM Lunar lander had to have 100% availability in that to get off of the moon, their engines only had to fire once, however it had to work 100% of the time! This thought process is said to have had leveraged principles from SANtas operations guide where he has one night a year to accomplish the impossible.

I should mention, while I cannot disclose (due to NDA) the exact locations of the SOCs, data or logistics centers, not to mention the vendors or the technology being used, I can tell you that they are all around you! The fully redundant SOCs, data and call centers as well as logistics sites (including staff, facilities, technology) leverage different time zones for efficiency.

SANtas staff have also found that the redundant SOCs, part of an approach across Santa entire vast organization has helped to guard against global epidemics and pandemics including SARs and H1N1 among others by isolating workers while providing appropriate coverage and availability, something many large organizations have since followed.

Carrying through on the philosophy of redundant SOCs, all other aspects of SANtas operations are distributed yet with centralized coordinated management, leveraging real-time situation awareness, event and activity correlation (what we used to call or refer to as AI), cross technology domain management, proactive monitoring and planning yet with ability for on the spot decision making.

What this means is that the various locations have ability to make localized decisions on the spot. However coordinated with primary operations or mission control to streamline global operations focus on strategic activity along with exceptions handling to be more effective. Thus it is not fully distributed nor fully centralized, rather a hybrid in terms of management, technologies and the way they work.

For example, to handle the diverse applications, there are some primary large processing and data retention facilities that backup, replicate information to other peer sites as well as smaller regional remote office branch offices close to where information services are needed. To say the environment is highly virtualized would be an understatement.

Likewise, optimization is key not just to keep costs low or avoid overheating some of SANtas facilities that are located in the Arctic and Antarctic regions that could melt the ice cap; they are also optimized to keep response time as low as possible while boosting productivity.

Thus, SANta has to rely on very robust and diverse communications networking leveraging LAN, SAN, MAN, WAN, POTS and PANs among other technologies. For example, his communications portfolio is said to involves landlines (copper and optical), RF including microwave and other radio based commutations supporting or using 3G, 4G, MPLS, SONET/SCH, xWDM, Microwave and Free space optics among others.

SANtas networking and communications elves are also said to be working with 5G and 100GbE multiplexed on 256 lambda WDM trunk circuits in non core trunk applications. Of course given the airborne operations, satellite and ACARS are a must to avoid over flying a destination while remaining in positive control during low visibility. Note that Santa routinely makes more CAT 3+ low visibility landings than most of the worlds airlines, air freight companies combined.

My sources also tell me that SANta has virtual desktop capability leveraging PCoIP and other optimizations on his primary and backup sleighs enabling rapid reconfiguration for changing workload conditions. He also is fully equipped with onboard social media capabilities for updates via twitter, Face book and Linked In among others designed by his chief social networking elf.

Consequently, given the vast amount of information needed to support his operations from CRM, shipping, tracking not to mention historical and profiling needs, transactional volumes both on the data as well as voice and social media networks dwarf the stock market trading volume.

Feeding SANtas vast organizations are online highly available robust databases for transactions purposes, reference unstructured data material including videos, websites and more. Some of which look hauntingly familiar given those that are part of SANtas eWorld Helpers initiative including: Sears, Amazon, NetFlix, Target, Albertsons, Staples, EMC, Wall mart, Overstock, RadioShack, Landsend, Dell, HP, eBay, Lowes, Publix, emusic, Riteaid and Supervalu among others (Im just sayin…).

The actual size of SANta information repository is a closely regarded secret as is the exact topology, schema and content structure. However it is understood that on peak days SANtas highly distributed high performance, low latency data warehouse sees upwards of 1,225PBytes of data added, one that is rumored to make Larry Ellison gush with excitement over its growth possibilities.

How does SANta pull this all off is by leveraging virtualization, automation, efficient and enabling technologies that allow him and elves (excuse me, associates or team members) to be more productivity in their areas of focus that is the envy of the universe.

Some of their efficiency is measured in terms of:

  • How many packages can be processed per elf with minimum or no mistakes
  • Number of calls, requests, inquiries per day per elf in a friendly and understandable manner
  • Knowing who has been naughty or nice in the blink of an eye including historical profiles
  • Virtual machines (VM) or physical machine (PM) servers managed per team member
  • Databases and applications, local and remote, logical and physical per team member
  • Storage in terms of PByte and Exabyte managed to given service level per team member
  • Network circuits and bandwidth with fewest dropped packets (or packages) per member
  • Fewest misdirected packages as well as aborted landings per crew
  • Fewest pounds gained from consumption of most milk and cookies per crew

From how many packages can be processed per hour, to the number of virtual servers per person, PBytes of data managed per person, network connections and circuits per person, databases and applications per person to takes and landings (SANta has the top of the list for this one), they are all highly efficient and effective.

Likewise, SANta leverages the partners in his SANtas eWORLD Helpers initiative network to help out where of course he looks for value; however value is not just lowest price per VM, lowest cost per TByte or cost per bandwidth. For SANta it is also very focused on performance, availability, capacity and economic efficiency not to mention quality with an environmentally friendly green supply chain.

By having a green supply chain, SANta leverages from a responsible, global approach that also makes economic sense on where to manufacture and produce or procure products. Contrary to growing popular belief, locally produced may not always be the most environmentally as well as economically favorable approach. For example (read more here), instead of growing flowers and plans in western Europe where they are consumed, a process that would require more energy for heat, lights, not to mention water and other resources. SANta has bucked the trend instead relying on the economics and environmental benefit of leveraging flowers and plants grown in warmer, sunnier climates.

Granted and rest assured, SANta still has an army of elves busily putting things together in his own factories along with managing IT related activities in a economically positive manner.

SANta has also leveraged this thinking to his data and information and communications networks leveraging sites such as in the arctic where solar power can be used during summer months along with cooling economizers to offset the impact of batteries, workload is shifted around the world as needed. This approach is rumored to be the envy of the US EPA Energy Star for Server, Storage and Data Center crew not to mention their followers.

How does SANta make sure all of the data and information is protected and available? Its a combination of best practices, techniques, technologies including hardware, software, data protection management tools, disk, dedupe, compression, tape and cloud among others.

Rest assured, if it is in the technology buzzword bingo book, it is a good bet that it has been tested in one of SANtas facilities, or, partner sites long before you hear about it even under a strict NDA discussion with one of his elves (opps, I mean supplier partners).

When asked of the importance of his information and data networks, resources and cloud enabled highly virtualized efficient operations SANta responded with a simple:

Ho Ho Ho, Merry Christmas to all, and to all, a good night!

As you sit back and relax, reflect, recreate, recoup or recharge, or whatever it is that you do this time of the year, take a moment to think about and thank all of SANtas helpers. They are the ones that work behind the scenes in SANtas facilities as well as his partners or suppliers, some in the clouds, some on or underground to make the worlds largest single event day (excuse me, night) possible! Or, is this SANta and cloud thing all just one big fantasy?

Happy and safe holidays or whatever you want to refer to it as, best wishes and thanks!

BTW: FTC disclosure information can be found here!

Greg on Break

Me on a break during tour SANta site tour

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Another StorageIO Appearance on Storage Monkeys InfoSmack

Following up from a previous appearance, I recently had another opportunity to participate in another Storage Monkeys InfoSmack podcast episode.

In the most recent podcast, discussions were centered on the recent service disruption at Microsoft/T-Mobile Side-Kick cloud services, FTC blogger disclosure guidelines, is Brocade up for sale and who should buy them, SNIA and SNW among other topics.

Here are a couple of relevant links pertaining to topics discussed in this InfoSmack session.

If you are involved with servers, storage, I/O networking, virtualization and other related data infrastructure topics, check out Storage Monkeys and InfoSmack.

Cheers – gs

Greg Schulz – StorageIO, Author “The Green and Virtual Data Center” (CRC)

Clouds and Data Loss: Time for CDP (Commonsense Data Protection)?

Today SNIA released a press release pertaining to cloud storage timed to coincide with SNW where we can only presume vendors are talking about their cloud storage stories.

Yet chatter on the coconut wire along with various news (here and here and here) and social media sites is how could cloud storage and information service provider T-Mobile/Microsoft/Side-Kick loose customers data?

Data loss is a dangerous phrase, after all, your data may still be intact somewhere, however if you cannot get to it when needed, that may seem like data loss to you.

There are many types of data loss including loss of accessibility or availability along with flat out loss. Let me clarify, loss of data availability or accessibility means that somewhere, your data is still intact, perhaps off-line on a removable disk, optical, tape or at another site on-line, near-line or off-line, its just that you cannot get to it yet. There is also real data loss where both your primary copy and backup as well as archive data are lost, stolen, corrupted or never actually protected.

Clouds or managed service providers in general are getting beat up due to some loss of access, availability or actual data loss, however before jumping on that bandwagon and pointing fingers at the service, how about a step back for a minute. Granted, given all of the cloud hype and proliferation of managed service offerings on the web (excuse me cloud), there is a bit of a lightning rod backlash or see I told you so approach.

Whats different about this story compared to prior disruptions with Amazon, Google, Blackberry among others is that unlike where access to information or services ranging from calendar, emails, contacts or other documents is disrupted for a period of time, it sounds as those data may have been lost.

Lost data you should say? How can you lose data after all there are copies of copies of data that have been snapshot, replicated and deduplicated storage across different tiered storage right?

Certainly anyone involved in data management or data protection is asking the question; why not go back to a snapshot copy, replicated volute, backup copy on disk or tape?

Needless to say, finger pointing aerobics are or will be in full swing. Instead, lets ask the question, is it time for CDP as in Commonsense Data Protection?

However, rather than point blame or spout off about how bad clouds are, or, that they are getting an un-fair shake and un-due coverage, and that just because there might be a few bad ones, not all clouds are bad particularly with recent outages.

I can think of many ways on how to actually lose data, however, to totally lose data requires not a technology failure, it can be something much simpler and is equally applicable to cloud, virtual and physical data centers and storage environments from the largest to the smallest to the consumer. Its simple, common sense, best practices, making copies of all data and keeping extra copies around somewhere, with more frequent or recent data having copies readily available.

Some trends Im seeing include among others:

  • Low cost craze leveraging free or near free services and products
  • Cloud hype and cloud bashing and need to discuss wide area in between those extremes
  • Renewed need for basic data protection including BC/DR, HA, backup and security
  • Opportunity to re-architect data protection in conjunction with other initiatives
  • Lack of adequate funding for continued and proactive data protection

Just to be safe, lets revisit some common data protection best practices:

  • Learn from mistakes, preferable during testing with aim to avoid repeating them again
  • Most disasters in IT and elsewhere are the result of a chain of events not being contained
  • RAID is not a replacement for backup, it simply provides availability or accessibility
  • Likewise, mirroring or replication by themselves is not a replacement for backup.
  • Use point in time RPO based data protection such as snapshots or backup with replication
  • Maintain a master backup or gold copy that can be used to restore to a given point of time
  • Keep backup on another medium, also protect backup catalog or other configuration data
  • If using deduplication, make sure that indexes/dictionary or Meta data is also protected.
  • Moving your data into the cloud is not a replacement for a data protection strategy
  • Test restoration of backed data both locally, as well as from cloud services
  • Employ data protection management (DPM) tools for event correlation and analysis
  • Data stored in clouds need to be part of a BC/DR and overall data protection strategy
  • Have extra copy of data placed in clouds kept in alternate location as part of BC/DR
  • Ask yourself, what will do you when your cloud data goes away (note its not if, its when)
  • Combine multiple layers or rings of defines and assume what can break will break

Clouds should not be scary; Clouds do not magically solve all IT or consumer issues. However they can be an effective tool when of high caliber as part of a total data protection strategy.

Perhaps this will be a wake up call, a reminder, that it is time to think beyond cost savings and a shift back to basic data protection best practices. What good is the best or most advanced technology if you have less than adequate practices or polices? Bottom line, time for Commonsense Data Protection (CDP).

Ok, nuff said for now, I need to go and make sure I have a good removable backup in case my other local copies fail or Im not able to get to my cloud copies!

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Back to School and Dedupe School

Summers is over hear in the northern hemisphere and its back to school time.

This coming week I will be the substitute teacher filling in for my friend Mr. Backup in Minneapolis and Toronto for TechTargets Dedupe School. If you are in either city and have not yet signed up, check out the link here to learn more.

Hope to see you this week, or, next week at Infrastructure Optimization in Chicago or Storage Decisions in NYC where I will also be presenting or teaching if you prefer, as well as listening and learning from the attendees whats on their minds.

Stay current on other upcoming activities on our events page, as well as see whats new or in the news here.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Worried about IT M&A, here come the new startups!

Storage I/O trends

Late last year , I did a post (see here) countering the notion that there is a lack of innovation in IT and specifically around data storage. Recently I did a post about a Funeral for Friend, not to mention yesterdays post about Summer marriages.

For those who are concerned about lack of innovation, or, that consolidation will result in just a few big vendors, here’s some food for thought. Those big vendors in addition to growing via internal organic growth, also grow by buying or merging with other vendors. Those other vendors emerge as startups, some grow, blossom and are bought, some make a decent business on their own, some are looking to be bought, some need to be bought, some will see fire sales, liquidation or simply closing their doors and perhaps re-launching as a new company.

With all the M&A activity currently that has taken place, and I’m sure (speculation only ;) ) that there will be plenty more, here’s a short and far from comprehensive list of some startups or companies you may not have heard of yet. There are additional ones who are still in deep stealth, some on the list are still in stealth, yet talking and letting information trickle out, thus only non-NDA information is being shown here. In other words, you can find out about these via publicly available information and sources.

Something that I have noticed and talked with others in the industry about is that this generation of startups, at least for now are taking a far more low-key approach to their launches than in the past. Gone at least for now are the Dot COM era over the top announcements in some cases before there was even a product or shipping for actual customer production deployment scenario. This crop or corps of startups are taking their time leveraging the current economic situation to further incubate their technologies and go to market strategies, not to mention minimizing the amount of over the top VC funding we have seen in the past. Some of these may not appear to be storage related and that would be correct. This list includes those associated with data infrastructure technolgies from servers, to storage to networking, hardware, software and services among othes as a common theme.

Disclosure Notice: None of these companies mentioned are nor have ever been clients of StorageIO. Why do I mention this, why not!

Balesio – File compression solutions
Box.net – Internet/web/cloud storage service with high availability and backup
Cirrustore – Backup data protection tools
Dataslide – Hard rectangular disk (HRD)
Enclarity – Healthcare CRM and analysis tools
Enstratus – Amazon cloud computing management tools
Exludas – Multi core optimize
Firescope – CMDB data solutions
Greenbytes – ZFS based storage management solutions
Likewise – Open backup software for macs/linux/windows
Liquidcomputing – High density servers
Maxiscale – Web infrastructure (Stealth)
Metalogix – Archiving solutions
Neptuny – Capacity Planning
Netronome – Network and I/O optimization technology
Newboundary – IT policy management and IRM tools
Nexenta ZFS – based storage management solutions
Pergamumsystems – Archive solutions (Stealth)
Pranah – SMB Storage vendor formerly known as Marner
Procedo – Archiving and migration solutions
Rebit – Backup and data protection solutions
Rightscale – Amazon cloud computing management tools
Rmsource – Cloud backup solutions
RNAnetworks – Virtual memory management solutions
Scale Computing – Clustered storage management software
ScaleMP – Multi-core virtualization for scale out
SiberSystems – Goodsync data protection solutions
Sparebackup – Backup data protection solutions
StorageFusion – Storage resource analysis
Storspeed – NAS/NFS optimization solutions (Stealth)
Sugarsync – Backup and data protection solutions
Surgient – Cloud computing solutions
Synology – SMB storage solutions
TwinStrata – BC/DR analysis and assessment tools
Vadium – Security and encryption tools
Vembu – Backup data protection tools
Versant – Object database management solutions
Vipre – Security, data loss, data leak prevention
VirtenSys – Virtual I/O and I/O virtualization (IOV)
Vizrt – Video management software tools
WhipTail – Flash SSD solutions
Xenos – Archive and data footprint reduction solutions

Links to the above along with many other companies including manufactures and vars can be found on the Interesting Links page at StorageIO.

Food for thought for your summer technology picnic fun.

Nuf said for now.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Did someone forget to tell Dell that Tape is dead?

Storage I/O trends

Did someone forget to send a memo to Dell that magnetic tape is dead, or, perhaps pre-occupied with other activities? Maybe no body at Dell read the “virtual” or “fictional” memo that tape is dead?

Ok, enough with the cynicism and joking around, tape is not dead (See recent Computerworld and Dell story) and Dell is one of several vendors including IBM who still find time to talk about tape as part of a solution to different customer and environment needs.

Sure, tape might be in or heading into its golden years or what can also be called the plateau of productivity (for customers) or profitability (for some vendors), tape does not get the marketing dollars and media coverage as its been around as a technology for a long time and their are cooler and niftier (techno term) things to discuss including disk based backup and data protection, CDP, VTLs, de-dupe debates, clusters, grids and clouds, FCoE vs. iSCSI, NAS, SAS, virtualization, OSD and pretty much anything except tape.

However, the reality is that many organizations, particular larger organizations still use and rely on tape based data protection for backup/BC/DR as well as archive for compliance and non-compliance data retention or data preservation activities, in some cases complimenting and co-existing with disk based solutions.

Disk to disk (D2D) based backups and data protection certainly continue to gain adoption and deployments in both large and small environments, however, the shift to disk based data protection, or, clinging to tape with a death grip does not have to be, nor should it be an all or nothing value proposition, that is, they can and do co-exist for different uses and purposes leveraging the various economics and benefits of the technologies to address various tasks and requirements.

New and emerging technologies certainly need to be discussed, dissected, developed and deployed as they are the future for maintaining and sustaining business growth via IT service delivery in economical and reliable fashion, that is, apply what technologies makes economic and business sense at a given point in time to minimize risk while maximizing useful benefits to your business.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Data Proteciton for Virtual Environments at VMware VMworld

Storage I/O trends

Data protection for virtual environments including protecting virtual servers and virtual storage as well as using virtualization techniques to protect applications and data on non consolidated servers is gaining plenty of attention building on past, recent and this weeks as well as other forthcoming announcements during VMworld 2008 taking place now in Las Vegas. The last month or so has been busy with the usual analyst pre-briefing sessions for some of the items now announced as well as others that are still in the wings.

Here are a few links, one to a recent webcast (Industry Trends and Perspectives: Data Protection for Virtual Server Environments) along with another to an industry trends and perspective white paper titled “Data Protection Options for Virtual Servers”.

Now its time to get ready to travel off to New Orleans where I will be speaking about data protection and other related topics for virtual server and storage environments tonight at an event and then later this week in Chicago.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Links to Upcoming and Recent Webcasts and Videocasts

Here are links to several recent and upcoming Webcast and video casts covering a wide range of topics. Some of these free Webcast and video casts may require registration.

Industry Trends & Perspectives – Data Protection for Virtual Server Environments

Next Generation Data Centers Today: What’s New with Storage and Networking

Hot Storage Trends for 2008

Expanding your Channel Business with Performance and Capacity Planning

Top Ten I/O Strategies for the Green and Virtual Data Center

Cheers
Greg Schulz – StorageIO

SMB capacity planning; Focusing on energy conservation

Storage I/O trends

Here’s a link to a new tip I wrote that is posted over at SearchSMBStorage on Capacity Planning and energy conservation.

Here are some added links to other recent tips I wrote and posted at a SearchSMBStorage:

Improve your storage energy efficiency

Data protection for virtual server environments

Data footprint reduction for SMBs

Is clustered NAS for SMBs?

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Missing Dedupe Debate Detail!

Storage I/O trends

The de-dupe vendors like to debate details of their solutions, ranging from compression or de-dupe ratios, to hashing and caching algorithms, to processor vs. disk vs. memory, to in-band vs. out-of-band, pre or post processing among other items. At times the dedupe debates can get more lively than a political debate or even the legendary storage virtualization debates of yester year.

However one item that an IT professional recently mentioned that is not being addressed or talked about during the de-dupe debates is how IT customers will get around vendor lock-in. Never mind the usual lock-in debates of whose back-end storage or disk drives, whose server a de-dupe appliance software runs and so forth.

The real concern is how data in the future will be recoverable from a de-dupe solution similar to how data can be recovered from tape today. Granted this is an apple to oranges comparison at best. The only real similarity is that a backup or archive solution sends a data stream in a tar-ball or backup or archive save set or perhaps in a file format to the tape or de-dupe appliance. Then, the VTL or de-dupe appliance software puts the data into yet another format.

Granted not all tape media can be interchanged between different tape drives given format, generations and of course using the proper backup or archive application to un-pack the data for use. Probably a more applicable apple to oranges comparison would be how will IT personal get data back from a VTL (non de-duping) disk based storage system compared to getting data back from a VTL or de-dupe appliance.

Today and for the foreseeable future the answer is simple, if your pain point is severe and you need the benefits of de-dupe, then the de-dupe software and appliance is your point of vendor lock-in. If vendor lock-in is a main concern, take your time, do your homework and due diligence for solutions that reduce lock-in or at least give a reasonable strategy for data access in the future.

Welcome to the world of virtualized data and virtualized data protection. Here?s the golden rule for de-dupe and that is like virtualization, who ever controls the software and management meta data controls the vendor lock-in, good, bad or in-different, that?s the harsh reality.

For the record, I like de-dupe technology in general as part of an overall data footprint reduction strategy combined with archiving and real-time compression for on-line and off-line data. I see a very bright future for it moving forward. I also see many of the heavy thinking and heavy lifting issues to support large-scale deployments and processing getting addressed over time allowing de-dupe to move from mid markets to large-scale mainstream adoption.

Now, back to your regularly scheduled de-dupe debate drama!

Cheers
gs