Missing Dedupe Debate Detail!

Storage I/O trends

The de-dupe vendors like to debate details of their solutions, ranging from compression or de-dupe ratios, to hashing and caching algorithms, to processor vs. disk vs. memory, to in-band vs. out-of-band, pre or post processing among other items. At times the dedupe debates can get more lively than a political debate or even the legendary storage virtualization debates of yester year.

However one item that an IT professional recently mentioned that is not being addressed or talked about during the de-dupe debates is how IT customers will get around vendor lock-in. Never mind the usual lock-in debates of whose back-end storage or disk drives, whose server a de-dupe appliance software runs and so forth.

The real concern is how data in the future will be recoverable from a de-dupe solution similar to how data can be recovered from tape today. Granted this is an apple to oranges comparison at best. The only real similarity is that a backup or archive solution sends a data stream in a tar-ball or backup or archive save set or perhaps in a file format to the tape or de-dupe appliance. Then, the VTL or de-dupe appliance software puts the data into yet another format.

Granted not all tape media can be interchanged between different tape drives given format, generations and of course using the proper backup or archive application to un-pack the data for use. Probably a more applicable apple to oranges comparison would be how will IT personal get data back from a VTL (non de-duping) disk based storage system compared to getting data back from a VTL or de-dupe appliance.

Today and for the foreseeable future the answer is simple, if your pain point is severe and you need the benefits of de-dupe, then the de-dupe software and appliance is your point of vendor lock-in. If vendor lock-in is a main concern, take your time, do your homework and due diligence for solutions that reduce lock-in or at least give a reasonable strategy for data access in the future.

Welcome to the world of virtualized data and virtualized data protection. Here?s the golden rule for de-dupe and that is like virtualization, who ever controls the software and management meta data controls the vendor lock-in, good, bad or in-different, that?s the harsh reality.

For the record, I like de-dupe technology in general as part of an overall data footprint reduction strategy combined with archiving and real-time compression for on-line and off-line data. I see a very bright future for it moving forward. I also see many of the heavy thinking and heavy lifting issues to support large-scale deployments and processing getting addressed over time allowing de-dupe to move from mid markets to large-scale mainstream adoption.

Now, back to your regularly scheduled de-dupe debate drama!

Cheers
gs

Do Disk based VTLs draw less power than Tape?

The tape is dead debates rage on as they have for a decades which make for good press and discussion or debate during slow times, similar to coverage of what Britney Spears or Paris Hilton are or are not wearing.

In the on-going debates and Greenwashing of what technology or vendor is greener to prevent global warming, some recent tape is dead flare-ups have occurred including one hinting that tape libraries can draw more power than a disk based VTL with de-dupe are discussed over on Tony Pearson of IBM fame blog site as well as Beth Pariseau of TechTarget StorageSoup site.

I posted some comments on those sites along along with a link to a StorageIO Industry Trends and Perspective report titled “Energy Savings without Performance Compromise” as an example (look for an updated version of the comparison charts in the report in the not so distant future). The report looks at how different storage tiers including on-line disk, MAID, MAID 2.0 and tape libraries vary to address different PCFE (power, cooling, floor-space, environment) issues while supporting various service levels including performance, availability, capacity and energy use.

Additional related material can be found at www.storageio.com and www.greendatastorage.com including the Industry Trends and Perspective Report Business “Benefits of Data Footprint Reduction in general covering archiving, compression (on-line and off-line) along with de-duplication

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Wide World of Archiving – Life Beyond Compliance

Earlier this week I did a keynote talk at a TechTarget event in the New York city area titled the “Wide World of Archiving – Life Beyond Compliance” with the basic theme that archiving and data preservation for future or possible future use is not unique or exclusive to SARBOX, HIPPA, CFR, PCI, OHSAS, ISO or other members of the common alphabet soup of governmental or industry regulatory compliance needs.

The basic theme is that archiving can be used to discuss many IT and business pain points and issues from preserving project oriented or seasonal data to off-loading un-used or seldom used data to free up resources to meet power, cooling, floor space and environmental (PCFE) issues
“aka Green”
along with boosting performance for on-line access as well as backup, BC and DR.

The challenge however is that archiving while a powerful technique, is also complex in that it requires hardware and mediums to park your data onto, software to find and then execute policies defined by someone to move data to the archive medium and if applicable, delete or cleanup data that has been moved all of which has cost and application specific issues. Then the human side which is more involved than simply throwing head count at the tasks and avoiding the mistakes of the Mythical Man Month.

The human side of archiving is the glue to make it work in that similar to cleaning out your garage, attic, basement or store-room, you can have someone come and do the real work, however do they have the insight to know what to keep and what to discard? Sure that’s an overly simple example, and there are plenty of search and discovery software management tool vendors who will be more than happy to show you a demo of their wares that will discovery and classify and categorize and index what data you have as well as interface with policy managers, data movers and archiving devices.

However who is going to tell the management tools what policies are applicable and the different variances for your different business segments or activities? Consequently the key to making archiving work particularly on a broader basis is to get internal personal familiar with your business, IT personal, as well as external subject matter experts involved all of which leads to a challenge and dilemma of is it cheaper to just buy more energy-efficient, space-saving storage than to pay the fees to find, manage, move and archive data. Talking with one of the attendees who brought up some good points that this all makes sense however there is a scaling challenge and when dealing with 100’s of TBytes or PBytes, the complexity increases.

This is where the notion of scaling with stability comes into play in that many solutions exist to address different functionality for example archiving, de-duping, compression, server or storage virtualization, thin-provisioning among many others however how do they scale with stability. That is, how stable or reliable do the solutions remain when scaling from 10s to 100’s to 10,000’s or even 100,000’s users, email boxes, sessions, streams or from 10’s of TBytes to 10’s of PBytes? How does the performance hold up, how does the availability hold up, how does the management and on-going care and feeding change for the better or worse? Concerns around scaling is a common issue I hear from IT organizations pertaining to both hardware and software tools in that what works great during a WebEx demo or PowerPoint or pdf slide show may be different from real-world performance, management, reliability and complexity concerns. After all, have you ever seen a WebEx or live office or PowerPoint or PDF slide deck showing a hardware or software based solution that could not scale or provide transparent interoperability? That would be akin to finding a used car sales rep who gives you a tour of how a car was refurbished inside and out after it was declared totaled by the previous owners insurance company after the last great flood or hurricane.

Getting back to archiving, and not trying to conquer all of your data at one time, take a divide and conquer approach, go for some low hanging fruit where your chances of success go up that you can then build some momentum and perhaps a business case to do a larger project. Also, one solution particularly one archiving software solution may not be applicable to all of your needs in that you may need a tool specialized for email, one for databases and another general purpose tool. Likewise you may need to engage different subject matter experts to help you with policy definition and establishing rules to meet different requirements which is where business partners can come into play with either their in-house staff, partners, or associates that they work with for different issues and needs.

Look beyond the hardware and software, look at the people or human and knowledge side of archiving as well as look beyond archiving for compliance as there is a much bigger wide world of archiving and opportunity. If you remember the ABC sports TV show “Wide World of Sports” you may recall Jim McCay saying “Spanning the globe to bring you the constant variety of sports… the thrill of victory… and the agony of defeat… the human drama of athletic competition… This is “ABC’s Wide World of Sports!”.

From an archiving perspective, keep this in mind in that there is a wide world of opportunities for archiving, the thrill of victory are the benefits, the agony of defeat are the miss-steps, lack of scaling, out of control costs or complexity, the human drama is how to make or break a solution, this is the “Wide World of Archiving”…

Rest assured, some form of archiving structured database, semi-structured email with attachments and un-structured word, PowerPoint, PDF, MP3 and other data is in your future, it’s a matter of when. Archiving is just one of many tools available for effectively managing your data and addressing data footprint sprawl particular for data that you can not simply delete and ignore, if you need it to go forward, you need to keep it. Or, as a friend of mine says You can’t go forward unless you can go back. Likewise, you can’t manage what you don’t know about; you can’t move and delete what you can’t manage.

Look for solution providers who are not looking to simply get you to buy the latest and greatest archiving storage device, or, the slickest archiving management tool with a Uhi Gui that rivals those on an Wii or Xbox, or, that is looking to simply run up billable hours. That’s a balancing act requires investing time with different business solution providers to see where their core business is, how they can scale, where and how they make their money to help you decide where and how they are fit as opposed to simply adding complexity to your environment and existing issues.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Airport Parking, Tiered Storage and Latency

Storage I/O trends

Ok, so what do airport parking, tiered storage and latency have in common? Based on some recent travel experience I will assert that there is a bit in common, or at a least an analogy. What got me thinking about this was recently I could not get a parking spot at the airport primary parking ramp next to the terminal (either a reasonable walk or short tram ride) which offers quick access to the departure gate.

Granted there is premium for this ability to park or “store” my vehicle for a few days in near to airport terminal, however that premium is off-set in the time savings and less disruptions enabling me a few extra minutes to get other things done while traveling.

Let me call the normal primary airport parking tier-1 (regardless of what level of the ramp you park on), with tier-0 being valet parking where you pay a fee that might rival the cost of your airline ticket, yet your car stays in a climate controlled area, gets washed and cleaned, maybe an oil change and hopefully in a more secure environment with an even faster access to your departure gate, something for the rich and famous.

Now the primary airport parking has been full lately, not surprising given the cold weather and everyone looking to use up their carbon off-set credits to fly somewhere warm or attend business meetings or what ever it is that they are doing.

Budgeting some extra time, a couple of weeks ago I tried one of those off-site airport parking facilities where the bus picks you up in the parking lot and then whisks you off to the airport, then you on return you wait for the buss to pick you up at the airport, ride to the lot and tour the lot looking at everyone’s car as they get dropped off and 30-40 minutes later, you are finally to your vehicle faced with the challenge of how to get out of the parking lot late at night as it is such a budget operation, they have gone to lights out and automated check-out. That is, put your credit card in the machine and the gate opens, that is, if the credit card reader is not frozen because it about “zero” outside and the machine wont read your card using up more time, however heck, I saved a few dollars a day.

On another recent trip, again the main parking ramp was full, at least the airport has a parking or storage resource monitoring ( aka Airport SRM) tool that you can check ahead to see if the ramps are full or not. This time I went to another terminal, parked in the ramp there, walked a mile (would have been a nice walk if it had not been 1 above zero (F) with a 20 mile per hour wind) to the light rail train station, waited ten minutes for the 3 minute train ride to the main terminal, walked to the tram for the 1-2 minute tram ride to the real terminal to go to my departure gate. On return, the process was reversed, adding what I will estimate to be about an hour to the experience, which, if you have the time, not a bad option and certainly good exercise even if it was freezing cold.

During the planes, trains and automobiles expedition, it dawned on me, airport parking is a lot like tiered storage in that you have different types of parking with different cost points, locality of reference or latency or speed from which how much time to get from your car to your plane, levels of protection and security among others.

I likened the off-airport parking experience to off-line tier-3 tape or MAID or at best, near-line tier-2 storage in that I saved some money at the cost of lost time and productivity. The parking at the remote airport ramp involving a train ride and tram ride I likened to tier-2 or near-line storage over a very slow network or I/O path in that the ramp itself was pretty efficiency, however the transit delays or latency were ugly, however I did save some money, a couple of bucks, not as much as the off-site, however a few less than the primary parking.

Hence I jump back to the primary ramp as being the fastest as tier-1 unless you have someone footing your parking bills and can afford tier-0. It also dawned on me that like primary or tier-1 storage, regardless of if it is enterprise class like an EMC DMX, IBM DS8K, Fujitsu, HDS USP or mid-range EMC CLARiiON, HP EVA, IBM DS4K, HDS AMS, Dell or EqualLogic, 3PAR, Fujitsu, NetApp or entry-level products from many different vendors; people still pay for the premium storage, aka tier-1 storage in a given price band even if there are cheaper alternatives however like the primary airport parking, there are limits on how much primary storage or parking can be supported due to floor space, power, cooling and budget constraints.

With tiered storage the notion is to align different types and classes of storage for various usage and application categories based on service (performance, availability, capacity, energy consumption) requirements balanced with cost or other concerns. For example there is high cost yet ultra high performance with ultra low energy saving and relative small capacity of tier-0 solid state devices (SSD) using either FLASH or dynamic random access memory (DRAM) as part of a storage system, as a storage device or as a caching appliance to meet I/O or activity intensive scenarios. Tier-1 is high performance, however not as high performance as tier-0, although given a large enough budget, large enough power and cooling ability and no constraints on floor space, you can make an total of traditional disk drives out perform even solid state, having a lot more capacity at the tradeoff of power, cooling, floor space and of course cost.

For most environments tier-1 storage will be the fastest storage with a reasonable amount of capacity, as tier-1 provides a good balance of performance and capacity per amount of energy consumed for active storage and data. On the other hand, lower cost, higher capacity and slower tier-2 storage also known as near-line or secondary storage is used in some environments for primary storage where performance is not a concern, yet is typically more for non-performance intensive applications.

Again, given enough money, unlimited power, cooling and floor space not to mention the number of enclosures, controllers and management software, you can sum a large bunch of low-cost SATA drives as an example to produce a high level of performance, however the cost benefits to do a high activity or performance level, either IOPS or bandwidth particular where the excess capacity is not needed would make SSD technology look cheap on an overall cost basis perspective.

Likewise replacing your entire disk with SSD particularly for capacity based environments is not really practical outside of extreme corner case applications unless you have the disposable income of a small country for your data storage and IT budget.

Another aspect of tiered storage is the common confusion of a class of storage and the class of a storage vendor or where a product is positioned for example from a price band or target environment such as enterprise, small medium environment, small medium business (SMB), small office or home office (SOHO) or prosumer/consumer.

I often hear discussions that go along the lines of tier-1 storage being products for the enterprise, tier-1 being for workgroups and tier-3 being for SMB and SOHO. I also hear confusion around tier-1 being block based, tier-2 being NAS and tier-3 being tape. “What we have here is a failure to communicate” in that there is confusion around tiered, categories, classification, price band and product positioning and perception. To add to the confusion is that there are also different tiers of access including Fibre Channel and FICON using 8GFC (coming soon to a device near you), 4GFC, 2GFC and even 1GFC along with 1GbE and 10GbE for iSCSI and/or NAS (NFS and/or CIFS) as well as InfiniBand for block (iSCSI or SRP) and file (NAS) offering different costs, performance, latency and other differing attributes to aligning to various application service and cost requirements.

What this all means is that there is more to tiered storage, there is tiered access, tiered protection, tiered media, different price band and categories of vendors and solutions to be aligned to applicable usage and service requirements. On the other hand, similar to airport parking, I can chose to skip the airport parking and take a cab to the airport which would be analogous to shifting your storage needs to a managed service provider. However ultimately it will come down to balancing performance, availability, capacity and energy (PACE) efficiency to the level of service and specific environment or application needs.

Greg Schulz www.storageio.com and www.greendatastorage.com

StorageIO Outlines Intelligent Power Management and MAID 2.0 Storage Techniques, Advocates New Technologies to Address Modern Data Center Energy Concerns

Storage I/O trends

Marketwire – January 23, 2008?StorageIO Outlines Intelligent Power Management and MAID 2.0 Storage Techniques, Advocates New Technologies to Address Modern Data Center Energy Concerns.?Intelligent Power Management and MAID 2.0 Equal Energy Efficiency Without Compromising Performance.

?The StorageIO Group explores these issues in detail in two new Industry Trends and Perspectives white papers entitled, “MAID 2.0: Energy Savings without Performance Compromises” and “The Many Faces of MAID Storage Technology.” These and other Industry Trends and Perspectives white papers addressing power, cooling, floor space and green storage related topics including “Business Benefits of Data Footprint Reduction” and “Achieving Energy Efficiency using FLASH SSD” are available for download at www.storageio.com?and www.greendatastorage.com.