Another StorageIO Hybrid Momentus Moment

Its been a few months since my last post (read it here) about Hybrid Hard Disk Drive (HHDD) such as the Seagate Momentus XT that I have been using.

The Momentus XT HHDD I have been using is a 500GB 7,200RPM 2.5 inch SATA Hard Disk Drive (HDD) with 4GB of embedded FLASH (aka SSD) and 32MB of DRAM memory for buffering hence the hybrid name.

I have been using the XT HHDD mainly for transferring large multi GByte size files between computers and for doing some disk to disk (D2D) backups while becoming more comfortable with it. While not as fast as my 64GB all flash SSD, the XT HHDD is as fast as my 7,200RPM 160GB Momentus HDD and in some cases faster on burst reads or writes. The notion of having a 500GB HDD that was affordable to support D2D was attractive however the ability to get some performance boost now and then via the embedded 4GB FLASH opens many different possibilities particularly when combined with compression.

Recently I switched the role of the Momentus XT HHDD from that of being a utility drive to becoming the main disk in one of my laptops. Despite many forums or bulletin boards touting issues or problems with the Seagate Momentus XT causing system hangs or Windows Blue Screen of Death (BSoD), I continued on with the next phase of testing.

Making the switch to XT HHDD as a primary disk

I took a few precaution including eating some of my own dog food that I routinely talk about. For example, I made sure that the Lenovo T61 where the Momentus XT was going to be installed was backed up. In addition, I synced my traveling laptop so that it was the primary so that I could continue working during the conversion not to mention having an extra copy in addition to normal on and offsite backups.

Ok, lets get back to the conversion or migration from a regular HDD to the HHDD.

Once I knew I had a good backup, I used the Seagate Discwizard (e.g. Acronis based) tool for imaging the existing T61 HDD to the Momentus XT HHDD. Using Discwizard (you could use other tools as well) I configured it to initialize the HHDD which was attached via a Seagate Goflex USB to SATA cable kit as well as image or copy the contents of the T61 HDD partitions to the Momentus XT. During the several hours it took to copy and create a new bootable disk image on the HHDD I continued working on my travel or standby laptop.

After the image copy was completed and verified, it was time to reboot and see how Windows (XP SP3) liked the HHDD which all seemed to be normal. There were some parts of the boot that seemed a bit faster, however not 100 percent conclusive. The next step was to shutdown the laptop and physically swap the old internal HDD with the HHDD and reboot. The subsequent boot did seem faster and programs accessing large files also seemed to run a bit faster.

Keep in mind that the HHDD is still a spinning 7,200RPM disk drive so comparisons to a full time SSD would be apples to oranges as would the cost capacity difference between those devices. However, for what I wanted to see and use, the limited 4GB of flash does seem to provide a performance boost and if I needed full time super fast performance, I could buy a larger capacity SSD and install it. Im going to hold off on buying any more larger capacity flash SSD for the time being however.

Do I see HHDD appearing in SMB, SME or enterprise storage systems anytime soon? Probably not, at least not in primary storage systems. However perhaps in some D2D backup, archive or dedupe and VTL devices or other appliances.

Momentus XT Speed Bumps

Now, to be fair, there have been some bumps in the road!

The first couple of days were smooth sailing other than hearing the mystery chirp the HHDD makes a couple of times a day. Low and behold after a couple of days, just as many forums had indicated, a mystery system hang occurred (and no, not like Windows might normally do so for those Microsoft cynics). Other than the inconvenience of a reboot, no data was lost as files being updated were saved or had been backed up not to mention after the reboot, everything was intact anyway. So far just an inconvenience or so I thought.

Almost 24 hours later, same thing except this time I got to see the BSoD which candidly, I very rarely see despite hearing stories from others. Ok, this was annoying, however as long as I did not lose any data, other than lost time from a reboot, lets chalk this up to a learning experience and see where it goes. Now guess what, about 12 hours later, once again, the system froze up and this time I was in the middle of a document edit. This time I did lose about 8 minutes of typing data that had not been auto saved (I have since changed my auto save from 10 minutes to 5 minutes).

With this BSoD incident, I took some notes and using the X61s, started checking some web sites and verified the BIOS firmware on the T61 which was up to date. However I noticed that the Seagate Momentus XT HHDD was at firmware 22 while there was a 23 version available. Reading through some web sites and forums, I was on the fence on trying firmware 23 given that it appears a newer firmware version for the HHDD is in the works. Deciding to forge forward with the experiment, after all, no real data loss had occurred, and I still had the X61s not to mention the original T61 HDD to fall back to worse case.

Going to the Seagate web site, I downloaded the firmware 23 install kit and ran it to their instructions which was a breeze and then did the reboot.

It has not been quite a week yet, however knocking on wood, while I keep expecting to see one, no BSoD or system freezes have occurred. However having said that and knocking on wood, Im also making sure things are backed up protected and ready if needed. Likewise, if I start to see a rash of BSoD, my plan is to fall back to the original T61 HDD, bring it up to date and use it until a newer HHDD firmware version is available to resume testing.

What is next for my Seagate Momentus XT HHDD?

Im going to wait to see if the BSoD and mystery system hangs disappear as well as for the arrival of the new firmware followed by some more testing. However, when Im confident with it, the next step is to put the XT HHDD into the X61s which is used primarily for travel purpose.

Why wait? Simple, while I can tolerate a reboot or crash or data loss or disruption while in the office given access to copies as well as standby or backup systems to work from, when traveling options are more limited. Sure if there is data loss, I can go to my cloud provider and rapidly recall a file or multiple ones as needed or for critical data, recover from a portable encrypted USB device. Consequently I want more confidence in the XT HHDD before deploying it for travel mode which it is probably safe to do as of now, however I want to see how stable it is in the office before taking it on the road.

What does this all mean?

  • Simple, have a backup of your data and systems
  • Test and verify those backups or standby systems periodically
  • Have a fall back plan for when trying new things
  • Keep productivity in mind, at some point you may have to fall back
  • If something is important enough to protect, have multiple copies
  • Be ready to eat your own dog food or what you talk about
  • Do not be scared, however be prepared, look before you leap

How about you are you using a HHDD yet and if so, what are your experiences? I am curious to hear if anyone has tried using a HHDD in their VMware lab environments yet in place of a regular HDD or before spending a boat load of money for a similar sized SSD.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

IBMs Storwize or wise Storage, the V7000 and DFR

A few months ago IBM bought a Data Footprint Reduction (DFR) technology company called Storwize (read more about DFR and Storwize Real time Compression here, here, here, here and here).

A couple of weeks ago IBM renamed the Storwize real time compression technology to surprise surprise, IBM real time compression (wow, wonder how lively that market focus research group study discussion was).

Subsequently IBM recycled the Storwize name in time to be used for the V7000 launch.

Now to be clear right up front, currently the V7000 does not include real time compression capabilities, however I would look for that and other forms of DFR techniques to appear on an increasing basis in IBM products in the future.

IBM has a diverse storage portfolio with good products some with longer legs than others to compete in the market. By long legs, that means both technology and marketability for enabling their direct as well as partners including distributors or vars to effectively compete with other vendors offerings.

The enablement capability of the V7000 will be to give IBM and their business partners a product that they will want go tell and sell to customers competing with Cisco, Dell, EMC, Fujitsu, HDS, HP, NEC, NetApp and Oracle among others.

What about XIV?

For those interested in XIV regardless of if you are a fan, nay sayer or simply an observer, here, here and here are some related posts to view if you like (as well as comment on).

Back to the V7000

A couple of common themes about the IBM V7000 are:

  • It appears to be a good product based on the SVC platform with many enhancements
  • Expanding the industry scope and focus awareness around Data Footprint Reduction (DFR)
  • Branding the storwize acquisition as real-time compression as part of their DFR portfolio
  • Confusion about using the Storwize name for a storage virtualization solution
  • Lack of Data Footprint Reduction (DFR) particularly real-time compression (aka Storwize)
  • Yet another IBM storage product adding to confusion around product positioning

Common questions that Im being asked about the IBM V7000 include among others:

  • Is the V7000 based on LSI, NetApp or other third party OEM technology?

    No, it is based on the IBM SVC code base along with an XIV like GUI and features from other IBM products.

  • Is the V7000 based on XIV?

    No, as mentioned above, the V7000 is based on the IBM SVC code base along with an XIV like GUI and features from other IBM products.

  • Does the V7000 have DFR such as dedupe or compression?

    No, not at this time other than what was previously available with the SVC.

  • Does this mean there will be a change or defocusing on or of other IBM storage products?

    IMHO I do not think so other than perhaps around XIV. If anything, I would expect IBM to start pushing the V7000 as well as the entire storage product portfolio more aggressively. Now there could be some defocusing on XIV or put a different way, putting all products on the same equal footing and let the customer determine what they want based on effective solution selling from IBM and their business partners.

  • What does this mean for XIV is that product no longer the featured or marquee product?

    IMHO XIV remains relevant for the time being. However, I also expect to be put on equal footprint with other IBM products or, if you prefer, other IBM products particularly the V7000 to be unleashed to compete with other external vendors solutions such as those from Cisco, Dell, EMC, Fujitsu, HDS, HP, NEC, NetApp and Oracle among others. Read more here, here and here about XIV remaining relevant.

  • Why would I not just buy an SVC and add storage to it?

    That is an option and strength of SVC to sit in front of different IBM storage products as well as those of third party competitors. However with the V7000 customers now have a turnkey storage solution to sell instead of a virtualization appliance.

  • Is this a reaction to EMC VPLEX, HDS VSP, HP SVSP or 3PAR, Oracle/Sun 7000?

    Perhaps it is, perhaps it is a reaction to XIV, and perhaps it is a realization that IBM has a lot of IP that could be combined into a solution to respond to a market need among many other scenarios. However, IBM has had a virtualization platform with a decent installed base in the form of SVC which happens to be at the heart of the V7000.

  • Does this mean IBM is jumping on the using off the shelf server instead of purpose built hardware for storage systems bandwagon like Oracle, HP and others are doing?

    If you are new to storage or IBM, it might appear that way, however, IBM has been shipping storage systems that are based on general purpose servers for a couple for a couple of decades now. Granted, some of those products are based on IBM Power PC (e.g. power platform) also used in their pSeries formerly known as the RS6000s. For example, the DS8000 series similar to its predecessors the ESS (aka Shark) and VSS before that have been based on the Power platform. Likewise, SVC has been based on general purpose processors since its inception.

    Likewise, while only generally deployed in two node pairs, the DS8000 is architected to scale into many more nodes that what has been shipped meaning that IBM has had clustered storage for some time, granted, some of their competitors will dispute that.

  • How does the V7000 stack up from a performance standpoint?

    Interestingly, IBM has traditionally been very good if not out front running public benchmarks and workload simulations ranging from SPC to TPC to SPEC to Microsoft ESRP among others for all of their storage systems except one (e.g. XIV). However true to traditional IBM systems and storage practices, just a couple of weeks after the V7000 launch, IBM has released the first wave of performance comparisons including SPC for the V7000 which can be seen here to compare with others.

  • What do I think of the V7000?

    Like other products both in the IBM storage portfolio or from other vendors, the V7000 has its place and in that place which needs to be further articulated by IBM, it has a bright future. I think that the V7000 for many environments particularly those that were looking at XIV will be a good IBM based solution as well as competitor to other solutions from Dell, EMC, HDS, HP, NetApp, Oracle as well as some smaller startups providers.

Comments, thoughts and perspectives:

IBM is part of a growing industry trend realizing that data footprint reduction (DFR) focus should expand the scope beyond backup and dedupe to span an entire organization using many different tools, techniques and best practices. These include archiving of databases, email, file systems for both compliance and non compliance purposes, backup/restore modernization or redesign, compression (real-time for online and post processing). In addition, DFR includes consolidation of storage capacity and performance (e.g. fast 15K SAS, caching or SSD), data management (including some data deletion where practical), data dedupe, space saving snapshots such as copy on write or redirect on write, thin provisioning as well as virtualization for both consolidation and enabling agility.

IBM has some great products, however too often with such a diverse product portfolio better navigation and messaging of what to use when, where and why is needed not to mention the confusion over the current product dejur.

As has been the case for the past couple of years, lets see how this all plays out in a year or so from now. Meanwhile cast your vote or see the results of others as to if XIV remains relevant. Likewise, join in on the new poll below as to if the V7000 is now relevant or not.

Note: As with the ongoing is XIV relevant polling (above), for the new is the V7000 relevant polling (below) you are free to vote early, vote often, vote for those who cannot or that care not to vote.

Here are some links to read more about this and related topics:

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Re visiting if IBM XIV is still relevant with V7000

Over the past couple of years I routinely get asked what I think of XIV by fans as well as foes in addition to many curious or neutral onlookers including XIV competitors, other analysts, media, bloggers, consultants as well as IBM customers, prospects, vars and business partners. Consequently I have done some blog posts about my thoughts and perspectives.

Its time again for what has turned out to be the third annual perspective or thoughts around IBM XIV and if it is still relevant as a result of the recent IBM V7000 (excuse me, I meant to say IBM Storwize V7000) storage system launch.

For those wanting to take a step back in time, here is an initial thought perspective about IBM and XIV storage from 2008, as well as the 2009 revisiting of XIV relevance post and the latest V7000 companion post found here.

What is the IBM V7000?

Here is a link to a companion post pertaining to the IBM V7000 that you will want to have a look at.

In a nut shell, the V7000 is a new storage system with built in storage virtualization or virtual storage if you prefer that leverages IBM developed software from its San Volume Controller (SVC), DS8000 enterprise system and others.

Unlike the SVC which is a gateway or appliance head that virtualizes various IBM and third party storage systems providing data movement, migration, copy, replication, snapshot and other agility or abstraction capabilities, the V7000 is a turnkey integrated solution.

By being a turnkey solution, the V7000 combines the functionality of the SVC as a basis for adding other IBM technologies including a GUI management tool similar to that found on XIV along with dedicated attached storage (e.g. SAS disk drives including fast, high capacity as well as SSD).

In other words, for those customer or prospects who liked XIV because of its management GUI interface, you may like the V7000.

For those who liked the functionality capabilities of the SVC however needed it to be a turnkey solution, you might like the V7000.

For those of you who did not like or competed with the SVC in the past, well, you know what to do.

BTW, for those who knew of Storwize the Data Footprint Reduction (DFR) vendor with real time compression that IBM recently acquired and renamed IBM Real time Compression, the V7000 does not contain any real time compression (yet).

What are my thoughts and perspectives?

In addition to the comments in the companion post found here, right now Im of the mind set that XIV does not fade away quietly into the sunset or take a timeout at the IBM technology rest and recuperation resort located on the beautiful someday isle.

The reason I think XIV will remain somewhat relevant for some time, (time to be determined of course) is that IBM has expended over the past two and half years significant resources to promote it. Those resources have included marketing time, messaging space and in some instances perhaps inadvertinly at the expense of other IBM storage solutions. Simiarly, a lot of time, money and effort have gone into business partner outreach to establish and keep XIV relevant with those commuities who in turn have gone to their customers to tell and sell the XIV story to some customers who have bought it.

Consequently or as a result of all of that investment, I would be surprised if IBM were simply to walk away from XIV at least near term.

What I do see as happening including some early indicators is that the V7000 (along with other IBM products) now will be getting equal billing, resources and promotional support. Weather this means the XIV division finally being assimilated into the mainstream IBM fold and on equal footing with other IBM products, or, that other IBM products being brought up to an elevated position of XIV is subject to interpretation and your own perception.

I expect to continue to see IBM teams and subsequently their distributors, vars and other business partners get more excited talking about the V7000 along with other IBM solutions. For example, SONAS for bulk, clustered and scale out NAS, DS8000 for high end, GMAS and Information Archive platforms as well as N and DS3K/DS4K/DS5K not to mentiuon the TS/TL backup and archive target platforms along with associated Tivoli software. Also, lets not forget about SVC among other IBM solutions including of course, XIV.

I would also not be surprised if some of the diehard XIV loyalist (e.g. sales and marketing reps that were faithful members of Moshe Yani army who appears to be MIA at IBM) pack up their bags and leave the IBM storage SANdbox in virtual protest. That is, refusing to be assimilated into the general IBM storage pool and thus leaving for Greener IT pastures elsewhere. Some will stick around discovering the opportunities associated with selling a broader more diverse product portfolio into their target accounts where they have spent time and resources to establish relationships or getting thier proverbial foot in the door.

Consequently, I think XIV remains somewhat relevant for now given all of the resources that IBM poured into it and relationships that their partner ecosystem also spent on establishing with the installed customer base.

However, I do think that the V7000 despite some confusion (here and here) around its recycled Storwize name that is built around the field proven SVC and other IBM technology has some legs. Those legs of the V7000 are both from a technology standpoint as well as a means to get the entire IBM systems and storage group energized to go out and compete with their primary nemesis (e.g. Dell, EMC, HP, HDS, NetApp and Oracle among others).

As has been the case for the past couple of years, lets see how this all plays out in a year or so from now. Meanwhile cast your vote or see the results of others as to if XIV remains relevant. Likewise, join in on the new poll below as to if the V7000 is now relevant or not.

Note: As with the ongoing is XIV relevant polling (above), for the new is the V7000 relevant polling (below) you are free to vote early, vote often, vote for those who cannot or that care not to vote.

Here are some links to read more about this and related topics:

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Have VTLs or VxLs become Zombies, Declared dead yet still alive?

Have you heard or read the reports and speculation that VTLs (Virtual Tape Libraries) are dead?

It seems that in IT the all to popular trend is to declare something dead so that your new product or technology can have a chance of making it in to the market or perhaps seen in a better light.

Sometimes this approach works to temporary freeze the market until common sense and clarity returns to the market or until something else fun to talk about comes along and in other cases, the messages can fall on deft ears.

The approach of declaring something dead tends to play well for those who like shiny new toys (SNT) or new shiny toys (NST) and being on the popular, cool trendy bandwagon.

Not surprisingly, while some actual IT customers can fall into the SNT or NST syndrome, its often the broader industry including media, bloggers, analysts, consultants and other self proclaimed or anointed pundits as well as vendors who latch on to the declare it dead movement. After all, who wants to talk about something that is old, boring and already being sold to paying customers who are using it. Now this is not a bad thing as we need a balance of up and coming challengers to keep the status quo challenged, likewise we need a balance of the new to avoid death grips on the old and what is working.

Likewise, many IT customers particularly larger ones tend to be very risk averse and conservative with their budgets protecting their investments thus they may only go leading bleeding edge if there is a dual redundant blood bank with a backup on hot standby (thats some HA humor BTW).

Another reason that declaring items dead in support of SNT and NST is that while many of the commonly declared dead items are on the proverbial plateau of productivity for IT customers, that also can mean that they are on the plateau of profitability for the vendors.

However, not all good things last and at sometime, there is the need to transition from the old to the new and this is where things like virtualization including virtual tape libraries or virtual disk libraries or virtual storage library or what ever you want to call a VxL (more on what a VxL is in a moment) can come into play.

I realize that for some, particularly those who like to grasp on to SNT, NST and ride the dead pool bandwagons this will probably appear as snarky or cynical which is fine, after all, for some, you should be laughing to the bank and if not, you may in fact be missing out on an opportunity for playing in the dead pool marketing game.

Now back to VxL.

In the case of VTLs, for some it is the T word that bothers them, you know T as in Tape which is not a SNT or NST in an age where SSD has supposedly killed the disk drive which allegedly terminated tape (yeah right). Sure tape is not being used as much for backup as it has in the past with its role shifting to that of longer term retention, something that it is well suited for.

For tape fans (or cynics) you can read more here, here and here. However there is still a large amount of backup/restore along with other data protection or preservation (e.g. archiving) processing (software tools, processes, procedures, skill sets, management tools) that still expects to see tape.

Hence this is where VTLs or VxLs come into play leveraging virtualization in an Life Beyond Consolidation (and here) scenario providing abstraction, transparency, agility and emulation and IMHO are still very much alive and evolving.

Ok, for those who do not like or believe in or of its continued existence and evolving role, substitute the T (tape) with X and you get a VxL. That is, plug in what ever X word that makes you happy or marketable or a Shiny New TLA. For example Virtual Disk Library, Virtual Storage Library, Virtual Backup Library, Virtual Compression Library, Virtual Dedupe Library, Virtual ILM Library, Virtual Archive Library, Virtual Cloud Library and so forth. Granted some VxLs only emulate tape and hence are VTLs while others support NAS and other protocols (or personalities) not to mention functionality ranging from replication, DFR as well as automated policy management.

However, keep in mind that if your preference is VTL, VxL or what ever other buzzword bingo name that you want to use or come up with, look at how virtualization in the form of abstraction, transparency and emulation can bridge the gap between the new (disk based data protection) combined with DFR (Data Footprint Reduction) and the old (existing backup/restore, archive or other management tools and processes.

Here are some additional links pertaining to VTLs (excuse me, VxLs):

  • Virtual tape libraries: Old backup technology holdover or gateway to the future?
  • Not to mention here, here, here, here or here.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

More Data Footprint Reduction (DFR) Material

This is part of an ongoing series of short industry trends and perspectives (ITP) blog posts briefs based on what I am seeing and hearing in my conversations with IT professionals on a global basis.

These short posts compliment other longer posts along with traditional industry trends and perspective white papers, research reports, videos, podcasts, webcasts as well as solution brief content found a www.storageioblog.com/reports and www.storageio.com/articles.

If you recall from previous posts including here, here or here among others, Data Footprint Reduction (DFR) is a collection of tools, technologies and best practices for addressing growing data storage management and cost impacts.

DFR encompasses many different tools, techniques and technologies across various applications ranging from active or primary storage to secondary and inactive along with backup and archive.

Some of the technologies techniques and technologies include archiving, backup modernization, compression, data management, dedupe, space saving snapshots and thin provisioning among others.

Following are some links to various articles and commentary pertaining to DFR:

  • Using DFR including dedupe and compression to defry storage and management costs
  • Deduplicate, compress and defray costs of data storage management
  • Virtual tape libraries: Old backup technology holdover or gateway to the future?
  • As well as here, here or here

In the spirit of DFR, that is doing more with less, nuff said (for now).

Of course let me know what your thoughts and perspectives are on this and other related topics.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

What do you do when your service provider drops the ball

Do you have a web, internet, backup or other IT cloud service provider of some type?

Do you pay for it, or is it a free service?

Do you take your service provider for granted?

Does your service provider take you or your data for granted?

Does your provider offer some form of service level objectives (SLO)?

For example, Recovery Time Objectives (RTO), Recovery Point Objectives (RPO), Quality of Service (QOS) or if a backup service alternate forms of recovery among others?

So what happens when there is a service disruption, do you threaten to leave the provider and if so, how much does that (or would it) cost you to move?

A couple of weeks ago I was using on a Delta airlines flight from LAX to MSP returning from a west coast speaking engagement event.

During the late evening three hour flight, I was using the gogo inflight wifi service to get caught up on some emails, blog items along with other work items in addition to doing a few twitter tweets while flying high over the real clouds from my virtual office.

During that time, I saw a twitter tweet from Devang Panchigar (@storageNerve) commenting that his hosting service provider Bluehost was down or offline. This caught my attention as Bluehost is also my service provider and a quick check verified that my sites and services were still working. I subsequently sent a tweet to Devang indicating that Bluehost or at least from looking at my sites and services were still functioning, or at least for the time being as I was about to find out. Long story short, about 20 to 25 minutes later, I noticed that I could not longer get to any of my sites, low and behold my Bluehost services were also now offline.

Bluehost

Overall, I have been pleased with Bluehost as a service provider including finding their call support staff very accommodating and easy to work with when I have questions or need something taken care of. Normally I would have simply called Bluehost to see what was going on, however being at about 38,000 feet above the clouds, a quick conversation was not going to be possible. Instead, I checked some forums that revealed Bluehost was experiencing some electrical power issues with their data center (I believe in Utah). Looking at some of the forums as well as various twitter comments, I also decided to check to see if Bluehost CEO Matt Heaton blog was functioning (it was).

It would have been too easy to do one of those irate customer type posts telling them how bad they were, how I was dropping them like a hot potato and then doing a blog post telling everyone to never use them again or along those lines that are far to common and often get deleted as spam.

Instead, I took a different approach (you could have read it here however I just checked and it has been deleted). My comment on Matts blog post took a week or so to be moderated (now since deleted). Essentially my post took the opposite approach of going off on the usual customer tirade instead commenting how ironic that a hosting service for my web site which contains content information about resilient data infrastructure themes was offline.

Now I realize that I am not paying for a high end no downtime always available hosting service, however I also realize that I am paying for a more premium package vs. a basic subscription or even a for free service. While I was not happy about the one hour of downtime around midnight, it was comforting to know that no data was lost and my sites were only offline for a short period of time.

What does all of this mean?

There have been some widely publicized and discussed internet and cloud service related disruptions.

I hope Bluehost continues to improve on their services to stay out of the news for a major disruption as well as minimize or eliminate downtime for their for fee based services.

I also hope that Bluehost CEO Matt Heaton continues to listen to what his customers have to say while improving his services to keep us as customers instead of taking us for granted as some providers or vendors do.

Thanks again to Devang for the tip that there was a service disruption, after all, sometimes we take services for granted and in other situations some service providers take their customers for granted.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

End to End (E2E) Systems Resource Analysis (SRA) for Cloud and Virtual Environments

A new StorageIO Industry Trends and Perspective (ITP) white paper titled “End to End (E2E) Systems Resource Analysis (SRA) for Cloud, Virtual and Abstracted Environments” is now available at www.storageioblog.com/reports compliments of SANpulse technologies.

End to End (E2E) Systems Resource Analysis (SRA) for Virtual, Cloud and abstracted environments: Importance of Situational Awareness for Virtual and Abstracted Environments

Abstract:
Many organizations are in the planning phase or already executing initiatives moving their IT applications and data to abstracted, cloud (public or private) virtualized or other forms of efficient, effective dynamic operating environments. Others are in the process of exploring where, when, why and how to use various forms of abstraction techniques and technologies to address various issues. Issues include opportunities to leverage virtualization and abstraction techniques that enable IT agility, flexibility, resiliency and salability in a cost effective yet productive manner.

An important need when moving to a cloud or virtualized dynamic environment is to have situational awareness of IT resources. This means having insight into how IT resources are being deployed to support business applications and to meet service objectives in a cost effective manner.

Awareness of IT resource usage provides insight necessary for both tactical and strategic planning as well as decision making. Effective management requires insight into not only what resources are at hand but also how they are being used to decide where different applications and data should be placed to effectively meet business requirements.

Learn more about the importance and opportunities associated with gaining situational awareness using E2E SRA for virtual, cloud and abstracted environments in this StorageIO Industry Trends and Perspective (ITP) white paper compliments of SANpulse technologies by clicking here.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

Updated 10/9/2018

What is DFR or Data Footprint Reduction?

Data Footprint Reduction (DFR) is a collection of techniques, technologies, tools and best practices that are used to address data growth management challenges. Dedupe is currently the industry darling for DFR particularly in the scope or context of backup or other repetitive data.

However DFR expands the scope of expanding data footprints and their impact to cover primary, secondary along with offline data that ranges from high performance to inactive high capacity.

Consequently the focus of DFR is not just on reduction ratios, its also about meeting time or performance rates and data protection windows.

This means DFR is about using the right tool for the task at hand to effectively meet business needs, and cost objectives while meeting service requirements across all applications.

Examples of DFR technologies include Archiving, Compression, Dedupe, Data Management and Thin Provisioning among others.

Read more about DFR in Part I and Part II of a two part series found here and here.

Where to learn more

Learn more about data footprint reducton (DFR), data footprint overhead and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

That is all for now, hope you find these ongoing series of current or emerging Industry Trends and Perspectives posts of interest.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

August 2010 StorageIO News Letter

StorageIO News Letter Image
August 2010 Newsletter

Welcome to the August Summer Wrap Up 2010 edition of the Server and StorageIO Group (StorageIO) newsletter. This follows the June 2010 edition building on the great feedback received from recipients.
Items that are new in this expanded edition include:

  • Out and About Update
  • Industry Trends and Perspectives (ITP)
  • Featured Article

You can access this news letter via various social media venues (some are shown below) in addition to StorageIO web sites and subscriptions. Click on the following links to view the August 2010 edition as an HTML or PDF or, to go to the newsletter page to view previous editions.

Follow via Goggle Feedburner here or via email subscription here.

You can also subscribe to the news letter by simply sending an email to newsletter@storageio.com

Enjoy this edition of the StorageIO newsletter, let me know your comments and feedback.

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Dell Will Buy Someone, However Not Brocade (At least for now)

Dell

Earlier this week Dell announced that they were buying 3APR for $1.15B USD

As a follow up to this, this and this recent posts, I keep getting asked in different forums, venues, via email, telephone calls and in person who will or should Dell buy next, and will Dell buy Brocade, who will buy Brocade or anyone else for that matter.

Ok, first let me say that everything in this post is just a perspective based on openly (e.g. publicly) available information along with some common sense. Thus there is no NDA or confidential insight or tips from some anonymous source named blue horseshoe (remember the movie wall street?).

However I did used to work for a SAN, MAN and WAN company called INRANGE that was a supplier to server and storage vendors as well as partnered with Emulex, Qlogic as well as Adva among others. INRANGE which became OUT of RANGE (that is some SAN humor btw) when it was sold to CNT was then bought by EMC spin off McData (I left before then) which in turn was bought by Brocade. Now does any of that make more qualified than any other arm chair quarterback pundit with a keyboard and pulse to jump into the whom Dell will buy next sweepstakes to I say no.

However, let me use some experience to analyze a few things, then connect some dots. From there, I will leave it up to you to agree, disagree, bet, guess, speculate or wish upon a falling star as to whom Dell might buy, or for that matter, what others may or may not do.

First, since Brocade keeps coming up in conversations, here is a previous post I did on the topic of them being for sale or who might buy them.

I still think that Brocade can survive on their own, granted they need to kick it into gear on the switch (Ethernet, Fibre Channel and FCoE), distance extension, HBA or CNA if you prefer as well as management tools front. Brocade built their business with OEM partnerships via Dell, EMC, HP, HDS, IBM, NetApp and Oracle/Sun among many others not to mention their channel distribution programs.

Thus Brocade needs to leverage those OEMs on a go forward basis. However, that model and channel partner model also gets in the way of Brocade being bought by one of their OEMs. Keep in mind that EMC once owned McData and made a nice profit on that spin off (or spin out) while IBM sold off their networking division to Cisco, now both do good business with their OEM suppliers. Likewise, both leverage multiple suppliers as that is what their partners and customers want (e.g. choice of suppliers).

Now, keep in mind that HP has had their procurve low end Ethernet switches for some time and historically flipped some business (excuse me, partnered) to Cisco for high end Ethernet LAN networking technology. Lets also not forget about HPs recent acquisition of 3COM (read about it here).

Now with Cisco tip toeing into the server market trying to flex its muscles in the small server pool (no offense Cisco or to your faithful followers) HP and other server vendors might be wanting to flip something else at Cisco besides business. Oh oh, I think I hear the Cisco UCS truth squads knocking at the door with large amounts of truth serum (Ok, Im just kidding folks).

Lets get back to HP and 3COM.

IMHO that was partly an opportunity to pick up some additional revenue, partly to grab a brand name that also has ties into the Chinese market. Keep Huawei (here and here) in mind, you know, that sometimes Cisco nemeses networking company who had 2009 revenues of RMB149.1B or $21.8B USD. Now back to H and 3COM, that was also IMHO play to gain access to additional SMB, SOHO, ROBO and consumer market channels for a bargain price. HP is not alone as others have done similar acquisitions in part or in whole to pick up a brand name that also hade partners, channels, products and revenues. For example among many others, EMC and Iomega, Seagate and Maxtor, Symantec and Norton, CA buying, well, I think or hope you get the picture.

Now back to Brocade and Dell.

Why would Dell need Brocade for which they would have to a pay a premium price of $6-7B USD (assume 3 to 3.5x multiplier on revenue) which would get them just under $900M in debt and a couple of billion in annual revenue. Keep in mind that Dell has somewhere in the neighborhood of $9-10B in cash although while Im not an accountant, the financial people tell me they need to maintain their strategic reserves of which such a deal would put a big dent into.

However, there is more to the story which is that revenue would be in jeopardy if the other server and or storage vendors (e.g. EMC, Fujitsu, HP, HDS, IBM, NEC, NetApp, and Oracle/Sun etc) did not like Dell owning one of their suppliers. In other words, unless Cisco really upsets the server vendors which they have been doing to a lesser degree already, why would Dell want to risk a Texas size pile of cash to get a revenue stream that could blow away in a Texas size hurricane or dust storm?

Granted if Dell could talk Michael Klayko (Brocades CEO) and board as well as other investors into a low ball offer the math might virtually work. However that is also doubtful knowing that Klayko also knows Joe Tucci of EMC who knows how to drive a deal or bargain. Thus, I do not see Brocade rolling over in desperation to sell them at a discount as much as some might want you to believe that they need to do.

Thus, while anything is possible, I do not see Dell buying Brocade except for one possible scenario which could result in a bidding war not to mention utter industry chaos.

That scenario is what I refer to as MAD which is a Mutual Assured Destruction situation. In other words, an all out war or ensuing instability that throws existing OEMs, partners and business into chaos (keep in mind however in chaos or confusion there is opportunity). The MAD scenario could be triggered by Cisco finally getting truly and really serious about servers. Granted Cisco is doing their best to test their partners, OEMs and even customers as too how much they will tolerate in terms of entering the server market.

Im not convinced they are ready to be number one, two or three let alone four or five. After all, my numbers may be off, however best I can tell the number of Cisco blade servers is measured in thousands or best case a few ten thousand since its launch. By comparison, how many thousands of servers do Cisco OEMs Dell, HP, IBM, Oracle among others ship per week or month? In other words, Cisco to really get serious would need to ramp up that server business by several factors of ten, a move that would not sit well (even worse than now) with their major OEM partners.

Thus, if Cisco were to get serious and want to move up into the top two or three spot of the server market, something people always tell me that Cisco feels they have to be in a top market spot, they step all over their OEMs. This in turn would set off the MAD scenario mentioned above, kind of like a scene out of war games, perhaps what you are seeing with some of the early Cisco posturing. Sure Cisco made some moves with their UCS and their EMC alliances as well as dancing with whoever buys them a drink and sure HP bought 3COM which I guess could be seen as a warning shot if you like. Sure Cisco is the 800 lb guerrilla when compared to the networking vendors except do not forget about Huawei (read more here).

Thus for the time being, I expect Cisco to keep making noise, testing the waters, pushing its OEMs and partners. Perhaps Cisco also does some arms treaties in the form of marketing alliances as it continues to push its FCoE and unified compute initiatives. Sure they will keep pushing Virtual Desktop Initiatives (VDI) and anything else that can generate network traffic so they can support those needs. However, also keep in mind that VMwares biggest platform deployment (e.g. servers) customers or partners are HP and Dell in no particular order (I will let you rank them depending on whose data you choose).

Oh no, I have to stop now as I wanted this to be a short post.

So what does this have to do with Dell and Brocade?

Simple, why would Dell want to go down that path if they do not have to?

As to who Dell should buy, real quickly, how about a data protection (security, backup, restore, BC, DR) company or a data management or a desktop management company, how about one that fits all of those like Symantec which from a revenue standpoint is about three times that of Brocade.

Heck, if you think Dell could afford Brocade, then why not a Symantec which might actually be worth more in pieces than as a whole. Dell could sell off what they do not need or want or make that part of a deal or keep it all! As for others, how Dell buying a low end consumer, prosumer, SOHO storage play like Drobo or Snap among others.

Ok, I have to wrap up for now.

Talk to you all soon either here, or in one of the many other different venues or social media as well as traditional mediums as this story is far from being done.

Whats is your take?

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Here are some links to read more about the above topics and themes

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Back to school shopping: Dude, Dell Digests 3PAR Disk storage

Dell

No sooner has the dust settled from Dells other recent acquisitions, its back to school shopping time and the latest bargain for the Round Rock Texas folks is bay (San Francisco) area storage vendor 3PAR for $1.15B. As a refresh, some of Dells more recent acquisitions including a few years ago $1.4B for EqualLogic, $3.9B for Perot systems not to mention Exanet, Kace and Ocarina earlier this year. For those interested, as of April 2010 reporting figures found here, Dell showed about $10B USD in cash and here is financial information on publicly held 3PAR (PAR).

Who is 3PAR
3PAR is a publicly traded company (PAR) that makes a scalable or clustered storage system with many built in advanced features typically associated with high end EMC DMX and VMAX as well as CLARiiON, in addition to Hitachi or HP or IBM enterprise class solutions. The Inserv (3PARs storage solution) combines hardware and software providing a very scalable solution that can be configured for smaller environments or larger enterprise by varying the number of controllers or processing nodes, connectivity (server attachment) ports, cache and disk drives.

Unlike EqualLogic which is more of a mid market iSCSI only storage system, the 3PAR Inserv is capable of going head to head with the EMC CLARiiON as well as DMC or VMAX systems that support a mix of iSCSI and Fibre Channel or NAS via gateway or appliances. Thus while there were occasional competitive situations between 3PAR and Dell EqualLogic, they for the most part were targeted at different market sectors or customers deployment scenarios.

What does Dell get with 3PAR?

  • A good deal if not a bargain on one of the last new storage startup pure plays
  • A public company that is actually generating revenue with a large and growing installed base
  • A seasoned sales force who knows how to sell into the enterprise storage space against EMC, HP, IBM, Oracle/SUN, Netapp and others
  • A solution that can scale in terms of functionality, connectivity, performance, availability, capacity and energy efficiency (PACE)
  • Potential route to new markets where 3PAR has had success, or to bridge gaps where both have played and competed in the past
  • Did I say a company with an established footprint of installed 3PAR Inserv storage systems and good list of marquee customers
  • Ability to sell a solution that they own the intellectual property (IP) instead of that of partner EMC
  • Plenty of IP that can be leveraged within other Dell solutions, not to mention combine 3PAR with other recently acquired technologies or companies.

On a lighter note, Dell picks up once again Marc Farley who was with them briefly after the EqualLogic acquisition who then departed to 3PAR where he became director of social media including launch of Infosmack on Storage Monkeys with co host Greg Knieriemen (@Knieriemen). Of course the twitter world and traditional coconut wires are now speculating where Farley will go next that Dell may end up buying in the future.

What does this mean for Dell and their data storage portfolio?
While in no ways all inclusive or comprehensive, table 1 provides a rough framework of different price bands, categories, tiers and market or application segments requiring various types of storage solutions where Dell can sell into.

 

HP

Dell

EMC

IBM

Oracle/Sun

Servers

Blade systems, rack mount, towers to desktop

Blade systems, rack mount, towers to desktop

Virtual servers with VMware, servers via vBlock servers via Cisco

Blade systems, rack mount, towers to desktop

Blade systems, rack mount, towers to desktop

Services

HP managed services, consulting and hosting supplemented by EDS acquisition

Bought Perot systems (an EDS spin off/out)

Partnered with various organizations and services

Has been doing smaller acquisitions adding tools and capabilities to IBM global services

Large internal consulting and services as well as Software as a Service (SaaS) hosting, partnered with others

Enterprise storage

XP (FC, iSCSI, FICON for mainframe and NAS with gateway) which is OEMed from Hitachi Japan parent of HDS

3PAR (iSCSI and FICON or NAS with gateway) replaces EMC CLARiiON or perhaps rare DMX/VMAX at high end?

DMX and VMAX

DS8000

Sun resold HDS version of XP/USP however Oracle has since dropped it from lineup

Data footprint impact reduction

Dedupe on VTL via Sepaton plus HP developed technology or OEMed products

Dedupe in OEM or partner software or hardware solutions, recently acquired Ocarina

Dedupe in Avamar, Datadomain, Networker, Celerra, Centera, Atmos. CLARiiON and Celerra compression

Dedupe in various hardware and software solutions, source and target, compression with Storwize

Dedupe via OEM VTLs and other sun solutions

Data preservation

Database and other archive tools, archive storage

OEM solutions from EMC and others

Centera and other solutions

Various hardware and software solutions

Various hardware and software solutions

General data protection (excluding logical or physical security and DLP)

Internal Data Protector software plus OEM, partners with other software, various VTL, TL and target solutions as well as services

OEM and resell partner tools as well as Dell target devices and those of partners. Could this be a future acquisition target area?

Networker and Avamar software, Datadomain and other targets, DPA management tools and Mozy services

Tivoli suite of software and various hardware targets, management tools and cloud services

Various software and partners tools, tape libraries, VTLs and online storage solutions

Scale out, bulk, or clustered NAS

eXtreme scale out, bulk and clustered storage for unstructured data applications

Exanet on Dell servers with shared SAS, iSCSI or FC storage

Celerra and ATMOS

IBM SONAS or N series (OEM from NetApp)

ZFS based solutions including 7000 series

General purpose NAS

Various gateways for EVA or MSA or XP, HP IBRIX or Polyserve based as well as Microsoft WSS solutions

EMC Celerra, Dell Exanet, Microsoft WSS based. Acquisition or partner target area?

Celerra

N Series OEMed from Netapp as well as growing awareness of SONAS

ZFS based solutions. Whatever happened to Procom?

Mid market multi protocol block

EVA (FC with iSCSI or NAS gateways), LeftHand (P Series iSCSI) for lowered of this market

3PAR (FC and iSCSI, NAS with gateway) for mid to upper end of this market, EqualLogic (iSCSI) for the lower end of the market, some residual EMC CX activity phases out over time?

CLARiiON (FC and iSCSI with NAS via gateway), Some smaller DMX or VMAX configurations for mid to upper end of this market

DS5000, DS4000 (FC and iSCSI with NAS via a gateway) both OEMed from LSI, XIV and N series (Netapp)

7000 series (ZFS and Sun storage software running on Sun server with internal storage, optional external storage)

6000 series

Scalable SMB iSCSI

LeftHand (P Series)

EqualLogic

Celerra NX, CLARiiON AX/CX

XIV, DS3000, N Series

2000
7000

Entry level shared block

MSA2000 (iSCSI, FC, SAS)

MD3000 (iSCSI, FC, SAS)

AX (iSCSI, FC)

DS3000 (iSCSI, FC, SAS), N Series (iSCSI, FC, NAS)

2000
7000

Entry level unified multi function

X (not to be confused with eXtreme series) HP servers with Windows Storage Software

Dell servers with Windows Storage Software or EMC Celerra

Celerra NX, Iomega

xSeries servers with Microsoft or other software installed

ZFS based solutions running on Sun servers

Low end SOHO

X (not to be confused with eXtreme series) HP servers with Windows Storage Software

Dell servers with storage and Windows Storage Software. Future acqustion area perhaps?

Iomega

 

 

Table 1: Sampling of various tiers, architectures, functionality and storage solution options

Clarifying some of the above categories in table 1:

Servers: Application servers or computers running Windows, Linux, HyperV, VMware or other applications, operating systems and hypervisors.

Services: Professional and consulting services, installation, break fix repair, call center, hosting, managed services or cloud solutions

Enterprise storage: Large scale (hundreds to thousands of drives, many front end as well as back ports, multiple controllers or storage processing engines (nodes), large amount of cache and equally strong performance, feature rich functionality, resilient and scalable.

Data footprint impact reduction: Archive, data management, compression, dedupe, thin provision among other techniques. Read more here and here.

Data preservation: Archiving for compliance and non regulatory applications or data including software, hardware, services.

General data protection: Excluding physical or logical data security (firewalls, dlp, etc), this would be backup/restore with encryption, replication, snapshots, hardware and software to support BC, DR and normal business operations. Read more about data protection options for virtual and physical storage here.

Scale out NAS: Clustered NAS, bulk unstructured storage, cloud storage system or file system. Read more about clustered storage here. HP has their eXtreme X series of scale out and bulk storage systems as well as gateways. These leverage IBRIX and Polyserve which were bought by HP as software, or as a solution (HP servers, storage and software), perhaps with optional data reduction software such as Ocarina OEMed by Dell. Dell now has Exanet which they bought recently as software, or as a solution running on Dell servers, with either SAS, iSCSI or FC back end storage plus optional data footprint reduction software such as Ocarina. IBM has GPFS as a software solution running on IBM or other vendors servers with attached storage, or as a solution such as SONAS with IBM servers running software with IBM DS mid range storage. IBM also OEMs Netapp as the N series.

General purpose NAS: NAS (NFS and CIFS or optional AFP and pNFS) for everyday enterprise (or SME/SMB) file serving and sharing

Mid market multi protocol block: For SMB to SME environments that need scalable shared (SAN) scalable block storage using iSCSI, FC or FCoE

Scalable SMB iSCSI: For SMB to SME environments that need scalable iSCSI storage with feature rich functionality including built in virtualization

Entry level shared block: Block storage with flexibility to support iSCSI, SAS or Fibre Channel with optional NAS support built in or available via a gateway. For example external SAS RAID shared storage between 2 or more servers configured in a HyeprV or VMware clustered that do not need or can afford higher cost of iSCSI. Another example would be shared SAS (or iSCSI or Fibre Channel) storage attached to a server running storage software such as clustered file system (e.g. Exanet) or VTL, Dedupe, Backup, Archiving or data footprint reduction tools or perhaps database software where higher cost or complexity of an iSCSI or Fibre Channel SAN is not needed. Read more about external shared SAS here.

Entry level unified multifunction: This is storage that can do block and file yet is scaled down to meet ease of acquisition, ease of sale, channel friendly, simplified deployment and installation yet affordable for SMBs or larger SOHOs as well as ROBOs.

Low end SOHO: Storage that can scale down to consumer, prosumer or lower end of SMB (e.g. SOHO) providing mix of block and file, yet priced and positioned below higher price multifunction systems.

Wait a minute, are that too many different categories or types of storage?

Perhaps, however it also enables multiple tools (tiers of technologies) to be in a vendors tool box, or, in an IT professionals tool bin to address different challenges. Lets come back to this in a few moments.

 

Some Industry trends and perspectives (ITP) thoughts:

How can Dell with 3PAR be an enterprise play without IBM mainframe FICON support?
Some would say forget about it, mainframes are dead thus not a Dell objective even though EMC, HDS and IBM sell a ton of storage into those environments. However, fair enough argument and one that 3PAR has faced for years while competing with EMC, HDS, HP, IBM and Fujitsu thus they are versed in how to handle that discussion. Thus the 3PAR teams can help the Dell folks determine where to hunt and farm for business something that many of the Dell folks already know how to do. After all, today they have to flip the business to EMC or worse.

If truly pressured and in need, Dell could continue reference sales with EMC for DMX and VMAX. Likewise they could also go to Bustech and/or Luminex who have open systems to mainframe gateways (including VTL support) under a custom or special solution sale. Ironically EMC has OEMed in the past Bustech to transform their high end storage into Mainframe VTLs (not to be confused with Falconstor or Quantum for open system) as well as Datadomain partnered with Luminex.

BTW, did you know that Dell has had for several years a group or team that handles specialized storage solutions addressing needs outside the usual product portfolio?

Thus IMHO Dells enterprise class focus will be that for open systems large scale out where they will compete with EMC DMX and VMAX, HDS USP or their soon to be announced enhancements, HP and their Hitachi Japan OEMed XP, IBM and the DS8000 as well as the seldom heard about yet equally scalable Fujitsu Eternus systems.

 

Why only 1.15B, after all they paid 1.4B for EqualLogic?
IMHO, had this deal occurred a couple of years ago when some valuations were still flying higher than today, and 3PAR were at their current sales run rate, customer deployment situations, it is possible the amount would have been higher, either way, this is still a great value for both Dell and 3PAR investors, customers, employees and partners.

 

Does this mean Dell dumps EMC?
Near term I do not think Dell dumps the EMC dudes (or dudettes) as there is still plenty of business in the mid market for the two companies. However, over time, I would expect that Dell will unleash the 3PAR folks into the space where normally a CLARiiON CX would have been positioned such as deals just above where EqualLogic plays, or where Fibre Channel is preferred. Likewise, I would expect Dell to empower the 3PAR team to go after additional higher end deals where a DMX or VMAX would have been the previous option not to mention where 3PAR has had success.

This would also mean extending into sales against HP EVA and XPs, IBM DS5000 and DS8000 as well as XIV, Oracle/Sun 6000 and 7000s to name a few. In other words there will be some spin around coopition, however longer term you can read the writing on the wall. Oh, btw, lest you forget, Dell is first and foremost a server company who now is getting into storage in a much bigger way and EMC is first and foremost a storage company who is getting into severs via VMware as well as their Cisco partnerships.

Are shots being fired across each other bows? I will leave that up to you to speculate.

 

Does this mean Dell MD1000/MD3000 iSCSI, SAS and FC disappears?
I do not think so as they have had a specific role for entry level below where the EqualLogic iSCSI only solution fits providing mixed iSCSI, SAS and Fibre Channel capabilities to compete with the HP MSA2000 (OEMed by Dothill) and IBM DS3000 (OEMed from LSI). While 3PAR could be taken down into some of these markets, which would also potentially dilute the brand and thus premium margin of those solutions.

Likewise, there is a play with server vendors to attach shared SAS external storage to small 2 and 4 node clusters for VMware, HyperV, Exchange, SQL, SharePoint and other applications where iSCSI or Fibre Channel are to expensive or not needed or where NAS is not a fit. Another play for the shared external SAS attached is for attaching low cost storage to scale out clustered NAS or bulk storage where software such as Exanet runs on a Dell server. Take a closer look at how HP is supporting their scale out as well as IBM and Oracle among others. Sure you can find iSCSI or Fibre Channel or even NAS back end to file servers. However growing trend of using shared SAS.

 

Does Dell now have too many different storage systems and solutions in their portfolio?
Possibly depending upon how you look at it and certainly the potential is there for revenue prevention teams to get in the way of each other instead of competing with external competitors. However if you compare the Dell lineup with those of EMC, HP, IBM and Oracle/Sun among others, it is not all that different. Note that HP, IBM and Oracle also have something in common with Dell in that they are general IT resource providers (servers, storage, networks, services, hardware and software) as compared to other traditional storage vendors.

Consequently if you look at these vendors in terms of their different markets from consumer to prosumer to SOHO at the low end of the SMB to SME that sits between SMB and enterprise, they have diverse customer needs. Likewise, if you look at these vendors server offerings, they too are diverse ranging from desktops to floor standing towers to racks, high density racks and blade servers that also need various tiers, architectures, price bands and purposed storage functionality.

 

What will be key for Dell to make this all work?
The key for Dell will be similar to that of their competitors which is to clearly communicate the value proposition of the various products or solutions, where, who and what their target markets are and then execute on those plans. There will be overlap and conflict despite the best spin as is always the case with diverse portfolios by vendors.

However if Dell can keep their teams focused on expanding their customer footprints at the expense of their external competition vs. cannibalizing their own internal product lines, not to mention creating or extending into new markets or applications. Consequently Dell now has many tools in their tool box and thus need to educate their solution teams on what to use or sell when, where, why and how instead of just having one tool or a singular focus. In other words, while a great solution, Dell no longer has to respond with the solution to everything is iSCSI based EqualLogic.

Likewise Dell can leverage the same emotion and momentum behind the EqualLogic teams to invigorate and unleash the best with 3PAR teams and solution into or onto the higher end of the SMB, SME and enterprise environments.

Im still thinking that Exanet is a diamond in the rough for Dell where they can install the clustered scalable NAS software onto their servers and use either lower end shared SAS RAID (e.g. MD3000), or iSCSI (MD3000, EqualLogic or 3PAR) or higher end Fibre Channel with 3PAR) for scale out, cloud and other bulk solutions competing with HP, Oracle and IBM. Dell still has the Windows based storage server for entry level multi protocol block and file capabilities as well as what they OEM from EMC.

 

Is Dell done shopping?
IMHO I do not think so as there are still areas where Dell can extend their portfolio and not just in storage. Likewise there are still some opportunities or perhaps bargains out there for fall and beyond acquisitions.

 

Does this mean that Dell is not happy with EqualLogic and iSCSI
Simply put from my perspective talking with Dell customers, prospects, and partners and seeing them all in action nothing could be further from Dell not being happy with iSCSI or EqualLogic. Look at this as being a way to extend the Dell story and capabilities into new markets, granted the EqualLogic folks now have a new sibling to compete with internal marketing and management for love and attention.

 

Isnt Dell just an iSCSI focused company?
A couple of years I was quoted in one of the financial analysis reports as saying that Dell needed to remain open to various forms of storage instead of becoming singularly focused on just iSCSI as a result of the EqualLogic deal. I standby that statement in that Dell to be a strong enterprise contender needs to have a balanced portfolio across different price or market bands, from block to file, from shared SAS to iSCSI to Fibre Channel and emerging FCoE.

This also means supporting traditional NAS across those different price band or market sectors as well as support for emerging and fast growing unstructured data markets where there is a need for scale out and bulk storage. Thus it is great to see Dell remaining open minded and not becoming singularly focused on just iSCSI instead providing the right solution to meet their diverse customer as well as prospect needs or opportunities.

While EqualLogic was and is a very successfully iSCSI focused storage solution not to mention one that Dell continues to leverage, Dell is more than just iSCSI. Take a look at Dells current storage line up as well as up in table 1 and there is a lot of existing diversity. Granted some of that current diversity is via partners which the 3PAR deal helps to address. What this means is that iSCSI continues to grow in popularity however there are other needs where shared SAS or Fibre Channel or FCoE will be needed opening new markets to Dell.

 

Bottom line and wrap up (for now)
This is a great move for Dell (as well as 3PAR) to move up market in the storage space with less reliance on EMC. Assuming that Dell can communicate the what to use when, where, why and how to both their internal teams, partners as well as industry and customers not to mention then execute on, they should have themselves a winner.

Will this deal end up being an even better bargain than when Dell paid $1.4B for EqualLogic?

Not sure yet, it certainly has potential if Dell can execute on their plans without losing momentum in any other their other areas (products).

Whats your take?

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Here are some related links to read more

Data footprint reduction (Part 2): Dell, IBM, Ocarina and Storwize

Dell

IBM

Over the past couple of weeks there has been a flurry of IT industry activity around data footprint impact reduction with Dell buying Ocarina and IBM acquiring Storwize. For those who want the quick (compacted, reduced) synopsis of what Dell buying Ocarina as well as IBM acquiring Storwize means read the first post in this two part series as well as some of my comments here and here.

This piece and it companion in part I of this two part series is about expanding the discussion to the much larger opportunity for vendors or vars of overall data footprint impact reduction beyond where they are currently focused. Likewise, this is about IT customers realizing that there are more opportunities to address data and storage optimization across your entire organization using various techniques instead of just focusing on backup or vmware virtual servers.

Who is Ocarina and Storwize?
Ocarina is a data and storage management software startup focused on data footprint reduction using a variety of approaches, techniques and algorithms. They differ from the traditional data dedupers (e.g. Asigra, Bakbone, Commvault, EMC Avamar, Datadomain and Networker, Exagrid, Falconstor, HP, IBM Protectier and TSM, Quantum, Sepaton and Symantec among others) by looking at data footprint reduction beyond just backup.

This means looking at how to reduce data footprint across different types of data including videos, image as well as text based documents among others. As a result, the market sweet spot for Ocarina is for general data footprint reduction including static along with active data including entertainment, video surveillance or gaming, reference data, web 2.0 and other bulk storage application data needs (this should compliment Dells recent Exanet acquisition).

What this means is that Ocarina is very well suited to address the rapidly growing amount of unstructured data that may not otherwise be handled as efficiently with by dedupe alone.

Storwize is a data and storage management startup focused on data footprint reduction using inline compression with an emphasis on maintaining performance for reads as well as writes of unstructured as well as structured database data. Consequently the market sweet spot for Storwize is around boosting the capacity of existing NAS storage systems from different vendors without negatively impacting performance. The trade off of the Storwize approach is that you do not get the spectacular data reduction ratios associated with backup centric or focused dedupe, however, you maintain performance associated with online storage that some dedupers dream of.

Both Dell and IBM have existing dedupe solutions for general purpose as well as backup along with other data footprint impact reduction tools (either owned or via partners). Now they are both expanding their focus and reach similar to what others such as EMC, HP, NetApp, Oracle and Symantec among others are doing. What this means is that someone at Dell and IBM see that there is much more to data footprint impact reduction than just a focus on dedupe for backup.

Wait, what does all of this discussion (or read here for background issues, challenges and opportunities) about unstructured data and changing access lifecycles have to do with dedupe, Ocarina and Storwize?

Continue reading on as this is about the expanding opportunity for data footprint reduction across entire organizations. That is, more data is being kept online and expanding data footprint impact needs to be addressed to meet business objectives using various techniques balancing performance, availability, capacity and energy or economics (PACE).

Dell

IBM

What does all of this have to do with IBM buying Storwize and Dell acquiring Ocarina?
If you have not pieced this together yet, let me net it out.

This is about the opportunity to address the organization wide expanding data footprint impact across all applications, types of data as well as tiers of storage to support business growth (more data to store) while maintaining QoS yet reduce per unit costs including management.

This is about expanding the story to the broader data footprint impact reduction from the more narrowly focused backup and dedupe discussion which are still in their infancy on a relative basis to their full market potential (read more here).

Now are you seeing where this is going and fits?

Does this mean IBM and Dell defocus on their existing Dedupe product lines or partners?
I do not believe so, at least as long as their respective revenue prevention departments are kept on the sidelines and off of the field of play. What I mean by this is that the challenge for IBM and Dell is similar to that of what others such as EMC are faced with having diverse portfolios or technology toolboxes. The challenge is messaging to the bigger issues, then aligning the right tool to the task at hand to address given issues and opportunities instead of singularly focused on a specific product causing revenue prevention elsewhere.

As an example, for backup, I would expect Dell to continue to work with its existing dedupe backup centric partners and technologies however find new opportunities to leverage their Ocarina solution. Likewise, IBM I would expect to continue to show customers where Tivoli software based dedupe or Protectier (aka the deduper formerly known as Diligent) or other target based dedupe fits and expand into other data footprint impact areas with Storewize.

Does this change the playing field?
IMHO these moves as well as some previous moves by the likes of EMC and NetApp among others are examples of expanding the scope and dimension of the playing field. That is, the focus is much more than just dedupe for backup or of virtual machines (e.g. VMware vSphere or Microsoft HyperV).

This signals a growing awareness around the much larger and broader opportunity around organization wide data footprint impact reduction. In the broader context some applications or data gets compressed either in application software such as databases, file systems, operating systems or even hypervisors as well as in networks using protocol or bandwidth optimizers as well as inline compression or post processing techniques as has been the case with streaming tape devices for some time.

This also means that where with dedupe the primary focus or marketing angle up until recently has been around reduction ratios, to meet the needs of time or performance sensitive applications data transfer rates also become important.

Hence the role of policy based data footprint reduction where the right tool or technique to meet specific service requirements is applied. For those vendors with a diverse data footprint impact reduction tool kit including archive, compression, dedupe, thin provision among other techniques, I would expect to hear expanded messaging around the theme of applying the right tool to the task at hand.

Does this mean Dell bought Ocarina to accessorize EqualLogic?
Perhaps, however that would then beg the question of why EqualLogic needs accessorizing. Granted there are many EqualLogic along with other Dell sold storage systems attached to Dell and other vendors servers operating as NFS or Windows CIFS file servers that are candidates for Ocarina. However there are also many environments that do not yet include Dell EqualLogic solutions where Ocarina is a means for Dell to extend their reach enabling those organizations to do more with what they have while supporting growth.

In other words, Ocarina can be used to accessorize, or, it can be used to generate and create pull through for various Dell products. I also see a very strong affinity and opportunity for Dell to combine their recent Exanet NAS storage clustering software with Dell servers, storage to create bulk or scale out solutions similar to what HP and other vendors have done. Of course what Dell does with the Ocarina software over time, where they integrate it into their own products as well as OEM to others should be interesting to watch or speculate upon.

Does this mean IBM bought Storwize to accessorize XIV?
Well, I guess if you put a gateway (or software on a server which is the same thing) in front of XIV to transform it into a NAS system, sure, then Storwize could be used to increase the net usable capacity of the XIV installed base. However that is a lot of work and cost for what is on a relative basis a small footprint, yet it is a viable option never the less.

IMHO IBM has much more of a play, perhaps a home run by walking before they run by placing Storwize in front of their existing large installed base of NetApp N series (not to mention targeting NetApps own install base) as well as complimenting their SONAS solutions. From there as IBM gets their legs and mojo, they could go on the attack by going after other vendors NAS solutions with an efficiency story similar to how IBM server groups target other vendors server business for takeout opportunities except in a complimenting manner.

Longer term I would not be surprised to see IBM continue development of the block based IP (as well as file) in the storwize product for deployment in solutions ranging from SVC to their own or OEM based products along with articulating their comprehensive data footprint reduction solution portfolio. What will be important for IBM to do is articulating what solution to use when, where, why and how without confusing their customers, partners and rest of the industry (something that Dell will also have to do).

Some links for additional reading on the above and related topics

Wrap up (for now)

Organizations of all shape and size are encountering some form of growing data footprint impact that currently, or soon will need to be addressed. Given that different applications and types of data along with associated storage mediums or tiers have various performance, availability, capacity, energy as well as economic characteristics multiple data footprint impact reduction tools or techniques are needed. What this all means is that the focus of data footprint reduction is expanding beyond that of just dedupe for backup or other early deployment scenarios.

Note what this means is that dedupe has an even brighter future than where it currently is focused which is still only scratching the surface of potential market adoption as was discussed in part 1 of this series.

However this also means that dedupe is not the only solution to all data footprint reduction scenarios. Other techniques including archiving, compression, data management, thin provisioning, data deletion, tiered storage and consolidation will start to gain respect, coverage discussions and debates.

Bottom line, use the most applicable technologies or combinations along with best practice for the task and activity at hand.

For some applications reduction ratios are an important focus on the tools or modes of operations that achieve those results.

Likewise for other applications where the focus is on performance with some data reduction benefit, tools are optimized for performance first and reduction secondary.

Thus I expect messaging from some vendors to adjust (expand) to those capabilities that they have in their toolboxes (product portfolios) offerings

Consequently, IMHO some of the backup centric dedupe solutions may find themselves in niche roles in the future unless they can diversity. Vendors with multiple data footprint reduction tools will also do better than those with only a single function or focused tool.

However for those who only have a single or perhaps a couple of tools, well, guess what the approach and messaging will be. After all, if all you have is a hammer everything looks like a nail, if all you have is a screw driver, well, you get the picture.

On the other hand, if you are still not clear on what all this means, send me a note, give a call, post a comment or a tweet and will be happy to discuss with you.

Oh, FWIW, if interested, disclosure: Storwize was a client a couple of years ago.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Data footprint reduction (Part 1): Life beyond dedupe and changing data lifecycles

Over the past couple of weeks there has been a flurry of IT industry activity around data footprint impact reduction with Dell buying Ocarina and IBM acquiring Storwize. For those who want the quick (compacted, reduced) synopsis of what Dell buying Ocarina as well as IBM acquiring Storwize means read this post here along with some of my comments here and here.

Now, before any Drs or Divas of Dedupe get concerned and feel the need to debate dedupes expanding role, success or applicability, relax, take a deep breath, then read on and take another breath before responding if so inclined.

The reason I mention this is that some may mistake this as a piece against or not in favor of dedupe as it talks about life beyond dedupe which could be mistaken as indicating dedupes diminished role which is not the case (read ahead and see figure 5 to see the bigger picture).

Likewise some might feel that since this piece talks about archiving for compliance and non regulatory situations along with compression, data management and other forms of data footprint reduction they may be compelled to defend dedupes honor and future role.

Again, relax, take a deep breath and read on, this is not about the death of dedupe.

Now for others, you might wonder why the dedupe tongue in check humor mentioned above (which is what it is) and the answer is quite simple. The industry in general is drunk on dedupe and in some cases thus having numbed its senses not to mention having blurred its vision of the even bigger opportunities for the business benefits of data footprint reduction beyond todays backup centric or vmware server virtualization dedupe discussions.

Likewise, it is time for the industry to wake (or sober) up and instead of trying to stuff everything under or into the narrowly focused dedupe bottle. Instead, realize that there is a broader umbrella called data footprint impact reduction which includes among other techniques, dedupe, archive, compression, data management, data deletion and thin provisioning across all types of data and applications. What this means is a broader opportunity or market than what exists or being discussed today leveraging different techniques, technologies and best practices.

Consequently this piece is about expanding the discussion to the larger opportunity for vendors or vars to extend their focus to the bigger world of overall data footprint impact reduction beyond where currently focused. Likewise, this is about IT customers realizing that there are more opportunities to address data and storage optimization across your entire organization using various techniques instead of just focusing on backup.

In other words, there is a very bright future for dedupe as well as other techniques and technologies that fall under the data footprint reduction umbrella including data stored online, offline, near line, primary, secondary, tertiary, virtual and in a public or private cloud..

Before going further however lets take a step back and look at some business along with IT issues, challenges and opportunities.

What is the business and IT issue or challenge?
Given that there is no such thing as a data or information recession shown in figure 1, IT organizations of all size are faced with the constant demand to store more data, including multiple copies of the same or similar data, for longer periods of time.


Figure 1: IT resource demand growth continues

The result is an expanding data footprint, increased IT expenses, both capital and operational, due to additional Infrastructure Resource Management (IRM) activities to sustain given levels of application Quality of Service (QoS) delivery shown in figure 2.

Some common IT costs associated with supporting an increased data footprint include among others:

  • Data storage hardware and management software tools acquisition
  • Associated networking or IO connectivity hardware, software and services
  • Recurring maintenance and software renewal fees
  • Facilities fees for floor space, power and cooling along with IT staffing
  • Physical and logical security for data and IT resources
  • Data protection for HA, BC or DR including backup, replication and archiving


Figure 2: IT Resources and cost balancing conflicts and opportunities

Figure 2 shows the result is that IT organizations of all size are faced with having to do more with what they have or with less including maximizing available resources. In addition, IT organizations often have to overcome common footprint constraints (available power, cooling, floor space, server, storage and networking resources, management, budgets, and IT staffing) while supporting business growth.

Figure 2 also shows that to support demand, more resources are needed (real or virtual) in a denser footprint, while maintaining or enhancing QoS plus lowering per unit resource cost. The trick is improving on available resources while maintaining QoS in a cost effective manner. By comparison, traditionally if costs are reduced, one of the other curves (amount of resources or QoS) are often negatively impacted and vice versa. Meanwhile in other situations the result can be moving problems around that later resurface elsewhere. Instead, find, identify, diagnose and prescribe the applicable treatment or form of data footprint reduction or other IT IRM technology, technique or best practices to cure the ailment.

What is driving the expanding data footprint?
Granted more data can be stored in the same or smaller physical footprint than in the past, thus requiring less power and cooling per Gbyte, Tbyte or PByte. Data growth rates necessary to sustain business activity, enhanced IT service delivery and enable new applications are placing continued demands to move, protect, preserve, store and serve data for longer periods of time.

The popularity of rich media and Internet based applications has resulted in explosive growth of unstructured file data requiring new and more scalable storage solutions. Unstructured data includes spreadsheets, Power Point, slide decks, Adobe PDF and word documents, web pages, video and audio JPEG, MP3 and MP4 files. This trend towards increasing data storage requirements does not appear to be slowing anytime soon for organizations of all sizes.

After all, there is no such thing as a data or information recession!

Changing data access lifecycles
Many strategies or marketing stories are built around the premise that shortly after data is created data is seldom, if ever accessed again. The traditional transactional model lends itself to what has become known as information lifecycle management (ILM) where data can and should be archived or moved to lower cost, lower performing, and high density storage or even deleted where possible.

Figure 3 shows as an example on the left side of the diagram the traditional transactional data lifecycle with data being created and then going dormant. The amount of dormant data will vary by the type and size of an organization along with application mix. 


Figure 3: Changing access and data lifecycle patterns

However, unlike the transactional data lifecycle models where data can be removed after a period of time, Web 2.0 and related data needs to remain online and readily accessible. Unlike traditional data lifecycles where data goes dormant after a period of time, on the right side of figure 3, data is created and then accessed on an intermittent basis with variable frequency. The frequency between periods of inactivity could be hours, days, weeks or months and, in some cases, there may be sustained periods of activity.

A common example is a video or some other content that gets created and posted to a web site or social networking site such as Face book, Linked in, or You Tube among others. Once the content is discussed, while it may not change, additional comment and collaborative data can be wrapped around the data as additional viewers discover and comment on the content. Solution approaches for the new category and data lifecycle model include low cost, relative good performing high capacity storage such as clustered bulk storage as well as leveraging different forms of data footprint reduction techniques.

Given that a large (and growing) percentage of new data is unstructured, NAS based storage solutions including clustered, bulk, cloud and managed service offerings with file based access are gaining in popularity. To reduce cost along with support increased business demands (figure 2), a growing trend is to utilize clustered, scale out and bulk NAS file systems that support NFS, CIFS for concurrent large and small IOs as well as optionally pNFS for large parallel access of files. These solutions are also increasingly being deployed with either built in or add on accessorized data footprint reduction techniques including archive, policy management, dedupe and compression among others.

What is your data footprint impact?
Your data footprint impact is the total data storage needed to support your various business application and information needs. Your data footprint may be larger than how much actual data storage you have as seen in figure 4. In Figure 4, an example is an organization that has 20TBytes of storage space allocated and being used for databases, email, home directories, shared documents, engineering documents, financial and other data in different formats (structured and unstructured) not to mention varying access patterns.


Figure 4: Expanding data footprint due to data proliferation and copies being retained

Of the 20TBytes of data allocated and used, it is very likely that the consumed storage space is not 100 percent used. Database tables may be sparsely (empty or not fully) allocated and there is likely duplicate data in email and other shared documents or folders. Additionally, of the 20TBytes, 10TBytes are duplicated to three different areas on a regular basis for application testing, training and business analysis and reporting purposes.

The overall data footprint is the total amount of data including all copies plus the additional storage required for supporting that data such as extra disks for Redundant Array of Independent Disks (RAID) protection or remote mirroring.

In this overly simplified example, the data footprint and subsequent storage requirement are several times that of the 20TBytes of data. Consequently, the larger the data footprint the more data storage capacity and performance bandwidth needed, not to mention being managed, protected and housed (powered, cooled, situated in a rack or cabinet on a floor somewhere).

Data footprint reduction techniques
While data storage capacity has become less expensive on a relative basis, as data footprint continue to expand in order to support business requirements, more IT resources will be needed to be made available in a cost effective, yet QoS satisfying manner (again, refer back to figure 2). What this means is that more IT resources including server, storage and networking capacity, management tools along with associated software licensing and IT staff time will be required to protect, preserve and serve information.

By more effectively managing the data footprint across different applications and tiers of storage, it is possible to enhance application service delivery and responsiveness as well as facilitate more timely data protection to meet compliance and business objectives. To realize the full benefits of data footprint reduction, look beyond backup and offline data improvements to include online and active data using various techniques such as those in table 1 among others.

There are several methods (shown in table 1) that can be used to address data footprint proliferation without compromising data protection or negatively impacting application and business service levels. These approaches include archiving of structured (database), semi structured (email) and unstructured (general files and documents), data compression (real time and offline) and data deduplication.

 

Archiving

Compression

Deduplication

When to use

Structured (database), email and unstructured

Online (database, email, file sharing), backup or archive

Backup or archiving or recurring and similar data

Characteristic

Software to identify and remove unused data from active storage devices

Reduce amount of data to be moved (transmitted) or stored on disk or tape.

Eliminate duplicate files or file content observed over a period of time to reduce data footprint

Examples

Database, email, unstructured file solutions with archive storage

Host software, disk or tape, (network routers) and compression appliances or software as well as appearing in some primary storage system solutions

Backup and archive target devices and Virtual Tape Libraries (VTLs), specialized appliances

Caveats

Time and knowledge to know what and when to archive and delete, data and application aware

Software based solutions require host CPU cycles impacting application performance

Works well in background mode for backup data to avoid performance impact during data ingestion

Table 1: Data footprint reduction approaches and techniques

Archiving for compliance and general data retention
Data archiving is often perceived as a solution for compliance, however, archiving can be used for many other non compliance purposes. These include general data footprint reduction, to boost performance and enhance routine data maintenance and data protection. Archiving can be applied to structured databases data, semi structured email data and attachments and unstructured file data.

A key to deploying an archiving solution is having insight into what data exists along with applicable rules and policies to determine what can be archived, for how long, how many copies and how data ultimately may be finally retired or deleted. Archiving requires a combination of hardware, software and people to implement business rules.

A challenge with archiving is having the time and tools available to identify what data should be archived and what data can be securely destroyed when no longer needed. Further complicating archiving is that knowledge of the data value is also needed; this may well include legal issues as to who is responsible for making decisions on what data to keep or discard.

If a business can invest in the time and software tools, as well as identify which data to archive to support an effective archive strategy, the returns can be very positive towards reducing the data footprint without limiting the amount of information available for use.

Data compression (real time and offline)
Data compression is a commonly used technique for reducing the size of data being stored or transmitted to improve network performance or reduce the amount of storage capacity needed for storing data. If you have used a traditional or TCP/IP based telephone or cell phone, watched either a DVD or HDTV, listened to an MP3, transferred data over the internet or used email you have most likely relied on some form of compression technology that is transparent to you. Some forms of compression are time delayed, such as using PKZIP to zip files, while others are real time or on the fly based such as when using a network, cell phone or listening to an MP3.

Two different approaches to data compression that vary in time delay or impact on application performance along with the amount of compression and loss of data are loss less (no data loss) and lossy (some data loss for higher compression ratio). In addition to these approaches, there are also different implementations of including real time for no performance impact to applications and time delayed where there is a performance impact to applications.

In contrast to traditional ZIP or offline, time delayed compression approaches that require complete decompression of data prior to modification, online compression allows for reading from, or writing to, any location within a compressed file without full file decompression and resulting application or time delay. Real time appliance or target based compression capabilities are well suited for supporting online applications including databases, OLTP, email, home directories, web sites and video streaming among others without consuming host server CPU or memory resources or degrading storage system performance.

Note that with the increase of CPU server processing performance along with multiple cores, server based compression running in applications such as database, email, file systems or operating systems can be a viable option for some environments.

A scenario for using real time data compression is for time sensitive applications that require large amounts of data such as online databases, video and audio media servers, web and analytic tools. For example, databases such as Oracle support NFS3 Direct IO (DIO) and Concurrent IO (CIO) capabilities to enable random and direct addressing of data within an NFS based file. This differs from traditional NFS operations where a file would be sequential read or written.

Another example of using real time compression is to combine a NAS file server configured with 300GB or 600GB high performance 15.5K Fibre Channel or SAS HDDs in addition to flash based SSDs to boost the effective storage capacity of active data without introducing a performance bottleneck associated with using larger capacity HDDs. Of course, compression would vary with the type of solution being deployed and type of data being stored just as dedupe ratios will differ depending on algorithm along with if text or video or object based among other factors.

Deduplication (Dedupe)
Data deduplication (also known as single instance storage, commonalty factoring, data difference or normalization) is a data footprint reduction technique that eliminates the occurrence of the same data. Deduplication works by normalizing the data being backed up or stored by eliminating recurring or duplicate copies of files or data blocks depending on the implementation.

Some data deduplication solutions boast spectacular ratios for data reduction given specific scenarios, such as backup of repetitive and similar files, while providing little value over a broader range of applications.

This is in contrast with traditional data compression approaches that provide lower, yet more predictable and consistent data reduction ratios over more types of data and application, including online and primary storage scenarios. For example, in environments where there is little to no common or repetitive data files, data deduplication will have little to no impact while data compression generally will yield some amount of data footprint reduction across almost all types of data.

Some data deduplication solution providers have either already added, or have announced plans to add, compression techniques to compliment and increase the data footprint effectiveness of their solutions across a broader range of applications and storage scenarios, attesting to the value and importance of data compression to reduce data footprint.

When looking at deduplication solutions, determine if the solution is designed to scale in terms of performance, capacity and availability over a large amount of data along with how restoration of data will be impacted by scaling for growth. Other items to consider include how data is reduplicated, such as real time using inline or some form of time delayed post processing, and the ability to select the mode of operation.

For example, a dedupe solution may be able to process data at a specific ingest rate inline until a certain threshold is hit and then processing reverts to post processing so as to not cause a performance degradation to the application writing data to the deduplication solution. The downside of post processing is that more storage is needed as a buffer. It can, however, also enable solutions to scale without becoming a bottleneck during data ingestion.

However, there is life beyond dedupe which is to in no way diminish dedupe or its very strong and bright future, one that Im increasingly convinced of having talked with hundreds of IT professionals (e.g. the customers) is that only the surface is being scratched for dedupe, not to mention larger data footprint impact opportunity seen in figure 5.


Figure 5: Dedupe adoption and deployment waves over time

While dedupe is a popular technology from a discussion standpoint and has good deployment traction, it is far from reaching mass customer adoption or even broad coverage in environments where it is being used. StorageIO research shows broadest adoption of dedupe centered around backup in smaller or SMB environments (dedupe deployment wave one in figure 5) with some deployment in Remote Office Branch Office (ROBO) work groups as well as departmental environments.

StorageIO research also shows that complete adoption in many of those SMB, ROBO, work group or smaller environments has yet to reach 100 percent. This means that there remains a large population that has yet to deploy dedupe as well as further opportunities to increase the level of dedupe deployment by those already doing so.

There has also been some early adoption in larger core IT environments where dedupe coexists with complimenting existing data protection and preservation practices. Another current deployment scenario for dedupe has been for supporting core edge deployments in larger environments that provide support for backup and data protection of ROBO, work group and departmental systems.

Note that figure 5 simply shows the general types of environments in which dedupe is being adopted and not any sort of indicators as to the degree of deployment by a given customer or IT environment.

What to do about your expanding data footprint impact?
Develop an overall data foot reduction strategy that leverages different techniques and technologies addressing online primary, secondary and offline data. Assess and discover what data exists and how it is used in order to effectively manage storage needs.

Determine policies and rules for retention and deletion of data combining archiving, compression (online and offline) and dedupe in a comprehensive data footprint strategy. The benefit of a broader, more holistic, data footprint reduction strategy is the ability to address the overall environment, including all applications that generate and use data as well as IRM or overhead functions that compound and impact the data footprint.

Data footprint reduction: life beyond (and complimenting) dedupe
The good news is that the Drs. and Divas of dedupe marketing (the ones who also are good at the disco dedupe dance debates) have targeted backup as an initial market sweet (and success) spot shown in figure 5 given the high degree of duplicate data.


Figure 6: Leverage multiple data footprint reduction techniques and technologies

However that same good news is bad news in that there is now a stigma that dedupe is only for backup, similar to how archive was hijacked by the compliance marketing folks in the post Y2K era. There are several techniques that can be used individually to address specific data footprint reduction issues or in combination as seen in figure 7 to implement a more cohesive and effective data footprint reduction strategy.


Figure 7: How various data footprint reduction techniques are complimentary

What this means is that both archive, dedupe as well as other forms of data footprint reduction can and should be used beyond where they have been target marketed using the applicable tool for the task at hand. For example, a common industry rule of thumb is that on average, ten percent of data changes per day (your mileage and rate of change will certainly vary given applications, environment and other factors).

Now assuming that you have 100TB (feel free to subtract a zero or two, or add as many as needed) of data (note I did not say storage capacity or percent utilized), ten percent change would be 10TB that needs to be backed up, replicated and so forth. Now with basic 2 to 1 streaming tape compression (2.5 to 1 in upcoming LTO enhancements) would reduce the daily backup footprint from 10TB to 5TB.

Using dedupe with 10 to 1 would get that from 10TB down to 1TB or about the size of a large capacity disk drive. With 20 to 1 that cuts the daily backup down to 500GB and so forth. The net effect is that more daily backups can be stored in the same footprint which in turn helps expedite individual file recover by having more options to choose from off of the disk based cache, buffer or storage pool.

On the other hand, if your objective is to reduce and eliminate storage capacity, then the same amount of backups can be stored on less disk freeing up resources. Now take the savings times the number of days in your backup retention and you should see the numbers start to add up.

Now what about the other 90 percent of the data that may not have changed, or, that did change and exists on higher performance storage?

Can its footprint impact be reduced?

The answer should be perhaps or it depends as well as prompts the question of what tool would be best. There is a popular thinking as is often the case with industry buzzwords or technologies to use it everywhere. After all goes the thinking, if it is a good thing why not use and deploy more of it everywhere?

Keep in mind that dedupe trades time to perform thinking and apply intelligence to further reduce data in exchange for space capacity. Thus trading time for space capacity can have a negative impact on applications that need lower response time, higher performance where the focus is on rates vs ratios. For example, the other 90 to 100 percent of the data in the above example may have to be on a mix of high and medium performance storage to meet QoS or service level agreement (SLA) objectives. While it would fun or perhaps cool to try and achieve a high data reduction ratio on the entire 100TB of active data with dedupe (e.g. trying to achieve primary dedupe), the performance impacts could have a negative impact.

The option is to apply a mix of different data footprint reduction techniques across the entire 100TB. That is, use dedupe where applicable and higher reduction ratios can be achieved while balancing performance, compression used for streaming data to tape for retention or archive as well as in databases or other applications software not to mention in networks. Likewise, use real time compression or what some refer to as primary dedupe for online active changing data along with online static read only data.

Deploy a comprehensive data footprint reduction strategy combining various techniques and technologies to address point solution needs as well as the overall environment, including online, near line for backup, and offline for archive data.

Lets not forget about archiving, thin provisioning, space saving snapshots, commonsense data management among other techniques across the entire environment. In other words, if your focus is just on dedupe for backup to
achieve an optimized and efficient storage environment, you are also missing

out on a larger opportunity. However, this also means having multiple tools or

technologies in your IT IRM toolbox as well as understanding what to use when, where and why.

Data transfer rates is a key metric for performance (time) optimization such as meeting backup or restore or other data protection windows. Data reduction ratios is a key metric for capacity (space) optimization where the focus is on storing as much data in a given footprint

Some additional take away points:

  • Develop a data footprint reduction strategy for online and offline data
  • Energy avoidance can be accomplished by powering down storage
  • Energy efficiency can be accomplished by using tiered storage to meet different needs
  • Measure and compare storage based on idle and active workload conditions
  • Storage efficiency metrics include IOPS or bandwidth per watt for active data
  • Storage capacity per watt per footprint and cost is a measure for in active data
  • Small percentage reductions on a large scale have big benefits
  • Align the applicable form of virtualization for the given task at hand

Some links for additional reading on the above and related topics

Wrap up (for now, read part II here)

For some applications reduction ratios are an important focus on the tools or modes of operations that achieve those results.

Likewise for other applications where the focus is on performance with some data reduction benefit, tools are optimized for performance first and reduction secondary.

Thus I expect messaging from some vendors to adjust (expand) to those capabilities that they have in their toolboxes (product portfolios) offerings

Consequently, IMHO some of the backup centric dedupe solutions may find themselves in niche roles in the future unless they can diversity. Vendors with multiple data footprint reduction tools will also do better than those with only a single function or focused tool.

However for those who only have a single or perhaps a couple of tools, well, guess what the approach and messaging will be.

After all, if all you have is a hammer everything looks like a nail, if all you have is a screw driver, well, you get the picture.

On the other hand, if you are still not clear on what all this means, send me a note, give a call, post a comment or a tweet and will be happy to discuss with you.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

July 2010 Odds and Ends: Perspectives, Tips and Articles

Here are some items that have been added to the main StorageIO website news, tips and articles, video podcast related pages that pertain to a variety of topics ranging from data storage, IO, networking, data centers, virtualization, Green IT, performance, metrics and more.

These content items include various odds and end pieces such as industry or technology commentary, articles, tips, ATEs (See additional ask the expert tips here) or FAQs as well as some video and podcasts for your mid summer (if in the northern hemisphere) enjoyment.

The New Green IT: Productivity, supporting growth, doing more with what you have

Energy efficient and money saving Green IT or storage optimization are often associated to mean things like MAID, Intelligent Power Management (IPM) for servers and storage disk drive spin down or data deduplication. In other words, technologies and techniques to minimize or avoid power consumption as well as subsequent cooling requirements which for some data, applications or environments can be the case. However there is also shifting from energy avoidance to that of being efficient, effective, productive not to mention profitable as forms of optimization. Collectively these various techniques and technologies help address or close the Green Gap and can reduce the amount of Green IT confusion in the form of boosting productivity (same goes for servers or networks) in terms of more work, IOPS, bandwidth, data moved, frames or packets, transactions, videos or email processed per watt per second (or other unit of time).

Click here to read and listen to my comments about boosting IOPs per watt, or here to learn more about the many facets of energy efficient storage and here on different aspects of storage optimization. Want to read more about the next major wave of server, storage, desktop and networking virtualization? Then click here to read more about virtualization life beyond consolidation where the emphasis or focus expands to abstraction, transparency, enablement in addition to consolidation for servers, storage, networks. If you are interested in metrics and measurements, Storage Resource Management (SRM) not to mention discussion about various macro data center metrics including PUE among others, click on the preceding links.

NAS and Shared Storage, iSCSI, DAS, SAS and more

Shifting gears to general industry trends and commentary, here are some comments on consumer and SOHO storage sharing, the role and importance Value Added Resellers (VARs) serve for SMB environments, as well as the top storage technologies that are in use and remain relevant. Here are some comments on iSCSI which continues to gain in popularity as well as storage options for small businesses.

Are you looking to buy or upgrade a new server? Here are some vendor and technology neutral tips to help determine needs along with requirements to help be a more effective informed buyer. Interested or do you want to know more about Serial Attached SCSI (6Gb/s SAS) including for use as external shared direct attached storage (DAS) for Exchange, Sharepoint, Oracle, VMware or HyperV clusters among other usage scenarios, check out this FAQ as well as podcast. Here are some other items including a podcast about using storage partitions in your data storage infrastructure, an ATE about what type of 1.5TB centralized storage to support multiple locations, and a video on scaling with clustered storage.

That is all for now, hope all is well and enjoy the content.

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio