Clarifying Clustered Storage Confusion

Clustered storage can be iSCSI, Fibre Channel block based or NAS (NFS or CIFS or proprietary file system) file system based. Clustered storage can also be found in virtual tape library (VTL) including dedupe solutions along with other storage solutions such as those for archiving, cloud, medical or other specialized grids among others.

Recently in the IT and data storage specific industry, there has been a flurry of merger and acquisition (M&A) (Here and here), new product enhancement or announcement activity around clustered storage. For example, HP buying clustered file system vendor IBRIX complimenting their previous acquisition of another clustered file system vendor (PolyServe) a few years ago, or, of iSCSI block clustered storage software vendor LeftHand earlier this year. Another recent acquisition is that of LSI buying clustered NAS vendor ONstor, not to mention Dell buying iSCSI block clustered storage vendor EqualLogic about a year and half ago, not to mention other vendor acquisitions or announcements involving storage and clustering.

Where the confusion enters into play is the term cluster which means many things to different people, and even more so when clustered storage is combined with NAS or file based storage. For example, clustered NAS may infer a clustered file system when in reality a solution may only be multiple NAS filers, NAS heads, controllers or storage processors configured for availability or failover.

What this means is that a NFS or CIFS file system may only be active on one node at a time, however in the event of a failover, the file system shifts from one NAS hardware device (e.g. NAS head or filer) to another. On the other hand, a clustered file system enables a NFS or CIFS or other file system to be active on multiple nodes (e.g. NAS heads, controllers, etc.) concurrently. The concurrent access may be for small random reads and writes for example supporting a popular website or file serving application, or, it may be for parallel reads or writes to a large sequential file.

Clustered storage is no longer exclusive to the confines of high-performance sequential and parallel scientific computing or ultra large environments. Small files and I/O (read or write), including meta-data information, are also being supported by a new generation of multipurpose, flexible, clustered storage solutions that can be tailored to support different applications workloads.

There are many different types of clustered and bulk storage systems. Clustered storage solutions may be block (iSCSI or Fibre Channel), NAS or file serving, virtual tape library (VTL), or archiving and object-or content-addressable storage. Clustered storage in general is similar to using clustered servers, providing scale beyond the limits of a single traditional system—scale for performance, scale for availability, and scale for capacity and to enable growth in a modular fashion, adding performance and intelligence capabilities along with capacity.

For smaller environments, clustered storage enables modular pay-as-you-grow capabilities to address specific performance or capacity needs. For larger environments, clustered storage enables growth beyond the limits of a single storage system to meet performance, capacity, or availability needs.

Applications that lend themselves to clustered and bulk storage solutions include:

  • Unstructured data files, including spreadsheets, PDFs, slide decks, and other documents
  • Email systems, including Microsoft Exchange Personal (.PST) files stored on file servers
  • Users’ home directories and online file storage for documents and multimedia
  • Web-based managed service providers for online data storage, backup, and restore
  • Rich media data delivery, hosting, and social networking Internet sites
  • Media and entertainment creation, including animation rendering and post processing
  • High-performance databases such as Oracle with NFS direct I/O
  • Financial services and telecommunications, transportation, logistics, and manufacturing
  • Project-oriented development, simulation, and energy exploration
  • Low-cost, high-performance caching for transient and look-up or reference data
  • Real-time performance including fraud detection and electronic surveillance
  • Life sciences, chemical research, and computer-aided design

Clustered storage solutions go beyond meeting the basic requirements of supporting large sequential parallel or concurrent file access. Clustered storage systems can also support random access of small files for highly concurrent online and other applications. Scalable and flexible clustered file servers that leverage commonly deployed servers, networking, and storage technologies are well suited for new and emerging applications, including bulk storage of online unstructured data, cloud services, and multimedia, where extreme scaling of performance (IOPS or bandwidth), low latency, storage capacity, and flexibility at a low cost are needed.

The bandwidth-intensive and parallel-access performance characteristics associated with clustered storage are generally known; what is not so commonly known is the breakthrough to support small and random IOPS associated with database, email, general-purpose file serving, home directories, and meta-data look-up (Figure 1). Note that a clustered storage system, and in particular, a clustered NAS may or may not include a clustered file system.

Clustered Storage Model: Source The Green and Virtual Data Center (CRC)
Figure 1 – Generic clustered storage model (Courtesy “The Green and Virtual Data Center  (CRC)”

More nodes, ports, memory, and disks do not guarantee more performance for applications. Performance depends on how those resources are deployed and how the storage management software enables those resources to avoid bottlenecks. For some clustered NAS and storage systems, more nodes are required to compensate for overhead or performance congestion when processing diverse application workloads. Other things to consider include support for industry-standard interfaces, protocols, and technologies.

Scalable and flexible clustered file server and storage systems provide the potential to leverage the inherent processing capabilities of constantly improving underlying hardware platforms. For example, software-based clustered storage systems that do not rely on proprietary hardware can be deployed on industry-standard high-density servers and blade centers and utilizes third-party internal or external storage.

Clustered storage is no longer exclusive to niche applications or scientific and high-performance computing environments. Organizations of all sizes can benefit from ultra scalable, flexible, clustered NAS storage that supports application performance needs from small random I/O to meta-data lookup and large-stream sequential I/O that scales with stability to grow with business and application needs.

Additional considerations for clustered NAS storage solutions include the following.

  • Can memory, processors, and I/O devices be varied to meet application needs?
  • Is there support for large file systems supporting many small files as well as large files?
  • What is the performance for small random IOPS and bandwidth for large sequential I/O?
  • How is performance enabled across different application in the same cluster instance?
  • Are I/O requests, including meta-data look-up, funneled through a single node?
  • How does a solution scale as the number of nodes and storage devices is increased?
  • How disruptive and time-consuming is adding new or replacing existing storage?
  • Is proprietary hardware needed, or can industry-standard servers and storage be used?
  • What data management features, including load balancing and data protection, exists?
  • What storage interface can be used: SAS, SATA, iSCSI, or Fibre Channel?
  • What types of storage devices are supported: SSD, SAS, Fibre Channel, or SATA disks?

As with most storage systems, it is not the total number of hard disk drives (HDDs), the quantity and speed of tiered-access I/O connectivity, the types and speeds of the processors, or even the amount of cache memory that determines performance. The performance differentiator is how a manufacturer combines the various components to create a solution that delivers a given level of performance with lower power consumption.

To avoid performance surprises, be leery of performance claims based solely on speed and quantity of HDDs or the speed and number of ports, processors and memory. How the resources are deployed and how the storage management software enables those resources to avoid bottlenecks are more important. For some clustered NAS and storage systems, more nodes are required to compensate for overhead or performance congestion.

Learn more about clustered storage (block, file, VTL/dedupe, archive), clustered NAS, clustered file system, grids and cloud storage among other topics in the following links:

"The Many faces of NAS – Which is appropriate for you?"

Article: Clarifying Storage Cluster Confusion
Presentation: Clustered Storage: “From SMB, to Scientific, to File Serving, to Commercial, Social Networking and Web 2.0”
Video Interview: How to Scale Data Storage Systems with Clustering
Guidelines for controlling clustering
The benefits of clustered storage

Along with other material on the StorageIO Tips and Tools or portfolio archive or events pages.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Why XIV is so important to IBMs storage business – Its Not About the Technology or Product!

Storage I/O trends

Ok, so I know I’m not taking a popular stance on this one from both camps, the IBMers and their faithful followers as well as the growing legion of XIV followers will take exception I’m sure.

Likewise, the nay sayers would argue why not take a real swing and knock the ball out of the park as if it were baseball batting practice. No, I’m going a different route as actually, either of the approaches would be too easy and have been pretty well addressed already.

The IBM XIV product that IBM acquired back in January 2008 is getting a lot of buzz (some good, some not so good) lately in the media and blog sphere (here and here which in turn lead to many others) as well as in various industry and customer discussions.

How ironic that the 2008 version of storage in an election year in the U.S. pits the IBM and XIV faithful in one camp and the nay sayers and competition in the other camps. To hear both camps go at it with points, counter points, mud-slinging and lipstick slurs should be of no surprise when it comes vendor?s points and counter points. In fact the only thing missing from some of the discussions or excuse me, debates is the impromptu appearance on-stage by either Senators Bidden, Clinton, McCain or Obama or Governor Palin to weigh in on the issues, after all, it is the 2008 edition of storage in an election year here in the United States.

Rather than jump on the bashing XIV bandwagon which about everyone in the industry is now doing except for, the proponents or, folks taking a step back looking at the bigger non-partisan picture like Steve Duplessie the genesis billionaire founder of ESG and probably the future owner of the New England Patriots (American) Football team whose valuation may have dripped enough for Steve to buy now that their start quarterback Tom Brady is out with a leg injury that will take longer to rebuild than all the RAID 6 configured 1 TByte SATA disk drives in 3PAR, Dell, EMC, HGST, HP, IBM, NetApp, Seagate, Sun and Western Digital as well as many other vendors test labs combined. As for the proponents or faithful, in the spirit of providing freedom of choice and flexible options, the cool-aid comes in both XIV orange as well as traditional IBM XIV blue, nuff said.

In my opinion, which is just that, an opinion, XIV is going to help and may have already done so for IBMs storage business not from the technical architecture or product capabilities or even in the number of units that IBM might eventually sell bundled or un-bundled. Rather, XIV is getting IBM exposure and coverage to be able to sit at the table with some re-invigorated spirit to tell the customer what IBM is doing and if they pay attention, in-between slide decks, grasp the orders for upgrades, expansion or new installs for the existing IBM storage product line, then continue on with their pitch until the customer asks to place another upgraded or expansion order, then quickly grab that order, then continue on with the presentation while touching lightly on the products IBM customers continue to buy and looking to upgrade including:

IBM disk
IBM tape – tape and virtual tape
DS8000 – Mainframe and open systems storage
DS5000 – New version of DS4000 to compete with new EMC CLARiiON CX4s
DS4000 ? aka the Array formerly known as the FastT
DS3000 – Entry level iSCSI, SAS and FC storage
NetApp based N-Series – For NAS windows CIFS and NFS file sharing
DR550 archiving solution
SAN Volume Controller-SVC

Not to mention other niche products such as the Data Direct Networks-DDN based DCS9550 or IBM developed DS6000 or recently acquired Diligent VTL and de-duping software.

IBM will be successful with XIV not by how many systems they sell or give away, oh, excuse me, add value to other solutions. How IBM should be gauging XIV success is based on increased sales of their other storage systems and associated software and networking technologies including the mainframe attachable DS8000, the new high performance midrange DS5000 that builds on the success of the DS4000, all of which should have both Brocade and Cisco salivating given their performance need for more Fibre Channel (and FICON for DS8000) 4GFC and 8GFC Fibre Channel ports, switches, adapters and directors. Then there is the netapp based N series for NAS and file serving to support unstructured data including Web and social networking.

If I were Brocade, Cisco, NetApp or any of the other many IBM suppliers, I would be putting solution bundles together certainly to ride the XIV wave, however have solution bundles ready to play to the collateral impact of all the other IBM storage products getting coverage. For example sure Brocade and Cisco will want to talk about more Fibre Channel and iSCSI switch ports for the XIV, however, also talk performance to be able to unleash the capabilities of the DS8000 and DS5000, or, file management tools for the N-Series as well as bundles around the archiving DR550 solution.

The N-Series NAS gateway that could be used in theory to dress up XIV and actually make it usable for NAS file serving, file sharing and Web 2.0 related applications or unstructured data. There is the IBM SAN Volume Controller-SVC that virtualizes almost everything except the kitchen sink which may be in a future release. There is the DR550 archiving and compliance platform that not only provides RAID 6 protected energy-efficient storage, it also supports movement of data to tape, now if IBM could get the story out on that solution which maybe in the course of talking about XIV, IBM DR550 might get discovered as well. Of course there are all the other backup, archiving, data protection management and associated tools that will get pick-up and traction as well.

You see even if IBM quadruples the XIV footprint of revenue installed in production systems with 400% growth rates year over year, never mind that the nay-sayers that would only be about 1/20 or 1/50th of what Dell/EqualLogic, or LeftHand via HP/Intel or even IBM xseries not to mention all the others using IBRIX, HP/PolyServe, Isilon, 3PAR, Panasas, Permabit, NEC and the list goes on with similar clustered solutions have already done.

The point is watch for up-tick even if only 10% on the installed DS8000 or DS5000 (new) or DS4000 or DS3000 or N-Series (NetApp) or DR550 (the archive appliance IBM should talk more about), or SVC or the TS series VTLs.

Even a 1% jump due to IBM folks getting out and in front of customers and business partners, a 10% jump on the installed based of somewhere around 40,000 DS8000 (and earlier ESS versions) is 4,000 new systems, on the combined DS5000/DS4000/DS3000 formerly known as FasT with combined footprint of over 100,000 systems in the field, 10% would be 10,000 new systems. Take the SVC, with about 3,000 instances (or about 11,000 clustered nodes), 10% would mean another new 300 instances and continue this sort of improvement across the rest of the line and IBM will have paid for not only XIV and Moshe?s (former EMCer and founded of XIV and now IBM fellow) retirement fund.

IBM may be laughing to the big blue bank even after having enough money to finally buy a clustered NAS file system for Web 2.0 and bulk storage such as IBRIX before someone else like Dell, EMC or HP gets their hands on it. So while everyone else continues to bash how bad XIV is performing. Whether this is a by design strategy or one that IBM can simply fall into, it could be brilliant if played out and well executed however only time will tell.

If those who want to rip on xiv really want to inflict damage, cease and ignore XIV for what it is or is not and find something else to talk about and rest assured, if there are other good stories, they will get covered and xiv will be ignored.

Instead of ripping on XIV, or listening to more XIV hype, I’m going fishing and maybe will come back with a fish story to rival the XIV hype, in the meantime, look I forward to seeing the IBM success for their storage business as a whole due to the opportunity for IBMers and their partners getting excited to go and talk about storage and being surprised by their customers giving them orders for other IBM products, that is unless the IBM revenue prevention department gets in the way. For example if IBMers or their partners in the excitement of the XIV moment forget to sell to customers what customers want, and will buy today or are ready to buy and grab the low hanging fruit (sales orders for upgrades and new sales) of current and recently enhanced products while trying to reprogram and re-condition customers to the XIV story.

Congratulations to IBM and their partners as well as OEM suppliers if they can collective pull the ruse off and actually stimulate total storage sales while XIV becomes a decoy and maybe even gets a few more installs and some revenue to help prop it up as a decoy.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Hot Storage Topics Converge on Chicago Next Week

Storage I/O trends

Next week in Chicago (May 12th) at Storage Strategies, the event for channel professional held the evening before StorageDecisions I will be talking about Hot Storage Topics for 2008 including addressing data protection for virtual environments, power cooling floor space environmental (PCFE) aka green items and the “Green Gap”, data footprint reduction for both on-line active and changing data using real-time data compression, archiving for in-active or dormant data and de-dupe for backup data. Also on the list of hot topics will be clustered NAS and clustered storage for Web 2.0 along with other timely and relevant items.

At the StorageDecisions event, I will be talking about ?Green and Environmental Friendly Storage? Tuesday morning May 13th in the presentation ?Practical Ways to Achieve Energy Efficiency – Power, Cooling, Floor-Space and Environmental (PCFE) Issues and Trends? looking at different issues including the ?Green Gap? or disconnect between messaging and common IT data center issues along with various options to boost efficiency for both active and in-active data and storage resources.

Also while at StorageDecisions next week, on Wednesday the 14th I will be talking about clustered storage including clustered NAS in the session ?Clustered Storage – ?From SMB, to Scientific, to File Serving, to Commercial, Social Networking and Web 2.0?. Given some recent vendor technology announcements and statements of direction, Web 2.0 and unstructured data are gaining popularity as are the confusing options or different types of clustered storage solutions including ?Cluster Wanna Bee?s?. If you are in Chicago next week, stop in and check out the event and if you can attend any of my sessions, stop by and say hello.

Cheers
GS