EMC Storage and Management Software Getting FAST

EMC has announced the availability of the first phase of FAST (Fully Automated Storage Tiering) functionality for their Symmetrix VMAX, CLARiiON and Celerra storage systems.

FAST was first previewed earlier this year (see here and here).

Key themes of FAST are to leverage policies for enabling automation to support large scale environments, doing more with what you have along with enabling virtual data centers for traditional, private and public clouds as well as enhancing IT economics.

This means enabling performance and capacity planning analysis along with facilitating load balancing or other infrastructure optimization activities to boost productivity, efficiency and resource usage effectiveness not to mention enabling Green IT.

Is FAST revolutionary? That will depend on who you talk or listen to.

Some vendors will jump up and down similar to donkey in shrek wanting to be picked or noticed claiming to have been the first to implement LUN or file movement inside of storage systems, or, as operating system or file system or volume manager built in. Others will claim to have done it via third party information lifecycle management (ILM) software including hierarchal storage management (HSM) tools among others. Ok, fair enough, than let their games begin (or continue) and I will leave it up to the variou vendors and their followings to debate whos got what or not.

BTW, anyone remember system manage storage on IBM mainframes or array based movement in HP AutoRAID among others?

Vendors have also in the past provided built in or third party add on tools for providing insight and awareness ranging from capacity or space usage and allocation storage resource management (SRM) tools, performance advisory activity monitors or charge back among others. For example, hot files analysis and reporting tool have been popular in the past, often operating system specific for identifying candidate files for placement on SSD or other fast storage. Granted the tools provided insight and awareness, there was still the time and error prone task of decision making and subsequently data movement, not to mention associated down time.

What is new here with FAST is the integrated approach, tools that are operating system independent, functionality in the array, available for different product family and price bands as well as that are optimized for improving user and IT productivity in medium to high-end enterprise scale environments.

One of the knocks on previous technology is either the performance impact to an application when data was moved, or, impact to other applications when data is being moved in the background. Another issue has been avoiding excessive thrashing due to data being moved at the expense of taking performance cycles from production applications. This would also be similar to having too many snapshots or raid rebuild that are not optimized running in the background on a storage system lacking sufficient performance capability. Another knock has been that historically, either 3rd party host or appliance based software was needed, or, solutions were designed and targeted for workgroup, departmental or small environments.

What is FAST and how is it implemented
FAST is technology for moving data within storage systems (and external for Celerra) for load balancing, capacity and performance optimization to meet quality of service (QoS) performance, availability, capacity along with energy and economic initiatives (figure1) across different tiers or types of storage devices. For example, moving data from slower SATA disks where a performance bottleneck exists to faster Fibre Channel or SSD devices. Similarly, cold or infrequently data on faster more expensive storage devices can be marked as candidates for migration to lower cost SATA devices based on customer policies.

EMC FAST
Figure 1 FAST big picture Source EMC

The premise is that policies are defined based on activity along with capacity to determine when data becomes a candidate for movement. All movement is performed in the background concurrently while applications are accessing data without disruptions. This means that there are no stub files or application pause or timeouts that occur or erratic I/O activity while data is being migrated. Another aspect of FAST data movement which is performed in the actual storage systems by their respective controllers is the ability for EMC management tools to identify hot or active LUNs or volumes (files in the case of Celerra) as candidates for moving (figure 2).

EMC FAST
Figure 2 FAST what it does Source EMC

However, users specify if they want data moved on its own or under supervision enabling a deterministic environment where the storage system and associated management tools makes recommendations and suggestions for administrators to approve before migration occurs. This capacity can be a safeguard as well as a learn mode enabling organizations to become comfortable with the technology along with its recommendations while applying knowledge of current business dynamics (figure 3).

EMC FAST
Figure 3 The Value proposition of FAST Source EMC

FAST is implemented as technology resident or embedded in the EMC VMAX (aka Symmetrix), CLARiiON and Cellera along with external management software tools. In the case of the block (figure 4) storage systems including DMX/VMAX and CLARiiON family of products that support FAST, data movement is on a LUN or volume basis and within a single storage system. For NAS or file based Cellera storage systems, FAST is implanted using FMA technology enabling either in the box or externally to other storage systems on a file basis.

EMC FAST
Figure 4 Example of FAST activity Source EMC

What this means is that data at the LUN or volume level can be moved across different tiers of storage or disk drives within a CLARiiON instance, or, within a VMAX instance (e.g. amongst the nodes). For example, Virtual LUNs are a building block that is leveraged for data movement and migration combined with external management tools including Navisphere for the CLARiiON and Symmetrix management console along with Ionix all of which has been enhanced.

Note however that initially data is not moved externally between different CLARiiONs or VMAX systems. For external data movement, other existing EMC tools would be deployed. In the case of Celerra, files can be moved within a specific CLARiiON as well as externally across other storage systems. External storage systems that files can be moved across using EMC FMA technology includes other Celleras, Centera and ATMOS solutions based upon defined policies.

What do I like most and why?

Integration of management tools providing insight with ability for user to setup polices as well as approve or intercede with data movement and placement as their specific philosophies dictate. This is key, for those who want to, let the system manage it self with your supervision of course. For those who prefer to take their time, then take simple steps by using the solution for initially providing insight into hot or cold spots and then helping to make decisions on what changes to make. Use the solution and adapt it to your specific environment and philosophy approach, what a concept, a tool that works for you, vs you working for it.

What dont I like and why?

There is and will remain some confusion about intra and inter box or system data movement and migration, operations that can be done by other EMC technology today for those who need it. For example I have had questions asking if FAST is nothing more than EMC Invista or some other data mover appliance sitting in front of Symmetrix or CLARiiONs and the answer is NO. Thus EMC will need to articulate that FAST is both an umbrella term as well as a product feature set combining the storage system along with associated management tools unique to each of the different storage systems. In addition, there will be confusion at least with GA of lack of support for Symmetrix DMX vs supported VMAX. Of course with EMC pricing is always a question so lets see how this plays out in the market with customer acceptance.

What about the others?

Certainly some will jump up and down claiming ratification of their visions welcoming EMC to the game while forgetting that there were others before them. However, it can also be said that EMC like others who have had LUN and volume movement or cloning capabilities for large scale solutions are taking the next step. Thus I would expect other vendors to continue movement in the same direction with their own unique spin and approach. For others who have in the past made automated tiering their marketing differentiation, I would suggest they come up with some new spins and stories as those functions are about to become table stakes or common feature functionality on a go forward basis.

When and where to use?

In theory, anyone with a Symmetrix/VMAX, CLARiiON or Celerra that supports the new functionality should be a candidate for the capabilities, that is, at least the insight, analysis, monitoring and situation awareness capabilities Note that does not mean actually enabling the automated movement initially.

While the concept is to enable automated system managed storage (Hmmm, Mainframe DejaVu anyone), for those who want to walk before they run, enabling the insight and awareness capabilities can provide valuable information about how resources are being used. The next step would then to look at the recommendations of the tools, and if you concur with the recommendations, then take remedial action by telling the system when the movement can occur at your desired time.

For those ready to run, then let it rip and take off as FAST as you want. In either situation, look at FAST for providing insight and situational awareness of hot and cold storage, where opportunities exist for optimizing and gaining efficiency in how resources are used, all important aspects for enabling a Green and Virtual Data Center not to mention as well as supporting public and private clouds.

FYI, FTC Disclosure and FWIW

I have done content related projects for EMC in the past (see here), they are not currently a client nor have they sponsored, underwritten, influenced, renumerated, utilize third party off shore swiss, cayman or south american unnumbered bank accounts, or provided any other reimbursement for this post, however I did personally sign and hand to Joe Tucci a copy of my book The Green and Virtual Data Center (CRC) ;).

Bottom line

Do I like what EMC is doing with FAST and this approach? Yes.

Do I think there is room for improvement and additional enhancements? Absolutely!

Whats my recommendation? Have a look, do your homework, due diligence and see if its applicable to your environment while asking others vendors what they will be doing (under NDA if needed).

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2026 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Clarifying Clustered Storage Confusion

Clustered storage can be iSCSI, Fibre Channel block based or NAS (NFS or CIFS or proprietary file system) file system based. Clustered storage can also be found in virtual tape library (VTL) including dedupe solutions along with other storage solutions such as those for archiving, cloud, medical or other specialized grids among others.

Recently in the IT and data storage specific industry, there has been a flurry of merger and acquisition (M&A) (Here and here), new product enhancement or announcement activity around clustered storage. For example, HP buying clustered file system vendor IBRIX complimenting their previous acquisition of another clustered file system vendor (PolyServe) a few years ago, or, of iSCSI block clustered storage software vendor LeftHand earlier this year. Another recent acquisition is that of LSI buying clustered NAS vendor ONstor, not to mention Dell buying iSCSI block clustered storage vendor EqualLogic about a year and half ago, not to mention other vendor acquisitions or announcements involving storage and clustering.

Where the confusion enters into play is the term cluster which means many things to different people, and even more so when clustered storage is combined with NAS or file based storage. For example, clustered NAS may infer a clustered file system when in reality a solution may only be multiple NAS filers, NAS heads, controllers or storage processors configured for availability or failover.

What this means is that a NFS or CIFS file system may only be active on one node at a time, however in the event of a failover, the file system shifts from one NAS hardware device (e.g. NAS head or filer) to another. On the other hand, a clustered file system enables a NFS or CIFS or other file system to be active on multiple nodes (e.g. NAS heads, controllers, etc.) concurrently. The concurrent access may be for small random reads and writes for example supporting a popular website or file serving application, or, it may be for parallel reads or writes to a large sequential file.

Clustered storage is no longer exclusive to the confines of high-performance sequential and parallel scientific computing or ultra large environments. Small files and I/O (read or write), including meta-data information, are also being supported by a new generation of multipurpose, flexible, clustered storage solutions that can be tailored to support different applications workloads.

There are many different types of clustered and bulk storage systems. Clustered storage solutions may be block (iSCSI or Fibre Channel), NAS or file serving, virtual tape library (VTL), or archiving and object-or content-addressable storage. Clustered storage in general is similar to using clustered servers, providing scale beyond the limits of a single traditional system—scale for performance, scale for availability, and scale for capacity and to enable growth in a modular fashion, adding performance and intelligence capabilities along with capacity.

For smaller environments, clustered storage enables modular pay-as-you-grow capabilities to address specific performance or capacity needs. For larger environments, clustered storage enables growth beyond the limits of a single storage system to meet performance, capacity, or availability needs.

Applications that lend themselves to clustered and bulk storage solutions include:

  • Unstructured data files, including spreadsheets, PDFs, slide decks, and other documents
  • Email systems, including Microsoft Exchange Personal (.PST) files stored on file servers
  • Users’ home directories and online file storage for documents and multimedia
  • Web-based managed service providers for online data storage, backup, and restore
  • Rich media data delivery, hosting, and social networking Internet sites
  • Media and entertainment creation, including animation rendering and post processing
  • High-performance databases such as Oracle with NFS direct I/O
  • Financial services and telecommunications, transportation, logistics, and manufacturing
  • Project-oriented development, simulation, and energy exploration
  • Low-cost, high-performance caching for transient and look-up or reference data
  • Real-time performance including fraud detection and electronic surveillance
  • Life sciences, chemical research, and computer-aided design

Clustered storage solutions go beyond meeting the basic requirements of supporting large sequential parallel or concurrent file access. Clustered storage systems can also support random access of small files for highly concurrent online and other applications. Scalable and flexible clustered file servers that leverage commonly deployed servers, networking, and storage technologies are well suited for new and emerging applications, including bulk storage of online unstructured data, cloud services, and multimedia, where extreme scaling of performance (IOPS or bandwidth), low latency, storage capacity, and flexibility at a low cost are needed.

The bandwidth-intensive and parallel-access performance characteristics associated with clustered storage are generally known; what is not so commonly known is the breakthrough to support small and random IOPS associated with database, email, general-purpose file serving, home directories, and meta-data look-up (Figure 1). Note that a clustered storage system, and in particular, a clustered NAS may or may not include a clustered file system.

Clustered Storage Model: Source The Green and Virtual Data Center (CRC)
Figure 1 – Generic clustered storage model (Courtesy “The Green and Virtual Data Center  (CRC)”

More nodes, ports, memory, and disks do not guarantee more performance for applications. Performance depends on how those resources are deployed and how the storage management software enables those resources to avoid bottlenecks. For some clustered NAS and storage systems, more nodes are required to compensate for overhead or performance congestion when processing diverse application workloads. Other things to consider include support for industry-standard interfaces, protocols, and technologies.

Scalable and flexible clustered file server and storage systems provide the potential to leverage the inherent processing capabilities of constantly improving underlying hardware platforms. For example, software-based clustered storage systems that do not rely on proprietary hardware can be deployed on industry-standard high-density servers and blade centers and utilizes third-party internal or external storage.

Clustered storage is no longer exclusive to niche applications or scientific and high-performance computing environments. Organizations of all sizes can benefit from ultra scalable, flexible, clustered NAS storage that supports application performance needs from small random I/O to meta-data lookup and large-stream sequential I/O that scales with stability to grow with business and application needs.

Additional considerations for clustered NAS storage solutions include the following.

  • Can memory, processors, and I/O devices be varied to meet application needs?
  • Is there support for large file systems supporting many small files as well as large files?
  • What is the performance for small random IOPS and bandwidth for large sequential I/O?
  • How is performance enabled across different application in the same cluster instance?
  • Are I/O requests, including meta-data look-up, funneled through a single node?
  • How does a solution scale as the number of nodes and storage devices is increased?
  • How disruptive and time-consuming is adding new or replacing existing storage?
  • Is proprietary hardware needed, or can industry-standard servers and storage be used?
  • What data management features, including load balancing and data protection, exists?
  • What storage interface can be used: SAS, SATA, iSCSI, or Fibre Channel?
  • What types of storage devices are supported: SSD, SAS, Fibre Channel, or SATA disks?

As with most storage systems, it is not the total number of hard disk drives (HDDs), the quantity and speed of tiered-access I/O connectivity, the types and speeds of the processors, or even the amount of cache memory that determines performance. The performance differentiator is how a manufacturer combines the various components to create a solution that delivers a given level of performance with lower power consumption.

To avoid performance surprises, be leery of performance claims based solely on speed and quantity of HDDs or the speed and number of ports, processors and memory. How the resources are deployed and how the storage management software enables those resources to avoid bottlenecks are more important. For some clustered NAS and storage systems, more nodes are required to compensate for overhead or performance congestion.

Learn more about clustered storage (block, file, VTL/dedupe, archive), clustered NAS, clustered file system, grids and cloud storage among other topics in the following links:

"The Many faces of NAS – Which is appropriate for you?"

Article: Clarifying Storage Cluster Confusion
Presentation: Clustered Storage: “From SMB, to Scientific, to File Serving, to Commercial, Social Networking and Web 2.0”
Video Interview: How to Scale Data Storage Systems with Clustering
Guidelines for controlling clustering
The benefits of clustered storage

Along with other material on the StorageIO Tips and Tools or portfolio archive or events pages.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2026 Server StorageIO and UnlimitedIO LLC All Rights Reserved