What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

Updated 10/9/2018

What is DFR or Data Footprint Reduction?

Data Footprint Reduction (DFR) is a collection of techniques, technologies, tools and best practices that are used to address data growth management challenges. Dedupe is currently the industry darling for DFR particularly in the scope or context of backup or other repetitive data.

However DFR expands the scope of expanding data footprints and their impact to cover primary, secondary along with offline data that ranges from high performance to inactive high capacity.

Consequently the focus of DFR is not just on reduction ratios, its also about meeting time or performance rates and data protection windows.

This means DFR is about using the right tool for the task at hand to effectively meet business needs, and cost objectives while meeting service requirements across all applications.

Examples of DFR technologies include Archiving, Compression, Dedupe, Data Management and Thin Provisioning among others.

Read more about DFR in Part I and Part II of a two part series found here and here.

Where to learn more

Learn more about data footprint reducton (DFR), data footprint overhead and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

That is all for now, hope you find these ongoing series of current or emerging Industry Trends and Perspectives posts of interest.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

August 2010 StorageIO News Letter

StorageIO News Letter Image
August 2010 Newsletter

Welcome to the August Summer Wrap Up 2010 edition of the Server and StorageIO Group (StorageIO) newsletter. This follows the June 2010 edition building on the great feedback received from recipients.
Items that are new in this expanded edition include:

  • Out and About Update
  • Industry Trends and Perspectives (ITP)
  • Featured Article

You can access this news letter via various social media venues (some are shown below) in addition to StorageIO web sites and subscriptions. Click on the following links to view the August 2010 edition as an HTML or PDF or, to go to the newsletter page to view previous editions.

Follow via Goggle Feedburner here or via email subscription here.

You can also subscribe to the news letter by simply sending an email to newsletter@storageio.com

Enjoy this edition of the StorageIO newsletter, let me know your comments and feedback.

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

My Favorite Late Summer Reading Material

No it is not the Tape Times, or the Oracle Sun times, or IBM Magic Moments, or EMC Money Magazine, nor is it the Oracle Law Journal review. Sorry to say that it is not the Dedupe Discovery Debate Diaries, nor is it the Virtual Vanity Fair or NetApp Networking News.

My favorite late summer reading is not the eDiscovery Entertainment this week, or Mens Metadata Monthly and it is not the Cisco Chronicles let alone the HP national inquirer Pages.

No my favorite late summer reading is not Business Barons, NFL weekly wrap up nor Virtualization Hyperventilation Health tips. Neither is it the editorials, advertisements or cheerleading sections in the Cloud Crowd Confusion Chronicles, nor is it million miler monthly and it is not Green IT Eggs and Spam. While all good reads, it is not Wine Snob Weekly, or the Great Grape Gazette or Beer Brewers News, Minnesota DNR news, Virtual Motor head Monthly, or Freshwater Dock Yachting Yearly review, Aviation Leak and Space Technology nor Rolling Stone.

It is also not one of the local news papers or national ones for that matter although the Singapore Shipping Times is a good diversion read that reminds me of my past visits there.

While I would like to say it is one of the many popular blogs (industry or other), let alone one of the many great books out there in print or kindle, no, it is something completely different.

Granted all of the above or their virtual reality physical variant are in fact great reading material that I enjoy and do recommend (or their reasonable facsimile).

However, there is one that stands out above all others and it is called Cooks Illustrated (FTC disclosure, my wife gave me a subscription).

Is there a point to all of the above which if you could not tell, includes some tongue in cheek humor, perhaps what some might see as, skepticism or snarkyness while others might have a good laugh (to each your own)?

Yes the point is this.

Take a break from your normal wide world of work routine, stop typing or talking for a bit, sit back, maybe put some tunes on and read something to stimulate (as well as relax) the brain for a bit.

Find and enjoy some recreational or diversion reading material no matter how light or heavy, humor or serious, perhaps listen to some music and enjoy a cold (or warm) beverage perhaps even drifting into a drool producing nap. Enjoy the balance of your summer (or winter for friends down under) and take some time to read something to stimulate that gray matter between the ears located slightly behind your eyes.

Ok, now Im hungry have to go.

BTW: What is your favorite late summer reading material (and/or relaxation activity, music, food or beverage)?

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Back to school shopping: Dude, Dell Digests 3PAR Disk storage

Dell

No sooner has the dust settled from Dells other recent acquisitions, its back to school shopping time and the latest bargain for the Round Rock Texas folks is bay (San Francisco) area storage vendor 3PAR for $1.15B. As a refresh, some of Dells more recent acquisitions including a few years ago $1.4B for EqualLogic, $3.9B for Perot systems not to mention Exanet, Kace and Ocarina earlier this year. For those interested, as of April 2010 reporting figures found here, Dell showed about $10B USD in cash and here is financial information on publicly held 3PAR (PAR).

Who is 3PAR
3PAR is a publicly traded company (PAR) that makes a scalable or clustered storage system with many built in advanced features typically associated with high end EMC DMX and VMAX as well as CLARiiON, in addition to Hitachi or HP or IBM enterprise class solutions. The Inserv (3PARs storage solution) combines hardware and software providing a very scalable solution that can be configured for smaller environments or larger enterprise by varying the number of controllers or processing nodes, connectivity (server attachment) ports, cache and disk drives.

Unlike EqualLogic which is more of a mid market iSCSI only storage system, the 3PAR Inserv is capable of going head to head with the EMC CLARiiON as well as DMC or VMAX systems that support a mix of iSCSI and Fibre Channel or NAS via gateway or appliances. Thus while there were occasional competitive situations between 3PAR and Dell EqualLogic, they for the most part were targeted at different market sectors or customers deployment scenarios.

What does Dell get with 3PAR?

  • A good deal if not a bargain on one of the last new storage startup pure plays
  • A public company that is actually generating revenue with a large and growing installed base
  • A seasoned sales force who knows how to sell into the enterprise storage space against EMC, HP, IBM, Oracle/SUN, Netapp and others
  • A solution that can scale in terms of functionality, connectivity, performance, availability, capacity and energy efficiency (PACE)
  • Potential route to new markets where 3PAR has had success, or to bridge gaps where both have played and competed in the past
  • Did I say a company with an established footprint of installed 3PAR Inserv storage systems and good list of marquee customers
  • Ability to sell a solution that they own the intellectual property (IP) instead of that of partner EMC
  • Plenty of IP that can be leveraged within other Dell solutions, not to mention combine 3PAR with other recently acquired technologies or companies.

On a lighter note, Dell picks up once again Marc Farley who was with them briefly after the EqualLogic acquisition who then departed to 3PAR where he became director of social media including launch of Infosmack on Storage Monkeys with co host Greg Knieriemen (@Knieriemen). Of course the twitter world and traditional coconut wires are now speculating where Farley will go next that Dell may end up buying in the future.

What does this mean for Dell and their data storage portfolio?
While in no ways all inclusive or comprehensive, table 1 provides a rough framework of different price bands, categories, tiers and market or application segments requiring various types of storage solutions where Dell can sell into.

 

HP

Dell

EMC

IBM

Oracle/Sun

Servers

Blade systems, rack mount, towers to desktop

Blade systems, rack mount, towers to desktop

Virtual servers with VMware, servers via vBlock servers via Cisco

Blade systems, rack mount, towers to desktop

Blade systems, rack mount, towers to desktop

Services

HP managed services, consulting and hosting supplemented by EDS acquisition

Bought Perot systems (an EDS spin off/out)

Partnered with various organizations and services

Has been doing smaller acquisitions adding tools and capabilities to IBM global services

Large internal consulting and services as well as Software as a Service (SaaS) hosting, partnered with others

Enterprise storage

XP (FC, iSCSI, FICON for mainframe and NAS with gateway) which is OEMed from Hitachi Japan parent of HDS

3PAR (iSCSI and FICON or NAS with gateway) replaces EMC CLARiiON or perhaps rare DMX/VMAX at high end?

DMX and VMAX

DS8000

Sun resold HDS version of XP/USP however Oracle has since dropped it from lineup

Data footprint impact reduction

Dedupe on VTL via Sepaton plus HP developed technology or OEMed products

Dedupe in OEM or partner software or hardware solutions, recently acquired Ocarina

Dedupe in Avamar, Datadomain, Networker, Celerra, Centera, Atmos. CLARiiON and Celerra compression

Dedupe in various hardware and software solutions, source and target, compression with Storwize

Dedupe via OEM VTLs and other sun solutions

Data preservation

Database and other archive tools, archive storage

OEM solutions from EMC and others

Centera and other solutions

Various hardware and software solutions

Various hardware and software solutions

General data protection (excluding logical or physical security and DLP)

Internal Data Protector software plus OEM, partners with other software, various VTL, TL and target solutions as well as services

OEM and resell partner tools as well as Dell target devices and those of partners. Could this be a future acquisition target area?

Networker and Avamar software, Datadomain and other targets, DPA management tools and Mozy services

Tivoli suite of software and various hardware targets, management tools and cloud services

Various software and partners tools, tape libraries, VTLs and online storage solutions

Scale out, bulk, or clustered NAS

eXtreme scale out, bulk and clustered storage for unstructured data applications

Exanet on Dell servers with shared SAS, iSCSI or FC storage

Celerra and ATMOS

IBM SONAS or N series (OEM from NetApp)

ZFS based solutions including 7000 series

General purpose NAS

Various gateways for EVA or MSA or XP, HP IBRIX or Polyserve based as well as Microsoft WSS solutions

EMC Celerra, Dell Exanet, Microsoft WSS based. Acquisition or partner target area?

Celerra

N Series OEMed from Netapp as well as growing awareness of SONAS

ZFS based solutions. Whatever happened to Procom?

Mid market multi protocol block

EVA (FC with iSCSI or NAS gateways), LeftHand (P Series iSCSI) for lowered of this market

3PAR (FC and iSCSI, NAS with gateway) for mid to upper end of this market, EqualLogic (iSCSI) for the lower end of the market, some residual EMC CX activity phases out over time?

CLARiiON (FC and iSCSI with NAS via gateway), Some smaller DMX or VMAX configurations for mid to upper end of this market

DS5000, DS4000 (FC and iSCSI with NAS via a gateway) both OEMed from LSI, XIV and N series (Netapp)

7000 series (ZFS and Sun storage software running on Sun server with internal storage, optional external storage)

6000 series

Scalable SMB iSCSI

LeftHand (P Series)

EqualLogic

Celerra NX, CLARiiON AX/CX

XIV, DS3000, N Series

2000
7000

Entry level shared block

MSA2000 (iSCSI, FC, SAS)

MD3000 (iSCSI, FC, SAS)

AX (iSCSI, FC)

DS3000 (iSCSI, FC, SAS), N Series (iSCSI, FC, NAS)

2000
7000

Entry level unified multi function

X (not to be confused with eXtreme series) HP servers with Windows Storage Software

Dell servers with Windows Storage Software or EMC Celerra

Celerra NX, Iomega

xSeries servers with Microsoft or other software installed

ZFS based solutions running on Sun servers

Low end SOHO

X (not to be confused with eXtreme series) HP servers with Windows Storage Software

Dell servers with storage and Windows Storage Software. Future acqustion area perhaps?

Iomega

 

 

Table 1: Sampling of various tiers, architectures, functionality and storage solution options

Clarifying some of the above categories in table 1:

Servers: Application servers or computers running Windows, Linux, HyperV, VMware or other applications, operating systems and hypervisors.

Services: Professional and consulting services, installation, break fix repair, call center, hosting, managed services or cloud solutions

Enterprise storage: Large scale (hundreds to thousands of drives, many front end as well as back ports, multiple controllers or storage processing engines (nodes), large amount of cache and equally strong performance, feature rich functionality, resilient and scalable.

Data footprint impact reduction: Archive, data management, compression, dedupe, thin provision among other techniques. Read more here and here.

Data preservation: Archiving for compliance and non regulatory applications or data including software, hardware, services.

General data protection: Excluding physical or logical data security (firewalls, dlp, etc), this would be backup/restore with encryption, replication, snapshots, hardware and software to support BC, DR and normal business operations. Read more about data protection options for virtual and physical storage here.

Scale out NAS: Clustered NAS, bulk unstructured storage, cloud storage system or file system. Read more about clustered storage here. HP has their eXtreme X series of scale out and bulk storage systems as well as gateways. These leverage IBRIX and Polyserve which were bought by HP as software, or as a solution (HP servers, storage and software), perhaps with optional data reduction software such as Ocarina OEMed by Dell. Dell now has Exanet which they bought recently as software, or as a solution running on Dell servers, with either SAS, iSCSI or FC back end storage plus optional data footprint reduction software such as Ocarina. IBM has GPFS as a software solution running on IBM or other vendors servers with attached storage, or as a solution such as SONAS with IBM servers running software with IBM DS mid range storage. IBM also OEMs Netapp as the N series.

General purpose NAS: NAS (NFS and CIFS or optional AFP and pNFS) for everyday enterprise (or SME/SMB) file serving and sharing

Mid market multi protocol block: For SMB to SME environments that need scalable shared (SAN) scalable block storage using iSCSI, FC or FCoE

Scalable SMB iSCSI: For SMB to SME environments that need scalable iSCSI storage with feature rich functionality including built in virtualization

Entry level shared block: Block storage with flexibility to support iSCSI, SAS or Fibre Channel with optional NAS support built in or available via a gateway. For example external SAS RAID shared storage between 2 or more servers configured in a HyeprV or VMware clustered that do not need or can afford higher cost of iSCSI. Another example would be shared SAS (or iSCSI or Fibre Channel) storage attached to a server running storage software such as clustered file system (e.g. Exanet) or VTL, Dedupe, Backup, Archiving or data footprint reduction tools or perhaps database software where higher cost or complexity of an iSCSI or Fibre Channel SAN is not needed. Read more about external shared SAS here.

Entry level unified multifunction: This is storage that can do block and file yet is scaled down to meet ease of acquisition, ease of sale, channel friendly, simplified deployment and installation yet affordable for SMBs or larger SOHOs as well as ROBOs.

Low end SOHO: Storage that can scale down to consumer, prosumer or lower end of SMB (e.g. SOHO) providing mix of block and file, yet priced and positioned below higher price multifunction systems.

Wait a minute, are that too many different categories or types of storage?

Perhaps, however it also enables multiple tools (tiers of technologies) to be in a vendors tool box, or, in an IT professionals tool bin to address different challenges. Lets come back to this in a few moments.

 

Some Industry trends and perspectives (ITP) thoughts:

How can Dell with 3PAR be an enterprise play without IBM mainframe FICON support?
Some would say forget about it, mainframes are dead thus not a Dell objective even though EMC, HDS and IBM sell a ton of storage into those environments. However, fair enough argument and one that 3PAR has faced for years while competing with EMC, HDS, HP, IBM and Fujitsu thus they are versed in how to handle that discussion. Thus the 3PAR teams can help the Dell folks determine where to hunt and farm for business something that many of the Dell folks already know how to do. After all, today they have to flip the business to EMC or worse.

If truly pressured and in need, Dell could continue reference sales with EMC for DMX and VMAX. Likewise they could also go to Bustech and/or Luminex who have open systems to mainframe gateways (including VTL support) under a custom or special solution sale. Ironically EMC has OEMed in the past Bustech to transform their high end storage into Mainframe VTLs (not to be confused with Falconstor or Quantum for open system) as well as Datadomain partnered with Luminex.

BTW, did you know that Dell has had for several years a group or team that handles specialized storage solutions addressing needs outside the usual product portfolio?

Thus IMHO Dells enterprise class focus will be that for open systems large scale out where they will compete with EMC DMX and VMAX, HDS USP or their soon to be announced enhancements, HP and their Hitachi Japan OEMed XP, IBM and the DS8000 as well as the seldom heard about yet equally scalable Fujitsu Eternus systems.

 

Why only 1.15B, after all they paid 1.4B for EqualLogic?
IMHO, had this deal occurred a couple of years ago when some valuations were still flying higher than today, and 3PAR were at their current sales run rate, customer deployment situations, it is possible the amount would have been higher, either way, this is still a great value for both Dell and 3PAR investors, customers, employees and partners.

 

Does this mean Dell dumps EMC?
Near term I do not think Dell dumps the EMC dudes (or dudettes) as there is still plenty of business in the mid market for the two companies. However, over time, I would expect that Dell will unleash the 3PAR folks into the space where normally a CLARiiON CX would have been positioned such as deals just above where EqualLogic plays, or where Fibre Channel is preferred. Likewise, I would expect Dell to empower the 3PAR team to go after additional higher end deals where a DMX or VMAX would have been the previous option not to mention where 3PAR has had success.

This would also mean extending into sales against HP EVA and XPs, IBM DS5000 and DS8000 as well as XIV, Oracle/Sun 6000 and 7000s to name a few. In other words there will be some spin around coopition, however longer term you can read the writing on the wall. Oh, btw, lest you forget, Dell is first and foremost a server company who now is getting into storage in a much bigger way and EMC is first and foremost a storage company who is getting into severs via VMware as well as their Cisco partnerships.

Are shots being fired across each other bows? I will leave that up to you to speculate.

 

Does this mean Dell MD1000/MD3000 iSCSI, SAS and FC disappears?
I do not think so as they have had a specific role for entry level below where the EqualLogic iSCSI only solution fits providing mixed iSCSI, SAS and Fibre Channel capabilities to compete with the HP MSA2000 (OEMed by Dothill) and IBM DS3000 (OEMed from LSI). While 3PAR could be taken down into some of these markets, which would also potentially dilute the brand and thus premium margin of those solutions.

Likewise, there is a play with server vendors to attach shared SAS external storage to small 2 and 4 node clusters for VMware, HyperV, Exchange, SQL, SharePoint and other applications where iSCSI or Fibre Channel are to expensive or not needed or where NAS is not a fit. Another play for the shared external SAS attached is for attaching low cost storage to scale out clustered NAS or bulk storage where software such as Exanet runs on a Dell server. Take a closer look at how HP is supporting their scale out as well as IBM and Oracle among others. Sure you can find iSCSI or Fibre Channel or even NAS back end to file servers. However growing trend of using shared SAS.

 

Does Dell now have too many different storage systems and solutions in their portfolio?
Possibly depending upon how you look at it and certainly the potential is there for revenue prevention teams to get in the way of each other instead of competing with external competitors. However if you compare the Dell lineup with those of EMC, HP, IBM and Oracle/Sun among others, it is not all that different. Note that HP, IBM and Oracle also have something in common with Dell in that they are general IT resource providers (servers, storage, networks, services, hardware and software) as compared to other traditional storage vendors.

Consequently if you look at these vendors in terms of their different markets from consumer to prosumer to SOHO at the low end of the SMB to SME that sits between SMB and enterprise, they have diverse customer needs. Likewise, if you look at these vendors server offerings, they too are diverse ranging from desktops to floor standing towers to racks, high density racks and blade servers that also need various tiers, architectures, price bands and purposed storage functionality.

 

What will be key for Dell to make this all work?
The key for Dell will be similar to that of their competitors which is to clearly communicate the value proposition of the various products or solutions, where, who and what their target markets are and then execute on those plans. There will be overlap and conflict despite the best spin as is always the case with diverse portfolios by vendors.

However if Dell can keep their teams focused on expanding their customer footprints at the expense of their external competition vs. cannibalizing their own internal product lines, not to mention creating or extending into new markets or applications. Consequently Dell now has many tools in their tool box and thus need to educate their solution teams on what to use or sell when, where, why and how instead of just having one tool or a singular focus. In other words, while a great solution, Dell no longer has to respond with the solution to everything is iSCSI based EqualLogic.

Likewise Dell can leverage the same emotion and momentum behind the EqualLogic teams to invigorate and unleash the best with 3PAR teams and solution into or onto the higher end of the SMB, SME and enterprise environments.

Im still thinking that Exanet is a diamond in the rough for Dell where they can install the clustered scalable NAS software onto their servers and use either lower end shared SAS RAID (e.g. MD3000), or iSCSI (MD3000, EqualLogic or 3PAR) or higher end Fibre Channel with 3PAR) for scale out, cloud and other bulk solutions competing with HP, Oracle and IBM. Dell still has the Windows based storage server for entry level multi protocol block and file capabilities as well as what they OEM from EMC.

 

Is Dell done shopping?
IMHO I do not think so as there are still areas where Dell can extend their portfolio and not just in storage. Likewise there are still some opportunities or perhaps bargains out there for fall and beyond acquisitions.

 

Does this mean that Dell is not happy with EqualLogic and iSCSI
Simply put from my perspective talking with Dell customers, prospects, and partners and seeing them all in action nothing could be further from Dell not being happy with iSCSI or EqualLogic. Look at this as being a way to extend the Dell story and capabilities into new markets, granted the EqualLogic folks now have a new sibling to compete with internal marketing and management for love and attention.

 

Isnt Dell just an iSCSI focused company?
A couple of years I was quoted in one of the financial analysis reports as saying that Dell needed to remain open to various forms of storage instead of becoming singularly focused on just iSCSI as a result of the EqualLogic deal. I standby that statement in that Dell to be a strong enterprise contender needs to have a balanced portfolio across different price or market bands, from block to file, from shared SAS to iSCSI to Fibre Channel and emerging FCoE.

This also means supporting traditional NAS across those different price band or market sectors as well as support for emerging and fast growing unstructured data markets where there is a need for scale out and bulk storage. Thus it is great to see Dell remaining open minded and not becoming singularly focused on just iSCSI instead providing the right solution to meet their diverse customer as well as prospect needs or opportunities.

While EqualLogic was and is a very successfully iSCSI focused storage solution not to mention one that Dell continues to leverage, Dell is more than just iSCSI. Take a look at Dells current storage line up as well as up in table 1 and there is a lot of existing diversity. Granted some of that current diversity is via partners which the 3PAR deal helps to address. What this means is that iSCSI continues to grow in popularity however there are other needs where shared SAS or Fibre Channel or FCoE will be needed opening new markets to Dell.

 

Bottom line and wrap up (for now)
This is a great move for Dell (as well as 3PAR) to move up market in the storage space with less reliance on EMC. Assuming that Dell can communicate the what to use when, where, why and how to both their internal teams, partners as well as industry and customers not to mention then execute on, they should have themselves a winner.

Will this deal end up being an even better bargain than when Dell paid $1.4B for EqualLogic?

Not sure yet, it certainly has potential if Dell can execute on their plans without losing momentum in any other their other areas (products).

Whats your take?

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Here are some related links to read more

Data footprint reduction (Part 1): Life beyond dedupe and changing data lifecycles

Over the past couple of weeks there has been a flurry of IT industry activity around data footprint impact reduction with Dell buying Ocarina and IBM acquiring Storwize. For those who want the quick (compacted, reduced) synopsis of what Dell buying Ocarina as well as IBM acquiring Storwize means read this post here along with some of my comments here and here.

Now, before any Drs or Divas of Dedupe get concerned and feel the need to debate dedupes expanding role, success or applicability, relax, take a deep breath, then read on and take another breath before responding if so inclined.

The reason I mention this is that some may mistake this as a piece against or not in favor of dedupe as it talks about life beyond dedupe which could be mistaken as indicating dedupes diminished role which is not the case (read ahead and see figure 5 to see the bigger picture).

Likewise some might feel that since this piece talks about archiving for compliance and non regulatory situations along with compression, data management and other forms of data footprint reduction they may be compelled to defend dedupes honor and future role.

Again, relax, take a deep breath and read on, this is not about the death of dedupe.

Now for others, you might wonder why the dedupe tongue in check humor mentioned above (which is what it is) and the answer is quite simple. The industry in general is drunk on dedupe and in some cases thus having numbed its senses not to mention having blurred its vision of the even bigger opportunities for the business benefits of data footprint reduction beyond todays backup centric or vmware server virtualization dedupe discussions.

Likewise, it is time for the industry to wake (or sober) up and instead of trying to stuff everything under or into the narrowly focused dedupe bottle. Instead, realize that there is a broader umbrella called data footprint impact reduction which includes among other techniques, dedupe, archive, compression, data management, data deletion and thin provisioning across all types of data and applications. What this means is a broader opportunity or market than what exists or being discussed today leveraging different techniques, technologies and best practices.

Consequently this piece is about expanding the discussion to the larger opportunity for vendors or vars to extend their focus to the bigger world of overall data footprint impact reduction beyond where currently focused. Likewise, this is about IT customers realizing that there are more opportunities to address data and storage optimization across your entire organization using various techniques instead of just focusing on backup.

In other words, there is a very bright future for dedupe as well as other techniques and technologies that fall under the data footprint reduction umbrella including data stored online, offline, near line, primary, secondary, tertiary, virtual and in a public or private cloud..

Before going further however lets take a step back and look at some business along with IT issues, challenges and opportunities.

What is the business and IT issue or challenge?
Given that there is no such thing as a data or information recession shown in figure 1, IT organizations of all size are faced with the constant demand to store more data, including multiple copies of the same or similar data, for longer periods of time.


Figure 1: IT resource demand growth continues

The result is an expanding data footprint, increased IT expenses, both capital and operational, due to additional Infrastructure Resource Management (IRM) activities to sustain given levels of application Quality of Service (QoS) delivery shown in figure 2.

Some common IT costs associated with supporting an increased data footprint include among others:

  • Data storage hardware and management software tools acquisition
  • Associated networking or IO connectivity hardware, software and services
  • Recurring maintenance and software renewal fees
  • Facilities fees for floor space, power and cooling along with IT staffing
  • Physical and logical security for data and IT resources
  • Data protection for HA, BC or DR including backup, replication and archiving


Figure 2: IT Resources and cost balancing conflicts and opportunities

Figure 2 shows the result is that IT organizations of all size are faced with having to do more with what they have or with less including maximizing available resources. In addition, IT organizations often have to overcome common footprint constraints (available power, cooling, floor space, server, storage and networking resources, management, budgets, and IT staffing) while supporting business growth.

Figure 2 also shows that to support demand, more resources are needed (real or virtual) in a denser footprint, while maintaining or enhancing QoS plus lowering per unit resource cost. The trick is improving on available resources while maintaining QoS in a cost effective manner. By comparison, traditionally if costs are reduced, one of the other curves (amount of resources or QoS) are often negatively impacted and vice versa. Meanwhile in other situations the result can be moving problems around that later resurface elsewhere. Instead, find, identify, diagnose and prescribe the applicable treatment or form of data footprint reduction or other IT IRM technology, technique or best practices to cure the ailment.

What is driving the expanding data footprint?
Granted more data can be stored in the same or smaller physical footprint than in the past, thus requiring less power and cooling per Gbyte, Tbyte or PByte. Data growth rates necessary to sustain business activity, enhanced IT service delivery and enable new applications are placing continued demands to move, protect, preserve, store and serve data for longer periods of time.

The popularity of rich media and Internet based applications has resulted in explosive growth of unstructured file data requiring new and more scalable storage solutions. Unstructured data includes spreadsheets, Power Point, slide decks, Adobe PDF and word documents, web pages, video and audio JPEG, MP3 and MP4 files. This trend towards increasing data storage requirements does not appear to be slowing anytime soon for organizations of all sizes.

After all, there is no such thing as a data or information recession!

Changing data access lifecycles
Many strategies or marketing stories are built around the premise that shortly after data is created data is seldom, if ever accessed again. The traditional transactional model lends itself to what has become known as information lifecycle management (ILM) where data can and should be archived or moved to lower cost, lower performing, and high density storage or even deleted where possible.

Figure 3 shows as an example on the left side of the diagram the traditional transactional data lifecycle with data being created and then going dormant. The amount of dormant data will vary by the type and size of an organization along with application mix. 


Figure 3: Changing access and data lifecycle patterns

However, unlike the transactional data lifecycle models where data can be removed after a period of time, Web 2.0 and related data needs to remain online and readily accessible. Unlike traditional data lifecycles where data goes dormant after a period of time, on the right side of figure 3, data is created and then accessed on an intermittent basis with variable frequency. The frequency between periods of inactivity could be hours, days, weeks or months and, in some cases, there may be sustained periods of activity.

A common example is a video or some other content that gets created and posted to a web site or social networking site such as Face book, Linked in, or You Tube among others. Once the content is discussed, while it may not change, additional comment and collaborative data can be wrapped around the data as additional viewers discover and comment on the content. Solution approaches for the new category and data lifecycle model include low cost, relative good performing high capacity storage such as clustered bulk storage as well as leveraging different forms of data footprint reduction techniques.

Given that a large (and growing) percentage of new data is unstructured, NAS based storage solutions including clustered, bulk, cloud and managed service offerings with file based access are gaining in popularity. To reduce cost along with support increased business demands (figure 2), a growing trend is to utilize clustered, scale out and bulk NAS file systems that support NFS, CIFS for concurrent large and small IOs as well as optionally pNFS for large parallel access of files. These solutions are also increasingly being deployed with either built in or add on accessorized data footprint reduction techniques including archive, policy management, dedupe and compression among others.

What is your data footprint impact?
Your data footprint impact is the total data storage needed to support your various business application and information needs. Your data footprint may be larger than how much actual data storage you have as seen in figure 4. In Figure 4, an example is an organization that has 20TBytes of storage space allocated and being used for databases, email, home directories, shared documents, engineering documents, financial and other data in different formats (structured and unstructured) not to mention varying access patterns.


Figure 4: Expanding data footprint due to data proliferation and copies being retained

Of the 20TBytes of data allocated and used, it is very likely that the consumed storage space is not 100 percent used. Database tables may be sparsely (empty or not fully) allocated and there is likely duplicate data in email and other shared documents or folders. Additionally, of the 20TBytes, 10TBytes are duplicated to three different areas on a regular basis for application testing, training and business analysis and reporting purposes.

The overall data footprint is the total amount of data including all copies plus the additional storage required for supporting that data such as extra disks for Redundant Array of Independent Disks (RAID) protection or remote mirroring.

In this overly simplified example, the data footprint and subsequent storage requirement are several times that of the 20TBytes of data. Consequently, the larger the data footprint the more data storage capacity and performance bandwidth needed, not to mention being managed, protected and housed (powered, cooled, situated in a rack or cabinet on a floor somewhere).

Data footprint reduction techniques
While data storage capacity has become less expensive on a relative basis, as data footprint continue to expand in order to support business requirements, more IT resources will be needed to be made available in a cost effective, yet QoS satisfying manner (again, refer back to figure 2). What this means is that more IT resources including server, storage and networking capacity, management tools along with associated software licensing and IT staff time will be required to protect, preserve and serve information.

By more effectively managing the data footprint across different applications and tiers of storage, it is possible to enhance application service delivery and responsiveness as well as facilitate more timely data protection to meet compliance and business objectives. To realize the full benefits of data footprint reduction, look beyond backup and offline data improvements to include online and active data using various techniques such as those in table 1 among others.

There are several methods (shown in table 1) that can be used to address data footprint proliferation without compromising data protection or negatively impacting application and business service levels. These approaches include archiving of structured (database), semi structured (email) and unstructured (general files and documents), data compression (real time and offline) and data deduplication.

 

Archiving

Compression

Deduplication

When to use

Structured (database), email and unstructured

Online (database, email, file sharing), backup or archive

Backup or archiving or recurring and similar data

Characteristic

Software to identify and remove unused data from active storage devices

Reduce amount of data to be moved (transmitted) or stored on disk or tape.

Eliminate duplicate files or file content observed over a period of time to reduce data footprint

Examples

Database, email, unstructured file solutions with archive storage

Host software, disk or tape, (network routers) and compression appliances or software as well as appearing in some primary storage system solutions

Backup and archive target devices and Virtual Tape Libraries (VTLs), specialized appliances

Caveats

Time and knowledge to know what and when to archive and delete, data and application aware

Software based solutions require host CPU cycles impacting application performance

Works well in background mode for backup data to avoid performance impact during data ingestion

Table 1: Data footprint reduction approaches and techniques

Archiving for compliance and general data retention
Data archiving is often perceived as a solution for compliance, however, archiving can be used for many other non compliance purposes. These include general data footprint reduction, to boost performance and enhance routine data maintenance and data protection. Archiving can be applied to structured databases data, semi structured email data and attachments and unstructured file data.

A key to deploying an archiving solution is having insight into what data exists along with applicable rules and policies to determine what can be archived, for how long, how many copies and how data ultimately may be finally retired or deleted. Archiving requires a combination of hardware, software and people to implement business rules.

A challenge with archiving is having the time and tools available to identify what data should be archived and what data can be securely destroyed when no longer needed. Further complicating archiving is that knowledge of the data value is also needed; this may well include legal issues as to who is responsible for making decisions on what data to keep or discard.

If a business can invest in the time and software tools, as well as identify which data to archive to support an effective archive strategy, the returns can be very positive towards reducing the data footprint without limiting the amount of information available for use.

Data compression (real time and offline)
Data compression is a commonly used technique for reducing the size of data being stored or transmitted to improve network performance or reduce the amount of storage capacity needed for storing data. If you have used a traditional or TCP/IP based telephone or cell phone, watched either a DVD or HDTV, listened to an MP3, transferred data over the internet or used email you have most likely relied on some form of compression technology that is transparent to you. Some forms of compression are time delayed, such as using PKZIP to zip files, while others are real time or on the fly based such as when using a network, cell phone or listening to an MP3.

Two different approaches to data compression that vary in time delay or impact on application performance along with the amount of compression and loss of data are loss less (no data loss) and lossy (some data loss for higher compression ratio). In addition to these approaches, there are also different implementations of including real time for no performance impact to applications and time delayed where there is a performance impact to applications.

In contrast to traditional ZIP or offline, time delayed compression approaches that require complete decompression of data prior to modification, online compression allows for reading from, or writing to, any location within a compressed file without full file decompression and resulting application or time delay. Real time appliance or target based compression capabilities are well suited for supporting online applications including databases, OLTP, email, home directories, web sites and video streaming among others without consuming host server CPU or memory resources or degrading storage system performance.

Note that with the increase of CPU server processing performance along with multiple cores, server based compression running in applications such as database, email, file systems or operating systems can be a viable option for some environments.

A scenario for using real time data compression is for time sensitive applications that require large amounts of data such as online databases, video and audio media servers, web and analytic tools. For example, databases such as Oracle support NFS3 Direct IO (DIO) and Concurrent IO (CIO) capabilities to enable random and direct addressing of data within an NFS based file. This differs from traditional NFS operations where a file would be sequential read or written.

Another example of using real time compression is to combine a NAS file server configured with 300GB or 600GB high performance 15.5K Fibre Channel or SAS HDDs in addition to flash based SSDs to boost the effective storage capacity of active data without introducing a performance bottleneck associated with using larger capacity HDDs. Of course, compression would vary with the type of solution being deployed and type of data being stored just as dedupe ratios will differ depending on algorithm along with if text or video or object based among other factors.

Deduplication (Dedupe)
Data deduplication (also known as single instance storage, commonalty factoring, data difference or normalization) is a data footprint reduction technique that eliminates the occurrence of the same data. Deduplication works by normalizing the data being backed up or stored by eliminating recurring or duplicate copies of files or data blocks depending on the implementation.

Some data deduplication solutions boast spectacular ratios for data reduction given specific scenarios, such as backup of repetitive and similar files, while providing little value over a broader range of applications.

This is in contrast with traditional data compression approaches that provide lower, yet more predictable and consistent data reduction ratios over more types of data and application, including online and primary storage scenarios. For example, in environments where there is little to no common or repetitive data files, data deduplication will have little to no impact while data compression generally will yield some amount of data footprint reduction across almost all types of data.

Some data deduplication solution providers have either already added, or have announced plans to add, compression techniques to compliment and increase the data footprint effectiveness of their solutions across a broader range of applications and storage scenarios, attesting to the value and importance of data compression to reduce data footprint.

When looking at deduplication solutions, determine if the solution is designed to scale in terms of performance, capacity and availability over a large amount of data along with how restoration of data will be impacted by scaling for growth. Other items to consider include how data is reduplicated, such as real time using inline or some form of time delayed post processing, and the ability to select the mode of operation.

For example, a dedupe solution may be able to process data at a specific ingest rate inline until a certain threshold is hit and then processing reverts to post processing so as to not cause a performance degradation to the application writing data to the deduplication solution. The downside of post processing is that more storage is needed as a buffer. It can, however, also enable solutions to scale without becoming a bottleneck during data ingestion.

However, there is life beyond dedupe which is to in no way diminish dedupe or its very strong and bright future, one that Im increasingly convinced of having talked with hundreds of IT professionals (e.g. the customers) is that only the surface is being scratched for dedupe, not to mention larger data footprint impact opportunity seen in figure 5.


Figure 5: Dedupe adoption and deployment waves over time

While dedupe is a popular technology from a discussion standpoint and has good deployment traction, it is far from reaching mass customer adoption or even broad coverage in environments where it is being used. StorageIO research shows broadest adoption of dedupe centered around backup in smaller or SMB environments (dedupe deployment wave one in figure 5) with some deployment in Remote Office Branch Office (ROBO) work groups as well as departmental environments.

StorageIO research also shows that complete adoption in many of those SMB, ROBO, work group or smaller environments has yet to reach 100 percent. This means that there remains a large population that has yet to deploy dedupe as well as further opportunities to increase the level of dedupe deployment by those already doing so.

There has also been some early adoption in larger core IT environments where dedupe coexists with complimenting existing data protection and preservation practices. Another current deployment scenario for dedupe has been for supporting core edge deployments in larger environments that provide support for backup and data protection of ROBO, work group and departmental systems.

Note that figure 5 simply shows the general types of environments in which dedupe is being adopted and not any sort of indicators as to the degree of deployment by a given customer or IT environment.

What to do about your expanding data footprint impact?
Develop an overall data foot reduction strategy that leverages different techniques and technologies addressing online primary, secondary and offline data. Assess and discover what data exists and how it is used in order to effectively manage storage needs.

Determine policies and rules for retention and deletion of data combining archiving, compression (online and offline) and dedupe in a comprehensive data footprint strategy. The benefit of a broader, more holistic, data footprint reduction strategy is the ability to address the overall environment, including all applications that generate and use data as well as IRM or overhead functions that compound and impact the data footprint.

Data footprint reduction: life beyond (and complimenting) dedupe
The good news is that the Drs. and Divas of dedupe marketing (the ones who also are good at the disco dedupe dance debates) have targeted backup as an initial market sweet (and success) spot shown in figure 5 given the high degree of duplicate data.


Figure 6: Leverage multiple data footprint reduction techniques and technologies

However that same good news is bad news in that there is now a stigma that dedupe is only for backup, similar to how archive was hijacked by the compliance marketing folks in the post Y2K era. There are several techniques that can be used individually to address specific data footprint reduction issues or in combination as seen in figure 7 to implement a more cohesive and effective data footprint reduction strategy.


Figure 7: How various data footprint reduction techniques are complimentary

What this means is that both archive, dedupe as well as other forms of data footprint reduction can and should be used beyond where they have been target marketed using the applicable tool for the task at hand. For example, a common industry rule of thumb is that on average, ten percent of data changes per day (your mileage and rate of change will certainly vary given applications, environment and other factors).

Now assuming that you have 100TB (feel free to subtract a zero or two, or add as many as needed) of data (note I did not say storage capacity or percent utilized), ten percent change would be 10TB that needs to be backed up, replicated and so forth. Now with basic 2 to 1 streaming tape compression (2.5 to 1 in upcoming LTO enhancements) would reduce the daily backup footprint from 10TB to 5TB.

Using dedupe with 10 to 1 would get that from 10TB down to 1TB or about the size of a large capacity disk drive. With 20 to 1 that cuts the daily backup down to 500GB and so forth. The net effect is that more daily backups can be stored in the same footprint which in turn helps expedite individual file recover by having more options to choose from off of the disk based cache, buffer or storage pool.

On the other hand, if your objective is to reduce and eliminate storage capacity, then the same amount of backups can be stored on less disk freeing up resources. Now take the savings times the number of days in your backup retention and you should see the numbers start to add up.

Now what about the other 90 percent of the data that may not have changed, or, that did change and exists on higher performance storage?

Can its footprint impact be reduced?

The answer should be perhaps or it depends as well as prompts the question of what tool would be best. There is a popular thinking as is often the case with industry buzzwords or technologies to use it everywhere. After all goes the thinking, if it is a good thing why not use and deploy more of it everywhere?

Keep in mind that dedupe trades time to perform thinking and apply intelligence to further reduce data in exchange for space capacity. Thus trading time for space capacity can have a negative impact on applications that need lower response time, higher performance where the focus is on rates vs ratios. For example, the other 90 to 100 percent of the data in the above example may have to be on a mix of high and medium performance storage to meet QoS or service level agreement (SLA) objectives. While it would fun or perhaps cool to try and achieve a high data reduction ratio on the entire 100TB of active data with dedupe (e.g. trying to achieve primary dedupe), the performance impacts could have a negative impact.

The option is to apply a mix of different data footprint reduction techniques across the entire 100TB. That is, use dedupe where applicable and higher reduction ratios can be achieved while balancing performance, compression used for streaming data to tape for retention or archive as well as in databases or other applications software not to mention in networks. Likewise, use real time compression or what some refer to as primary dedupe for online active changing data along with online static read only data.

Deploy a comprehensive data footprint reduction strategy combining various techniques and technologies to address point solution needs as well as the overall environment, including online, near line for backup, and offline for archive data.

Lets not forget about archiving, thin provisioning, space saving snapshots, commonsense data management among other techniques across the entire environment. In other words, if your focus is just on dedupe for backup to
achieve an optimized and efficient storage environment, you are also missing

out on a larger opportunity. However, this also means having multiple tools or

technologies in your IT IRM toolbox as well as understanding what to use when, where and why.

Data transfer rates is a key metric for performance (time) optimization such as meeting backup or restore or other data protection windows. Data reduction ratios is a key metric for capacity (space) optimization where the focus is on storing as much data in a given footprint

Some additional take away points:

  • Develop a data footprint reduction strategy for online and offline data
  • Energy avoidance can be accomplished by powering down storage
  • Energy efficiency can be accomplished by using tiered storage to meet different needs
  • Measure and compare storage based on idle and active workload conditions
  • Storage efficiency metrics include IOPS or bandwidth per watt for active data
  • Storage capacity per watt per footprint and cost is a measure for in active data
  • Small percentage reductions on a large scale have big benefits
  • Align the applicable form of virtualization for the given task at hand

Some links for additional reading on the above and related topics

Wrap up (for now, read part II here)

For some applications reduction ratios are an important focus on the tools or modes of operations that achieve those results.

Likewise for other applications where the focus is on performance with some data reduction benefit, tools are optimized for performance first and reduction secondary.

Thus I expect messaging from some vendors to adjust (expand) to those capabilities that they have in their toolboxes (product portfolios) offerings

Consequently, IMHO some of the backup centric dedupe solutions may find themselves in niche roles in the future unless they can diversity. Vendors with multiple data footprint reduction tools will also do better than those with only a single function or focused tool.

However for those who only have a single or perhaps a couple of tools, well, guess what the approach and messaging will be.

After all, if all you have is a hammer everything looks like a nail, if all you have is a screw driver, well, you get the picture.

On the other hand, if you are still not clear on what all this means, send me a note, give a call, post a comment or a tweet and will be happy to discuss with you.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

July 2010 Odds and Ends: Perspectives, Tips and Articles

Here are some items that have been added to the main StorageIO website news, tips and articles, video podcast related pages that pertain to a variety of topics ranging from data storage, IO, networking, data centers, virtualization, Green IT, performance, metrics and more.

These content items include various odds and end pieces such as industry or technology commentary, articles, tips, ATEs (See additional ask the expert tips here) or FAQs as well as some video and podcasts for your mid summer (if in the northern hemisphere) enjoyment.

The New Green IT: Productivity, supporting growth, doing more with what you have

Energy efficient and money saving Green IT or storage optimization are often associated to mean things like MAID, Intelligent Power Management (IPM) for servers and storage disk drive spin down or data deduplication. In other words, technologies and techniques to minimize or avoid power consumption as well as subsequent cooling requirements which for some data, applications or environments can be the case. However there is also shifting from energy avoidance to that of being efficient, effective, productive not to mention profitable as forms of optimization. Collectively these various techniques and technologies help address or close the Green Gap and can reduce the amount of Green IT confusion in the form of boosting productivity (same goes for servers or networks) in terms of more work, IOPS, bandwidth, data moved, frames or packets, transactions, videos or email processed per watt per second (or other unit of time).

Click here to read and listen to my comments about boosting IOPs per watt, or here to learn more about the many facets of energy efficient storage and here on different aspects of storage optimization. Want to read more about the next major wave of server, storage, desktop and networking virtualization? Then click here to read more about virtualization life beyond consolidation where the emphasis or focus expands to abstraction, transparency, enablement in addition to consolidation for servers, storage, networks. If you are interested in metrics and measurements, Storage Resource Management (SRM) not to mention discussion about various macro data center metrics including PUE among others, click on the preceding links.

NAS and Shared Storage, iSCSI, DAS, SAS and more

Shifting gears to general industry trends and commentary, here are some comments on consumer and SOHO storage sharing, the role and importance Value Added Resellers (VARs) serve for SMB environments, as well as the top storage technologies that are in use and remain relevant. Here are some comments on iSCSI which continues to gain in popularity as well as storage options for small businesses.

Are you looking to buy or upgrade a new server? Here are some vendor and technology neutral tips to help determine needs along with requirements to help be a more effective informed buyer. Interested or do you want to know more about Serial Attached SCSI (6Gb/s SAS) including for use as external shared direct attached storage (DAS) for Exchange, Sharepoint, Oracle, VMware or HyperV clusters among other usage scenarios, check out this FAQ as well as podcast. Here are some other items including a podcast about using storage partitions in your data storage infrastructure, an ATE about what type of 1.5TB centralized storage to support multiple locations, and a video on scaling with clustered storage.

That is all for now, hope all is well and enjoy the content.

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Happy Earth Day 2010!

Here in the northern hemisphere it is late April and thus mid spring time.

That means the trees sprouting their buds, leaves and flowering while other plants and things come to life.

In Minnesota where I live, there is not a cloud in the sky today, the sun is out and its going to be another warm day in the 60s, a nice day to not be flying or traveling and thus enjoy the fine weather.

Among other things of note on this earth day 2010 include:

  • Minnesota Twins new home Target Field was just named the most Green Major League Baseball (MLB) stadium as well as greenest in the US with its LEED (or see here) certification.
  • Icelands Eyjafjallajokull volcano continues to spew water vapor steam, CO2 and ash at a slower rate than last week when it first erupted with some speculating that there could be impending activity from other Icelandic volcanos. Some estimates placed the initial eruption CO2 impact and subsequent flight cancellations to be neutral, essentially canceling each other out, however Im sure we will be hearing many different stories in the weeks to come.

  • Image of Iceland Eyjafjallajokull Volcano Eruption via Boston.com

  • Flights to/from and within Europe and the UK are returning to normal
  • Toyota continues to deal with recalls on some of their US built automobiles including the energy efficient Prius, some of which may have been purchased during the recent US cash for clunkers (CFC) program (hmm, is that ironic or what?)
  • Greenpeace in addition to using a Facebook page to protest Facebook data center practices is now targeting cloud IT in general including just before the Apple iPad launch (Heres some comments from Microsoft).
  • Vendors in all industries are lining up for the second coming of Green marketing or perhaps Green Washing 2.0

The new Green IT, moving beyond Green wash and hype

Speaking of Green IT including Green Computing, Green Storage, Virtualization, Cloud, Federation and more, here is a link to a post that I did back in February discussing how the Green Gap continues to exist.

The green gap exists and centers around the confusion of what Green means along with the common disconnects between core IT issues or barriers to becoming more efficient, effective, flexible and optimized from both an economic as well as environmental basis to those commonly messaged to under the green umbrella (read more here).

Regardless of where you stand on Green, Green washing, Green hype, environmentalism, eco-tech and other related themes, for at least a moment, set aside the politics and science debates and think in terms of practicality and economics.

That is, look for simple, recurring things that can be done to stretch your dollar or spending ability in order to support demand (See figure below) in a more effective manner along with reducing waste. For example to meet growing demand requirements in the face of shrinking or stagnate budgets, the action is to stretch available resources to do more work when needed, or retain more where applicable with the same or less footprint. What this means is that while common messaging is around reducing costs, look at the inverse which is to do more with available budgets or resources. The result is green in terms of economic and environmental benefits.

IT Resource demand
Increasing IT Resource Demand

Green IT wheel of oppourtunity
Green IT enablement techniques and technologies

Look at and understand the broader aspects of being green which has both economical and environmental benefits without compromising on productivity or functionality. There are many aspects or facets of being green beyond those commonly discussed or perceived to be so (See Green IT enablement techniques and technologies figure above).

Certainly recycling of paper, water, aluminum, plastics and other items including technology equipment are important to reduce waste and are things to consider. Another aspect of reducing waste particularly in IT is to avoid rework that can range from finding network bottlenecks or problems that result in continuous retransmission of data for failed backup, replication or data transfers that cause lost opportunity or resource consumption. Likewise programming errors (bugs) or miss configuration that results in rework or lost productivity also are forms of waste among others.

Another theme is that of shifting from energy avoidance to energy efficiency and effectiveness which are often thought to the same. However the expanded focus is also about getting more work done when needed with the same or less resources (See figure below) for example increasing activity (IOPS, transactions, emails or video served, bandwidth or messages) per watt of energy consumed.

From energy avoidence to effectiveness
Shifting from energy avoidance to effectiveness

One of the many techniques and approaches for addressing energy including stretching resources and being green include intelligent power management (IPM). With IPM, the focus is not strictly centered around energy avoidance, instead about inteligently adapting to different workloads or activity balancing performance and energy. Thus when there is work to be done, get the work done quickly with as little energy as possible (IOP or activity per watt), when there is less work, provide lower performance and thus smaller energy requirements, or when no work to be done, going into additional energy saving modes. Thus power management does not have to be exclusively about turrning off the lights or IT equipment in order to be green.

The following two figures look at Green IT past, present and future with an expanding focus around optimization and effectiveness meaning getting more work done, storing more data for longer periods of time, meeting growth demands with what appears to be additional resources however at a lower per unit cost without compromising on performance, availability or economics.

Green IT wheel of oppourtunity
Green IT: Past, present and future shift from avoidance to efficiency and effectiveness

Green IT wheel of oppourtunity
The new Green IT: Boosting business effectiveness, maximize ROI while helping the environment

If you think about going green as simply doing or using things more effectively, reducing waste, working more intelligently or effectively the benefits are both economical and environmentally positive (See the two figures above).

Instead of finding ways to fund green initiatives, shift the focus to how you can enable enhanced productivity, stretching resources further, doing more in the same or smaller footprint (floor space, power, cooling, energy, personal, licensing, budgets) for business economic and environmental sustainability with the result being environmental encampments.

Also keep in mind that small percentage changes on a large or recurring basis have significant benefits. For example a small change in cooling temperatures while staying within vendor guideline recommendations can result in big savings for large environments.

 

Bottom line

If you are a business and discounting green as simply a fad, or perhaps as a public relations (PR) initiative or activity tied to reducing carbon footprints and recycling then you are missing out on economic (top and bottom line) enhancement opportunities.

Likewise if you think that going green is only about the environment, then there is a missed opportunity to boost economic opportunities to help fund those inititiaves.

Going green means many different things to various people and is often more broad and common sense based than most realize.

That is all for now, happy earth day 2010

Cheers gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Inaugural StorageIO Newsletter

Welcome to the winter 2010 edition of the Server and StorageIO (StorageIO) news letter. This inaugural edition of the StorageIO news letter coincides with our 5th year in business along with recent web site and blog enhancements.

In an age of social media including facebook, twitter, blogs and video, some might ask the question of why a news letter, after all, is that not old school or non social media?

For those who are immersed into twitter, blogs, facebook, feeds and other Web 2.0 means of communication, a traditional newsletter might not be in vogue.

StorageIO News Letter Image
Winter 2010 Newsletter
(Inaugural Edition)

However, realizing that there is still a large percentage of the population which also means a vast number of visitors and guest of StorageIO web sites and blogs or viewers of articles along with other content that do not use twitter, facebook, LinkedIn or RSS feeds, I realize that there is still a role for a newsletter.

Thus, it makes sense to bring info to those of you who prefer a traditional news letter format via email or other subscription, however this newsletter is available in HTML or PDF formats.

You can access this news letter via various social media venues (some are shown below) in addition to StorageIO web sites and subscriptions. Click on the following links to view the inaugural newsletter as HTML or PDF or, to go to the newsletter page.

Follow via Goggle Feedburner here or via email subscription here.

You can also subscribe to the news letter by simply sending an email to newsletter@storageio.com

Enjoy this inaugural edition of the StorageIO newsletter, let me know your comments and feedback.

Also, a very big thank you to everyone who has helped make StorageIO a success!.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Green IT, Green Gap, Tiered Energy and Green Myths

There are many different aspects of Green IT along with several myths or misperceptions not to mention missed opportunities.

There is a Green Gap or disconnect between environmentally aware, focused messaging and core IT data center issues. For example, when I ask IT professionals whether they have or are under direction to implement green IT initiatives, the number averages in the 10-15% range.

However, when I ask the same audiences who has or sees power, cooling, floor space, supporting growth, or addressing environmental health and safety (EHS) related issues, the average is 75 to 90%. What this means is a disconnect between what is perceived as being green and opportunities for IT organizations to make improvements from an economic and efficiency standpoint including boosting productivity.

 

Some IT Data Center Green Myths
Is “green IT” a convenient or inconvenient truth or a legend?

When it comes to green and virtual environments, there are plenty of myths and realities, some of which vary depending on market or industry focus, price band, and other factors.

For example, there are lines of thinking that only ultra large data centers are subject to PCFE-related issues, or that all data centers need to be built along the Columbia River basin in Washington State, or that virtualization eliminates vendor lock-in, or that hardware is more expensive to power and cool than it is to buy.

The following are some myths and realities as of today, some of which may be subject to change from reality to myth or from myth to reality as time progresses.

Myth: Green and PCFE issues are applicable only to large environments.

Reality: I commonly hear that green IT applies only to the largest of companies. The reality is that PCFE issues or green topics are relevant to environments of all sizes, from the largest of enterprises to the small/medium business, to the remote office branch office, to the small office/home office or “virtual office,” all the way to the digital home and consumer.

 

Myth: All computer storage is the same, and powering disks off solves PCFE issues.

Reality: There are many different types of computer storage, with various performance, capacity, power consumption, and cost attributes. Although some storage can be powered off, other storage that is needed for online access does not lend itself to being powered off and on. For storage that needs to be always online and accessible, energy efficiency is achieved by doing more with less—that is, boosting performance and storing more data in a smaller footprint using less power.

 

Myth: Servers are the main consumer of electrical power in IT data centers.

Reality: In the typical IT data center, on average, 50% of electrical power is consumed by cooling, with the balance used for servers, storage, networking, and other aspects. However, in many environments, particularly processing or computation intensive environments, servers in total (including power for cooling and to power the equipment) can be a major power draw.

 

Myth: IT data centers produce 2 to 8% of all global Carbon Dioxide (CO2) and carbon emissions.

Reality:  Thus might be perhaps true, given some creative accounting and marketing math in order to help build a justification case or to scare you into doing something. However, the reality is that in the United States, for example, IT data centers consume around 2 to 4% of electrical power (depending on when you read this), and less than 80% of all U.S. CO2 emissions are from electrical power generation, so the math does not quite add up. The reality is this, if no action is taken to improve IT data center energy efficiency, continued demand growth will shift IT power-related emissions from myth to reality, not to mention cause constraints on IT and business sustainability from an economic and productivity standpoint.

Myth: Server consolidation with virtualization is a silver bullet to address PCFE issues.

Reality: Server virtualization for consolidation is only part of an overall solution that should be combined with other techniques, including lower power, faster and more energy efficient servers, and improved data and storage management techniques.

 

Myth: Hardware costs more to power than to purchase.

Reality: Currently, for some low-cost servers, standalone disk storage, or entry level networking switches and desktops, this may be true, particularly where energy costs are excessively high and the devices are kept and used continually for three to five years. A general rule of thumb is that the actual cost of most IT hardware will be a fraction of the price of associated management and software tool costs plus facilities and cooling costs. For the most part, at least as of this writing, small standalone individual hard disk drives or small entry level volume servers can be bought and then used in locations that have very high electrical costs over a three  to five year time frame.

 

Regarding this last myth, for the more commonly deployed external storage systems across all price bands and categories, generally speaking, except for extremely inefficient and hot running legacy equipment, the reality is that it is still cheaper to power the equipment than to buy it. Having said that, there are some qualifiers that should also be used as key indicators to keep the equation balanced. These qualifiers include the acquisition cost  if any, for new, expanded, or remodeled habitats or space to house the equipment, the price of energy in a given region, including surcharges, as well as cooling, length of time, and continuous time the device will be used.

For larger businesses, IT equipment in general still costs more to purchase than to power, particularly with newer, more energy efficient devices. However, given rising energy prices, or the need to build new facilities, this could change moving forward, particularly if a move toward energy efficiency is not undertaken.

There are many variables when purchasing hardware, including acquisition cost, the energy efficiency of the device, power and cooling costs for a given location and habitat, and facilities costs. For example, if a new storage solution is purchased for $100,000, yet new habitat or facilities must be built for three to five times the cost of the equipment, those costs must be figured into the purchase cost.

Likewise, if the price of a storage solution decreases dramatically, but the device consumes a lot of electrical power and needs a large cooling capacity while operating in a region with expensive electricity costs, that, too, will change the equation and the potential reality of the myth.

 

Tiered Energy Sources
Given that IT resources and facilitated require energy to power equipment as well as keep them cool, electricity are popular topics associated with Green IT, economics and efficiency with lots of metrics and numbers tossed around. With that in mind, the U.S. national average CO2 emission is 1.34 lb/kWh of electrical power. Granted, this number will vary depending on the region of the country and the source of fuel for the power-generating station or power plant.

Like IT tiered resources (Servers, storage, I/O networks, virtual machines and facilities) of which there are various tiers or types of technologies to meet various needs, there are also multiple types of energy sources. Different tiers of energy sources vary by their cost, availability and environmental characteristics among others. For example, in the US, there are different types of coal and not all coal is as dirty when combined with emissions air scrubbers as you might be lead to believe however there are other energy sources to consider as well.

Coal continues to be a dominant fuel source for electrical power generation both in the United States and abroad, with other fuel sources, including oil, gas, natural gas, liquid propane gas (LPG or propane), nuclear, hydro, thermo or steam, wind and solar. Within a category of fuel, for example, coal, there are different emissions per ton of fuel burned. Eastern U.S. coal is higher in CO2 emissions per kilowatt hour than western U.S. lignite coal. However, eastern coal has more British thermal units (Btu) of energy per ton of coal, enabling less coal to be burned in smaller physical power plants.

If you have ever noticed that coal power plants in the United States seem to be smaller in the eastern states than in the Midwest and western states, it’s not an optical illusion. Because eastern coal burns hotter, producing more Btu, smaller boilers and stockpiles of coal are needed, making for smaller power plant footprints. On the other hand, as you move into the Midwest and western states of the United States, coal power plants are physically larger, because more coal is needed to generate 1 kWh, resulting in bigger boilers and vent stacks along with larger coal stockpiles.

On average, a gallon of gasoline produces about 20 lb of CO2, depending on usage and efficiency of the engine as well as the nature of the fuel in terms of octane or amount of Btu. Aviation fuel and diesel fuel differ from gasoline, as does natural gas or various types of coal commonly used in the generation of electricity. For example, natural gas is less expensive than LPG but also provides fewer Btu per gallon or pound of fuel. This means that more natural gas is needed as a fuel to generate a given amount of power.

Recently, while researching small, 10 to 12 kWh standby generators for my office, I learned about some of the differences between propane and natural gas. What I found was that with natural gas as fuel, a given generator produced about 10.5 kWh, whereas the same unit attached to a LPG or propane fuel source produced 12 kWh. The trade off was that to get as much power as possible out of the generator, the higher cost LPG was the better choice. To use lower cost fuel but get less power out of the device, the choice would be natural gas. If more power was needed, than a larger generator could be deployed to use natural gas, with the trade off of requiring a larger physical footprint.

Oil and gas are not used as much as fuel sources for electrical power generation in the United States as in other countries such as the United Kingdom. Gasoline, diesel, and other petroleum based fuels are used for some power plants in the United States, including standby or peaking plants. In the electrical power G and T industry as in IT, where different tiers of servers and storage are used for different applications there are different tiers of power plants using different fuels with various costs. Peaking and standby plants are brought online when there is heavy demand for electrical power, during disruptions when a lower cost or more environmentally friendly plant goes offline for planned maintenance, or in the event of a trip or unplanned outage.

CO2 is commonly discussed with respect to green and associated emissions however there are other so called Green Houses Gases including Nitrogen Dioxide (NO2) and water vapors among others. Carbon makes up only a fraction of CO2. To be specific, only about 27% of a pound of CO2 is carbon; the balance is not. Consequently, carbon emissions taxes schemes (ETS), as opposed to CO2 tax schemes, need to account for the amount of carbon per ton of CO2 being put into the atmosphere. In some parts of the world, including the EU and the UK, ETS are either already in place or in initial pilot phases, to provide incentives to improve energy efficiency and use.

Meanwhile, in the United States there are voluntary programs for buying carbon offset credits along with initiatives such as the carbon disclosure project. The Carbon Disclosure Project (www.cdproject.net) is a not for profit organization to facilitate the flow of information pertaining to emissions by organizations for investors to make informed decisions and business assessment from an economic and environmental perspective. Another voluntary program is the United States EPA Climate Leaders initiative where organizations commit to reduce their GHG emissions to a given level or a specific period of time.

Regardless of your stance or perception on green issues, the reality is that for business and IT sustainability, a focus on ecological and, in particular, the corresponding economic aspects cannot be ignored. There are business benefits to aligning the most energy efficient and low power IT solutions combined with best practices to meet different data and application requirements in an economic and ecologically friendly manner.

Green initiatives need to be seen in a different light, as business enables as opposed to ecological cost centers. For example, many local utilities and state energy or environmentally concerned organizations are providing funding, grants, loans, or other incentives to improve energy efficiency. Some of these programs can help offset the costs of doing business and going green. Instead of being seen as the cost to go green, by addressing efficiency, the by products are economic as well as ecological.

Put a different way, a company can spend carbon credits to offset its environmental impact, similar to paying a fine for noncompliance or it can achieve efficiency and obtain incentives. There are many solutions and approaches to address these different issues, which will be looked at in the coming chapters.

What does this all mean?
There are real things that can be done today that can be effective toward achieving a balance of performance, availability, capacity, and energy effectiveness to meet particular application and service needs.

Sustaining for economic and ecological purposes can be achieved by balancing performance, availability, capacity, and energy to applicable application service level and physical floor space constraints along with intelligent power management. Energy economics should be considered as much a strategic resource part of IT data centers as are servers, storage, networks, software, and personnel.

The bottom line is that without electrical power, IT data centers come to a halt. Rising fuel prices, strained generating and transmission facilities for electrical power, and a growing awareness of environmental issues are forcing businesses to look at PCFE issues. IT data centers to support and sustain business growth, including storing and processing more data, need to leverage energy efficiency as a means of addressing PCFE issues. By adopting effective solutions, economic value can be achieved with positive ecological results while sustaining business growth.

Some additional links include:

Want to learn or read more?

Check out Chapter 1 (Green IT and the Green Gap, Real or Virtual?) in my book “The Green and Virtual Data Center” (CRC) here or here.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Technology Tiering, Servers Storage and Snow Removal

Granted it is winter in the northern hemisphere and thus snow storms should not be a surprise.

However between December 2009 and early 2010, there has been plenty of record activity from in the U.K. (or here), to the U.S. east coast including New York, Boston and Washington DC, across the midwest and out to California, it made for a white christmas and SANta fun along with snow fun in general in the new year.

2010 Snow Storm via www.star-telegram.com

What does this have to do with Information Factories aka IT resources including public or private clouds, facilities, server, storage, networking along with data management let alone tiering?

What does this have to do with tiered snow removal, or even snow fun?

Simple, different tools are needed for addressing various types of snow from wet and heavy to light powdery or dustings to deep downfalls. Likewise, there are different types of servers, storage, data networks along with operating systems, management tools and even hyper visors to deal with various application needs or requirements.

First, lets look at tiered IT resources (servers, storage, networks, facilities, data protection and hyper visors) to meet various efficiency, optimization and service level needs.

Do you have tiered IT resources?

Let me rephrase that question to do you have different types of servers with various performance, availability, connectivity and software that support various applications and cost levels?

Thus the whole notion of tiered IT resources is to be abe to have different resources that can be aligned to the task at hand in order to meet performance, availability, capacity, energy along with economic along with service level agreement (SLA) requirements.

Computers or servers are targeted for different markets including Small Office Home Office (SOHO), Small Medium Business (SMB), Small Medium Enterprise (SME) and ultra large scale or extreme scaling, including high performance super computing. Servers are also positioned for different price bands and deployment scenarios.

General categories of tiered servers and computers include:

  • Laptops, desktops and workstations
  • Small floor standing towers or rack mounted 1U and 2U servers
  • Medium sizes floor standing towers or larger rack mounted servers
  • Blade Centers and Blade Servers
  • Large size floor standing servers, including mainframes
  • Specialized fault tolerant, rugged and embedded processing or real time servers

Servers have different names email server, database server, application server, web server, and video or file server, network server, security server, backup server or storage server associated with them depending on their use. In each of the previous examples, what defines the type of server is the type of software is being used to deliver a type of service. Sometimes the term appliance will be used for a server; this is indicative of the type of service the combined hardware and software solution are providing. For example, the same physical server running different software could be a general purpose applications server, a database server running for example Oracle, IBM, Microsoft or Teradata among other databases, an email server or a storage server.

This can lead to confusion when looking at servers in that a server may be able to support different types of workloads thus it should be considered a server, storage, networking or application platform. It depends on the type of software being used on the server. If, for example, storage software in the form a clustered and parallel file system is installed on a server to create highly scalable network attached storage (NAS) or cloud based storage service solution, then the server is a storage server. If the server has a general purpose operating system such as Microsoft Windows, Linux or UNIX and a database on it, it is a database server.

While not technically a type of server, some manufacturers use the term tin wrapped software in an attempt to not be classified as an appliance, server or hardware vendor but want their software to be positioned more as a turnkey solution. The idea is to avoid being perceived as a software only solution that requires integration with hardware. The solution is to use off the shelf commercially available general purpose servers with the vendors software technology pre integrated and installed ready for use. Thus, tin wrapped software is a turnkey software solution with some tin, or hardware, wrapped around it.

How about the same with tiered storage?

That is different tiers (Figure 1) of fast high performance disk including RAM or flash based SSD, fast Fibre Channel or SAS disk drives, or high capacity SAS and SATA disk drives along with magnetic tape as well as cloud based backup or archive?

Tiered Storage Resources
Figure 1: Tiered Storage resources

Tiered storage is also sometimes thought of in terms large enterprise class solutions or midrange, entry level, primary, secondary, near line and offline. Not to be forgotten, there are also tiered networks that support various speeds, convergence, multi tenancy and other capabilities from IO Virtualization (IOV) to traditional LAN, SAN, MAN and WANs including 1Gb Ethernet (1GbE), 10GbE up to emerging 40GbE and 100GbE not to mention various Fibre Channel speeds supporting various protocols.

The notion around tiered networks is like with servers and storage to enable aligning the right technology to be used for the task at hand economically while meeting service needs.

Two other common IT resource tiering techniques include facilities and data protection. Tiered facilities can indicate size, availability, resiliency among other characteristics. Likewise, tiered data protection is aligning the applicable technology to support different RTO and RPO requirements for example using synchronous replication where applicable vs. asynchronous time delayed for longer distance combined with snapshots. Other forms of tiered data protection include traditional backups either to disk, tape or cloud.

There is a new emerging form of tiering in many IT environments and that is tiered virtualization or specifically tiered server hyper visors in virtual data centers with similar objectives to having different server, storage, network, data protection or facilities tiers. Instead of an environment running all VMware, Microsoft HyperV or Xen among other hyper visors may be deployed to meet different application service class requirements. For example, VMware may be used for premium features and functionality on some applications, where others that do not need those features along with requiring lower operating costs leverage HyperV or Zen based solutions. Taking the tiering approach a step further, one could also declare tiered databases for example Oracle legacy vs. MySQL or Microsoft SQLserver among other examples.

What about IT clouds, are those different types of resources, or, essentially an extension of existing IT capabilities for example cloud storage being another tier of data storage?

There is another form of tiering, particularly during the winter months in the northern hemisphere where there is an abundance of snow this time of the year. That is, tiered snow management, removal or movement technologies.

What about tiered snow removal?

Well lets get back to that then.

Like IT resources, there are different technologies that can be used for moving, removing, melting or managing snow.

For example, I cant do much about getting ready of snow other than pushing it all down the hill and into the river, something that would take time and lots of fuel, or, I can manage where I put snow piles to be prepared for next storm, plus, to help put it where the piles of snow will melt and help avoid spring flood. Some technologies can be used for relocating snow elsewhere, kind of like archiving data onto different tiers of storage.

Regardless of if snowstorm or IT clouds (public or private), virtual, managed service provider (MSP), hosted or traditional IT data centers, all require physical servers, storage, I/O and data networks along with software including management tools.

Granted not all servers, storage or networking technology let alone software are the same as they address different needs. IT resources including servers, storage, networks, operating systems and even hyper visors for virtual machines are often categorized and aligned to different tiers corresponding to needs and characteristics (Figure 2).

Tiered IT Resources
Figure 2: Tiered IT resources

For example, in figure 3 there is a light weight plastic shovel (Shove 1) for moving small amounts of snow in a wide stripe or pass. Then there is a narrow shovel for digging things out, or breaking up snow piles (Shovel 2). Also shown are a light duty snow blower (snow thrower) capable of dealing with powdery or non wet snow, grooming in tight corners or small areas.

Tiered Snow tools
Figure 3: Tiered Snow management and migration tools

For other light dustings, a yard leaf blower does double duty for migrating or moving snow in small or tight corners such as decks, patios or for cleanup. Larger snowfalls, or, where there is a lot of area to clear involves heavier duty tools such as the Kawasaki mule with 5 foot curtis plow. The mule is a multifunction, multi protocol tool capable of being used for hauling items, towing, pulling or recreational tasks.

When all else fails, there is a pickup truck to get or go out and about, not to mention to pull other vehicles out of ditches or piles of snow when they become stuck!

Snow movement
Figure 4: Sometimes the snow light making for fast, low latency migration

Snow movement
Figure 5: And sometimes even snow migration technology goes off line!

Snow movement

And that is it for now!

Enjoy the northern hemisphere winter and snow while it lasts, make the best of it with the right tools to simplify the tasks of movement and management, similar to IT resources.

Keep in mind, its about the tools and when along with how to use them for various tasks for efficiency and effectiveness, and, a bit of snow fun.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

California Center for Sustainable Energy (CCSE)



CCSE Facility and Seminar Series

This past week I had the honor of delivering a keynote presentation in San Diego at the California Center for Sustainable Energy (CCSE) as part of their continuing education and community outreach and education, workshop and seminar series. The theme of the well attended event was Next Generation Data Center Solutions of which my talk centered around leveraging Green and Virtual Data Centers for enabling efficiencey and effectiveness. In addition to my keynote, included a panel discussion that I moderated with representatives of the events sponsor Compucom, along with their special guests APC, HP, Intel and VMware.

The CCSE has a focus around Climate Change, Energy Efficienecey, Green Buildings, Renewable Energy, Transportation, Home and Business. Their services and focus includes awareness and outreach, education programs, library and tools, consultant and associated services. Speaking of their library, there is even a signed copy of my book The Green and Virtual Data Center (CRC) now at the CCSE library that can be checked out along with their other resources.

The CCSE staff and facilities were fantastic with hosts Mike Bigelow (an energy engineer) and Marlene King (program manager) orchestrating a great event.

If you are in the San Diego area, check out the CCSE located at 8690 Balboa Ave., Suite 100. They have a great library, cool demonstrations and tools that you can check out to assist with optimization IT data centers from an energy efficicinecy standpoint. Learn more about the CCSE here.

Following are some relevant links to the keynote along with panel discussion from the CCSE event:

Follow these links to view additional videos or podcasts, tips, articles, books, reports and events.

Cheers
gs

Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
twitter @storageio

Technorati tags: Trends

EPA Server and Storage Workshop Feb 2, 2010

EPA Energy Star

Following up on a recent previous post pertaining to US EPA Energy Star(r) for Servers, Data Center Storage and Data Centers, there will be a workshop held Tuesday February 2, 2010 in San Jose, CA.

Here is the note (Italics added by me for clarity) from the folks at EPA with information about the event and how to participate.

 

Dear ENERGY STAR® Servers and Storage Stakeholders:

Representatives from the US EPA will be in attendance at The Green Grid Technical Forum in San Jose, CA in early February, and will be hosting information sessions to provide updates on recent ENERGY STAR servers and storage specification development activities.  Given the timing of this event with respect to ongoing data collection and comment periods for both product categories, EPA intends for these meetings to be informal and informational in nature.  EPA will share details of recent progress, identify key issues that require further stakeholder input, discuss timelines for the completion, and answer questions from the stakeholder community for each specification.

The sessions will take place on February 2, 2010, from 10:00 AM to 4:00 PM PT, at the San Jose Marriott.  A conference line and Webinar will be available for participants who cannot attend the meeting in person.  The preliminary agenda is as follows:

Servers (10:00 AM to 12:30 PM)

  • Draft 1 Version 2.0 specification development overview & progress report
    • Tier 1 Rollover Criteria
    • Power & Performance Data Sheet
    • SPEC efficiency rating tool development
  • Opportunities for energy performance data disclosure

 

Storage (1:30 PM to 4:00 PM)

  • Draft 1 Version 1.0 specification development overview & progress report
  • Preliminary stakeholder feedback & lessons learned from data collection 

A more detailed agenda will be distributed in the coming weeks.  Please RSVP to storage@energystar.gov or servers@energystar.gov no later than Friday, January 22.  Indicate in your response whether you will be participating in person or via Webinar, and which of the two sessions you plan to attend.

Thank you for your continued support of ENERGY STAR.

 

End of EPA Transmission

For those attending the event, I look forward to seeing you there in person on Tuesday before flying down to San Diego where I will be presenting on Wednesday the 3rd at The Green Data Center Conference.

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

RAID Relevance Revisited

Following up from some previous posts on the topic, a continued discussion point in the data storage industry is the relevance (or lack there) of RAID (Redundant Array of Independent Disks).

These discussions tend to evolve around how RAID is dead due to its lack of real or perceived ability to continue scaling in terms of performance, availability, capacity, economies or energy capabilities needed or when compared to those of newer techniques, technologies or products.

RAID Relevance

While there are many new and evolving approaches to protecting data in addition to maintaining availability or accessibility to information, RAID despite the fan fare is far from being dead at least on the technology front.

Sure, there are issues or challenges that require continued investing in RAID as has been the case over the past 20 years; however those will also be addressed on a go forward basis via continued innovation and evolution along with riding technology improvement curves.

Now from a marketing standpoint, ok, I can see where the RAID story is dead, boring, and something new and shiny is needed, or, at least change the pitch to sound like something new.

Consequently, when being long in the tooth and with some of the fore mentioned items among others, older technologies that may be boring or lack sizzle or marketing dollars can and often are declared dead on the buzzword bingo circuit. After all, how long now has the industry trade group RAID Advisory Board (RAB) been missing in action, retired, spun down, archived or ILMed?

RAID remains relevant because like other dead or zombie technologies it has reached the plateau of productivity and profitability. That success is also something that emerging technologies envy as their future domain and thus a classic marketing move is to declare the incumbent dead.

The reality is that RAID in all of its various instances from hardware to software, standard to non-standard with extensions is very much alive from the largest enterprise to the SMB to the SOHO down into consumer products and all points in between.

Now candidly, like any technology that is about 20 years old if not older after all, the disk drive is over 50 years old and been declared dead for how long now?.RAID in some ways is long in the tooth and there are certainly issues to be addressed as have been taken care of in the past. Some of these include the overhead of rebuilding large capacity 1TB, 2TB and even larger disk drives in the not so distant future.

There are also issues pertaining to distributed data protection in support of cloud, virtualized or other solutions that need to be addressed. In fact, go way way back to when RAID appeared commercially on the scene in the late 80s and one of the value propositions among others was to address the reliability of emerging large capacity multi MByte sized SCSI disk drives. It seems almost laughable today that when a decade later, when the 1GB disk drives appeared in the market back in the 90s that there was renewed concern about RAID and disk drive rebuild times.

Rest assured, I think that there is a need and plenty of room for continued innovate evolution around RAID related technologies and their associated storage systems or packaging on a go forward basis.

What I find interesting is that some of the issues facing RAID today are similar to those of a decade ago for example having to deal with large capacity disk drive rebuild, distributed data protecting and availability, performance, ease of use and so the list goes.

However what happened was that vendors continued to innovate both in terms of basic performance accelerated rebuild rates with improvements to rebuild algorithms, leveraged faster processors, busses and other techniques. In addition, vendors continued to innovate in terms of new functionality including adopting RAID 6 which for the better part of a decade outside of a few niche vendors languished as one of those future technologies that probably nobody would ever adopt, however we know that to be different now and for the past several years. RAID 6 is one of those areas where vendors who do not have it are either adding it, enhancing it, or telling you why you do not need it or why it is no good for you.

An example of how RAID 6 is being enhanced is boosting performance on normal read and write operations along with acceleration of performance during disk rebuild. Also tied to RAID 6 and disk drive rebuild are improvements in controller design to detect and proactively make repairs on the fly to minimize or eliminate errors or diminished the need for drive rebuilds, similar to what was done in previous generations. Lets also not forget the improvements in disk drives boosting performance, availability, capacity and energy improvements over time.

Funny how these and other enhancements are similar to those made to RAID controllers hardware and software fine tuning them in the early to mid 2000s in support for high capacity SATA disk drives that had different RAS characteristics of higher performance lower capacity enterprise drives.

Here is my point.

RAID to some may be dead while others continue to rely on it. Meanwhile others are working on enhancing technologies for future generations of storage systems and application requirements. Thus in different shapes, forms, configurations, feature; functionality or packaging, the spirit of RAID is very much alive and well remaining relevant.

Regardless of if a solution using two or three disk mirroring for availability, or RAID 0 fast SSD or SAS or FC disks in a stripe configuration for performance with data protection via rapid restoration from some other low cost medium (perhaps RAID 6 or tape), or perhaps single, dual or triple parity protection, or if using small block or multiMByte or volume based chunklets, let alone if it is hardware or software based, local or disturbed, standard or non standard, chances are there is some theme of RAID involved.

Granted, you do not have to call it RAID if you prefer!

As a closing thought, if RAID were no longer relevant, than why do the post RAID, next generation, life beyond RAID or whatever you prefer to call them technologies need to tie themselves to the themes of RAID? Simple, RAID is still relevant in some shape or form to different audiences as well as it is a great way of stimulating discussion or debate in a constantly evolving industry.

BTW, Im still waiting for the revolutionary piece of hardware that does not require software, and the software that does not require hardware and that includes playing games with server less servers using hypervisors :) .

Provide your perspective on RAID and its relevance in the following poll.

Here are some additional related and relevant RAID links of interests:

Stay tuned for more about RAIDs relevance as I dont think we have heard the last on this.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Upcoming Events and Activities Update V2010.1

The end of year christmas and new years holiday season has come and gone which means of course that 2009 is a wrap along with the travel from being out and about.

In addition to getting some time to relax a bit (playing Wii resort, snow plowing, cooking etc.), I have also been catching up on developing some new content including articles, blogs (some yet to be post), tips as well as podcasts along with some custom research advisory projects.

Check out some recent tips, articles, videos and podcasts here along with perspecitives and comments on indusitry news here.

2009 events and activities saw visits to cities including San Jose, Tucson, Cancun Mexico, Dallas, Tampa, Miami, Los Angles, San Jose, Las Vegas, Milwaukee, Atlanta, St. Louis, Birmingham, Cincinnati, Santa Ana, Minneapolis, Boston, Dallas, Boston, Chicago, Parsipanny, Raleigh, Providence, Kansas City, Denver, Chicago, Orlando, Chicago, Philadelphia, Toronto, Richmond, Columbus, Princeton, Seattle, Portland, Dallas, San Francisco, Minneapolis, Toronto, Chicago, New York, Milwaukee, Atlanta, Boston, Cleveland and Detroit among others.

This time of the year also means that the 2010 events and activities including in person keynote and presentations also known as out and about are getting underway. While the 2010 schedule of events is still being finalized, some initial events have are on the calendar, my bags are about to be packed and tickets in hand not to mention finalizing the presentation and discussion content.

In addition to some non public events including keynote presenting at some vendors annual sales (kick off) meetings, the following are some of what are currently on the calendar that you can click on the links below to learn more about the venues.

February 3, 2010 Green Data Center Conference, San Diego, CA
January 21, 2010 Dinner Event keynote Speaker Dynamic IT Infrastructure, Beverly Hills, CA
January 21, 2010 Morning keynote Speaker The Green and Virtual Data Center, San Diego, CA
January 19, 2010 Dinner Event keynote Speaker Dynamic IT Infrastructure, Miami, FL

Watch for updates to the events calendar and I look forward to seeing you all while Im out and about during 2010.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved