Ceph Day Amsterdam 2012 (Object and cloud storage)

StorageIO industry trends cloud, virtualization and big data

Recently while I was in Europe presenting some sessions at conferences and doing some seminars, I was invited by Ed Saipetch (@edsai) of Inktank.com to attend the first Ceph Day in Amsterdam.

Ceph day image

As luck or fate would turn out, I was in Nijkerk which is about an hour train ride from Amsterdam central station plus a free day in my schedule. After a morning train ride and nice walk from Amsterdam Central I arrived at the Tobacco Theatre (a former tobacco trading venue) where Ceph Day was underway, and in time for lunch of Krokettens sandwich.

Attendees at Ceph Day

Lets take a quick step back and address for those not familiar what is Ceph (Cephalanthera) and why it was worth spending a day to attend this event. Ceph is an open source distributed object scale out (e.g. cluster or grid) software platform running on industry standard hardware.

Dell server supporting ceph demoSketch of ceph demo configuration

Ceph is used for deploying object storage, cloud storage and managed services, general purpose storage for research, commercial, scientific, high performance computing (HPC) or high productivity computing (commercial) along with backup or data protection and archiving destinations. Other software similar in functionality or capabilities to Ceph include OpenStack Swift, Basho Riak CS, Cleversafe, Scality and Caringo among others. There are also the tin wrapped software (e.g. appliances or pre-packaged) solutions such as Dell DX (Caringo), DataDirect Networks (DDN) WOS, EMC ATMOS and Centera, Amplidata and HDS HCP among others. From a service standpoint, these solutions can be used to build services similar Amazon S3 and Glacier, Rackspace Cloud files and Cloud Block, DreamHost DreamObject and HP Cloud storage among others.

Ceph cloud and object storage architecture image

At the heart of Ceph is RADOS a distributed object store that consists of peer nodes functioning as object storage devices (OSD). Data can be accessed via REST (Amazon S3 like) APIs, Libraries, CEPHFS and gateway with information being spread across nodes and OSDs using a CRUSH based algorithm (note Sage Weil is one of the authors of CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data). Ceph is scalable in terms of performance, availability and capacity by adding extra nodes with hard disk drives (HDD) or solid state devices (SSDs). One of the presentations pertained to DreamHost that was an early adopter of Ceph to make their DreamObjects (cloud storage) offering.

Ceph cloud and object storage deployment image

In addition to storage nodes, there are also an odd number of monitor nodes to coordinate and manage the Ceph cluster along with optional gateways for file access. In the above figure (via DreamHost), load balancers sit in front of gateways that interact with the storage nodes. The storage node in this example is a physical server with 12 x 3TB HDDs each configured as a OSD.

Ceph dreamhost dreamobject cloud and object storage configuration image

In the DreamHost example above, there are 90 storage nodes plus 3 management nodes, the total raw storage capacity (no RAID) is about 3PB (12 x 3TB = 36TB x 90 = 3.24PB). Instead of using RAID or mirroring, each objects data is replicated or copied to three (e.g. N=3) different OSDs (on separate nodes), where N is adjustable for a given level of data protection, for a usable storage capacity of about 1PB.

Note that for more usable capacity and lower availability, N could be set lower, or a larger value of N would give more durability or data protection at higher storage capacity overhead cost. In addition to using JBOD configurations with replication, Ceph can also be configured with a combination of RAID and replication providing more flexibility for larger environments to balance performance, availability, capacity and economics.

Ceph dreamhost and dreamobject cloud and object storage deployment image

One of the benefits of Ceph is the flexibility to configure it how you want or need for different applications. This can be in a cost-effective hardware light configuration using JBOD or internal HDDs in small form factor generally available servers, or high density servers and storage enclosures with optional RAID adapters along with SSD. This flexibility is different from some cloud and object storage systems or software tools which take a stance of not using or avoiding RAID vs. providing options and flexibility to configure and use the technology how you see fit.

Here are some links to presentations from Ceph Day:
Introduction and Welcome by Wido den Hollander
Ceph: A Unified Distributed Storage System by Sage Weil
Ceph in the Cloud by Wido den Hollander
DreamObjects: Cloud Object Storage with Ceph by Ross Turk
Cluster Design and Deployment by Greg Farnum
Notes on Librados by Sage Weil

Presentations during ceph day

While at Ceph day, I was able to spend a few minutes with Sage Weil Ceph creator and founder of inktank.com to record a pod cast (listen here) about what Ceph is, where and when to use it, along with other related topics. Also while at the event I had a chance to sit down with Curtis (aka Mr. Backup) Preston where we did a simulcast video and pod cast. The simulcast involved Curtis recording this video with me as a guest discussing Ceph, cloud and object storage, backup, data protection and related themes while I recorded this pod cast.

One of the interesting things I heard, or actually did not hear while at the Ceph Day event that I tend to hear at related conferences such as SNW is a focus on where and how to use, configure and deploy Ceph along with various configuration options, replication or copy modes as opposed to going off on erasure codes or other tangents. In other words, instead of focusing on the data protection protocol and algorithms, or what is wrong with the competition or other architectures, the Ceph Day focused was removing cloud and object storage objections and enablement.

Where do you get Ceph? You can get it here, as well as via 42on.com and inktank.com.

Thanks again to Sage Weil for taking time out of his busy schedule to record a pod cast talking about Ceph, as well 42on.com and inktank for hosting, and the invitation to attend the first Ceph Day in Amsterdam.

View of downtown Amsterdam on way to train station to return to Nijkerk
Returning to Amsterdam central station after Ceph Day

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Garbage data in, garbage information out, big data or big garbage?

StorageIO industry trends cloud, virtualization and big data

Do you know the computer technology saying, garbage data in results in garbage information out?

In other words even with the best algorithms and hardware, bad, junk or garbage data put in results in garbage information delivered. Of course, you might have data analysis and cleaning software to look for, find and remove bad or garbage data, however that’s for a different post on another day.

If garbage data in results in garbage information out, does garbage big data in result in big garbage out?

I’m sure my sales and marketing friends or their surrogates will jump at the opportunity to tell me why and how big data is the solution to the decades old garbage data in problem.

Likewise they will probably tell me big data is the solution to problems that have not even occurred or been discovered yet, yeah right.

However garbage data does not discriminate or show preference towards big data or little data, in fact it can infiltrate all types of data and systems.

Lets shift gears from big and little data to how all of that information is protected, backed up, replicated, copied for HA, BC, DR, compliance, regulatory or other reasons. I wonder how much garbage data is really out there and many garbage backups, snapshots, replication or other copies of data exist? Sounds like a good reason to modernize data protection.

If we don’t know where the garbage data is, how can we know if there is a garbage copy of the data for protection on some other tape, disk or cloud. That also means plenty of garbage data to compact (e.g. compress and dedupe) to cut its data footprint impact particular with tough economic times.

Does this mean then that the cloud is the new destination for garbage data in different shapes or forms, from online primary to back up and archive?

Does that then make the cloud the new virtual garbage dump for big and little data?

Hmm, I think I need to empty my desktop trash bin and email deleted items among other digital house keeping chores now.

On the other hand, just had a thought about orphaned data and orphaned storage, however lets leave those sleeping dogs lay where they rest for now.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

SSD, flash and DRAM, DejaVu or something new?

StorageIO industry trends cloud, virtualization and big data

Recently I was in Europe for a couple of weeks including stops at Storage Networking World (SNW) Europe in Frankfurt, StorageExpo Holland, Ceph Day in Amsterdam (object and cloud storage), and Nijkerk where I delivered two separate 2 day, and a single 1 day seminar.

Image of Frankfurt transtationImage of inside front of ICE train going from Frankfurt to Utrecht

At the recent StorageExpo Holland event in Utrecht, I gave a couple of presentations, one on cloud, virtualization and storage networking trends, the other taking a deeper look at Solid State Devices (SSD’s). As in the past, StorageExpo Holland was great in a fantastic venue, with many large exhibits and great attendance which I heard was over 6,000 people over two days (excluding exhibitor vendors, vars, analysts, press and bloggers) which was several times larger than what was seen in Frankfurt at the SNW event.

Image of Ilja Coolen (twitter @@iCoolen) who was session host for SSD presentation in UtrechtImage of StorageExpo Holland exhibit show floor in Utrecht

Both presentations were very well attended and included lively interactive discussion during and after the sessions. The theme of my second talk was SSD, the question is not if, rather what to use where, how and when which brings us up to this post.

For those who have been around or using SSD for more than a decade outside of cell phones, camera, SD cards or USB thumb drives, that probably means DRAM based with some form of data persistency mechanisms. More recently mention SSD and that implies nand flash-based, either MLC or eMLC or SLC or perhaps emerging mram or PCM. Some might even think of NVRAM or other forms of SSD including emerging mram or mem-resistors among others, however lets stick to nand flash and dram for now.

image of ssd technology evolution

Often in technology what is old can be new, what is new can be seen as old, if you have seen, experienced or done something before you will have a sense of DejaVu and it might be evolutionary. On the other hand, if you have not seen, heard, experienced, or found a new audience, then it can be  revolutionary or maybe even an industry first ;).

Technology evolves, gets improved on, matures, and can often go in cycles of adoption, deployment, refinement, retirement, and so forth. SSD in general has been an on again, off again type cycle technology for the past several decades except for the past six to seven years. Normally there is an up cycle tied to different events, servers not being fast enough or affordable so use SSD to help address performance woes, or drives and storage systems not being fast enough and so forth.

Btw, for those of you who think that the current SSD focused technology (nand flash) is new, it is in fact 25 years old and still evolving and far from reaching its full potential in terms of customer deployment opportunities.

StorageIO industry trends cloud, virtualization and big data

Nand flash memory has helped keep SSD practical for the past several years riding the similar curve that is keeping hard disk drives (HDD’s) that they were supposed  to replace alive. That is improved reliability, endurance or duty cycle, better annual failure rate (AFR), larger space capacity, lower cost, and enhanced interfaces, packaging, power and functionality.

Where SSD can be used and options

DRAM historically at least for enterprise has been the main option for SSD based solutions using some form of data persistency. Data persistency options include battery backup combined with internal HDD’s to de stage information from the DRAM before power was lost. TMS (recently bought by IBM) was one of the early SSD vendors from the DRAM era that made the transition to flash including being one of the first many years ago to combine DRAM as a cache layer over nand flash as a persistency or de-stage layer. This would be an example of if you were not familiar with TMS back then and their capacities, you might think or believe that some more recent introductions are new and revolutionary, and perhaps they are in their own right or with enough caveats and qualifiers.

An emerging trend, which for some will be Dejavu, is that of using more DRAM in combination with nand flash SSD.

Oracle is one example of a vendor who IMHO rather quietly (intentionally or accidentally) has done this in the 7000 series storage systems as well as ExaData based database storage systems. Rest assured they are not alone and in fact many of the legacy large storage vendors have also piled up large amounts of DRAM based cache in their storage systems. For example EMC with 2TByte of DRAM cache in their VMAX 40K, or similar systems from Fujitsu HP, HDS, IBM and NetApp (including recent acquisition of DRAM based CacheIQ) among others. This has also prompted the question of if SSD has been successful in traditional storage arrays, systems or appliances as some would have you believe not, click here to learn more and cast your vote.

SSD, IO, memory and storage hirearchy

So is the future in the past? Some would say no, some will say yes, however IMHO there are lessons to learn and leverage from the past while looking and moving forward.

Early SSD’s were essentially RAM disks, that is a portion of main random access memory (RAM) or what we now call DRAM set aside as a non persistent (unless battery backed up) cache or device. Using a device driver, applications could use the RAM disk as though it were a normal storage system. Different vendors springing up with drivers for various platforms and disappeared as their need were reduced with faster storage systems, interfaces and ram disks drives supplied by vendors, not to mention SSD devices.

Oh, for you tech trivia types, there was also database machines from the late 80’s such as Briton Lee that would offload your database processing functions to a specialized appliance. Sound like Oracle ExaData  I, II or III to anybody?

Image of Oracle ExaData storage system

Ok, so we have seen this movie before, no worries, old movies or shows get remade, and unless you are nostalgic or cling to the past, sure some of the remakes are duds, however many can be quite good.

Same goes with the remake of some of what we are seeing now. Sure there is a generation that does not know nor care about the past, its full speed ahead and leverage what will get them there.

Thus we are seeing in memory databases again, some of you may remember the original series (pick your generation, platform, tool and technology) with each variation getting better. With 64 bit processor, 128 bit and beyond file system and addressing, not to mention ability for more DRAM to be accessed directly, or via memory address extension, combined with memory data footprint reduction or compression, there is more space to put things (e.g. no such thing as a data or information recession).

Lets also keep in mind that the best IO is the IO that you do not have to do, and that SSD which is an extension of the memory map plays by the same rules of real estate. That is location matters.

Thus, here we go again for some of you (DejaVu), while for others get ready for a new and exciting ride (new and revolutionary). We are back to the future with in memory database which while for a time will take some pressure from underlying IO systems until they once again out grow server memory addressing limits (or IT budgets).

However for those who do not fall into a false sense of security, no fear, as there is no such thing as a data or information recession. Sure as the sun rises in the east and sets in the west, sooner or later those IO’s that were or are being kept in memory will need to be de-staged to persistent storage, either nand flash SSD, HDD or somewhere down the road PCM, mram and more.

StorageIO industry trends cloud, virtualization and big data

There is another trend that with more IOs being cached, reads are moving to where they should resolve which is closer to the application or via higher up in the memory and IO pyramid or hierarchy (shown above).

Thus, we could see a shift over time to more writes and ugly IOs being sent down to the storage systems. Keep in mind that any cache historically provides temporal relieve, question is how long of a temporal relief or until the next new and revolutionary or DejaVu technology shows up.

Ok, go have fun now, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Data Center Infrastructure Management (DCIM) and IRM

StorageIO industry trends cloud, virtualization and big data

There are many business drivers and technology reasons for adopting data center infrastructure management (DCIM) and infrastructure Resource Management (IRM) techniques, tools and best practices. Today’s agile data centers need updated management systems, tools, and best practices that allow organizations to plan, run at a low-cost, and analyze for workflow improvement. After all, there is no such thing as an information recession driving the need to move process and store more data. With budget and other constraints, organizations need to be able to stretch available resources further while reducing costs including for physical space and energy consumption.

The business value proposition of DCIM and IRM includes:

DCIM, Data Center, Cloud and storage management figure

Data Center Infrastructure Management or DCIM also known as IRM has as their names describe a focus around management resources in the data center or information factory. IT resources include physical floor and cabinet space, power and cooling, networks and cabling, physical (and virtual) servers and storage, other hardware and software management tools. For some organizations, DCIM will have a more facilities oriented view focusing on physical floor space, power and cooling. Other organizations will have a converged view crossing hardware, software, facilities along with how those are used to effectively deliver information services in a cost-effective way.

Common to all DCIM and IRM practices are metrics and measurements along with other related information of available resources for gaining situational awareness. Situational awareness enables visibility into what resources exist, how they are configured and being used, by what applications, their performance, availability, capacity and economic effectiveness (PACE) to deliver a given level of service. In other words, DCIM enabled with metrics and measurements that matter allow you to avoid flying blind to make prompt and effective decisions.

DCIM, Data Center and Cloud Metrics Figure

DCIM comprises the following:

  • Facilities, power (primary and standby, distribution), cooling, floor space
  • Resource planning, management, asset and resource tracking
  • Hardware (servers, storage, networking)
  • Software (virtualization, operating systems, applications, tools)
  • People, processes, policies and best practices for management operations
  • Metrics and measurements for analytics and insight (situational awareness)

The evolving DCIM model is around elasticity, multi-tenant, scalability, flexibility, and is metered and service-oriented. Service-oriented, means a combination of being able to rapidly give new services while keeping customer experience and satisfaction in mind. Also part of being focused on the customer is to enable organizations to be competitive with outside service offerings while focusing on being more productive and economic efficient.

DCIM, Data Center and Cloud E2E management figure

While specific technology domain areas or groups may be focused on their respective areas, interdependencies across IT resource areas are a matter of fact for efficient virtual data centers. For example, provisioning a virtual server relies on configuration and security of the virtual environment, physical servers, storage and networks along with associated software and facility related resources.

You can read more about DCIM, ITSM and IRM in this white paper that I did, as well as in my books Cloud and Virtual Data Storage Networking (CRC Press) and The Green and Virtual Data Center (CRC Press).

Ok, nuff said, for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Mr. Backup (Curtis Preston) goes back to Ceph School

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, I am at the Ceph day in Amsterdam Holland event at the Tobacco Theatre hosted by on42.com and inktank.com.

Ceph Day Amsterdam 2012

My guest for this episode is Curtis (Mr. Backup) Preston (@wcpreston) of Backup School and Backup Central fame where we discuss what is Ceph and object storage, cloud storage, file systems, backup and data protection along with dinner we had at an Indonesian restaurant .

Dinner Restaurant Blauw Utrecht Netherlands
Mr Backup getting ready to compress and dedupe dinner

The dinner we are referring to was at Restaurant Blauw in Utrecht Holland (click here) where Curtis and me were joined by Hans De Leenher @hansdeleenher of Veeam (thanks again for the dinner, that was a disclosure btw ;) ).

Note that this is a special episode in that while I’m recording the pod cast, Curtis is recording a video of our discussion for his truebit.tv site that you can view here.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Curtis and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Also check out the companion to this pod cast where I meet up with Ceph Creator Sage Weil while at Ceph Day.

Enjoy this episode Mr. Backup (Curtis Preston) goes back to Ceph School.

 

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Ben Woo on Big Data Buzzword Bingo and Business Benefits

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, In this episode, Im joined in Frankfurt Germany by Ben Woo (@benwoony) of Neuralytix.com. Our conversation includes cloud; big data and how buzzword bingo technology focused discussions can result in missed business benefits for both vendors and customers. We also reminisce about MTI where we worked together along with protecting home storage.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Ben and myself.

StorageIO podcast

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode with Ben Woo talking big data and business benefits vs. buzzword bingo.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Ceph Day in Amsterdam and Sage Weil on Object Storage

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, I am at the Ceph day in Amsterdam Holland event at the Tobacco Theatre. My guest for this episode is Ceph (Cephalanthera) creator Sage Weil who is also the founder of inktank.com that provides services and support for the open source based Ceph project.

For those not familiar with Ceph, it is an open source distributed object scale out software platform that can be used for deploying cloud and managed services, general purpose storage for research, commercial, scientific, high performance computing (HPC) or high productivity computing (commercial) along with backup or data protection and archiving destinations.

During our conversation Sage presents an overview of what Ceph is (e.g. Ceph for non Dummies), where and how it can be used, some history of the project and how it fits in with or provides an alternative to other solutions. Sage also talks about the business or commercial considerations for open source based projects, importance of community and having good business mentors and partners as well as staying busy with his young family.

If you are a Ceph fan, gain more insight into Sage along with Ceph day sponsors Inktank and 42on. On the other hand, if you new to object storage, open source storage software or cloud storage, listen in to gain perspectives of where technology such as Ceph fits for public, private, hybrid or traditional environments.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Sage and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode Ceph Day in Amsterdam with Sage Weil.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Little data, big data and very big data (VBD) or big BS?

StorageIO industry trends cloud, virtualization and big data

This is an industry trends and perspective piece about big data and little data, industry adoption and customer deployment.

If you are in any way associated with information technology (IT), business, scientific, media and entertainment computing or related areas, you may have heard big data mentioned. Big data has been a popular buzzword bingo topic and term for a couple of years now. Big data is being used to describe new and emerging along with existing types of applications and information processing tools and techniques.

I routinely hear from different people or groups trying to define what is or is not big data and all too often those are based on a particular product, technology, service or application focus. Thus it should be no surprise that those trying to police what is or is not big data will often do so based on what their interest, sphere of influence, knowledge or experience and jobs depend on.

Traveling and big data images

Not long ago while out traveling I ran into a person who told me that big data is new data that did not exist just a few years ago. Turns out this person was involved in geology so I was surprised that somebody in that field was not aware of or working with geophysical, mapping, seismic and other legacy or traditional big data. Turns out this person was basing his statements on what he knew, heard, was told about or on sphere of influence around a particular technology, tool or approach.

Fwiw, if you have not figured out already, like cloud, virtualization and other technology enabling tools and techniques, I tend to take a pragmatic approach vs. becoming latched on to a particular bandwagon (for or against) per say.

Not surprisingly there is confusion and debate about what is or is not big data including if it only applies to new vs. existing and old data. As with any new technology, technique or buzzword bingo topic theme, various parties will try to place what is or is not under the definition to align with their needs, goals and preferences. This is the case with big data where you can routinely find proponents of Hadoop and Map reduce position big data as aligning with the capabilities and usage scenarios of those related technologies for business and other forms of analytics.

SAS software for big data

Not surprisingly the granddaddy of all business analytics, data science and statistic analysis number crunching is the Statistical Analysis Software (SAS) from the SAS Institute. If these types of technology solutions and their peers define what is big data then SAS (not to be confused with Serial Attached SCSI which can be found on the back-end of big data storage solutions) can be considered first generation big data analytics or Big Data 1.0 (BD1 ;) ). That means Hadoop Map Reduce is Big Data 2.0 (BD2 ;) ;) ) if you like, or dislike for that matter.

Funny thing about some fans and proponents or surrogates of BD2 is that they may have heard of BD1 like SAS with a limited understanding of what it is or how it is or can be used. When I worked in IT as a performance and capacity planning analyst focused on servers, storage, network hardware, software and applications I used SAS to crunch various data streams of event, activity and other data from diverse sources. This involved correlating data, running various analytic algorithms on the data to determine response times, availability, usage and other things in support of modeling, forecasting, tuning and trouble shooting. Hmm, sound like first generation big data analytics or Data Center Infrastructure Management (DCIM) and IT Service Management (ITSM) to anybody?

Now to be fair, comparing SAS, SPSS or any number of other BD1 generation tools to Hadoop and Map Reduce or BD2 second generation tools is like comparing apples to oranges, or apples to pears.

Lets move on as there is much more to what is big data than simply focus around SAS or Hadoop.

StorageIO industry trends cloud, virtualization and big data

Another type of big data are the information generated, processed, stored and used by applications that result in large files, data sets or objects. Large file, objects or data sets include low resolution and high-definition photos, videos, audio, security and surveillance, geophysical mapping and seismic exploration among others. Then there are data warehouses where transactional data from databases gets moved to for analysis in systems such as those from Oracle, Teradata, Vertica or FX among others. Some of those other tools even play (or work) in both traditional e.g. BD1 and new or emerging BD2 worlds.

This is where some interesting discussions, debates or disagreements can occur between those who latch onto or want to keep big data associated with being something new and usually focused around their preferred tool or technology. What results from these types of debates or disagreements is a missed opportunity for organizations to realize that they might already be doing or using a form of big data and thus have a familiarity and comfort zone with it.

By having a familiarity or comfort zone vs. seeing big data as something new, different, hype or full of FUD (or BS), an organization can be comfortable with the term big data. Often after taking a step back and looking at big data beyond the hype or fud, the reaction is along the lines of, oh yeah, now we get it, sure, we are already doing something like that so lets take a look at some of the new tools and techniques to see how we can extend what we are doing.

Likewise many organizations are doing big bandwidth already and may not realize it thinking that is only what media and entertainment, government, technical or scientific computing, high performance computing or high productivity computing (HPC) does. I’m assuming that some of the big data and big bandwidth pundits will disagree, however if in your environment you are doing many large backups, archives, content distribution, or copying large amounts of data for different purposes that consume big bandwidth and need big bandwidth solutions.

Yes I know, that’s apples to oranges and perhaps stretching the limits of what is or can be called big bandwidth based on somebody’s definition, taxonomy or preference. Hopefully you get the point that there is diversity across various environments as well as types of data and applications, technologies, tools and techniques.

StorageIO industry trends cloud, virtualization and big data

What about little data then?

I often say that if big data is getting all the marketing dollars to generate industry adoption, then little data is generating all the revenue (and profit or margin) dollars by customer deployment. While tools and technologies related to Hadoop (or Haydoop if you are from HDS) are getting industry adoption attention (e.g. marketing dollars being spent) revenues from customer deployment are growing.

Where big data revenues are strongest for most vendors today are centered around solutions for hosting, storing, managing and protecting big files, big objects. These include scale out NAS solutions for large unstructured data like those from Amplidata, Cray, Dell, Data Direct Networks (DDN), EMC (e.g. Isilon), HP X9000 (IBRIX), IBM SONAS, NetApp, Oracle and Xyratex among others. Then there flexible converged compute storage platforms optimized for analytics and running different software tools such as those from EMC (Greenplum), IBM (Netezza), NetApp (via partnerships) or Oracle among others that can be used for different purposes in addition to supporting Hadoop and Map reduce.

If little data is databases and things not generally lumped into the big data bucket, and if you think or perceive big data only to be Hadoop map reduce based data, then does that mean all the large unstructured non little data is then very big data or VBD?

StorageIO industry trends cloud, virtualization and big data

Of course the virtualization folks might want to if they have not already corner the V for Virtual Big Data. In that case, then instead of Very Big Data, how about very very Big Data (vvBD). How about Ultra-Large Big Data (ULBD), or High-Revenue Big Data (HRBD), granted the HR might cause some to think its unique for Health Records, or Human Resources, both btw leverage different forms of big data regardless of what you see or think big data is.

Does that then mean we should really be calling videos, audio, PACs, seismic, security surveillance video and related data to be VBD? Would this further confuse the market, or the industry or help elevate it to a grander status in terms of size (data file or object capacity, bandwidth, market size and application usage, market revenue and so forth)?

Do we need various industry consortiums, lobbyists or trade groups to go off and create models, taxonomies, standards and dictionaries based on their constituents needs and would they align with those of the customers, after all, there are big dollars flowing around big data industry adoption (marketing).

StorageIO industry trends cloud, virtualization and big data

What does this all mean?

Is Big Data BS?

First let me be clear, big data is not BS, however there is a lot of BS marketing BS by some along with hype and fud adding to the confusion and chaos, perhaps even missed opportunities. Keep in mind that in chaos and confusion there can be opportunity for some.

IMHO big data is real.

There are different variations, use cases and types of products, technologies and services that fall under the big data umbrella. That does not mean everything can or should fall under the big data umbrella as there is also little data.

What this all means is that there are different types of applications for various industries that have big and little data, virtual and very big data from videos, photos, images, audio, documents and more.

Big data is a big buzzword bingo term these days with vendor marketing big dollars being applied so no surprise the buzz, hype, fud and more.

Ok, nuff said, for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Industry trends and perspectives: SNW 2012 Rapping with Dave Raffo of SearchStorage

Now also available via

This is the seventh (here is the first, second, third, fourth, fifth and sixth) in a series of StorageIO industry trends and perspective audio blog and pod cast discussions from Storage Networking World (SNW) Fall 2012 in Santa Clara California.

StorageIO industry trends cloud, virtualization and big data

Given how at conference conversations tend to occur in the hallways, lobbies and bar areas of venues, what better place to have candid conversations with people from throughout the industry, some you know, some you will get to know better.

In this episode, my co-host Bruce Rave aka Bruce Ravid of Ravid and Associates (twitter @brucerave) meets up Sr. News Director Dave Raffo of TechTarget and Search Storage in the SNW trade show expo hall. Our conversation covers past and present SNWs along with other industry conferences, industry trends, software defined buzzwords, Green Bay Packers smack and more.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Dave, Bruce and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts from SNW and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy listening to Rapping with Dave Raffo of Search Storage from the Fall SNW 2012 pod cast.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Industry trends and perspectives: Meeting up with Marty Foltyn of SNIA

This is the fourth (here is the first, second and third) in a series of StorageIO industry trends and perspective audio blog and pod cast discussions from Storage Networking World (SNW) Fall 2012 in Santa Clara California.

StorageIO industry trends cloud, virtualization and big data

Given how at conference conversations tend to occur in the hallways, lobbies and bar areas of venues, what better place to have candid conversations with people from throughout the industry, some you know, some you will get to know better.

In this episode, while I’m on a plane flying home above the clouds, my co-host Bruce Rave aka Bruce Ravid of Ravid and Associates (twitter @brucerave) meets up with Marty Foltyn (@martyfoltyn) of SNIA Hands On Lab (HOL).

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Marty and Bruce.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts from SNW and other upcoming events.

Enjoy listening to meeting up with Marty Foltyn from the Fall SNW 2012 pod cast.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Industry trends and perspectives: Catching up with Quantum CTE David Chapa

This is the third (here is the first and the second) in a series of StorageIO industry trends and perspective audio blog and pod cast discussions from Storage Networking World (SNW) Fall 2012 in Santa Clara California.

StorageIO industry trends cloud, virtualization and big data

Given how at conference conversations tend to occur in the hallways, lobbies and bar areas of venues, what better place to have candid conversations with people from throughout the industry, some you know, some you will get to know better.

In this episode, I’m joined by my co-host Bruce Rave aka Bruce Ravid of Ravid & Associates (twitter @brucerave) as we catch up and visit with David Chapa (@davidchapa) Chief Technology Evangelist (CTE) of Quantum Corporation (@quantumcorp) in the Santa Clara Hyatt (event venue) lobby bar area. Disclosure note, Quantum has in the past been a client of StorageIO.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with David and Bruce. Our conversations covers SNW, evolution and transformation of Quantum, global travels in and around the clouds, big data myths and realities, monetizing and transforming data into information, using big data to drive diapers and beer sales, people and data living longer as well as getting larger, managing your diet and data footprint, rethinking and modernizing data protection among other topics.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts from SNW and other upcoming events.

Enjoy listening to catching up with David Chapa from the Fall SNW 2012 pod cast.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Does Dell have a cloudy cloud strategy story (Part II)?

StorageIO industry trends cloud, virtualization and big data

This is the second of two posts (here is the first post) that are part of ongoing industry trends and perspectives cloud conversations series that looks at Dell and their cloud strategy story.

So what does the first post have to do with Dell having a cloudy cloud strategy story?

Simple, there have been some rather low-key, almost quiet or muddled announcements (also here, here and here) about Dell and Nirvanix collaborating around public cloud storage. Keep in mind that Nirvanix and IBM not too long ago also announced a partnership that some jumped to the conclusion that big blue was about to buy the startup vendor, even though IBM already has other cloud and storage as a service, or backup as a service and DR as a service offerings, what the heck, the more the merrier for big blue?

Dell image

What about Dell and their partnership with Nirvanix, (more on that in the first post) did somebody jump the gun, or jump the shark?

Is Dell trying to walk the tightrope between being a supplier to major cloud providers while carefully moving into the cloud services market themselves, or are they simply addressing point customer situation or opportunities, at least for the time being?

Alternatively, is this nothing more than Dell establishing another partnership with a technology partner who also happens to be in the services business, similar to what Dell is doing with OpenStack and others?

OpenStack image for cloud and virtual data storage networking

IMHO Dell has some of the pieces and partnerships and could be a strong contender in the SMB and SME private cloud space, along with VDI and related areas with their Citrix, Microsoft and VMware partnerships. This is also also leveraging their servers and, storage, software, networking and other solutions to supply service providers.

The rest comes down to what markets or areas of focus does Dell want to target, that would in turn dictate how to extend what they already have or what they need to go out and get or partner around.

Dont be scared of clouds, learn and gain confidence with cloud and virtual data storage networking

What say you, what’s your take on Dells cloud strategy story and portfolio?

Ok, nuff said (for now).

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Does Dell have a cloudy cloud strategy story (Part I)?

StorageIO industry trends cloud, virtualization and big data

This is first of a two-part post (click here for second post) that is part of ongoing industry trends and perspective cloud conversations series that looks at Dell and their cloud strategy story. For background, some previous Dell posts are found here, here, here and here. Here is a link that has video of the live Dell Storage Customer Advisory (CAP) panel that Dell asked me to moderate back in June that touches on some related themes and topics. Btw, fwiw and for disclosure Dell AppAssure is a site advertiser on storageioblog.com ;).

Dell image

Depending on your view of what is or is not a cloud service, product or solution, naturally you will then have various opinions of where Dell is at with their cloud strategy and story.

If you consider object based storage to be part of or a component of private clouds or at least for medical, healthcare and related focus, then Dell is already there with their DX object storage solutions (Caringo based).

From a scale out, clustered or grid file system, Dell bought Exanet in a post holiday shopping sale a few years back and has invested in its development having renamed it Fluid File System and initially available as the FS7000 series (EqualLogic) and more recently expanded systems such as the FS8600 (Compellent based), EqualLogic and NX3500 (MD3000 based).

Dont be scared of clouds, learn and gain confidence with cloud and virtual data storage networking

If you view clouds as being part of services provided including via hosting or similar, Dell is already there via their Perot systems acquisitions.

If you view cloud as being part of VDI, or VDI being part of cloud, Dell is there with their tools including various acquisitions and solution bundles.

On the other hand if you view clouds as reference architectures across VMware vSphere, Microsoft Hyper-V and Citrix Xen among others, guess what, Dell is also there with their VIS.

Or, if you view private clouds as being a bundled solution (server, storage, hardware, software) such as EMC vBlock or NetApp FlexPod, then Dell vStart (not to be  confused as being a service) is on the list with other infrastructure stack solutions.

OpenStack image for cloud and virtual data storage networking

How about being a technology supplier to what you may consider as being true cloud providers or enables including those who use OpenStack or other APIs and cloud tools, guess what, Dell is also there including at Rackspace (via public web info).

So the above all comes back to that Dell like many vendors who offer services, solutions and related items for data and information infrastructures have diverse offerings including servers, storage, networking, hardware, software and support. Dell like others similar to them has to find a balance between providing services that compete with their customers, as well as supplier such as to Rackspace. In this case Dell is no different from EMC who happened to move their Mozy backup service off to their VMware subsidiary and has managed to help define where VCE (and here) and ATMOS fit as products while being services capable. IBM has figured this out having a mix of old school services such as SmartCloud Services (or here), IBM Global Services and BCRS (business continuity recovery services), not to mention newer backup and storage cloud services, products and solutions they have acquired, or OEM or have reseller agreements with.

StorageIO industry trends cloud, virtualization and big data

HP has expanded their traditional focused EDS as well as other HP services along with products being joined by their Amazon like Cloud Services including compute, storage and content distribution network (CDN) capabilities. NetApp is taking the partnering route along with Cisco staying focused for at least now on being a partner supplier. Oracle, well Oracle is Oracle and they have a mix of products and services. In fact some might say Oracle is late to the cloud game however they have been in the game since the late 90s when they came out with Oracle online, granted the cloud purist will call that application service provider (e.g. ASP) vs. today’s applications as a service (AaaS) models.

Continue with the second post here, ok, nuff said (for now).

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

StorageIO going Dutch and Deutsch fall 2012

StorageIO industry trends cloud, virtualization and big data

Following a busy spring and summer schedule, the fall 2012 StorageIO out and about activities are underway including events on both the European and North American continents.

StorageIO events, object storage, ssd cloud, virtualization and big data

In addition to in person events, there are also some virtual activities including live and recorded video and audio sessions, as well as webcast on the fall schedule with more in the works.

Some of the fall events include SNW (past SNW posts here, here, and here) in Santa Clara, as well as SNW Europe and Power the Cloud event (Frankfurt Deutschland aka Germany) October 30 and 31st where I will be doing some meetings and briefing, along with attending sessions and the expo activities.

StorageIO modernize data protection with clouds, for virtualization and big data

On November 1st its off to Storage Expo Holland in Utrecht (here and here) where I will be presenting two sessions. One is on SSD industry trends and tips on deployment with a theme of not if, rather when, where, why and with what to use SSD. In addition I will be doing a general industry trends and perspective session on gaining confidence with clouds, virtualization, data and storage networking including object storage and backup (e.g. data protection modernization).

Storage IO travel clouds and virtualizationStorage IO travel clouds and virtualization
European travel tools and technologies

In addition to the above activities, following successful past events in Nijkerk Holland including the most recent May 2012 sessions, a new seminar has been announced focused on backup, restore, BC, DR and archiving hosted by Brouwer Consultancy on November 5th and 6th 2012. These workshop format seminars are very interactive providing independent perspectives on technology, tools, trends and what to do to address various challenges including more informed and effective IT decision-making.

Greg in action Nijkerk Storage Seminar

In addition to the new seminar that you can learn more about here, two other sessions will also be offered in Holland. These include a backup, restore, BC, DR and archiving. The other session is a backup, restore, BC, DR and archiving covering storage and networking industry trends covering clouds, virtualization and other broad topics.

Storage IO travel clouds and virtualizationStorage IO travel clouds and virtualization
Examples of Dutch refreshments

Learn more about the dutch seminars including how to register here.

Watch for more events, seminars, live video, webinars and virtual trade shows by visiting the StorageIO events page.

StorageIO events, object storage, ssd cloud, virtualization and big data

Drop me a note if you would like to schedule or arrange for a meeting, webinar, seminar or other activity at an event near you. If you planning to be in or near Holland early November, and interested in scheduling a meeting or session, send me a note or contact Brouwer Consultancy (here) to make arrangements.

Time to get ready for these and other events, ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved