Ceph Day Amsterdam 2012 (Object and cloud storage)

StorageIO industry trends cloud, virtualization and big data

Recently while I was in Europe presenting some sessions at conferences and doing some seminars, I was invited by Ed Saipetch (@edsai) of Inktank.com to attend the first Ceph Day in Amsterdam.

Ceph day image

As luck or fate would turn out, I was in Nijkerk which is about an hour train ride from Amsterdam central station plus a free day in my schedule. After a morning train ride and nice walk from Amsterdam Central I arrived at the Tobacco Theatre (a former tobacco trading venue) where Ceph Day was underway, and in time for lunch of Krokettens sandwich.

Attendees at Ceph Day

Lets take a quick step back and address for those not familiar what is Ceph (Cephalanthera) and why it was worth spending a day to attend this event. Ceph is an open source distributed object scale out (e.g. cluster or grid) software platform running on industry standard hardware.

Dell server supporting ceph demoSketch of ceph demo configuration

Ceph is used for deploying object storage, cloud storage and managed services, general purpose storage for research, commercial, scientific, high performance computing (HPC) or high productivity computing (commercial) along with backup or data protection and archiving destinations. Other software similar in functionality or capabilities to Ceph include OpenStack Swift, Basho Riak CS, Cleversafe, Scality and Caringo among others. There are also the tin wrapped software (e.g. appliances or pre-packaged) solutions such as Dell DX (Caringo), DataDirect Networks (DDN) WOS, EMC ATMOS and Centera, Amplidata and HDS HCP among others. From a service standpoint, these solutions can be used to build services similar Amazon S3 and Glacier, Rackspace Cloud files and Cloud Block, DreamHost DreamObject and HP Cloud storage among others.

Ceph cloud and object storage architecture image

At the heart of Ceph is RADOS a distributed object store that consists of peer nodes functioning as object storage devices (OSD). Data can be accessed via REST (Amazon S3 like) APIs, Libraries, CEPHFS and gateway with information being spread across nodes and OSDs using a CRUSH based algorithm (note Sage Weil is one of the authors of CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data). Ceph is scalable in terms of performance, availability and capacity by adding extra nodes with hard disk drives (HDD) or solid state devices (SSDs). One of the presentations pertained to DreamHost that was an early adopter of Ceph to make their DreamObjects (cloud storage) offering.

Ceph cloud and object storage deployment image

In addition to storage nodes, there are also an odd number of monitor nodes to coordinate and manage the Ceph cluster along with optional gateways for file access. In the above figure (via DreamHost), load balancers sit in front of gateways that interact with the storage nodes. The storage node in this example is a physical server with 12 x 3TB HDDs each configured as a OSD.

Ceph dreamhost dreamobject cloud and object storage configuration image

In the DreamHost example above, there are 90 storage nodes plus 3 management nodes, the total raw storage capacity (no RAID) is about 3PB (12 x 3TB = 36TB x 90 = 3.24PB). Instead of using RAID or mirroring, each objects data is replicated or copied to three (e.g. N=3) different OSDs (on separate nodes), where N is adjustable for a given level of data protection, for a usable storage capacity of about 1PB.

Note that for more usable capacity and lower availability, N could be set lower, or a larger value of N would give more durability or data protection at higher storage capacity overhead cost. In addition to using JBOD configurations with replication, Ceph can also be configured with a combination of RAID and replication providing more flexibility for larger environments to balance performance, availability, capacity and economics.

Ceph dreamhost and dreamobject cloud and object storage deployment image

One of the benefits of Ceph is the flexibility to configure it how you want or need for different applications. This can be in a cost-effective hardware light configuration using JBOD or internal HDDs in small form factor generally available servers, or high density servers and storage enclosures with optional RAID adapters along with SSD. This flexibility is different from some cloud and object storage systems or software tools which take a stance of not using or avoiding RAID vs. providing options and flexibility to configure and use the technology how you see fit.

Here are some links to presentations from Ceph Day:
Introduction and Welcome by Wido den Hollander
Ceph: A Unified Distributed Storage System by Sage Weil
Ceph in the Cloud by Wido den Hollander
DreamObjects: Cloud Object Storage with Ceph by Ross Turk
Cluster Design and Deployment by Greg Farnum
Notes on Librados by Sage Weil

Presentations during ceph day

While at Ceph day, I was able to spend a few minutes with Sage Weil Ceph creator and founder of inktank.com to record a pod cast (listen here) about what Ceph is, where and when to use it, along with other related topics. Also while at the event I had a chance to sit down with Curtis (aka Mr. Backup) Preston where we did a simulcast video and pod cast. The simulcast involved Curtis recording this video with me as a guest discussing Ceph, cloud and object storage, backup, data protection and related themes while I recorded this pod cast.

One of the interesting things I heard, or actually did not hear while at the Ceph Day event that I tend to hear at related conferences such as SNW is a focus on where and how to use, configure and deploy Ceph along with various configuration options, replication or copy modes as opposed to going off on erasure codes or other tangents. In other words, instead of focusing on the data protection protocol and algorithms, or what is wrong with the competition or other architectures, the Ceph Day focused was removing cloud and object storage objections and enablement.

Where do you get Ceph? You can get it here, as well as via 42on.com and inktank.com.

Thanks again to Sage Weil for taking time out of his busy schedule to record a pod cast talking about Ceph, as well 42on.com and inktank for hosting, and the invitation to attend the first Ceph Day in Amsterdam.

View of downtown Amsterdam on way to train station to return to Nijkerk
Returning to Amsterdam central station after Ceph Day

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Seven databases in seven weeks, a book review of NoSQL databases

StorageIO industry trends cloud, virtualization and big data

Seven Databases in Seven Weeks (A Guide to Modern Databases and the NoSQL Movement) is a book written Eric Redmond (@coderoshi) and Jim Wilson (@hexlib), part of The Pragmatic Programmers (@pragprog) series that takes a look at several non SQL based database systems.

Cover image of seven databases in seven weeks book image

Coverage includes PostgreSQL, Riak, Apache HBase, MongoDB, Apache CouchDB, Neo4J and Redis with plenty of code and architecture examples. Also covered include relational vs. key value, columnar and document based systems among others.

The details: Seven Databases in Seven Weeks
Paperback: 352 pages
Publisher: Pragmatic Bookshelf (May 18, 2012)
Language: English
ISBN-10: 1934356921
ISBN-13: 978-1934356920
Product Dimensions: 7.5 x 0.8 x 9 inches

Buzzwords (or keywords) include availability, consistency, performance and related themes. Others include MongoDB, Cassandra, Redis, Neo4J, JSON, CouchDB, Hadoop, HBase, Amazon Dynamo, Map Reduce, Riak (Basho) and Postgres along with data models including relational, key value, columnar, document and graph along with big data, little data, cloud and object storage.

While this book is not a how to tutorial or installation guide, it does give a deep dive into the different databases covered. The benefit is gaining an understanding of what the different databases are good for, strengths, weakness, where and when to use or choose them for various needs.

Look inside seven databases in seven weeks book image
A look inside my copy of Seven Databases in Seven Days

Who should this book includes applications developers, programmers, Cloud, big data and IT/ICT architects, planners and designers along with database, server, virtualization and storage professionals. What I like about the book is that it is a great intro and overview along with sufficient depth to understand what these different solutions can and cannot do, when, where and why to use these tools for different situations in a quick read format and plenty of detail.

Would I recommend buying it: Yes, I bought a copy myself on Amazon.com, get your copy by clicking here.

Ok, nuff said

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

SSD, flash and DRAM, DejaVu or something new?

StorageIO industry trends cloud, virtualization and big data

Recently I was in Europe for a couple of weeks including stops at Storage Networking World (SNW) Europe in Frankfurt, StorageExpo Holland, Ceph Day in Amsterdam (object and cloud storage), and Nijkerk where I delivered two separate 2 day, and a single 1 day seminar.

Image of Frankfurt transtationImage of inside front of ICE train going from Frankfurt to Utrecht

At the recent StorageExpo Holland event in Utrecht, I gave a couple of presentations, one on cloud, virtualization and storage networking trends, the other taking a deeper look at Solid State Devices (SSD’s). As in the past, StorageExpo Holland was great in a fantastic venue, with many large exhibits and great attendance which I heard was over 6,000 people over two days (excluding exhibitor vendors, vars, analysts, press and bloggers) which was several times larger than what was seen in Frankfurt at the SNW event.

Image of Ilja Coolen (twitter @@iCoolen) who was session host for SSD presentation in UtrechtImage of StorageExpo Holland exhibit show floor in Utrecht

Both presentations were very well attended and included lively interactive discussion during and after the sessions. The theme of my second talk was SSD, the question is not if, rather what to use where, how and when which brings us up to this post.

For those who have been around or using SSD for more than a decade outside of cell phones, camera, SD cards or USB thumb drives, that probably means DRAM based with some form of data persistency mechanisms. More recently mention SSD and that implies nand flash-based, either MLC or eMLC or SLC or perhaps emerging mram or PCM. Some might even think of NVRAM or other forms of SSD including emerging mram or mem-resistors among others, however lets stick to nand flash and dram for now.

image of ssd technology evolution

Often in technology what is old can be new, what is new can be seen as old, if you have seen, experienced or done something before you will have a sense of DejaVu and it might be evolutionary. On the other hand, if you have not seen, heard, experienced, or found a new audience, then it can be  revolutionary or maybe even an industry first ;).

Technology evolves, gets improved on, matures, and can often go in cycles of adoption, deployment, refinement, retirement, and so forth. SSD in general has been an on again, off again type cycle technology for the past several decades except for the past six to seven years. Normally there is an up cycle tied to different events, servers not being fast enough or affordable so use SSD to help address performance woes, or drives and storage systems not being fast enough and so forth.

Btw, for those of you who think that the current SSD focused technology (nand flash) is new, it is in fact 25 years old and still evolving and far from reaching its full potential in terms of customer deployment opportunities.

StorageIO industry trends cloud, virtualization and big data

Nand flash memory has helped keep SSD practical for the past several years riding the similar curve that is keeping hard disk drives (HDD’s) that they were supposed  to replace alive. That is improved reliability, endurance or duty cycle, better annual failure rate (AFR), larger space capacity, lower cost, and enhanced interfaces, packaging, power and functionality.

Where SSD can be used and options

DRAM historically at least for enterprise has been the main option for SSD based solutions using some form of data persistency. Data persistency options include battery backup combined with internal HDD’s to de stage information from the DRAM before power was lost. TMS (recently bought by IBM) was one of the early SSD vendors from the DRAM era that made the transition to flash including being one of the first many years ago to combine DRAM as a cache layer over nand flash as a persistency or de-stage layer. This would be an example of if you were not familiar with TMS back then and their capacities, you might think or believe that some more recent introductions are new and revolutionary, and perhaps they are in their own right or with enough caveats and qualifiers.

An emerging trend, which for some will be Dejavu, is that of using more DRAM in combination with nand flash SSD.

Oracle is one example of a vendor who IMHO rather quietly (intentionally or accidentally) has done this in the 7000 series storage systems as well as ExaData based database storage systems. Rest assured they are not alone and in fact many of the legacy large storage vendors have also piled up large amounts of DRAM based cache in their storage systems. For example EMC with 2TByte of DRAM cache in their VMAX 40K, or similar systems from Fujitsu HP, HDS, IBM and NetApp (including recent acquisition of DRAM based CacheIQ) among others. This has also prompted the question of if SSD has been successful in traditional storage arrays, systems or appliances as some would have you believe not, click here to learn more and cast your vote.

SSD, IO, memory and storage hirearchy

So is the future in the past? Some would say no, some will say yes, however IMHO there are lessons to learn and leverage from the past while looking and moving forward.

Early SSD’s were essentially RAM disks, that is a portion of main random access memory (RAM) or what we now call DRAM set aside as a non persistent (unless battery backed up) cache or device. Using a device driver, applications could use the RAM disk as though it were a normal storage system. Different vendors springing up with drivers for various platforms and disappeared as their need were reduced with faster storage systems, interfaces and ram disks drives supplied by vendors, not to mention SSD devices.

Oh, for you tech trivia types, there was also database machines from the late 80’s such as Briton Lee that would offload your database processing functions to a specialized appliance. Sound like Oracle ExaData  I, II or III to anybody?

Image of Oracle ExaData storage system

Ok, so we have seen this movie before, no worries, old movies or shows get remade, and unless you are nostalgic or cling to the past, sure some of the remakes are duds, however many can be quite good.

Same goes with the remake of some of what we are seeing now. Sure there is a generation that does not know nor care about the past, its full speed ahead and leverage what will get them there.

Thus we are seeing in memory databases again, some of you may remember the original series (pick your generation, platform, tool and technology) with each variation getting better. With 64 bit processor, 128 bit and beyond file system and addressing, not to mention ability for more DRAM to be accessed directly, or via memory address extension, combined with memory data footprint reduction or compression, there is more space to put things (e.g. no such thing as a data or information recession).

Lets also keep in mind that the best IO is the IO that you do not have to do, and that SSD which is an extension of the memory map plays by the same rules of real estate. That is location matters.

Thus, here we go again for some of you (DejaVu), while for others get ready for a new and exciting ride (new and revolutionary). We are back to the future with in memory database which while for a time will take some pressure from underlying IO systems until they once again out grow server memory addressing limits (or IT budgets).

However for those who do not fall into a false sense of security, no fear, as there is no such thing as a data or information recession. Sure as the sun rises in the east and sets in the west, sooner or later those IO’s that were or are being kept in memory will need to be de-staged to persistent storage, either nand flash SSD, HDD or somewhere down the road PCM, mram and more.

StorageIO industry trends cloud, virtualization and big data

There is another trend that with more IOs being cached, reads are moving to where they should resolve which is closer to the application or via higher up in the memory and IO pyramid or hierarchy (shown above).

Thus, we could see a shift over time to more writes and ugly IOs being sent down to the storage systems. Keep in mind that any cache historically provides temporal relieve, question is how long of a temporal relief or until the next new and revolutionary or DejaVu technology shows up.

Ok, go have fun now, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Mr. Backup (Curtis Preston) goes back to Ceph School

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, I am at the Ceph day in Amsterdam Holland event at the Tobacco Theatre hosted by on42.com and inktank.com.

Ceph Day Amsterdam 2012

My guest for this episode is Curtis (Mr. Backup) Preston (@wcpreston) of Backup School and Backup Central fame where we discuss what is Ceph and object storage, cloud storage, file systems, backup and data protection along with dinner we had at an Indonesian restaurant .

Dinner Restaurant Blauw Utrecht Netherlands
Mr Backup getting ready to compress and dedupe dinner

The dinner we are referring to was at Restaurant Blauw in Utrecht Holland (click here) where Curtis and me were joined by Hans De Leenher @hansdeleenher of Veeam (thanks again for the dinner, that was a disclosure btw ;) ).

Note that this is a special episode in that while I’m recording the pod cast, Curtis is recording a video of our discussion for his truebit.tv site that you can view here.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Curtis and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Also check out the companion to this pod cast where I meet up with Ceph Creator Sage Weil while at Ceph Day.

Enjoy this episode Mr. Backup (Curtis Preston) goes back to Ceph School.

 

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved