Software Defined, Bulk, Cloud, Scale Out, Object Storage Fundamentals

Cloud, Bulk, Scale-Out, Object Storage Fundamentals

data infrastructure sddc object storage fundamentals

Updated 1/21/2018

Welcome to the Cloud, Big Data, Software Defined, scale-out, Bulk and Object Storage Fundamentals page.

This page contains various resources, tips, essential topics pertaining to Software Defined, scale-out, Cloud, Bulk and Object Storage Fundamentals. Other resources pertaining to Software Defined, scale-out, Cloud, Bulk and Object Storage include:

  • www.objectstoragecenter.com
  • Software Defined Data Infrastructure Essentials book (CRC Press)
  • Cloud, Software Defined, Scale-Out, Object Storage News Trends
  • There are various types of cloud, bulk and object storage including public services such as Amazon Web Services (AWS) Simple Storage Service (S3), Google, Microsoft Microsoft Azure, IBM Softlayer, Rackspace among many others. There are also solutions for hybrid and private deployment from Cisco, Cloudian, Fujifilm, DDN, Dell EMC, Fujitsu, HDS, HPE, IBM, NetApp, Noobaa, OpenStack, Quantum, Rackspace, Scality, Seagate, Spectra, Storpool, Suse, Swift and WD among others.

    Cloud products and services among others, along with associated data infrastructures including object storage, file systems, repositories and access methods are at the center of bulk, big data, big bandwidth and little data initiatives on a public, private, hybrid and community basis. After all, not everything is the same in cloud, virtual and traditional data centers or information factories from active data to in-active deep digital archiving.

    Cloud Object Storage Fundamentals Access and Architectures

    There are many facets to object storage including technology implementation, products, services, access and architectures for various applications and use scenarios.

    • Project or Account – Top of the hierarchy that can represent the owner or billing information for a service that where buckets are also attached.
    • Region – Location where data is stored that can include one or more data centers also known as Availability Zones.
    • AWS S3 Cross region replication
      Moving and Replicating Buckets/Containers, Subfolders and Objects

    • Availability Zone (AZ) or data center or server that implement durability and accessibility for availability within a region.
    • AWS Regions and Availability Zones AZs
      Example of Regions and Availability Zones (AZs)

    • Bucket or Container – Where objects or sub-folders containing objects are attached and accessed.
    • Object storage fundamentals sddc and cloud software defined

    • Sub-folder – While object storage can be located in a flat namespace for commonality and organization some solutions and service support the notion of sub-folder that resemble traditional directory hierarchy.
    • Object – Byte (or bit) stream that can be as small as one byte to as large as several Tbytes (some solutions and services support up to 5TByte sized objects). The object contains whatever data in any organization along with metadata. Different solutions and services support from a couple hundred KBytes of meta-data to Mbytes worth of meta-data. Regarding what can be stored in an object, anything from files, videos, images, virtual disks (VMDKs, VHDX), ZIP or tar files, backup and archive save sets, executable images or ISO’s, anything you want.
    • End-point – Where or what your software, application or tool and utilities along with gateways attach to for accessing buckets and objects.
    • object storage fundamentals, sddc and cloud storage example

      A common theme for object storage is flexibility, along with scaling (performance, availability, capacity, economics) along with extensibility without compromise or complexity. From those basics, there are many themes and variations from how data is protected (RAID or no RAID, hardware or software), deployed as a service or as tin wrapped software (an appliance), optimized for archiving or video serving or other applications.

      Many facets of cloud and object storage access

      One aspect of object and cloud storage is accessing or using object methods including application programming interfaces (API’s) vs. traditional block (LUN) or NAS (file) based approaches. Keep in mind that many object storage systems, software, and services support NAS file-based access including NFS, CIFS, HDFS  among others for compatibility and ease of use.

      Likewise various API’s can be found across different object solutions, software or services including Amazon Web Services (AWS) Simple Storage Service (S3) HTTP REST based, among others. Other API’s will vary by specific vendor or product however can include IOS (e.g. Apple iPhone and iPad), WebDav, FTP, JSON, XML, XAM, CDMI, SOAP, and DICOM among others. Another aspect of object and cloud storage are expanded  and dynamic metadata.

      While traditional file systems and NAS have simple or fixed metadata, object and cloud storage systems, services and solutions along with some scale-out file systems have ability to support user defined metadata. Specific systems, solutions, software, and services will vary on the amount of metadata that could range on the low-end from 100s of KBytes  to tens or more Mbytes.

      cloud object storage

      Where to learn more

      The following resources provide additional information about big data, bulk, software defined, cloud and object storage.

      Click here to view software defined, bulk, cloud and object storage trend news.


      StorageIO Founder Greg Schulz: File Services on Object Storage with HyperFile

      Via InfoStor: Object Storage Is In Your Future
      Via FujiFilm IT Summit: Software Defined Data Infrastructures (SDDI) and Hybrid Clouds
      Via StorageIOblog: AWS EFS Elastic File System (Cloud NAS) First Preview Look
      Via InfoStor: Cloud Storage Concerns, Considerations and Trends
      Via InfoStor: Object Storage Is In Your Future
      Via Server StorageIO: April 2015 Newsletter Focus on Cloud and Object storage
      Via StorageIOblog: AWS S3 Cross Region Replication storage enhancements
      Cloud conversations: AWS EBS, Glacier and S3 overview
      AWS (Amazon) storage gateway, first, second and third impressions
      Cloud and Virtual Data Storage Networking (CRC Book)
      Via ChannelPartnersOnline: Selling Software-Defined Storage: Not All File Systems Are the Same
      Via ITProPortal: IBM kills off its first cloud storage platform
      Via ITBusinessEdge: Time to Rein in Cloud Storage
      Via SerchCloudStorge: Ctera Networks’ file-sharing services gain intelligent cache
      Via StorageIOblog: Who Will Be At Top Of Storage World Next Decade?

      Videos and podcasts at storageio.tv also available via Applie iTunes.

      Human Face of Big Data
      Human Face of Big Data (Book review)

      Seven Databases in Seven weeks
      Seven Databases in Seven Weeks (Book review)

      Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

      Software Defined Data Infrastructure Essentials Book SDDC

      Wrap up and summary

      Object and cloud storage are in your future, the questions are when, where, with what and how among others.

      Watch for more content and links to be added here soon to this object storage center page including posts, presentations, pod casts, polls, perspectives along with services and product solutions profiles.

      Ok, nuff said, for now.

      Gs

      Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

      All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    Which Enterprise HDD for Content Applications Different File Size Impact

    Which HDD for Content Applications Different File Size Impact

    Different File Size Impact server storage I/O trends

    Updated 1/23/2018

    Which enterprise HDD to use with a content server platform different file size impact.

    Insight for effective server storage I/O decision making
    Server StorageIO Lab Review

    Which enterprise HDD to use for content servers

    This is the fifth in a multi-part series (read part four here) based on a white paper hands-on lab report I did compliments of Servers Direct and Seagate that you can read in PDF form here. The focus is looking at the Servers Direct (www.serversdirect.com) converged Content Solution platforms with Seagate Enterprise Hard Disk Drive (HDD’s). In this post the focus looks at large and small file I/O processing.

    File Performance Activity

    Tip, Content solutions use files in various ways. Use the following to gain perspective how various HDD’s handle workloads similar to your specific needs.

    Two separate file processing workloads were run (12), one with a relative small number of large files, and another with a large number of small files. For the large file processing (table-3), 5 GByte sized files were created and then accessed via 128 Kbyte (128KB) sized I/O over a 10 hour period with 90% read using 64 threads (workers). Large file workload simulates what might be seen with higher definition video, image or other content streaming.

    (Note 12) File processing workloads were run using Vdbench 5.04 and file anchors with sample script configuration below. Instead of vdbench you could also use other tools such as sysbench or fio among others.

    VdbenchFSBigTest.txt
    # Sample script for big files testing
    fsd=fsd1,anchor=H:,depth=1,width=5,files=20,size=5G
    fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=128k,fileselect=random,fileio=random,threads=64
    rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

    vdbench -f VdbenchFSBigTest.txt -m 16 -o Results_FSbig_H_060615

    VdbenchFSSmallTest.txt
    # Sample script for big files testing
    fsd=fsd1,anchor=H:,depth=1,width=64,files=25600,size=16k
    fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=1k,fileselect=random,fileio=random,threads=64
    rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

    vdbench -f VdbenchFSSmallTest.txt -m 16 -o Results_FSsmall_H_060615

    The 10% writes are intended to reflect some update activity for new content or other changes to content. Note that 128KB per second translates to roughly 1 Gbps streaming content such as higher definition video. However 4K video (not optimized) would require a higher speed as well as resulting in larger file sizes. Table-3 shows the performance during the large file access period showing average read /write rates and response time, bandwidth (MBps), average open and close rates with response time.

    Avg. File Read Rate

    Avg. Read Resp. Time
    Sec.

    Avg. File Write Rate

    Avg. Write Resp. Time
    Sec.

    Avg.
    CPU %
    Total

    Avg. CPU % System

    Avg. MBps
    Read

    Avg. MBps
    Write

    ENT 15K R1

    580.7

    107.9

    64.5

    19.7

    52.2

    35.5

    72.6

    8.1

    ENT 10K R1

    455.4

    135.5

    50.6

    44.6

    34.0

    22.7

    56.9

    6.3

    ENT CAP R1

    285.5

    221.9

    31.8

    19.0

    43.9

    28.3

    37.7

    4.0

    ENT 10K R10

    690.9

    87.21

    76.8

    48.6

    35.0

    21.8

    86.4

    9.6

    Table-3 Performance summary for large file access operations (90% read)

    Table-3 shows that for two-drive RAID 1, the Enterprise 15K are the fastest performance, however using a RAID 10 with four 10K HDD’s with enhanced cache features provide a good price, performance and space capacity option. Software RAID was used in this workload test.

    Figure-4 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

    large file processing
    Figure-4 Large file processing 90% read, 10% write rate and response time

    In figure-4 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K HDD’s).

    Results in figure-4 above and table-4 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-4 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

    Table-4 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

    Avg.
    File Reads Per Sec. (RPS)

    Single Drive Cost per RPS

    Multi-Drive Cost per RPS

    Single Drive Cost / Per GB Capacity

    Cost / Per GB Usable (Protected) Cap.

    Drive Cost (Multiple Drives)

    Protection Overhead (Space Capacity for RAID)

    Cost per usable GB per RPS

    Avg. File Read Resp. (Sec.)

    ENT 15K R1

    580.7

    $1.02

    $2.05

    $ 0.99

    $0.99

    $1,190

    100%

    $2.1

    107.9

    ENT 10K R1

    455.5

    1.92

    3.84

    0.49

    0.49

    1,750

    100%

    3.8

    135.5

    ENT CAP R1

    285.5

    1.40

    2.80

    0.20

    0.20

    798

    100%

    2.8

    271.9

    ENT 10K R10

    690.9

    1.27

    5.07

    0.49

    0.97

    3,500

    100%

    5.1

    87.2

    Table-4 Performance, capacity and cost analysis for big file processing

    Small File Size Processing

    To simulate a general file sharing environment, or content streaming with many smaller objects, 1,638,464 16KB sized files were created on each device being tested (table-5). These files were spread across 64 directories (25,600 files each) and accessed via 64 threads (workers) doing 90% reads with a 1KB I/O size over a ten hour time frame. Like the large file test, and database activity, all workloads were run at the same time (e.g. test devices were concurrently busy).

    Avg. File Read Rate

    Avg. Read Resp. Time
    Sec.

    Avg. File Write Rate

    Avg. Write Resp. Time
    Sec.

    Avg.
    CPU %
    Total

    Avg. CPU % System

    Avg. MBps
    Read

    Avg. MBps
    Write

    ENT 15K R1

    3,415.7

    1.5

    379.4

    132.2

    24.9

    19.5

    3.3

    0.4

    ENT 10K R1

    2,203.4

    2.9

    244.7

    172.8

    24.7

    19.3

    2.2

    0.2

    ENT CAP R1

    1,063.1

    12.7

    118.1

    303.3

    24.6

    19.2

    1.1

    0.1

    ENT 10K R10

    4,590.5

    0.7

    509.9

    101.7

    27.7

    22.1

    4.5

    0.5

    Table-5 Performance summary for small sized (16KB) file access operations (90% read)

    Figure-5 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

    small file processing
    Figure-5 Small file processing 90% read, 10% write rate and response time

    In figure-5 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K RPM), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K RPM HDD’s) that has higher performance and capacity along with costs (table-5).

    Results in figure-5 above and table-5 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-6 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

    Table-6 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

    Avg.
    File Reads Per Sec. (RPS)

    Single Drive Cost per RPS

    Multi-Drive Cost per RPS

    Single Drive Cost / Per GB Capacity

    Cost / Per GB Usable (Protected) Cap.

    Drive Cost (Multiple Drives)

    Protection Overhead (Space Capacity for RAID)

    Cost per usable GB per RPS

    Avg. File Read Resp. (Sec.)

    ENT 15K R1

    3,415.7

    $0.17

    $0.35

    $0.99

    $0.99

    $1,190

    100%

    $0.35

    1.51

    ENT 10K R1

    2,203.4

    0.40

    0.79

    0.49

    0.49

    1,750

    100%

    0.79

    2.90

    ENT CAP R1

    1,063.1

    0.38

    0.75

    0.20

    0.20

    798

    100%

    0.75

    12.70

    ENT 10K R10

    4,590.5

    0.19

    0.76

    0.49

    0.97

    3,500

    100%

    0.76

    0.70

    Table-6 Performance, capacity and cost analysis for small file processing

    Looking at the small file processing analysis in table-5 shows that the 15K HDD’s on an apples to apples basis (e.g. same RAID level and number of drives) provide the best performance. However when also factoring in space capacity, performance, different RAID level or other protection schemes along with cost, there are other considerations. On the other hand the Enterprise Capacity 2TB HDD’s have a low cost per capacity, however do not have the performance of other options, assuming your applications need more performance.

    Thus the right HDD for one application may not be the best one for a different scenario as well as multiple metrics as shown in table-5 need to be included in an informed storage decision making process.

    Where To Learn More

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    File processing are common content applications tasks, some being small, others large or mixed as well as reads and writes. Even if your content environment is using object storage, chances are unless it is a new applications or a gateway exists, you may be using NAS or file based access. Thus the importance of if your applications are doing file based processing, either run your own applications or use tools that can simulate as close as possible to what your environment is doing.

    Continue reading part six in this multi-part series here where the focus is around general I/O including 8KB and 128KB sized IOPs along with associated metrics.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    USENIX FAST (File and Storage Technologies) 2014 Conference Proceedings

    Storage I/O trends

    USENIX FAST (File and Storage Technologies) 2014 Conference Proceedings

    In case you missed it, the 12th annual USENIX conference on File and Storage Technologies (FAST) was recently held in Santa Clara, CA.

    USENIX FAST 2014

    Big Data, Little Data, Fast SSD and Erasure Code Data

    If like me you are interested in FAST related technologies, trends, tools and related research, check out the conference PDF proceedings here.

    You can also go here to the USENIX FAST site to view additional information about the sessions along with other download material.

    The PDF format proceedings contain over 320 pages of content including some good white papers and information covering RAID and Erasure code, Big Data and Little Data, Cloud and Virtualization, Flash, DRAM, SSD, Filesystem performance, metrics, measurement and related software along with plenty of file system related material.

    USENIX FAST 2014 Proceedings Index

    USENIX FAST 2014 Proceedings Index part 3

    Heads up though, these are not your usual vendor high-level marketing white papers rather what you would expect from a technical conference such as FAST as you can see in the above index with abstracts.

    So add the 2014 USENIX FAST Proceedings to your reading list.

    Ok, nuff said

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Ceph Day Amsterdam 2012 (Object and cloud storage)

    StorageIO industry trends cloud, virtualization and big data

    Recently while I was in Europe presenting some sessions at conferences and doing some seminars, I was invited by Ed Saipetch (@edsai) of Inktank.com to attend the first Ceph Day in Amsterdam.

    Ceph day image

    As luck or fate would turn out, I was in Nijkerk which is about an hour train ride from Amsterdam central station plus a free day in my schedule. After a morning train ride and nice walk from Amsterdam Central I arrived at the Tobacco Theatre (a former tobacco trading venue) where Ceph Day was underway, and in time for lunch of Krokettens sandwich.

    Attendees at Ceph Day

    Lets take a quick step back and address for those not familiar what is Ceph (Cephalanthera) and why it was worth spending a day to attend this event. Ceph is an open source distributed object scale out (e.g. cluster or grid) software platform running on industry standard hardware.

    Dell server supporting ceph demoSketch of ceph demo configuration

    Ceph is used for deploying object storage, cloud storage and managed services, general purpose storage for research, commercial, scientific, high performance computing (HPC) or high productivity computing (commercial) along with backup or data protection and archiving destinations. Other software similar in functionality or capabilities to Ceph include OpenStack Swift, Basho Riak CS, Cleversafe, Scality and Caringo among others. There are also the tin wrapped software (e.g. appliances or pre-packaged) solutions such as Dell DX (Caringo), DataDirect Networks (DDN) WOS, EMC ATMOS and Centera, Amplidata and HDS HCP among others. From a service standpoint, these solutions can be used to build services similar Amazon S3 and Glacier, Rackspace Cloud files and Cloud Block, DreamHost DreamObject and HP Cloud storage among others.

    Ceph cloud and object storage architecture image

    At the heart of Ceph is RADOS a distributed object store that consists of peer nodes functioning as object storage devices (OSD). Data can be accessed via REST (Amazon S3 like) APIs, Libraries, CEPHFS and gateway with information being spread across nodes and OSDs using a CRUSH based algorithm (note Sage Weil is one of the authors of CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data). Ceph is scalable in terms of performance, availability and capacity by adding extra nodes with hard disk drives (HDD) or solid state devices (SSDs). One of the presentations pertained to DreamHost that was an early adopter of Ceph to make their DreamObjects (cloud storage) offering.

    Ceph cloud and object storage deployment image

    In addition to storage nodes, there are also an odd number of monitor nodes to coordinate and manage the Ceph cluster along with optional gateways for file access. In the above figure (via DreamHost), load balancers sit in front of gateways that interact with the storage nodes. The storage node in this example is a physical server with 12 x 3TB HDDs each configured as a OSD.

    Ceph dreamhost dreamobject cloud and object storage configuration image

    In the DreamHost example above, there are 90 storage nodes plus 3 management nodes, the total raw storage capacity (no RAID) is about 3PB (12 x 3TB = 36TB x 90 = 3.24PB). Instead of using RAID or mirroring, each objects data is replicated or copied to three (e.g. N=3) different OSDs (on separate nodes), where N is adjustable for a given level of data protection, for a usable storage capacity of about 1PB.

    Note that for more usable capacity and lower availability, N could be set lower, or a larger value of N would give more durability or data protection at higher storage capacity overhead cost. In addition to using JBOD configurations with replication, Ceph can also be configured with a combination of RAID and replication providing more flexibility for larger environments to balance performance, availability, capacity and economics.

    Ceph dreamhost and dreamobject cloud and object storage deployment image

    One of the benefits of Ceph is the flexibility to configure it how you want or need for different applications. This can be in a cost-effective hardware light configuration using JBOD or internal HDDs in small form factor generally available servers, or high density servers and storage enclosures with optional RAID adapters along with SSD. This flexibility is different from some cloud and object storage systems or software tools which take a stance of not using or avoiding RAID vs. providing options and flexibility to configure and use the technology how you see fit.

    Here are some links to presentations from Ceph Day:
    Introduction and Welcome by Wido den Hollander
    Ceph: A Unified Distributed Storage System by Sage Weil
    Ceph in the Cloud by Wido den Hollander
    DreamObjects: Cloud Object Storage with Ceph by Ross Turk
    Cluster Design and Deployment by Greg Farnum
    Notes on Librados by Sage Weil

    Presentations during ceph day

    While at Ceph day, I was able to spend a few minutes with Sage Weil Ceph creator and founder of inktank.com to record a pod cast (listen here) about what Ceph is, where and when to use it, along with other related topics. Also while at the event I had a chance to sit down with Curtis (aka Mr. Backup) Preston where we did a simulcast video and pod cast. The simulcast involved Curtis recording this video with me as a guest discussing Ceph, cloud and object storage, backup, data protection and related themes while I recorded this pod cast.

    One of the interesting things I heard, or actually did not hear while at the Ceph Day event that I tend to hear at related conferences such as SNW is a focus on where and how to use, configure and deploy Ceph along with various configuration options, replication or copy modes as opposed to going off on erasure codes or other tangents. In other words, instead of focusing on the data protection protocol and algorithms, or what is wrong with the competition or other architectures, the Ceph Day focused was removing cloud and object storage objections and enablement.

    Where do you get Ceph? You can get it here, as well as via 42on.com and inktank.com.

    Thanks again to Sage Weil for taking time out of his busy schedule to record a pod cast talking about Ceph, as well 42on.com and inktank for hosting, and the invitation to attend the first Ceph Day in Amsterdam.

    View of downtown Amsterdam on way to train station to return to Nijkerk
    Returning to Amsterdam central station after Ceph Day

    Ok, nuff said.

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

    Mr. Backup (Curtis Preston) goes back to Ceph School

    Now also available via

    This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

    StorageIO industry trends cloud, virtualization and big data

    In this episode, I am at the Ceph day in Amsterdam Holland event at the Tobacco Theatre hosted by on42.com and inktank.com.

    Ceph Day Amsterdam 2012

    My guest for this episode is Curtis (Mr. Backup) Preston (@wcpreston) of Backup School and Backup Central fame where we discuss what is Ceph and object storage, cloud storage, file systems, backup and data protection along with dinner we had at an Indonesian restaurant .

    Dinner Restaurant Blauw Utrecht Netherlands
    Mr Backup getting ready to compress and dedupe dinner

    The dinner we are referring to was at Restaurant Blauw in Utrecht Holland (click here) where Curtis and me were joined by Hans De Leenher @hansdeleenher of Veeam (thanks again for the dinner, that was a disclosure btw ;) ).

    Note that this is a special episode in that while I’m recording the pod cast, Curtis is recording a video of our discussion for his truebit.tv site that you can view here.

    Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Curtis and myself.

    StorageIO podcast

    Also available via

    Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

    Also check out the companion to this pod cast where I meet up with Ceph Creator Sage Weil while at Ceph Day.

    Enjoy this episode Mr. Backup (Curtis Preston) goes back to Ceph School.

     

    Ok, nuff said.

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

    Post Holiday IT Shopping Bargains, Dell Buying Exanet?

    For consumers, the time leading up to the holiday Christmas season is usually busy including door busters as well as black Friday among other specials for purchasing gifts and other items. However savvy shoppers will wait for after Christmas or the holidays altogether perhaps well into the New Year when some good bargains can become available. IT customers are no different with budgets to use up before the end of the year thus a flurry of acquisitions that should become evident soon as we are entering earnings announcement season.

    However there are also bargains for IT organizations looking to take advantage of special vendor promotions trying to stimulate sales, not to mention for IT vendors to do some shopping of their own. Consequently, in addition to the flurry of merger and acquisition (M and A) activity from last summer through the fall, there has been several recent deals, some of which might make Monty Hall blush!

    Some recent acquisition activity include among others:

    • Dell bought Perot systems for $3.9B
    • DotHill bought Cloverleaf
    • Texas Memory Systems (TMS) bought Incipient
    • HP bought IBRIX and 3COM among others
    • LSI bought Onstor
    • VMware bought Zimbra
    • Micron bought Numonyx
    • Exar bought Neterion

    Now the industry is abuzz about Dell, who is perhaps using some of the lose change left over from holiday sales as being in the process of acquiring Israeli clustered storage startup Exanet for about $12M USD. Compared to previous Dell acquisitions including EqualLogic in 2007 for about $1.4B or last years Perot deal in the $3.9B range, $12M is a bargain and would probably not even put a dent in the selling and marketing advertising budget let alone corporate cash coffers which as of their Q3-F10 balance sheet shows about $12.795B in cash.

    Who is Exanet and what is their product solution?
    Exanet is a small Israeli startup providing a clustered, scale out NAS file serving storage solution (Figure 1) that began shipping in 2003. The Exanet solution (ExaStore) can be either software based, or, as a package solution ExaStore software installed on standard x86 servers with external RAID storage arrays combining as a clustered NAS file server.

    Product features include global name space, distributed metadata, expandable file systems, virtual volumes, quotas, snapshots, file migration, replication, and virus scanning, and load balancing, NFS, CIFS and AFP. Exanet scales up to 1 Exabyte of storage capacity along with supporting large files and billions of file per cluster.

    The target market that Exanet pursues is large scale out NAS where performance (either small random or large sequential I/Os) along with capacity are required. Consequently, in the scale out, clustered NAS file serving space, competitors include IPM GPFS (SONAS), HP IBRIX or PolyServe, Sun Lustre and Symantec SFS among others.

    Clustered Storage Model: Source The Green and Virtual Data Center (CRC)
    Figure 1 Generic clustered storage model (Courtesy The Green and Virtual Data Center(CRC)

    For a turnkey solution, Exanet packaged their cluster file system software with various vendors storage combined with 3rd party external Fibre Channel or other storage. This should play well for Dell who can package the Exanet software on its own servers as well as leverage either SAS or Fibre Channel  MD1000/MD3000 external RAID storage among other options (see more below).

    Click here to learn more about clustered storage including clustered NAS, clustered and parallel file systems.

    Dell

    Whats the dell play?

    • Its an opportunity to acquire some intellectual property (IP)
    • Its an opportunity to have IP similar to EMC, HP, IBM, NetApp, Oracle and Symantec among others
    • Its an opportunity to address a market gap or need
    • Its an opportunity to sell more Dell servers, storage and services
    • Its an opportunity time for doing acquisitions (bargain shopping)

    Note: IBM also this past week announced their new bundled scale out clustered NAS file serving solution based on GPFS called SONAS. HP has IBRIX in addition to their previous PolyServe acquisition, Sun has ZFS and Lustre.

    How does Exanet fit into the Dell lineup?

    • Dell sells Microsoft based NAS as NX series
    • Dell has an OEM relationship with EMC
    • Dell was OEMing or reselling IBRIX in the past for certain applications or environments
    • Dell has needed to expand its NAS story to balance its iSCSI centric storage story as well as compliment its multifunction block storage solutions (e.g. MD3000) and server solutions.

    Why Exanet?
    Why Exanet, why not one of the other startups or small NAS or cloud file system vendors including BlueArc, Isilon, Panasas, Parascale, Reldata, OpenE or Zetta among others?

    My take is that probably because those were either not relevant to what Dell is looking for, lack of seamless technology and business fit, technology tied to non Dell hardware, technology maturity, the investors are still expecting a premium valuation, or, some combination of the preceding.

    Additional thoughts on why Exanet
    I think that Dell simply saw an opportunity to acquire some intellectual property (IP) probably including a patent or two. The value of the patents could be in the form of current or future product offerings, perhaps a negotiating tool, or if nothing else as marketing tool. As a marketing tool, Dell via their EqualLogic acquisition among others has been able to demonstrate and generate awareness that they actually own some IP vs. OEM or resell those from others. I also think that this is an opportunity to either fill or supplement a solution offering that IBRIX provided to high performance, bulk storage and scale out file serving needs.

    NAS and file serving supporting unstructured data are a strong growth market for commercial, high performance, specialized or research as well as small business environments. Thus, where EqualLogic plays to the iSCSI block theme, Dell needs to expand their NAS and file serving solutions to provide product diversity to meet various customer applications needs similar to what they do with block based storage. For example, while iSCSI based EqualLogic PS systems get the bulk of the marketing attention, Dell also has a robust business around the PowerVault MD1000/MD3000 (SAS/iSCSI/FC) and Microsoft multi protocol based PowerVault NX series not to mention their EMC CLARiiON based OEM solutions (E.g. Dell AX, Dell/EMC CX).

    Thus, Dell can complement the Microsoft multi protocol (block and NAS file) NX with a packaged (Dell servers and MD (or other affordable block storage) powered with Exanet) solution. While it is possible that Dell will find a way to package Exanet as a NAS gateway in front of the iSCSI based EqualLogic PS systems, which would also make for an expensive scale out NAS solution compared to those from other vendors.

    Thats it for now.

    Lets see how this all plays out.

    Cheers gs

    Greg Schulz – Author The Green and Virtual Data Center (CRC) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    Technorati tags: Dell