Software Defined, Bulk, Cloud, Scale Out, Object Storage Fundamentals

Cloud, Bulk, Scale-Out, Object Storage Fundamentals

Welcome to the Cloud, Big Data, Software Defined, scale-out, Bulk and Object Storage Fundamentals page.

This page contains various resources, tips, essential topics pertaining to Software Defined, scale-out, Cloud, Bulk and Object and blob Storage Fundamentals. Other resources pertaining to Software Defined, scale-out, Cloud, Bulk and Object Storage include:

There are various types of cloud, bulk and object storage including public services such as Amazon Web Services (AWS) Simple Storage Service (S3), Google, Microsoft Microsoft Azure, IBM Softlayer, Rackspace among many others. There are also solutions for hybrid and private deployment from Cisco, Cloudian, Fujifilm, DDN, Dell EMC, Fujitsu, HDS, HPE, IBM, NetApp, Noobaa, OpenStack, Quantum, Rackspace, Scality, Seagate, Spectra, Storpool, Suse, Swift and WD among others.

Cloud products and services among others, along with associated data infrastructures including object storage, file systems, repositories and access methods are at the center of bulk, big data, big bandwidth and little data initiatives on a public, private, hybrid and community basis. After all, not everything is the same in cloud, virtual and traditional data centers or information factories from active data to in-active deep digital archiving.

Cloud Object Storage Fundamentals Access and Architectures

There are many facets to object storage including technology implementation, products, services, access and architectures for various applications and use scenarios.

    • Project or Account – Top of the hierarchy that can represent the owner or billing information for a service that where buckets are also attached.
    • Region – Location where data is stored that can include one or more data centers also known as Availability Zones.

AWS S3 Cross region replication
Moving and Replicating Buckets/Containers, Subfolders and Objects

    • Availability Zone (AZ) or data center or server that implement durability and accessibility for availability within a region.

AWS Regions and Availability Zones AZs
Example of Regions and Availability Zones (AZs)

    • Bucket or Container – Where objects or sub-folders containing objects are attached and accessed.

Object storage fundamentals sddc and cloud software defined

    • Sub-folder – While object storage can be located in a flat namespace for commonality and organization some solutions and service support the notion of sub-folder that resemble traditional directory hierarchy.
    • Object – Byte (or bit) stream that can be as small as one byte to as large as several Tbytes (some solutions and services support up to 5TByte sized objects). The object contains whatever data in any organization along with metadata. Different solutions and services support from a couple hundred KBytes of meta-data to Mbytes worth of meta-data. Regarding what can be stored in an object, anything from files, videos, images, virtual disks (VMDKs, VHDX), ZIP or tar files, backup and archive save sets, executable images or ISO’s, anything you want.
    • End-point – Where or what your software, application or tool and utilities along with gateways attach to for accessing buckets and objects.

 

object storage fundamentals, sddc and cloud storage example

A common theme for object storage is flexibility, along with scaling (performance, availability, capacity, economics) along with extensibility without compromise or complexity. From those basics, there are many themes and variations from how data is protected (RAID or no RAID, hardware or software), deployed as a service or as tin wrapped software (an appliance), optimized for archiving or video serving or other applications.

Many facets of cloud and object storage access

One aspect of object and cloud storage is accessing or using object methods including application programming interfaces (API’s) vs. traditional block (LUN) or NAS (file) based approaches. Keep in mind that many object storage systems, software, and services support NAS file-based access including NFS, CIFS, HDFS  among others for compatibility and ease of use.

Likewise various API’s can be found across different object solutions, software or services including Amazon Web Services (AWS) Simple Storage Service (S3) HTTP REST based, among others. Other API’s will vary by specific vendor or product however can include IOS (e.g. Apple iPhone and iPad), WebDav, FTP, JSON, XML, XAM, CDMI, SOAP, and DICOM among others. Another aspect of object and cloud storage are expanded  and dynamic metadata.

While traditional file systems and NAS have simple or fixed metadata, object and cloud storage systems, services and solutions along with some scale-out file systems have ability to support user defined metadata. Specific systems, solutions, software, and services will vary on the amount of metadata that could range on the low-end from 100s of KBytes  to tens or more Mbytes.

cloud object storage

Where to learn more

The following resources provide additional information about big data, bulk, software defined, cloud and object storage.

Click here to view software defined, bulk, cloud and object storage trend news.


StorageIO Founder Greg Schulz: File Services on Object Storage with HyperFile

Via InfoStor: Object Storage Is In Your Future
Via FujiFilm IT Summit: Software Defined Data Infrastructures (SDDI) and Hybrid Clouds
Via StorageIOblog: AWS EFS Elastic File System (Cloud NAS) First Preview Look
Via InfoStor: Cloud Storage Concerns, Considerations and Trends
Via InfoStor: Object Storage Is In Your Future
Via Server StorageIO: April 2015 Newsletter Focus on Cloud and Object storage
Via StorageIOblog: AWS S3 Cross Region Replication storage enhancements
Cloud conversations: AWS EBS, Glacier and S3 overview
AWS (Amazon) storage gateway, first, second and third impressions
Cloud and Virtual Data Storage Networking (CRC Book)
Via ChannelPartnersOnline: Selling Software-Defined Storage: Not All File Systems Are the Same
Via ITProPortal: IBM kills off its first cloud storage platform
Via ITBusinessEdge: Time to Rein in Cloud Storage
Via SerchCloudStorge: Ctera Networks’ file-sharing services gain intelligent cache
Via StorageIOblog: Who Will Be At Top Of Storage World Next Decade?

Videos and podcasts at storageio.tv also available via Applie iTunes.

Human Face of Big Data
Human Face of Big Data (Book review)

Seven Databases in Seven weeks
Seven Databases in Seven Weeks (Book review)

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

Wrap up and summary

Object and cloud storage are in your future, the questions are when, where, with what and how among others.

Watch for more content and links to be added here soon to this object storage center page including posts, presentations, pod casts, polls, perspectives along with services and product solutions profiles.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Which Enterprise HDD for Content Applications Different File Size Impact

Which HDD for Content Applications Different File Size Impact

Different File Size Impact server storage I/O trends

Updated 1/23/2018

Which enterprise HDD to use with a content server platform different file size impact.

Insight for effective server storage I/O decision making
Server StorageIO Lab Review

Which enterprise HDD to use for content servers

This is the fifth in a multi-part series (read part four here) based on a white paper hands-on lab report I did compliments of Servers Direct and Seagate that you can read in PDF form here. The focus is looking at the Servers Direct (www.serversdirect.com) converged Content Solution platforms with Seagate Enterprise Hard Disk Drive (HDD’s). In this post the focus looks at large and small file I/O processing.

File Performance Activity

Tip, Content solutions use files in various ways. Use the following to gain perspective how various HDD’s handle workloads similar to your specific needs.

Two separate file processing workloads were run (12), one with a relative small number of large files, and another with a large number of small files. For the large file processing (table-3), 5 GByte sized files were created and then accessed via 128 Kbyte (128KB) sized I/O over a 10 hour period with 90% read using 64 threads (workers). Large file workload simulates what might be seen with higher definition video, image or other content streaming.

(Note 12) File processing workloads were run using Vdbench 5.04 and file anchors with sample script configuration below. Instead of vdbench you could also use other tools such as sysbench or fio among others.

VdbenchFSBigTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=5,files=20,size=5G
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=128k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSBigTest.txt -m 16 -o Results_FSbig_H_060615

VdbenchFSSmallTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=64,files=25600,size=16k
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=1k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSSmallTest.txt -m 16 -o Results_FSsmall_H_060615

The 10% writes are intended to reflect some update activity for new content or other changes to content. Note that 128KB per second translates to roughly 1 Gbps streaming content such as higher definition video. However 4K video (not optimized) would require a higher speed as well as resulting in larger file sizes. Table-3 shows the performance during the large file access period showing average read /write rates and response time, bandwidth (MBps), average open and close rates with response time.

Avg. File Read Rate

Avg. Read Resp. Time
Sec.

Avg. File Write Rate

Avg. Write Resp. Time
Sec.

Avg.
CPU %
Total

Avg. CPU % System

Avg. MBps
Read

Avg. MBps
Write

ENT 15K R1

580.7

107.9

64.5

19.7

52.2

35.5

72.6

8.1

ENT 10K R1

455.4

135.5

50.6

44.6

34.0

22.7

56.9

6.3

ENT CAP R1

285.5

221.9

31.8

19.0

43.9

28.3

37.7

4.0

ENT 10K R10

690.9

87.21

76.8

48.6

35.0

21.8

86.4

9.6

Table-3 Performance summary for large file access operations (90% read)

Table-3 shows that for two-drive RAID 1, the Enterprise 15K are the fastest performance, however using a RAID 10 with four 10K HDD’s with enhanced cache features provide a good price, performance and space capacity option. Software RAID was used in this workload test.

Figure-4 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

large file processing
Figure-4 Large file processing 90% read, 10% write rate and response time

In figure-4 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K HDD’s).

Results in figure-4 above and table-4 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-4 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-4 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

Avg.
File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

580.7

$1.02

$2.05

$ 0.99

$0.99

$1,190

100%

$2.1

107.9

ENT 10K R1

455.5

1.92

3.84

0.49

0.49

1,750

100%

3.8

135.5

ENT CAP R1

285.5

1.40

2.80

0.20

0.20

798

100%

2.8

271.9

ENT 10K R10

690.9

1.27

5.07

0.49

0.97

3,500

100%

5.1

87.2

Table-4 Performance, capacity and cost analysis for big file processing

Small File Size Processing

To simulate a general file sharing environment, or content streaming with many smaller objects, 1,638,464 16KB sized files were created on each device being tested (table-5). These files were spread across 64 directories (25,600 files each) and accessed via 64 threads (workers) doing 90% reads with a 1KB I/O size over a ten hour time frame. Like the large file test, and database activity, all workloads were run at the same time (e.g. test devices were concurrently busy).

Avg. File Read Rate

Avg. Read Resp. Time
Sec.

Avg. File Write Rate

Avg. Write Resp. Time
Sec.

Avg.
CPU %
Total

Avg. CPU % System

Avg. MBps
Read

Avg. MBps
Write

ENT 15K R1

3,415.7

1.5

379.4

132.2

24.9

19.5

3.3

0.4

ENT 10K R1

2,203.4

2.9

244.7

172.8

24.7

19.3

2.2

0.2

ENT CAP R1

1,063.1

12.7

118.1

303.3

24.6

19.2

1.1

0.1

ENT 10K R10

4,590.5

0.7

509.9

101.7

27.7

22.1

4.5

0.5

Table-5 Performance summary for small sized (16KB) file access operations (90% read)

Figure-5 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

small file processing
Figure-5 Small file processing 90% read, 10% write rate and response time

In figure-5 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K RPM), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K RPM HDD’s) that has higher performance and capacity along with costs (table-5).

Results in figure-5 above and table-5 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-6 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-6 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

Avg.
File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

3,415.7

$0.17

$0.35

$0.99

$0.99

$1,190

100%

$0.35

1.51

ENT 10K R1

2,203.4

0.40

0.79

0.49

0.49

1,750

100%

0.79

2.90

ENT CAP R1

1,063.1

0.38

0.75

0.20

0.20

798

100%

0.75

12.70

ENT 10K R10

4,590.5

0.19

0.76

0.49

0.97

3,500

100%

0.76

0.70

Table-6 Performance, capacity and cost analysis for small file processing

Looking at the small file processing analysis in table-5 shows that the 15K HDD’s on an apples to apples basis (e.g. same RAID level and number of drives) provide the best performance. However when also factoring in space capacity, performance, different RAID level or other protection schemes along with cost, there are other considerations. On the other hand the Enterprise Capacity 2TB HDD’s have a low cost per capacity, however do not have the performance of other options, assuming your applications need more performance.

Thus the right HDD for one application may not be the best one for a different scenario as well as multiple metrics as shown in table-5 need to be included in an informed storage decision making process.

Where To Learn More

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

File processing are common content applications tasks, some being small, others large or mixed as well as reads and writes. Even if your content environment is using object storage, chances are unless it is a new applications or a gateway exists, you may be using NAS or file based access. Thus the importance of if your applications are doing file based processing, either run your own applications or use tools that can simulate as close as possible to what your environment is doing.

Continue reading part six in this multi-part series here where the focus is around general I/O including 8KB and 128KB sized IOPs along with associated metrics.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Welcome to the Cloud Bulk Object Storage Resources Center

Updated 8/31/19

Cloud Bulk Big Data Software Defined Object Storage Resources

server storage I/O trends Object Storage resources

Welcome to the Cloud, Big Data, Software Defined, Bulk and Object Storage Resources Center Page objectstoragecenter.com.

This object storage resources, along with software defined, cloud, bulk, and scale-out storage page is part of the server StorageIOblog microsite collection of resources. Software-defined, Bulk, Cloud and Object Storage exist to support expanding and diverse application data demands.

Other related resources include:

  • Software Defined, Cloud, Bulk and Object Storage Fundamentals
  • Software Defined Data Infrastructure Essentials book (CRC Press)
  • Cloud, Software Defined, Scale-Out, Object Storage News Trends
  •  Object storage SDDC SDDI
    Via Software Defined Data Infrastructure Essentials (CRC Press 2017)

    Bulk, Cloud, Object Storage Solutions and Services

    There are various types of cloud, bulk, and object storage including public services such as Amazon Web Services (AWS) Simple Storage Service (S3), Backblaze, Google, Microsoft Azure, IBM Softlayer, Rackspace among many others. There are also solutions for hybrid and private deployment from Cisco, Cloudian, CTERA, Cray, DDN, Dell EMC, Elastifile, Fujitsu, Vantera/HDS, HPE, Hedvig, Huawei, IBM, NetApp, Noobaa, OpenIO, OpenStack, Quantum, Rackspace, Rozo, Scality, Spectra, Storpool, StorageCraft, Suse, Swift, Virtuozzo, WekaIO, WD, among many others.

    Bulk Cloud Object storage SDDC SDDI
    Via Software Defined Data Infrastructure Essentials (CRC Press 2017)

    Cloud products and services among others, along with associated data infrastructures including object storage, file systems, repositories and access methods are at the center of bulk, big data, big bandwidth and little data initiatives on a public, private, hybrid and community basis. After all, not everything is the same in cloud, virtual and traditional data centers or information factories from active data to in-active deep digital archiving.

    Object Context Matters

    Before discussing Object Storage lets take a step back and look at some context that can clarify some confusion around the term object. The word object has many different meanings and context, both inside of the IT world as well as outside. Context matters with the term object such as a verb being a thing that can be seen or touched as well as a person or thing of action or feeling directed towards.

    Besides a person, place or physical thing, an object can be a software-defined data structure that describes something. For example, a database record describing somebody’s contact or banking information, or a file descriptor with name, index ID, date and time stamps, permissions and access control lists along with other attributes or metadata. Another example is an object or blob stored in a cloud or object storage system repository, as well as an item in a hypervisor, operating system, container image or other application.

    Besides being a verb, an object can also be a noun such as disapproval or disagreement with something or someone. From an IT context perspective, an object can also refer to a programming method (e.g. object-oriented programming [oop], or Java [among other environments] objects and classes) and systems development in addition to describing entities with data structures.

    In other words, a data structure describes an object that can be a simple variable, constant, complex descriptor of something being processed by a program, as well as a function or unit of work. There are also objects unique or with context to specific environments besides Java or databases, operating systems, hypervisors, file systems, cloud and other things.

    The Need For Bulk, Cloud and Object Storage

    There is no such thing as an information recession with more data being generated, moved, processed, stored, preserved and served, granted there are economic realities. Likewise as a society our dependence on information being available for work or entertainment, from medical healthcare to social media and all points in between continues to increase (check out the Human Face of Big Data).

    In addition, people and data are living longer, as well as getting larger (hence little data, big data and very big data). Cloud products and services along with associated object storage, file systems, repositories and access methods are at the center of big data, big bandwidth and little data initiatives on a public, private, hybrid and community basis. After all, not everything is the same in cloud, virtual and traditional data centers or information factories from active data to in-active deep digital archiving.

    Click here to view (and hear) more content including cloud and object storage fundamentals

    Click here to view software defined, bulk, cloud and object storage trend news

    cloud object storage

    Where to learn more

    The following resources provide additional information about big data, bulk, software defined, cloud and object storage.



    Via InfoStor: Object Storage Is In Your Future
    Via FujiFilm IT Summit: Software Defined Data Infrastructures (SDDI) and Hybrid Clouds
    Via MultiChannel: After ditching cloud business, Verizon inks Virtual Network Services deal with Amazon
    Via MultiChannel: Verizon Digital Media Services now offers integrated Microsoft Azure Storage
    Via StorageIOblog: AWS EFS Elastic File System (Cloud NAS) First Preview Look
    Via InfoStor: Cloud Storage Concerns, Considerations and Trends
    Via InfoStor: Object Storage Is In Your Future
    Via Server StorageIO: April 2015 Newsletter Focus on Cloud and Object storage
    Via StorageIOblog: AWS S3 Cross Region Replication storage enhancements
    Cloud conversations: AWS EBS, Glacier and S3 overview
    AWS (Amazon) storage gateway, first, second and third impressions
    Cloud and Virtual Data Storage Networking (CRC Book)

    View more news, trends and related cloud object storage activity here.

    Videos and podcasts at storageio.tv also available via Applie iTunes.

    Human Face of Big Data
    Human Face of Big Data (Book review)

    Seven Databases in Seven weeks Seven Databases in Seven Weeks (Book review)

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    Object and cloud storage are in your future, the questions are when, where, with what and how among others.

    Watch for more content and links to be added here soon to this object storage center page including posts, presentations, pod casts, polls, perspectives along with services and product solutions profiles.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.