Which HDD for Content Applications Different File Size Impact
Updated 1/23/2018
Which enterprise HDD to use with a content server platform different file size impact.
Insight for effective server storage I/O decision making
Server StorageIO Lab Review
This is the fifth in a multi-part series (read part four here) based on a white paper hands-on lab report I did compliments of Servers Direct and Seagate that you can read in PDF form here. The focus is looking at the Servers Direct (www.serversdirect.com) converged Content Solution platforms with Seagate Enterprise Hard Disk Drive (HDD’s). In this post the focus looks at large and small file I/O processing.
File Performance Activity
Tip, Content solutions use files in various ways. Use the following to gain perspective how various HDD’s handle workloads similar to your specific needs.
Two separate file processing workloads were run (12), one with a relative small number of large files, and another with a large number of small files. For the large file processing (table-3), 5 GByte sized files were created and then accessed via 128 Kbyte (128KB) sized I/O over a 10 hour period with 90% read using 64 threads (workers). Large file workload simulates what might be seen with higher definition video, image or other content streaming.
(Note 12) File processing workloads were run using Vdbench 5.04 and file anchors with sample script configuration below. Instead of vdbench you could also use other tools such as sysbench or fio among others.
VdbenchFSBigTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=5,files=20,size=5G
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=128k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30
vdbench -f VdbenchFSBigTest.txt -m 16 -o Results_FSbig_H_060615
VdbenchFSSmallTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=64,files=25600,size=16k
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=1k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30
vdbench -f VdbenchFSSmallTest.txt -m 16 -o Results_FSsmall_H_060615
The 10% writes are intended to reflect some update activity for new content or other changes to content. Note that 128KB per second translates to roughly 1 Gbps streaming content such as higher definition video. However 4K video (not optimized) would require a higher speed as well as resulting in larger file sizes. Table-3 shows the performance during the large file access period showing average read /write rates and response time, bandwidth (MBps), average open and close rates with response time.
Avg. File Read Rate | Avg. Read Resp. Time | Avg. File Write Rate | Avg. Write Resp. Time | Avg. | Avg. CPU % System | Avg. MBps | Avg. MBps | |
ENT 15K R1 | 580.7 | 107.9 | 64.5 | 19.7 | 52.2 | 35.5 | 72.6 | 8.1 |
ENT 10K R1 | 455.4 | 135.5 | 50.6 | 44.6 | 34.0 | 22.7 | 56.9 | 6.3 |
ENT CAP R1 | 285.5 | 221.9 | 31.8 | 19.0 | 43.9 | 28.3 | 37.7 | 4.0 |
ENT 10K R10 | 690.9 | 87.21 | 76.8 | 48.6 | 35.0 | 21.8 | 86.4 | 9.6 |
Table-3 Performance summary for large file access operations (90% read)
Table-3 shows that for two-drive RAID 1, the Enterprise 15K are the fastest performance, however using a RAID 10 with four 10K HDD’s with enhanced cache features provide a good price, performance and space capacity option. Software RAID was used in this workload test.
Figure-4 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.
Figure-4 Large file processing 90% read, 10% write rate and response time
In figure-4 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K HDD’s).
Results in figure-4 above and table-4 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-4 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.
Table-4 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).
Avg. | Single Drive Cost per RPS | Multi-Drive Cost per RPS | Single Drive Cost / Per GB Capacity | Cost / Per GB Usable (Protected) Cap. | Drive Cost (Multiple Drives) | Protection Overhead (Space Capacity for RAID) | Cost per usable GB per RPS | Avg. File Read Resp. (Sec.) | |
ENT 15K R1 | 580.7 | $1.02 | $2.05 | $ 0.99 | $0.99 | $1,190 | 100% | $2.1 | 107.9 |
ENT 10K R1 | 455.5 | 1.92 | 3.84 | 0.49 | 0.49 | 1,750 | 100% | 3.8 | 135.5 |
ENT CAP R1 | 285.5 | 1.40 | 2.80 | 0.20 | 0.20 | 798 | 100% | 2.8 | 271.9 |
ENT 10K R10 | 690.9 | 1.27 | 5.07 | 0.49 | 0.97 | 3,500 | 100% | 5.1 | 87.2 |
Table-4 Performance, capacity and cost analysis for big file processing
Small File Size Processing
To simulate a general file sharing environment, or content streaming with many smaller objects, 1,638,464 16KB sized files were created on each device being tested (table-5). These files were spread across 64 directories (25,600 files each) and accessed via 64 threads (workers) doing 90% reads with a 1KB I/O size over a ten hour time frame. Like the large file test, and database activity, all workloads were run at the same time (e.g. test devices were concurrently busy).
Avg. File Read Rate | Avg. Read Resp. Time | Avg. File Write Rate | Avg. Write Resp. Time | Avg. | Avg. CPU % System | Avg. MBps | Avg. MBps | |
ENT 15K R1 | 3,415.7 | 1.5 | 379.4 | 132.2 | 24.9 | 19.5 | 3.3 | 0.4 |
ENT 10K R1 | 2,203.4 | 2.9 | 244.7 | 172.8 | 24.7 | 19.3 | 2.2 | 0.2 |
ENT CAP R1 | 1,063.1 | 12.7 | 118.1 | 303.3 | 24.6 | 19.2 | 1.1 | 0.1 |
ENT 10K R10 | 4,590.5 | 0.7 | 509.9 | 101.7 | 27.7 | 22.1 | 4.5 | 0.5 |
Table-5 Performance summary for small sized (16KB) file access operations (90% read)
Figure-5 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.
Figure-5 Small file processing 90% read, 10% write rate and response time
In figure-5 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K RPM), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K RPM HDD’s) that has higher performance and capacity along with costs (table-5).
Results in figure-5 above and table-5 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-6 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.
Table-6 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).
Avg. | Single Drive Cost per RPS | Multi-Drive Cost per RPS | Single Drive Cost / Per GB Capacity | Cost / Per GB Usable (Protected) Cap. | Drive Cost (Multiple Drives) | Protection Overhead (Space Capacity for RAID) | Cost per usable GB per RPS | Avg. File Read Resp. (Sec.) | |
ENT 15K R1 | 3,415.7 | $0.17 | $0.35 | $0.99 | $0.99 | $1,190 | 100% | $0.35 | 1.51 |
ENT 10K R1 | 2,203.4 | 0.40 | 0.79 | 0.49 | 0.49 | 1,750 | 100% | 0.79 | 2.90 |
ENT CAP R1 | 1,063.1 | 0.38 | 0.75 | 0.20 | 0.20 | 798 | 100% | 0.75 | 12.70 |
ENT 10K R10 | 4,590.5 | 0.19 | 0.76 | 0.49 | 0.97 | 3,500 | 100% | 0.76 | 0.70 |
Table-6 Performance, capacity and cost analysis for small file processing
Looking at the small file processing analysis in table-5 shows that the 15K HDD’s on an apples to apples basis (e.g. same RAID level and number of drives) provide the best performance. However when also factoring in space capacity, performance, different RAID level or other protection schemes along with cost, there are other considerations. On the other hand the Enterprise Capacity 2TB HDD’s have a low cost per capacity, however do not have the performance of other options, assuming your applications need more performance.
Thus the right HDD for one application may not be the best one for a different scenario as well as multiple metrics as shown in table-5 need to be included in an informed storage decision making process.
Where To Learn More
- Part 1 of this series – Trends and Content Applications Servers
- Part 2 of this series – Content applications server decisions and testing plans
- Part 3 of this series – Test hardware and software configuration
- Part 4 of this series – Large file I/O processing
- Part 5 of this series – Small file I/O processing
- Part 6 of this series – General I/O processing
- Part 7 of this series – How HDD continue to evolve over different generations and wrap up
- As the platters spin, HDD’s for cloud, virtual and traditional storage environments
- How many IOPS can a HDD, HHDD or SSD do?
- Hard Disk Drives (HDD) for Virtual Environments
- Server and Storage I/O performance and benchmarking tools
- Additional Server StorageIO White Papers and Lab Reports, Solutions Briefs and Profiles, Tips and Articles
- PDF White Paper version of this post
- www.thenvmeplace.com and www.thessdplace.com
Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.
What This All Means
File processing are common content applications tasks, some being small, others large or mixed as well as reads and writes. Even if your content environment is using object storage, chances are unless it is a new applications or a gateway exists, you may be using NAS or file based access. Thus the importance of if your applications are doing file based processing, either run your own applications or use tools that can simulate as close as possible to what your environment is doing.
Continue reading part six in this multi-part series here where the focus is around general I/O including 8KB and 128KB sized IOPs along with associated metrics.
Ok, nuff said, for now.
Gs
Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.
All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.