IOPs performance on NVMe + HDD configuration with Windows Server 2016 and Storage Spaces Direct

first_img440160,495*641,979* 140238,560*954,240* 642119,280*477,120* Avg. CPU Utilization (%)21.8916.42 Working Set Size [TB]Cache Size [TB]Spill over [TB]IOPS/NodeIOPS 9.40845.433,576**134,305**  4K – 100% Random 100% Reads:  CSVFS Reads/sec8K 70/30 Read & Write CSVFS Reads/sec  CSVFS Writes/secWhen the working set was increased to use 78% of total storage on each node with the same configuration as above, we achieved 176,613 aggregate IOPS and average CPU utilization of 21.89% for 4K 100% Random 100% Reads. For 8K 70/30 RW scenario, we achieved aggregate IOPS of 135,365, with an average CPU utilization of 16.42%.When entire working set is 78% of total storage on each node: 340160,495*641,979* 84442,235*168,942* 94547,712*190,848* 94535,666*142,662* VMs96 VMs96 VMs VMs96 VMs96 VMs 340238,560*954,240* 9.40845.444,178**176,711** Per NodeIOPS Per Node To understand the cluster IOPs performance as the working set starts to spill over the caching tier, we did a theoretical analysis (these are estimates alone, given not all working set size IOPs were measured) based on the above results to estimate the effect on IOPs performance vs Working Set Size. In an All Cached In scenario, the working set is contained within the caching tier size; for this configuration with caching tier being 4 TB /Node we see from the results above that the IOPs performance stays consistent for 4K 100% Random 100% Reads (Average IOPs – 954,240). If we assume that the working set is increased beyond 4TB (Cache Tier Size), we expect to see a drop in overall IOPs performance.  At 6TB, IOPs performance has dropped by ~ 50% for 4K 100% Random 100% Reads. As the working set size increases further, we expect to see lower IOPs performance yet, and based on our testing when working set was increased to 9.4TB we see the IOPs performance to be reduced by ~82% when compared to the working set being contained entirely within the caching tier.The table below provides the outline of the theoretical analysis used in estimating IOPs vs. working set size performance.4K 100% Random 100% Reads: 74379,520*318,080* 0.9640238,560**954,240** IOPS 84459,640*238,560* 240238,560*954,240*  *Estimated IOPs ** Measured IOPsConclusionHybrid storage configurations like NVMe SSD + HDD work well for workloads where the working set fits within the NVMe SSD cache. In the case where the entire working set is resident within the capacity of the NVMe drives, we see an aggregate IO performance of ~950K IOPS (4K 100% Random 100% Reads). As the working set increases, or a multi-tenant configuration changes the request profile on the storage, we would expect data to spill over out of the NVMe SSDs. As this happens, the performance will be gated by the IOPS capacity of the HDDs, resulting in imbalance in the nodes.  This can potentially be addressed by employing Windows Server 2016 Storage QoS by pre-defining performance minimum and maximum for virtual machines. To support a growing working set of data, while maintaining consistent performance across all of the nodes, it would be more effective to deploy SSDs in the capacity tier.We’ll be presenting the results of the all-flash NVMe and SATA SSD configuration IOPs performance test soon, so stay tuned for our next blog.DisclaimersResults have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as HammerDB, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing.^ Other names and brands names may be claimed as the property of others. *Estimated IOPs ** Measured IOPsSimilarly for 8K 70/30 Read & Write based on the theoretical analysis, we expect to see ~60% reduction in IOPs performance as the working set grows from 4TB to 6TB. When the working set increases to 9.4TB, the measured IOPs performance was ~77% lower compared to the working set being contained within the caching tier.Theoretical analysis used in estimating IOPs vs. working set size performance for 8K 70/30 Read & Write.8K 70/30 Read & Write: 4K Random Reads 240160,495*641,979* 440238,560*954,240* 0.9640160,495**641,979** Working Set Size [TB]Cache Size [TB]Spill over [TB]IOPS/NodeIOPS Avg. CPU Utilization (%)80.2388.6 4K Random Reads8K 70/30 RW Aggregate IOPS176,613135,365 Aggregate IOPS954,240641,979 74351,773*207,090* 140160,495*641,979* 8K 70/30 RW 64266,873*267,491* In our previous blog on Storage Spaces Direct, we discussed three different configurations that we jointly developed with Microsoft: IOPS optimized (all-flash NVMe), throughput/capacity optimized (all-flash NVMe and SATA SSD), and capacity optimized (hybrid NVMe and HDD). Since then, we have been testing these configurations with Windows Server 2016 TP5 release in our lab and monitoring how they perform when we activate Storage Spaces Direct within Windows Server 2016 TP5. In this blog, we present the results of the hybrid NVMe and HDD configuration IOPs performance test.ConfigurationThe hybrid NVMe and HDD configuration setup consisted of four 2U Intel® Server Systems equipped with Intel® Server Board S2600WT2R. The configuration for each server consisted of:Processor: 2x Intel® Xeon® processor E5-2650 v4 (30M Cache, 2.2GHz, 12 cores, 105W)Storage: Cache Tier: 2x 2TB Intel® SSD DC P3700 SeriesCapacity Tier: 8x 6TB 3.5” HDD Seagate^ ST6000NM0024Network:1 x 10GbE dual-port Chelsio^ T520 adapterWith total capacity storage of 192 TB in the cluster [(48TB/node)*4] and with three-way mirroring we had 64 TB of total space (192 TB /3 = 64 TB) with each node having 16 TB of available storage (64 TB/4 nodes = 16 TB). The total used share space was 14*4 (=56 TB) + 2 TB = 58 TB. For cluster networking, 1x 10 GbE Extreme Networks Summit X670-48x switch was used.24x Azure-like VMs per node were deployed and each VM comes with 2 cores, 3.5GB RAM and 60GB disk. Each VM was also equipped with 500 GB Data VHD (53.76 TB total space used from the shares) containing 4*98 GB Diskspd files (spill over), and 2*10 GB Diskspd files (cached in).VMs:24x Azure-like VMs per node60 GB OS VHD + 500 GB Data VHD per VM [53.76 TB total space used from the shares]Spill over: 4*98GB Diskspd files per VMCached in: 2*10GB Diskspd files per VMResultsWith 24 VMs per node, for a total of 96 VMs, we ran DISKSPD (version: 2.0.15) in each virtual machine with 4 threads, and 32 outstanding IOs, with the working set contained within the caching tier, we achieved 954,240 aggregate IOPS and average CPU utilization of 80.23% for 4K 100% Random 100% Reads. For 8K 70/30 Read/Write scenario, we achieved aggregate IOPS of 641,979, with an average CPU utilization of 88.6%.When entire working set is contained within SSD Caching Tier (All Cached In):last_img

Post Your Comment Here

Your email address will not be published. Required fields are marked *