×

Welcome to Knowledge Base!

KB at your finger tips

This is one stop global knowledge base where you can learn about all the products, solutions and support features.

Categories
All
Storage and Backups-Purestorage
ESXi Host Configuration

VMware Native Multipathing Plugin (NMP) Configuration

VMware offers a Native Multipathing Plugin (NMP) layer in vSphere through Storage Array Type Plugins (SATP) and Path Selection Policies (PSP) as part of the VMware APIs for Pluggable Storage Architecture (PSA). The SATP has all the knowledge of the storage array to aggregate I/Os across multiple channels and has the intelligence to send failover commands when a path has failed. The Path Selection Policy can be either “Fixed”, “Most Recently Used” or “Round Robin”.

Round Robin Path Selection Policy

To best leverage the active-active nature of the front end of the FlashArray, Pure Storage requires that you configure FlashArray volumes to use the Round Robin Path Selection Policy. The Round Robin PSP rotates between all discovered paths for a given volume which allows ESXi (and therefore the virtual machines running on the volume) to maximize the possible performance by using all available resources (HBAs, target ports, etc.).

BEST PRACTICE: Use the Round Robin Path Selection Policy for FlashArray volumes.

The I/O Operations Limit

The Round Robin Path Selection Policy allows for additional tuning of its path-switching behavior in the form of a setting called the I/O Operations Limit. The I/O Operations Limit (sometimes called the “IOPS” value) dictates how often ESXi switches logical paths for a given device. By default, when Round Robin is enabled on a device, ESXi will switch to a new logical path every 1,000 I/Os. In other words, ESXi will choose a logical path, and start issuing all I/Os for that device down that path. Once it has issued 1,000 I/Os for that device, down that path, it will switch to a new logical path and so on.

Pure Storage recommends tuning this value down to the minimum of 1. This will cause ESXi to change logical paths after every single I/O, instead of 1,000.

This recommendation is made for a few reasons:

  1. Performance. Often the reason cited to change this value is performance. While this is true in certain cases, the performance impact of changing this value is not usually profound (generally in the single digits of a percentage performance increase). While changing this value from 1,000 to 1 can improve performance, it generally will not solve a major performance problem. Regardless, changing this value can improve performance in some use cases, especially with iSCSI.
  2. Path Failover Time. It has been noted in testing that ESXi will fail logical paths much more quickly when this value is set to a the minimum of 1. During a physical failure of the storage environment (loss of a HBA, switch, cable, port, controller) ESXi, after a certain period of time, will fail any logical path that relies on that failed physical hardware and will discontinue attempting to use it for a given volume. This failure does not always happen immediately. When the I/O Operations Limit is set to the default of 1,000 path failover time can sometimes be in the 10s of seconds which can lead to noticeable disruption in performance during this failure. When this value is set to the minimum of 1, path failover generally decreases to sub-ten seconds. This greatly reduces the impact of a physical failure in the storage environment and provides greater performance resiliency and reliability.
  3. FlashArray Controller I/O Balance. When Purity is upgraded on a FlashArray, the following process is observed (at a high level): upgrade Purity on one controller, reboot it, wait for it to come back up, upgrade Purity on the other controller, reboot it and you’re done. Due to the reboots, twice during the process half of the FlashArray front-end ports go away. Because of this, we want to ensure that all hosts are actively using both controllers prior to upgrade. One method that is used to confirm this is to check the I/O balance from each host across both controllers. When volumes are configured to use Most Recently Used, an imbalance of 100% is usually observed (ESXi tends to select paths that lead to the same front end port for all devices). This then means additional troubleshooting to make sure that host can survive a controller reboot. When Round Robin is enabled with the default I/O Operations Limit, port imbalance is improved to about 20-30% difference. When the I/O Operations Limit is set to 1, this imbalance is less than 1%. This gives Pure Storage and the end user confidence that all hosts are properly using all available front-end ports.

For these three above reasons, Pure Storage highly recommends altering the I/O Operations Limit to 1. For additional information you can read the VMware KB regarding setting the IOPs Limit.

BEST PRACTICE: Change the Round Robin I/O Operations Limit from 1,000 to 1 for FlashArray volumes on vSphere. This is a default configuration in all supported vSphere releases.

To fully utilize CPU resources, set the host's active power policy to high performance.

ESXi Express Patch 5 or 6.5 Update 1 and later

Starting with ESXi 6.0 Express Patch 5 (build 5572656) and later (Release notes) and ESXi 6.5 Update 1 (build 5969303) and later (release notes), Round Robin and an I/O Operations limit is the default configuration for all Pure Storage FlashArray devices (iSCSI and Fibre Channel) and no configuration is required.

A new default SATP rule, provided by VMware by default was specifically built for the FlashArray to Pure Storage’s best practices. Inside of ESXi you will see a new system rule:

Name                 Device  Vendor    Model             Driver  Transport  Options                     Rule Group  Claim Options                        Default PSP  PSP Options     Description
-------------------  ------  --------  ----------------  ------  ---------  --------------------------  ----------  -----------------------------------  -----------  --------------  --------------------------------------------------------------------------
VMW_SATP_ALUA                PURE      FlashArray                                                       system                                           VMW_PSP_RR   iops=1

For information, refer to this blog post:

https://www.codyhosterman.com/2017/0...e-now-default/

Configuring Round Robin and the I/O Operations Limit

If you are running earlier than ESXi 6.0 Express Patch 5 or 6.5 Update 1, there are a variety of ways to configure Round Robin and the I/O Operations Limit. This can be set on a per-device basis and as every new volume is added, these options can be set against that volume. This is not a particularly good option as one must do this for every new volume, which can make it easy to forget, and must do it on every host for every volume. This makes the chance of exposure to mistakes quite large.

The recommended option for configuring Round Robin and the correct I/O Operations Limit is to create a rule that will cause any new FlashArray device that is added in the future to that host to automatically get the Round Robin PSP and an I/O Operation Limit value of 1.

The following command creates a rule that achieves both of these for only Pure Storage FlashArray devices:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "iops=1" -e "FlashArray SATP Rule"

This must be repeated for each ESXi host.

This can also be accomplished through PowerCLI. Once connected to a vCenter Server this script will iterate through all of the hosts in that particular vCenter and create a default rule to set Round Robin for all Pure Storage FlashArray devices with an I/O Operation Limit set to 1.

Connect-VIServer -Server <vCenter> -Credential (Get-Credential)
Get-VMhost | Get-EsxCli –V2 | % {$_.storage.nmp.satp.rule.add.Invoke(@{description='Pure Storage FlashArray SATP';model='FlashArray';vendor='PURE';satp='VMW_SATP_ALUA';psp='VMW_PSP_RR'; pspoption='iops=1'})}

Furthermore, this can be configured using vSphere Host Profiles:

host-profile.png

It is important to note that existing, previously presented devices will need to be manually set to Round Robin and an I/O Operation Limit of 1. Optionally, the ESXi host can be rebooted so that it can inherit the multipathing configuration set forth by the new rule.

For setting a new I/O Operation Limit on an existing device, see Appendix I: Per-Device NMP Configuration.

Note that I/O Operations of 1 is the default in 6.0 Patch 5 and later in the 6.0 code branch, 6.5 Update 1 and later in the 6.5 code branch, and all versions of 6.7 and later.

Enhanced Round Robin Load Balancing (Latency Based PSP)

With the release of vSphere 6.7 U1, there is now a sub-policy option for Round Robin that actively monitors individual path performance. This new sub-policy is called "Enhanced Round Robin Load Balancing" (also known as Latency Based Path Selection Policy (PSP)). Before this policy became available the ESXi host would utilize all active paths by sending I/O requests down each path in a "fire and forget" type of fashion, sending 1 I/O down each path before moving to the next. Often times this resulted in performance penalties when individual paths became degraded and weren't functioning as optimally as other available paths. This performance penalty was invoked because the ESXi host would continue using the non-optimal path due to limited insight into the overall path health. This now changes with the Latency Based PSP by monitoring each path for latency, along with outstanding I/Os, allowing the  ESXi host to make smarter decisions on which paths to use and which to exclude in a more dynamic manner.

How it Works

Like all other Native Multipathing Plugin (NMP) policies this sub-policy is set on a per LUN or per datastore basis. Once enabled the NMP begins by assessing the first 16 user I/O requests per path and calculates their average latency. Once all of the paths have been successfully analyzed the NMP will then calculate the average latency of each path and use this information to determine which paths are healthy (optimal) and which are unhealthy (non-optimal). If a path falls outside of the average latency it is deemed non-optimal and will not be used until latency has reached an optimal response time once more.

After the initial assessment, the ESXi host then repeats the same process outlined above every 3 minutes. It will test every active path, including any non-optimal paths, to confirm if the latency has improved, worsened, or remained the same.  Once again those results will be analyzed and used to determine which paths should continue sending I/O requests and which should be paused to see if they report better health in the next 3 minutes. Throughout this process the NMP is also taking into account any outstanding I/Os for each path to make more informed decisions.

Configuring Round Robin and the Latency Based Sub-Policy

If you are using ESXi 7.0 or later then no changes are required to enable this new sub-policy as it is the new recommendation moving forward. In an effort to make things easier for end-users a new SATP rule has been added that will automatically apply this rule to any Pure Storage LUNs presented to the ESXi host:

Name                 Device  Vendor    Model             Driver  Transport  Options                     Rule Group  Claim Options                        Default PSP  PSP Options     Description
VMW_SATP_ALUA                PURE      FlashArray                                                       system                                           VMW_PSP_RR   policy=latency

If your environment is using ESXi 6.7U1 or later and you wish to utilize this feature, which Pure Storage supports, then the best way is to create a SATP rule on each ESXi host, which can be done as follows:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "policy=latency" -e "FlashArray SATP Rule"

Alternatively, this can be done using PowerShell:

Connect-VIServer -Server <vCenter> -Credential (Get-Credential)
Get-VMhost | Get-EsxCli –V2 | % {$_.storage.nmp.satp.rule.add.Invoke(@{description='Pure Storage FlashArray SATP';model='FlashArray';vendor='PURE';satp='VMW_SATP_ALUA';psp='VMW_PSP_RR'; pspoption='policy=latency'})}

Setting a new SATP rule will only change the policy for newly presented LUNs, it does not get applied to LUNs that were present before the rule was set until the host is rebooted.

Lastly, if you would like to change an individual LUN  (or set of LUNs) you can run the following command to change the PSP to latency (where device is specific to your env):

esxcli storage nmp psp roundrobin deviceconfig set --type=latency --device=naa.624a93708a75393becad4e43000540e8

Tuning

By default the RR latency policy is configured to send 16 user I/O requests down each path and evaluate each path every three minutes (180000ms). Based on extensive testing, Pure Storage's recommendation is to leave these options configured to their defaults and no changes are required.

BEST PRACTICE: Enhanced Round Robin Load Balancing is configured by default on ESXi 7.0 and later. No configuration changes are required.

Verifying Connectivity

It is important to verify proper connectivity prior to implementing production workloads on a host or volume.

This consists of a few steps:

  1. Verifying proper multipathing settings in ESXi.
  2. Verifying the proper numbers of paths.
  3. Verifying I/O balance and redundancy on the FlashArray.

The Path Selection Policy and number of paths can be verified easily inside of the vSphere Web Client.

verify-psp.png

This will report the path selection policy and the number of logical paths. The number of logical paths will depend on the number of HBAs, zoning and the number of ports cabled on the FlashArray.

The I/O Operations Limit cannot be checked from the vSphere Web Client—it can only be verified or altered via command line utilities. The following command can check a particular device for the PSP and I/O Operations Limit:

esxcli storage nmp device list -d naa.<device NAA>

Picture5.png

Please remember that each of these settings is a per-host setting, so while a volume might be configured properly on one host, it may not be correct on another.

Additionally, it is also possible to check multipathing from the FlashArray.

A CLI command exists to monitor I/O balance coming into the array:

purehost monitor --balance --interval <how long to sample> --repeat <how many iterations>

The command will report a few things:

  1. The host name.
  2. The individual initiators from the host. If they are logged into more than one FlashArray port, it will be reported more than once. If an initiator is not logged in at all, it will not appear.
  3. The port that the initiator is logged into.
  4. The number of I/Os that came into that port from that initiator over the time period sampled.
  5. The relative percentage of I/Os for that initiator as compared to the maximum.

The balance command will count the I/Os that came down from a particular initiator during the sampled time period, and it will do that for all initiator/target relationships for that host. Whichever relationship/path has the most I/Os will be designated as 100%. The rest of the paths will be then denoted as a percentage of that number. So if a host has two paths, and the first path has 1,000 I/Os and the second path has 800, the first path will be 100% and the second will be 80%.

A well balanced host should be within a few percentage points of each path. Anything more than 15% or so might be worthy of investigation. Refer to this post for more information.

Please keep in mind that if the Latency Based PSP is in use that IO may not be 1 to 1 for all paths to the Array from the ESXi hosts.

There is nothing inherently wrong with the IO not being balanced 1 to 1 for all paths as the Latency Bases PSP will be distributing IO based on which path has the lowest latency.  With that said, a few percentage points difference shouldn't be cause for alarm, however if there are paths with very little to no IO being sent down them this should be something investigated in the SAN to find out why that path is performing poorly.

The GUI will also report on host connectivity in general, based on initiator logins.

2018-01-29_12-02-17.png

This report should be listed as redundant for all hosts, meaning that it is connected to each controller. If this reports something else, investigate zoning and/or host configuration to correct this.

For a detailed explanation of the various reported states, please refer to the FlashArray User Guide which can be found directly in your GUI:

2018-01-26_16-25-39.png

Round Robin Path Selection Policy

To best leverage the active-active nature of the front end of the FlashArray, Pure Storage requires that you configure FlashArray volumes to use the Round Robin Path Selection Policy. The Round Robin PSP rotates between all discovered paths for a given volume which allows ESXi (and therefore the virtual machines running on the volume) to maximize the possible performance by using all available resources (HBAs, target ports, etc.).

BEST PRACTICE: Use the Round Robin Path Selection Policy for FlashArray volumes.

The I/O Operations Limit

The Round Robin Path Selection Policy allows for additional tuning of its path-switching behavior in the form of a setting called the I/O Operations Limit. The I/O Operations Limit (sometimes called the “IOPS” value) dictates how often ESXi switches logical paths for a given device. By default, when Round Robin is enabled on a device, ESXi will switch to a new logical path every 1,000 I/Os. In other words, ESXi will choose a logical path, and start issuing all I/Os for that device down that path. Once it has issued 1,000 I/Os for that device, down that path, it will switch to a new logical path and so on.

Pure Storage recommends tuning this value down to the minimum of 1. This will cause ESXi to change logical paths after every single I/O, instead of 1,000.

This recommendation is made for a few reasons:

  1. Performance. Often the reason cited to change this value is performance. While this is true in certain cases, the performance impact of changing this value is not usually profound (generally in the single digits of a percentage performance increase). While changing this value from 1,000 to 1 can improve performance, it generally will not solve a major performance problem. Regardless, changing this value can improve performance in some use cases, especially with iSCSI.
  2. Path Failover Time. It has been noted in testing that ESXi will fail logical paths much more quickly when this value is set to a the minimum of 1. During a physical failure of the storage environment (loss of a HBA, switch, cable, port, controller) ESXi, after a certain period of time, will fail any logical path that relies on that failed physical hardware and will discontinue attempting to use it for a given volume. This failure does not always happen immediately. When the I/O Operations Limit is set to the default of 1,000 path failover time can sometimes be in the 10s of seconds which can lead to noticeable disruption in performance during this failure. When this value is set to the minimum of 1, path failover generally decreases to sub-ten seconds. This greatly reduces the impact of a physical failure in the storage environment and provides greater performance resiliency and reliability.
  3. FlashArray Controller I/O Balance. When Purity is upgraded on a FlashArray, the following process is observed (at a high level): upgrade Purity on one controller, reboot it, wait for it to come back up, upgrade Purity on the other controller, reboot it and you’re done. Due to the reboots, twice during the process half of the FlashArray front-end ports go away. Because of this, we want to ensure that all hosts are actively using both controllers prior to upgrade. One method that is used to confirm this is to check the I/O balance from each host across both controllers. When volumes are configured to use Most Recently Used, an imbalance of 100% is usually observed (ESXi tends to select paths that lead to the same front end port for all devices). This then means additional troubleshooting to make sure that host can survive a controller reboot. When Round Robin is enabled with the default I/O Operations Limit, port imbalance is improved to about 20-30% difference. When the I/O Operations Limit is set to 1, this imbalance is less than 1%. This gives Pure Storage and the end user confidence that all hosts are properly using all available front-end ports.

For these three above reasons, Pure Storage highly recommends altering the I/O Operations Limit to 1. For additional information you can read the VMware KB regarding setting the IOPs Limit.

BEST PRACTICE: Change the Round Robin I/O Operations Limit from 1,000 to 1 for FlashArray volumes on vSphere. This is a default configuration in all supported vSphere releases.

To fully utilize CPU resources, set the host's active power policy to high performance.

ESXi Express Patch 5 or 6.5 Update 1 and later

Starting with ESXi 6.0 Express Patch 5 (build 5572656) and later (Release notes) and ESXi 6.5 Update 1 (build 5969303) and later (release notes), Round Robin and an I/O Operations limit is the default configuration for all Pure Storage FlashArray devices (iSCSI and Fibre Channel) and no configuration is required.

A new default SATP rule, provided by VMware by default was specifically built for the FlashArray to Pure Storage’s best practices. Inside of ESXi you will see a new system rule:

Name                 Device  Vendor    Model             Driver  Transport  Options                     Rule Group  Claim Options                        Default PSP  PSP Options     Description
-------------------  ------  --------  ----------------  ------  ---------  --------------------------  ----------  -----------------------------------  -----------  --------------  --------------------------------------------------------------------------
VMW_SATP_ALUA                PURE      FlashArray                                                       system                                           VMW_PSP_RR   iops=1

For information, refer to this blog post:

https://www.codyhosterman.com/2017/0...e-now-default/

Configuring Round Robin and the I/O Operations Limit

If you are running earlier than ESXi 6.0 Express Patch 5 or 6.5 Update 1, there are a variety of ways to configure Round Robin and the I/O Operations Limit. This can be set on a per-device basis and as every new volume is added, these options can be set against that volume. This is not a particularly good option as one must do this for every new volume, which can make it easy to forget, and must do it on every host for every volume. This makes the chance of exposure to mistakes quite large.

The recommended option for configuring Round Robin and the correct I/O Operations Limit is to create a rule that will cause any new FlashArray device that is added in the future to that host to automatically get the Round Robin PSP and an I/O Operation Limit value of 1.

The following command creates a rule that achieves both of these for only Pure Storage FlashArray devices:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "iops=1" -e "FlashArray SATP Rule"

This must be repeated for each ESXi host.

This can also be accomplished through PowerCLI. Once connected to a vCenter Server this script will iterate through all of the hosts in that particular vCenter and create a default rule to set Round Robin for all Pure Storage FlashArray devices with an I/O Operation Limit set to 1.

Connect-VIServer -Server <vCenter> -Credential (Get-Credential)
Get-VMhost | Get-EsxCli –V2 | % {$_.storage.nmp.satp.rule.add.Invoke(@{description='Pure Storage FlashArray SATP';model='FlashArray';vendor='PURE';satp='VMW_SATP_ALUA';psp='VMW_PSP_RR'; pspoption='iops=1'})}

Furthermore, this can be configured using vSphere Host Profiles:

host-profile.png

It is important to note that existing, previously presented devices will need to be manually set to Round Robin and an I/O Operation Limit of 1. Optionally, the ESXi host can be rebooted so that it can inherit the multipathing configuration set forth by the new rule.

For setting a new I/O Operation Limit on an existing device, see Appendix I: Per-Device NMP Configuration.

Note that I/O Operations of 1 is the default in 6.0 Patch 5 and later in the 6.0 code branch, 6.5 Update 1 and later in the 6.5 code branch, and all versions of 6.7 and later.

The I/O Operations Limit

The Round Robin Path Selection Policy allows for additional tuning of its path-switching behavior in the form of a setting called the I/O Operations Limit. The I/O Operations Limit (sometimes called the “IOPS” value) dictates how often ESXi switches logical paths for a given device. By default, when Round Robin is enabled on a device, ESXi will switch to a new logical path every 1,000 I/Os. In other words, ESXi will choose a logical path, and start issuing all I/Os for that device down that path. Once it has issued 1,000 I/Os for that device, down that path, it will switch to a new logical path and so on.

Pure Storage recommends tuning this value down to the minimum of 1. This will cause ESXi to change logical paths after every single I/O, instead of 1,000.

This recommendation is made for a few reasons:

  1. Performance. Often the reason cited to change this value is performance. While this is true in certain cases, the performance impact of changing this value is not usually profound (generally in the single digits of a percentage performance increase). While changing this value from 1,000 to 1 can improve performance, it generally will not solve a major performance problem. Regardless, changing this value can improve performance in some use cases, especially with iSCSI.
  2. Path Failover Time. It has been noted in testing that ESXi will fail logical paths much more quickly when this value is set to a the minimum of 1. During a physical failure of the storage environment (loss of a HBA, switch, cable, port, controller) ESXi, after a certain period of time, will fail any logical path that relies on that failed physical hardware and will discontinue attempting to use it for a given volume. This failure does not always happen immediately. When the I/O Operations Limit is set to the default of 1,000 path failover time can sometimes be in the 10s of seconds which can lead to noticeable disruption in performance during this failure. When this value is set to the minimum of 1, path failover generally decreases to sub-ten seconds. This greatly reduces the impact of a physical failure in the storage environment and provides greater performance resiliency and reliability.
  3. FlashArray Controller I/O Balance. When Purity is upgraded on a FlashArray, the following process is observed (at a high level): upgrade Purity on one controller, reboot it, wait for it to come back up, upgrade Purity on the other controller, reboot it and you’re done. Due to the reboots, twice during the process half of the FlashArray front-end ports go away. Because of this, we want to ensure that all hosts are actively using both controllers prior to upgrade. One method that is used to confirm this is to check the I/O balance from each host across both controllers. When volumes are configured to use Most Recently Used, an imbalance of 100% is usually observed (ESXi tends to select paths that lead to the same front end port for all devices). This then means additional troubleshooting to make sure that host can survive a controller reboot. When Round Robin is enabled with the default I/O Operations Limit, port imbalance is improved to about 20-30% difference. When the I/O Operations Limit is set to 1, this imbalance is less than 1%. This gives Pure Storage and the end user confidence that all hosts are properly using all available front-end ports.

For these three above reasons, Pure Storage highly recommends altering the I/O Operations Limit to 1. For additional information you can read the VMware KB regarding setting the IOPs Limit.

BEST PRACTICE: Change the Round Robin I/O Operations Limit from 1,000 to 1 for FlashArray volumes on vSphere. This is a default configuration in all supported vSphere releases.

To fully utilize CPU resources, set the host's active power policy to high performance.

ESXi Express Patch 5 or 6.5 Update 1 and later

Starting with ESXi 6.0 Express Patch 5 (build 5572656) and later (Release notes) and ESXi 6.5 Update 1 (build 5969303) and later (release notes), Round Robin and an I/O Operations limit is the default configuration for all Pure Storage FlashArray devices (iSCSI and Fibre Channel) and no configuration is required.

A new default SATP rule, provided by VMware by default was specifically built for the FlashArray to Pure Storage’s best practices. Inside of ESXi you will see a new system rule:

Name                 Device  Vendor    Model             Driver  Transport  Options                     Rule Group  Claim Options                        Default PSP  PSP Options     Description
-------------------  ------  --------  ----------------  ------  ---------  --------------------------  ----------  -----------------------------------  -----------  --------------  --------------------------------------------------------------------------
VMW_SATP_ALUA                PURE      FlashArray                                                       system                                           VMW_PSP_RR   iops=1

For information, refer to this blog post:

https://www.codyhosterman.com/2017/0...e-now-default/

Configuring Round Robin and the I/O Operations Limit

If you are running earlier than ESXi 6.0 Express Patch 5 or 6.5 Update 1, there are a variety of ways to configure Round Robin and the I/O Operations Limit. This can be set on a per-device basis and as every new volume is added, these options can be set against that volume. This is not a particularly good option as one must do this for every new volume, which can make it easy to forget, and must do it on every host for every volume. This makes the chance of exposure to mistakes quite large.

The recommended option for configuring Round Robin and the correct I/O Operations Limit is to create a rule that will cause any new FlashArray device that is added in the future to that host to automatically get the Round Robin PSP and an I/O Operation Limit value of 1.

The following command creates a rule that achieves both of these for only Pure Storage FlashArray devices:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "iops=1" -e "FlashArray SATP Rule"

This must be repeated for each ESXi host.

This can also be accomplished through PowerCLI. Once connected to a vCenter Server this script will iterate through all of the hosts in that particular vCenter and create a default rule to set Round Robin for all Pure Storage FlashArray devices with an I/O Operation Limit set to 1.

Connect-VIServer -Server <vCenter> -Credential (Get-Credential)
Get-VMhost | Get-EsxCli –V2 | % {$_.storage.nmp.satp.rule.add.Invoke(@{description='Pure Storage FlashArray SATP';model='FlashArray';vendor='PURE';satp='VMW_SATP_ALUA';psp='VMW_PSP_RR'; pspoption='iops=1'})}

Furthermore, this can be configured using vSphere Host Profiles:

host-profile.png

It is important to note that existing, previously presented devices will need to be manually set to Round Robin and an I/O Operation Limit of 1. Optionally, the ESXi host can be rebooted so that it can inherit the multipathing configuration set forth by the new rule.

For setting a new I/O Operation Limit on an existing device, see Appendix I: Per-Device NMP Configuration.

Note that I/O Operations of 1 is the default in 6.0 Patch 5 and later in the 6.0 code branch, 6.5 Update 1 and later in the 6.5 code branch, and all versions of 6.7 and later.

Enhanced Round Robin Load Balancing (Latency Based PSP)

With the release of vSphere 6.7 U1, there is now a sub-policy option for Round Robin that actively monitors individual path performance. This new sub-policy is called "Enhanced Round Robin Load Balancing" (also known as Latency Based Path Selection Policy (PSP)). Before this policy became available the ESXi host would utilize all active paths by sending I/O requests down each path in a "fire and forget" type of fashion, sending 1 I/O down each path before moving to the next. Often times this resulted in performance penalties when individual paths became degraded and weren't functioning as optimally as other available paths. This performance penalty was invoked because the ESXi host would continue using the non-optimal path due to limited insight into the overall path health. This now changes with the Latency Based PSP by monitoring each path for latency, along with outstanding I/Os, allowing the  ESXi host to make smarter decisions on which paths to use and which to exclude in a more dynamic manner.

How it Works

Like all other Native Multipathing Plugin (NMP) policies this sub-policy is set on a per LUN or per datastore basis. Once enabled the NMP begins by assessing the first 16 user I/O requests per path and calculates their average latency. Once all of the paths have been successfully analyzed the NMP will then calculate the average latency of each path and use this information to determine which paths are healthy (optimal) and which are unhealthy (non-optimal). If a path falls outside of the average latency it is deemed non-optimal and will not be used until latency has reached an optimal response time once more.

After the initial assessment, the ESXi host then repeats the same process outlined above every 3 minutes. It will test every active path, including any non-optimal paths, to confirm if the latency has improved, worsened, or remained the same.  Once again those results will be analyzed and used to determine which paths should continue sending I/O requests and which should be paused to see if they report better health in the next 3 minutes. Throughout this process the NMP is also taking into account any outstanding I/Os for each path to make more informed decisions.

Configuring Round Robin and the Latency Based Sub-Policy

If you are using ESXi 7.0 or later then no changes are required to enable this new sub-policy as it is the new recommendation moving forward. In an effort to make things easier for end-users a new SATP rule has been added that will automatically apply this rule to any Pure Storage LUNs presented to the ESXi host:

Name                 Device  Vendor    Model             Driver  Transport  Options                     Rule Group  Claim Options                        Default PSP  PSP Options     Description
VMW_SATP_ALUA                PURE      FlashArray                                                       system                                           VMW_PSP_RR   policy=latency

If your environment is using ESXi 6.7U1 or later and you wish to utilize this feature, which Pure Storage supports, then the best way is to create a SATP rule on each ESXi host, which can be done as follows:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "policy=latency" -e "FlashArray SATP Rule"

Alternatively, this can be done using PowerShell:

Connect-VIServer -Server <vCenter> -Credential (Get-Credential)
Get-VMhost | Get-EsxCli –V2 | % {$_.storage.nmp.satp.rule.add.Invoke(@{description='Pure Storage FlashArray SATP';model='FlashArray';vendor='PURE';satp='VMW_SATP_ALUA';psp='VMW_PSP_RR'; pspoption='policy=latency'})}

Setting a new SATP rule will only change the policy for newly presented LUNs, it does not get applied to LUNs that were present before the rule was set until the host is rebooted.

Lastly, if you would like to change an individual LUN  (or set of LUNs) you can run the following command to change the PSP to latency (where device is specific to your env):

esxcli storage nmp psp roundrobin deviceconfig set --type=latency --device=naa.624a93708a75393becad4e43000540e8

Tuning

By default the RR latency policy is configured to send 16 user I/O requests down each path and evaluate each path every three minutes (180000ms). Based on extensive testing, Pure Storage's recommendation is to leave these options configured to their defaults and no changes are required.

BEST PRACTICE: Enhanced Round Robin Load Balancing is configured by default on ESXi 7.0 and later. No configuration changes are required.

How it Works

Like all other Native Multipathing Plugin (NMP) policies this sub-policy is set on a per LUN or per datastore basis. Once enabled the NMP begins by assessing the first 16 user I/O requests per path and calculates their average latency. Once all of the paths have been successfully analyzed the NMP will then calculate the average latency of each path and use this information to determine which paths are healthy (optimal) and which are unhealthy (non-optimal). If a path falls outside of the average latency it is deemed non-optimal and will not be used until latency has reached an optimal response time once more.

After the initial assessment, the ESXi host then repeats the same process outlined above every 3 minutes. It will test every active path, including any non-optimal paths, to confirm if the latency has improved, worsened, or remained the same.  Once again those results will be analyzed and used to determine which paths should continue sending I/O requests and which should be paused to see if they report better health in the next 3 minutes. Throughout this process the NMP is also taking into account any outstanding I/Os for each path to make more informed decisions.

Configuring Round Robin and the Latency Based Sub-Policy

If you are using ESXi 7.0 or later then no changes are required to enable this new sub-policy as it is the new recommendation moving forward. In an effort to make things easier for end-users a new SATP rule has been added that will automatically apply this rule to any Pure Storage LUNs presented to the ESXi host:

Name                 Device  Vendor    Model             Driver  Transport  Options                     Rule Group  Claim Options                        Default PSP  PSP Options     Description
VMW_SATP_ALUA                PURE      FlashArray                                                       system                                           VMW_PSP_RR   policy=latency

If your environment is using ESXi 6.7U1 or later and you wish to utilize this feature, which Pure Storage supports, then the best way is to create a SATP rule on each ESXi host, which can be done as follows:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O "policy=latency" -e "FlashArray SATP Rule"

Alternatively, this can be done using PowerShell:

Connect-VIServer -Server <vCenter> -Credential (Get-Credential)
Get-VMhost | Get-EsxCli –V2 | % {$_.storage.nmp.satp.rule.add.Invoke(@{description='Pure Storage FlashArray SATP';model='FlashArray';vendor='PURE';satp='VMW_SATP_ALUA';psp='VMW_PSP_RR'; pspoption='policy=latency'})}

Setting a new SATP rule will only change the policy for newly presented LUNs, it does not get applied to LUNs that were present before the rule was set until the host is rebooted.

Lastly, if you would like to change an individual LUN  (or set of LUNs) you can run the following command to change the PSP to latency (where device is specific to your env):

esxcli storage nmp psp roundrobin deviceconfig set --type=latency --device=naa.624a93708a75393becad4e43000540e8

Tuning

By default the RR latency policy is configured to send 16 user I/O requests down each path and evaluate each path every three minutes (180000ms). Based on extensive testing, Pure Storage's recommendation is to leave these options configured to their defaults and no changes are required.

BEST PRACTICE: Enhanced Round Robin Load Balancing is configured by default on ESXi 7.0 and later. No configuration changes are required.

Verifying Connectivity

It is important to verify proper connectivity prior to implementing production workloads on a host or volume.

This consists of a few steps:

  1. Verifying proper multipathing settings in ESXi.
  2. Verifying the proper numbers of paths.
  3. Verifying I/O balance and redundancy on the FlashArray.

The Path Selection Policy and number of paths can be verified easily inside of the vSphere Web Client.

verify-psp.png

This will report the path selection policy and the number of logical paths. The number of logical paths will depend on the number of HBAs, zoning and the number of ports cabled on the FlashArray.

The I/O Operations Limit cannot be checked from the vSphere Web Client—it can only be verified or altered via command line utilities. The following command can check a particular device for the PSP and I/O Operations Limit:

esxcli storage nmp device list -d naa.<device NAA>

Picture5.png

Please remember that each of these settings is a per-host setting, so while a volume might be configured properly on one host, it may not be correct on another.

Additionally, it is also possible to check multipathing from the FlashArray.

A CLI command exists to monitor I/O balance coming into the array:

purehost monitor --balance --interval <how long to sample> --repeat <how many iterations>

The command will report a few things:

  1. The host name.
  2. The individual initiators from the host. If they are logged into more than one FlashArray port, it will be reported more than once. If an initiator is not logged in at all, it will not appear.
  3. The port that the initiator is logged into.
  4. The number of I/Os that came into that port from that initiator over the time period sampled.
  5. The relative percentage of I/Os for that initiator as compared to the maximum.

The balance command will count the I/Os that came down from a particular initiator during the sampled time period, and it will do that for all initiator/target relationships for that host. Whichever relationship/path has the most I/Os will be designated as 100%. The rest of the paths will be then denoted as a percentage of that number. So if a host has two paths, and the first path has 1,000 I/Os and the second path has 800, the first path will be 100% and the second will be 80%.

A well balanced host should be within a few percentage points of each path. Anything more than 15% or so might be worthy of investigation. Refer to this post for more information.

Please keep in mind that if the Latency Based PSP is in use that IO may not be 1 to 1 for all paths to the Array from the ESXi hosts.

There is nothing inherently wrong with the IO not being balanced 1 to 1 for all paths as the Latency Bases PSP will be distributing IO based on which path has the lowest latency.  With that said, a few percentage points difference shouldn't be cause for alarm, however if there are paths with very little to no IO being sent down them this should be something investigated in the SAN to find out why that path is performing poorly.

The GUI will also report on host connectivity in general, based on initiator logins.

2018-01-29_12-02-17.png

This report should be listed as redundant for all hosts, meaning that it is connected to each controller. If this reports something else, investigate zoning and/or host configuration to correct this.

For a detailed explanation of the various reported states, please refer to the FlashArray User Guide which can be found directly in your GUI:

2018-01-26_16-25-39.png

Disk.DiskMaxIOSize

The ESXi host setting, Disk.DiskMaxIOSize, controls the largest I/O size that ESXi will allow to be sent from ESXi to an underlying storage device. By default this is 32 MB. If an I/O is larger than the Disk.DiskMaxIOSize value, ESXi will split the I/O requests into segments under the configured limit.

If you are running an older release of ESXi (versions listed below) this setting needs to be modified if and only if you are on an old version and have an environment running the following scenarios:

  1. If a virtual machine is using EFI (Extensible Firmware Interface) instead of BIOS and is using VMware Hardware Version 12 or earlier.
  2. If your environment utilizes vSphere Replication.
  3. If your environment contains VMs which house applications that are sending READ or WRITE requests larger than 4 MB.
  4. The environment is using Fibre Channel with one of the above scenarios (this issue is not present with iSCSI).

VMware has resolved this issue in two places--fixing it in ESXi itself (ESXi now reads the maximum supported SCSI from the array and will only send I/Os of that size or smaller and split anything larger) and within VMware HW.

This is resolved in the following ESXi releases:

  • ESXi 6.0, Patch Release ESXi600-201909001
  • ESXi 6.5, Patch Release ESXi650-201811002
  • ESXi 6.7 Update 1 Release
  • ESXi 7.0 all releases

If you are not running one of these newer releases, it is necessary to reduce the ESXi parameter Disk.DiskMaxIOSize from the default of 32 MB (32,768 KB) down to 4 MB (4,096 KB) or less.

The above scenarios are only applicable if the VMs reside on a Pure Storage FlashArray. If you have VMs in your environment that are not on a Pure Storage FlashArray please consult with your vendor to verify if any changes are required.

If this is not configured for ESXi hosts running EFI-enabled VMs, the virtual machine will fail to properly boot. If it is not changed on hosts running VMs being replicated by vSphere Replication, replication will fail. If it is not changed for VMs whose applications are sending requests larger than 4MB, the larger I/O requests will fail which results in the application failing as well.

DiskMaxIOSize-2.png

This should be set on every ESXi host in the cluster that VMs may have access to, in order to ensure vMotion is successful from one ESXi host to another. If none of the above circumstances apply to your environment then this value can remain at the default. There is no known performance impact by changing this value.

For more detail on this change, please refer to the VMware KB article here:

https://kb.vmware.com/s/article/2137402

BEST PRACTICE: Upgrade ESXi to a release that adheres to the maximum supported SCSI size from the FlashArray.

VAAI Configuration

The VMware API for Array Integration (VAAI) primitives offer a way to offload and accelerate certain operations in a VMware environment.

Pure Storage requires that all VAAI features be enabled on every ESXi host that is using FlashArray storage. Disabling VAAI features can greatly reduce the efficiency and performance of FlashArray storage in ESXi environments.

All VAAI features are enabled by default (set to 1) in ESXi 5.x and later, so no action is typically required. Though these settings can be verified via the vSphere Web Client or CLI tools.

  1. WRITE SAME—DataMover.HardwareAcceleratedInit
  2. XCOPY—DataMover.HardwareAcceleratedMove
  3. ATOMIC TEST & SET— VMFSHardwareAcceleratedLocking

vaai.png

BEST PRACTICE: Keep VAAI enabled. DataMover.HardwareAcceleratedInit, DataMover.HardwareAcceleratedMove, and VMFS3.HardwareAcceleratedLocking

In order to provide a more efficient heart-beating mechanism for datastores VMware introduced a new host-wide setting called /VMFS3/UseATSForHBOnVMFS5. In VMware’s own words:

“A change in the VMFS heartbeat update method was introduced in ESXi 5.5 Update 2, to help optimize the VMFS heartbeat process. Whereas the legacy method involves plain SCSI reads and writes with the VMware ESXi kernel handling validation, the new method offloads the validation step to the storage system.“

Pure Storage recommends keeping this value on whenever possible. That being said, it is a host wide setting, and it can possibly affect storage arrays from other vendors negatively.

Read the VMware KB article here:

ESXi host loses connectivity to a VMFS3 and VMFS5 datastore

Pure Storage is NOT susceptible to this issue, but in the case of the presence of an affected array from another vendor, it might be necessary to turn this off. In this case, Pure Storage supports disabling this value and reverting to traditional heart-beating mechanisms.

ats-heartbeat.png

BEST PRACTICE: Keep VMFS3.UseATSForHBOnVMFS5 enabled—this is preferred. If another vendor is present and prefers it to be disabled, it is supported by Pure Storage to disable it .

For additional information please refer to VMware Storage APIs for Array Integration with the Pure Storage FlashArray document.

iSCSI Configuration

Just like any other array that supports iSCSI, Pure Storage recommends the following changes to an iSCSI-based vSphere environment for the best performance.

For a detailed walkthrough of setting up iSCSI on VMware ESXi and on the FlashArray please refer to the following VMware white paper. This is required reading for any VMware/iSCSI user:

https://core.vmware.com/resource/best-practices-running-vmware-vsphere-iscsi

Set Login Timeout to a Larger Value

For example, to set the Login Timeout value to 30 seconds, use commands similar to the following:

  1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.
  2. Navigate to the Manage tab.
  3. Select the Storage option.
  4. Under Storage Adapters, select the iSCSI vmhba to be modified.
  5. Select Advanced and change the Login Timeout parameter. This can be done on the iSCSI adapter itself or on a specific target.

The default Login Timeout value is 5 seconds and the maximum value is 60 seconds.

BEST PRACTICE: Set iSCSI Login Timeout for FlashArray targets to 30 seconds. A higher value is supported but not necessary.

Disable DelayedAck

DelayedAck is an advanced iSCSI option that allows or disallows an iSCSI initiator to delay acknowledgment of received data packets.

Disabling DelayedAck:

  1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.
  2. Navigate to the Configure tab.
  3. Select the Storage option.
  4. Under Storage Adapters, select the iSCSI vmhba to be modified.

Navigate to Advanced Options and modify the DelayedAck setting by using the option that best matches your requirements, as follows:

Option 1: Modify the DelayedAck setting on a particular discovery address (recommended) as follows:

  1. Select Targets.
  2. On a discovery address, select the Dynamic Discovery tab.
  3. Select the iSCSI server.
  4. Click Advanced.
  5. Change DelayedAck to false.

Option 2: Modify the DelayedAck setting on a specific target as follows:

  1. Select Targets.
  2. Select the Static Discovery tab.
  3. Select the iSCSI server and click Advanced.
  4. Change DelayedAck to false.

Option 3: Modify the DelayedAck setting globally for the iSCSI adapter as follows:

  1. Select the Advanced Options tab and click Advanced.
  2. Change DelayedAck to false.

DelayedAck is highly recommended to be disabled, but is not absolutely required by Pure Storage. In highly-congested networks, if packets are lost, or simply take too long to be acknowledged, due to that congestion, performance can drop. If DelayedAck is enabled, where not every packet is acknowledged at once (instead one acknowledgment is sent per so many packets) far more re-transmission can occur, further exacerbating congestion. This can lead to continually decreasing performance until congestion clears. Since DelayedAck can contribute to this it is recommended to disable it in order to greatly reduce the effect of congested networks and packet retransmission.

Enabling jumbo frames can further harm this since packets that are retransmitted are far larger. If jumbo frames are enabled, it is absolutely recommended to disable DelayedAck.

See the following VMware KB for more information:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002598

BEST PRACTICE: Disable DelayedAck for FlashArray iSCSI targets.

iSCSI Port Binding

For software iSCSI initiators, without additional configuration the default behavior for iSCSI pathing is for ESXi to leverage its routing tables to identify a path to its configured iSCSI targets. Without solid understanding of network configuration and routing behaviors, this can lead to unpredictable pathing and/or path unavailability in a hardware failure. To configure predictable and reliable path selection and failover it is necessary to configure iSCSI port binding (iSCSI multipathing).

Configuration and detailed discussion are out of the scope of this document, but it is recommended to read through the following VMware document that describes this and other concepts in-depth:

http://www.vmware.com/files/pdf/techpaper/vmware-multipathing-configuration-software-iSCSI-port-binding.pdf

BEST PRACTICE: Use Port Binding for ESXi software iSCSI adapters when possible.

Note that ESXi 6.5 has expanded support for port binding and features such as iSCSI routing (though the use of iSCSI routing is not usually recommended) and multiple subnets. Refer to ESXi 6.5 release notes for more information.

Jumbo Frames

In some iSCSI environments it is required to enable jumbo frames to adhere to the network configuration between the host and the FlashArray. Enabling jumbo frames is a cross-environment change so careful coordination is required to ensure proper configuration. It is important to work with your networking team and Pure Storage representatives when enabling jumbo frames. Please note that this is not a requirement for iSCSI use on the Pure Storage FlashArray—in general, Pure Storage recommends leaving MTU at the default setting.

That being said, altering the MTU is a fully supported and is up to the discretion of the user.

  1. Configure jumbo frames on the FlashArray iSCSI ports. 2018-01-29_10-00-52.png

Configure jumbo frames on the physical network switch/infrastructure for each port using the relevant switch CLI or GU I.

  1. Configure jumbo frames on the physical network switch/infrastructure for each port using the relevant switch CLI or GUI.
    1. Browse to a host in the vSphere Web Client navigator.
    2. Click the Configure tab and select Networking > Virtual Switches.
    3. Select the switch from the vSwitch list.
    4. Click the name of the VMkernel network adapter.
    5. Click the pencil icon to edit.
    6. Click NIC settings and set the MTU to your desired value.
    7. Click OK.
    8. Click the pencil icon to edit on the top to edit the vSwitch itself.
    9. Set the MTU to your desired value.
    10. Click OK.

Once jumbo frames are configured, verify end-to-end jumbo frame compatibility. To verify, try to ping an address on the storage network with vmkping.

vmkping -d -s 8972 <ip address of Pure Storage iSCSI port>

If the ping operations does not return successfully, then jumbo frames is not properly configured in ESXi, the networking devices, and/or the FlashArray port.

Challenge-Handshake Authentication Protocol (CHAP)

iSCSI CHAP is supported on the FlashArray for unidirectional or bidirectional authentication. Enabling CHAP is optional and up to the discretion of the user. Please refer to the following post for a detailed walkthrough:

http://www.codyhosterman.com/2015/03/configuring-iscsi-chap-in-vmware-with-the-flasharray/

2018-01-29_10-56-13.png

Please note that iSCSI CHAP is not currently supported with dynamic iSCSI targets on the FlashArray. If CHAP is going to be used, you MUST configure your iSCSI FlashArray targets as static targets.

iSCSI Failover Times

A common question encountered here at Pure Storage is why extended pauses in I/O are noted during specific operations or tests when utilizing the iSCSI protocol. Often times the underlying reasons for these pauses in I/O are a result of a network cable being disconnected, a misbehaving switch port, or a failover of the backend storage array; though this list is certainly not exhaustive.

When the default configuration for iSCSI is in use with VMware ESXi the delay for these events will generally be 25-35 seconds. While the majority of environments are able to successfully recover from these events unscathed this is not true for all environments. On a handful of occasions, there have been environments that contain applications that need faster recovery times. Without these faster recovery times, I/O failures have been noted and manual recovery efforts were required to bring the environment back online.

While Pure Storage's official best practice is to utilize default iSCSI configuration for failover times we also understand that not all environments are created equal. As such we do support modifying the necessary iSCSI advanced parameters to decrease failover times for sensitive applications.

Recovery times are controlled by the following 3 iSCSI advanced parameters:

Name                  Current     Default     Min  Max       Settable  Inherit
--------------------  ----------  ----------  ---  --------  --------  -------
NoopOutInterval       15          15          1    60            true    false
NoopOutTimeout        10          10          10   30            true     true
RecoveryTimeout       10          10          1    120           true     true

To better understand how these parameters are used in iSCSI recovery efforts it is recommended you read the following blog posts for deeper insight:

iSCSI: A 25-second pause in I/O during a single link loss? What gives?

iSCSI Advanced Settings

Once a thorough review of these iSCSI options have been investigated, additional testing within your own environment is strongly recommended to ensure no additional issues are introduced as a result of these changes.

Set Login Timeout to a Larger Value

For example, to set the Login Timeout value to 30 seconds, use commands similar to the following:

  1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.
  2. Navigate to the Manage tab.
  3. Select the Storage option.
  4. Under Storage Adapters, select the iSCSI vmhba to be modified.
  5. Select Advanced and change the Login Timeout parameter. This can be done on the iSCSI adapter itself or on a specific target.

The default Login Timeout value is 5 seconds and the maximum value is 60 seconds.

BEST PRACTICE: Set iSCSI Login Timeout for FlashArray targets to 30 seconds. A higher value is supported but not necessary.

Disable DelayedAck

DelayedAck is an advanced iSCSI option that allows or disallows an iSCSI initiator to delay acknowledgment of received data packets.

Disabling DelayedAck:

  1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.
  2. Navigate to the Configure tab.
  3. Select the Storage option.
  4. Under Storage Adapters, select the iSCSI vmhba to be modified.

Navigate to Advanced Options and modify the DelayedAck setting by using the option that best matches your requirements, as follows:

Option 1: Modify the DelayedAck setting on a particular discovery address (recommended) as follows:

  1. Select Targets.
  2. On a discovery address, select the Dynamic Discovery tab.
  3. Select the iSCSI server.
  4. Click Advanced.
  5. Change DelayedAck to false.

Option 2: Modify the DelayedAck setting on a specific target as follows:

  1. Select Targets.
  2. Select the Static Discovery tab.
  3. Select the iSCSI server and click Advanced.
  4. Change DelayedAck to false.

Option 3: Modify the DelayedAck setting globally for the iSCSI adapter as follows:

  1. Select the Advanced Options tab and click Advanced.
  2. Change DelayedAck to false.

DelayedAck is highly recommended to be disabled, but is not absolutely required by Pure Storage. In highly-congested networks, if packets are lost, or simply take too long to be acknowledged, due to that congestion, performance can drop. If DelayedAck is enabled, where not every packet is acknowledged at once (instead one acknowledgment is sent per so many packets) far more re-transmission can occur, further exacerbating congestion. This can lead to continually decreasing performance until congestion clears. Since DelayedAck can contribute to this it is recommended to disable it in order to greatly reduce the effect of congested networks and packet retransmission.

Enabling jumbo frames can further harm this since packets that are retransmitted are far larger. If jumbo frames are enabled, it is absolutely recommended to disable DelayedAck.

See the following VMware KB for more information:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002598

BEST PRACTICE: Disable DelayedAck for FlashArray iSCSI targets.

iSCSI Port Binding

For software iSCSI initiators, without additional configuration the default behavior for iSCSI pathing is for ESXi to leverage its routing tables to identify a path to its configured iSCSI targets. Without solid understanding of network configuration and routing behaviors, this can lead to unpredictable pathing and/or path unavailability in a hardware failure. To configure predictable and reliable path selection and failover it is necessary to configure iSCSI port binding (iSCSI multipathing).

Configuration and detailed discussion are out of the scope of this document, but it is recommended to read through the following VMware document that describes this and other concepts in-depth:

http://www.vmware.com/files/pdf/techpaper/vmware-multipathing-configuration-software-iSCSI-port-binding.pdf

BEST PRACTICE: Use Port Binding for ESXi software iSCSI adapters when possible.

Note that ESXi 6.5 has expanded support for port binding and features such as iSCSI routing (though the use of iSCSI routing is not usually recommended) and multiple subnets. Refer to ESXi 6.5 release notes for more information.

Jumbo Frames

In some iSCSI environments it is required to enable jumbo frames to adhere to the network configuration between the host and the FlashArray. Enabling jumbo frames is a cross-environment change so careful coordination is required to ensure proper configuration. It is important to work with your networking team and Pure Storage representatives when enabling jumbo frames. Please note that this is not a requirement for iSCSI use on the Pure Storage FlashArray—in general, Pure Storage recommends leaving MTU at the default setting.

That being said, altering the MTU is a fully supported and is up to the discretion of the user.

  1. Configure jumbo frames on the FlashArray iSCSI ports. 2018-01-29_10-00-52.png

Configure jumbo frames on the physical network switch/infrastructure for each port using the relevant switch CLI or GU I.

  1. Configure jumbo frames on the physical network switch/infrastructure for each port using the relevant switch CLI or GUI.
    1. Browse to a host in the vSphere Web Client navigator.
    2. Click the Configure tab and select Networking > Virtual Switches.
    3. Select the switch from the vSwitch list.
    4. Click the name of the VMkernel network adapter.
    5. Click the pencil icon to edit.
    6. Click NIC settings and set the MTU to your desired value.
    7. Click OK.
    8. Click the pencil icon to edit on the top to edit the vSwitch itself.
    9. Set the MTU to your desired value.
    10. Click OK.

Once jumbo frames are configured, verify end-to-end jumbo frame compatibility. To verify, try to ping an address on the storage network with vmkping.

vmkping -d -s 8972 <ip address of Pure Storage iSCSI port>

If the ping operations does not return successfully, then jumbo frames is not properly configured in ESXi, the networking devices, and/or the FlashArray port.

Challenge-Handshake Authentication Protocol (CHAP)

iSCSI CHAP is supported on the FlashArray for unidirectional or bidirectional authentication. Enabling CHAP is optional and up to the discretion of the user. Please refer to the following post for a detailed walkthrough:

http://www.codyhosterman.com/2015/03/configuring-iscsi-chap-in-vmware-with-the-flasharray/

2018-01-29_10-56-13.png

Please note that iSCSI CHAP is not currently supported with dynamic iSCSI targets on the FlashArray. If CHAP is going to be used, you MUST configure your iSCSI FlashArray targets as static targets.

iSCSI Failover Times

A common question encountered here at Pure Storage is why extended pauses in I/O are noted during specific operations or tests when utilizing the iSCSI protocol. Often times the underlying reasons for these pauses in I/O are a result of a network cable being disconnected, a misbehaving switch port, or a failover of the backend storage array; though this list is certainly not exhaustive.

When the default configuration for iSCSI is in use with VMware ESXi the delay for these events will generally be 25-35 seconds. While the majority of environments are able to successfully recover from these events unscathed this is not true for all environments. On a handful of occasions, there have been environments that contain applications that need faster recovery times. Without these faster recovery times, I/O failures have been noted and manual recovery efforts were required to bring the environment back online.

While Pure Storage's official best practice is to utilize default iSCSI configuration for failover times we also understand that not all environments are created equal. As such we do support modifying the necessary iSCSI advanced parameters to decrease failover times for sensitive applications.

Recovery times are controlled by the following 3 iSCSI advanced parameters:

Name                  Current     Default     Min  Max       Settable  Inherit
--------------------  ----------  ----------  ---  --------  --------  -------
NoopOutInterval       15          15          1    60            true    false
NoopOutTimeout        10          10          10   30            true     true
RecoveryTimeout       10          10          1    120           true     true

To better understand how these parameters are used in iSCSI recovery efforts it is recommended you read the following blog posts for deeper insight:

iSCSI: A 25-second pause in I/O during a single link loss? What gives?

iSCSI Advanced Settings

Once a thorough review of these iSCSI options have been investigated, additional testing within your own environment is strongly recommended to ensure no additional issues are introduced as a result of these changes.

Network Time Protocol (NTP)

No matter how perfect an environment is configured there will always come a time where troubleshooting an issue will be required. This is inevitable when dealing with large and complex environments. One way to help alleviate some of the stress that comes with troubleshooting is ensuring that the Network Time Protocol (NTP) is enabled on all components in the environment. NTP will ensure that the timestamps for servers, arrays, switches, etc are all aligned and in-sync. It is for this reason that Pure Storage recommends as a best practice that NTP be enabled and configured on all components.

Please refer to VMware KB Configuring Network Time Protocol (NTP) on an ESXi host using the vSphere Client for steps on how to configure NTP on your ESXi hosts.

Often times the VMware vCenter Server is configured to sync time with the ESXi host it resides on. If you do not use this option please ensure the vCenter Server has NTP properly configured and enabled as well.

Remote Syslog Server

Another helpful tool in the toolbox of troubleshooting is having a remote syslog server configured. There may be times where an investigation is required in the environment but when attempting to review the logs it is discovered that they are no longer available. Often times this is a result of the increased logging that happened during the time of the issue. The increased logging leads to thresholds for file size and counts being exceeded and thus the older logs are automatically deleted as a result.

Pure Storage recommends the use of the VMware vRealize Log Insight OVA. This provides for a quick and easy integration for the ESXi hosts and vCenter. Additionally, the Pure Storage Content Pack can be used with vRealize Log Insight which provides a single logging destination for both the vSphere and Pure Storage environments.

Configuring vCenter Server and ESXi with Log Insight

As explained above, configuring vCenter Server and ESXi is a relatively quick and simple process.

  • Login to VMware vRealize Log Insight.
  • Click on Administration .
  • Under Integration click on vSphere .
  • Click + Add vCenter Server .
  • Fill in the applicable vCenter Server information and Test Connection.
  • Ensure the following boxes are checked:
    • Collect vCenter Server events, tasks, and alarms
    • Configure ESXi hosts to send logs to Log Insight
  • Click Save to commit all of the requested changes.

loginsight-configuration.png

The following screenshot is applicable for vRealize Log Insight 8.x. If you have an earlier version of Log Insight then you can refer to the VMware documentation here on how to properly configure vCenter and ESXi.

Additional Remote syslog Options

It is understood that not every customer or environment will have vRealize Log Insight installed or available. If your environment takes advantage of a different solution then please refer to the third party documentation on how the best way to integrate it with your vSphere environment. You can also refer to VMware's Knowledge Base article Configuring syslog on ESXi for additional options and configuration information.

Configuring vCenter Server and ESXi with Log Insight

As explained above, configuring vCenter Server and ESXi is a relatively quick and simple process.

  • Login to VMware vRealize Log Insight.
  • Click on Administration .
  • Under Integration click on vSphere .
  • Click + Add vCenter Server .
  • Fill in the applicable vCenter Server information and Test Connection.
  • Ensure the following boxes are checked:
    • Collect vCenter Server events, tasks, and alarms
    • Configure ESXi hosts to send logs to Log Insight
  • Click Save to commit all of the requested changes.

loginsight-configuration.png

The following screenshot is applicable for vRealize Log Insight 8.x. If you have an earlier version of Log Insight then you can refer to the VMware documentation here on how to properly configure vCenter and ESXi.

Additional Remote syslog Options

It is understood that not every customer or environment will have vRealize Log Insight installed or available. If your environment takes advantage of a different solution then please refer to the third party documentation on how the best way to integrate it with your vSphere environment. You can also refer to VMware's Knowledge Base article Configuring syslog on ESXi for additional options and configuration information.

Virtual Machine and Guest Configuration

Virtual Disk Choice

Storage provisioning in virtual infrastructure involves multiple steps of crucial decisions. VMware vSphere offers three virtual disks formats: thin, zeroedthick and eagerzeroedthick.

vmdk-provisioning-types.png

To quickly review the types:

  1. Thin—thin virtual disks only allocate what is used by the guest. Upon creation, thin virtual disks only consume one block of space. As the guest writes data, new blocks are allocated on VMFS, then zereod out, then the data is committed to storage. Therefore there is some additional latency for new write
  2. Zeroedthick (lazy)— zeroed thick virtual disks allocate all of the space on the VMFS upon creation. As soon as the guest writes to a specific block for the first time in the virtual disk, the block is first zeroed, then the data is committed. Therefore there is some additional latency for new writes. Though less than thin (since it only has to zero—not also allocate), there is a negligible performance impact between zeroedthick (lazy) and thin.
  3. Eagerzeroedthick—eagerzeroedthick virtual disks allocate all of their provisioned size upon creation and also zero out the entire capacity upon creation. This type of disk cannot be used until the zeroing is complete. Eagerzeroedthick has zero first-write latency penalty because allocation and zeroing is done in advance, and not on-demand.

Prior to WRITE SAME support, the performance differences between these virtual disk allocation mechanisms were distinct. This was due to the fact that before an unallocated block could be written to, zeroes would have to be written first causing an allocate-on-first-write penalty (increased latency). Therefore, for every new block written, there were actually two writes; the zeroes then the actual data. For thin and zeroedthick virtual disks, this zeroing was on-demand so the penalty was seen by applications. For eagerzeroedthick, it was noticed during deployment because the entire virtual disk had to be zeroed prior to use. This zeroing caused unnecessary I/O on the SAN fabric, subtracting available bandwidth from “real” I/O.

To resolve this issue, VMware introduced WRITE SAME support. WRITE SAME is a SCSI command that tells a target device (or array) to write a pattern (in this case, zeros) to a target location. ESXi utilizes this command to avoid having to actually send a payload of zeros but instead simply communicates to any array that it needs to write zeros to a certain location on a certain device. This not only reduces traffic on the SAN fabric, but also speeds up the overall process since the zeros do not have to traverse the data path.

This process is optimized even further on the Pure Storage FlashArray. Since the array does not store space-wasting patterns like contiguous zeros on the array, the zeros are discarded and any subsequent reads will result in the array returning zeros to the host. This additional array-side optimization further reduces the time and penalty caused by pre-zeroing of newly-allocated blocks.

With this knowledge, choosing a virtual disk is a factor of a few different variables that need to be evaluated. In general, Pure Storage makes the following recommendations:

  • Lead with thin virtual disks. They offer the greatest flexibility and functionality and the performance difference is only at issue with the most sensitive of applications.
  • For highly-sensitive applications with high performance requirements, eagerzeroedthick is the best choice. It is always the best-performing virtual disk type.
  • In no situation does Pure Storage recommend the use of zeroedthick (thick provision lazy zeroed) virtual disks. There is very little advantage to this format over the others and can also lead to stranded space as described in this post.

With that being said, for more details on how these recommendations were decided upon, refer to the following considerations. Note that at the end of each consideration is a recommendation but that recommendation is valid only when only that specific consideration is important. When choosing a virtual disk type, take into account your virtual machine business requirements and utilize these requirements to motivate your design decisions. Based on those decisions, choose the virtual disk type that is best suitable for your virtual machine.

  • Performance —with the introduction of WRITE SAME (more information on WRITE SAME can be found in the section Block Zero or WRITE SAME) support, the performance difference between the different types of virtual disks is dramatically reduced—almost eliminated. In lab experiments, a difference can be observed during writes to unallocated portions of a thin or zeroedthick virtual disk. This difference is negligible but of course still non-zero. Therefore, performance is no longer an overridingly important factor in the type of virtual disk to use as the disparity is diminished, but for the most latency-sensitive of applications eagerzeroedthick will always be slightly better than the others. Recommendation: eagerzeroedthick.
  • Protection against space exhaustion —each virtual disk type, based on its architecture, has varying degrees of protection against space exhaustion. Thin virtual disks do not reserve space on the VMFS datastore upon creation and instead grow in 1 MB blocks as needed. Therefore, if unmonitored, as one or more thin virtual disks grow on the datastore, they could exhaust the capacity of the VMFS. Even if the underlying array has plenty of additional capacity to provide. If careful monitoring is in place that provides the ability to make proactive resolution of capacity exhaustion (moving the virtual machines around or grow the VMFS) thin virtual disks are a perfectly acceptable choice. Storage DRS is an excellent solution for space exhaustion prevention. While careful monitoring can protect against this possibility, it can still be of a concern and should be contemplated upon initial provisioning. Zeroedthick and eagerzeroedthick virtual disks are not susceptible to VMFS logical capacity exhaustion because the space is reserved on the VMFS upon creation. Recommendation: eagerzeroedthick.
  • Virtual disk density —it should be noted that while all virtual disk types take up the same amount of physical space on the FlashArray due to data reduction, they have different requirements on the VMFS layer. Thin virtual disks can be oversubscribed (more capacity provisioned than the VMFS reports as being available) allowing for far more virtual disks to fit on a given volume than either of the thick formats. This provides a greater virtual machine to VMFS datastore density and reduces the number or size of volumes that are required to store them. This, in effect, reduces the management overhead of provisioning and managing additional volumes in a VMware environment. Recommendation: thin.
  • Time to create —the virtual disk types also vary in how long it takes to initially create them. Since thin and zeroedthick virtual disks do not zero space until they are actually written to by a guest they are both created in trivial amounts of time—usually a second or two. Eagerzeroedthick disks, on the other hand, are pre-zeroed at creation and consequently take additional time to create. If the time-to-first-IO is paramount for whatever reason, thin or zeroedthick is best. Recommendation: thin.
  • Space efficiency —the aforementioned bullet on “virtual disk density” describes efficiency on the VMFS layer. Efficiency on the underlying array should also be considered. In vSphere 6.0, thin virtual disks support guest-OS initiated UNMAP to a virtual disk, through the VMFS and down to the physical storage. Therefore, thin virtual disks can be more space efficient as time wears on and data is written and deleted. For more information on this functionality in vSphere 6.0, refer to the section, In-Guest UNMAP in ESXi 6.x, that can be found later in this paper. Recommendation: thin.
  • Storage usage trending —A useful metric to know and track is how much capacity is actually being used by a virtual machine guest. If you know how much space is being used by the guests, and furthermore, at what rate that is growing, you can more appropriately size and project storage allocations. Since thick type virtual disks reserve all of the space on the VMFS whether or not the guest has used it, it is difficult to know, without guest tools, how much the guest has actually written. Often it is not known until the application has used its available space and the administrator requests more. This leads to abrupt and unplanned capacity increases. Thin virtual disks only reserve what the guest has written, therefore will grow as the guest adds more data. This growth can be monitored and trended. This will allow VMware administrators to plan and predict future storage needs. Recommendation: thin.

BEST PRACTICE: Use thin virtual disks for most virtual machines. Use eagerzeroedthick for virtual machines that require very high performance levels.

Do not use zeroedthick.

No virtual disk option quite fits all possible use-cases perfectly, so choosing an allocation method should generally be decided upon on a case-by-case basis. VMs that are intended for short term use, without extraordinarily high performance requirements, fit nicely with thin virtual disks. For VMs that have higher performance needs eagerzeroedthick is a good choice.

Virtual Hardware Configuration

Pure Storage makes the following recommendations for configuring a virtual machine in vSphere:

Virtual SCSI Adapter —the best performing and most efficient virtual SCSI adapter is the VMware Paravirtual SCSI Adapter. This adapter has the best CPU efficiency at high workloads and provides the highest queue depths for a virtual machine—starting at an adapter queue depth of 256 and a virtual disk queue depth 64 (twice what the LSI Logic can provide by default). The queue limits of PVSCSI can be further tuned, please refer to the Guest-level Settings section for more information. The virtual NVMe adapter is supported by both Pure and VMware, but at this time there is no significant benefit to its use over PVSCSI. In the future, that will likely change, but as of ESX 7.0 U1 the recommendation (not requirement though) remains PVSCSI.

Virtual Hardware —it is recommended to use the latest virtual hardware version that the hosting ESXi hosts supports.

VMware tools —in general, it is advisable to install the latest supported version of VMware tools in all virtual machines.

CPU and Memory - provision vCPUs and memory as per the application requirements.

VM encryption —vSphere 6.5 introduced virtual machine encryption which encrypts the VM’s virtual disk from a VMFS perspective. Pure Storage generally recommends not using this and instead relying on FlashArray-level Data-At-Rest-Encryption. Though, if it is necessary to leverage VM Encryption, doing so is fully supported by Pure Storage—but it should be noted that data reduction will disappear for that virtual machine as host level encryption renders post-encryption deduplication and compression impossible.

IOPS Limits —if you want to limit a virtual machine or a particular amount of IOPS, you can use the built-in ESXi IOPS limits. ESXi allows you to specify a number of IOPS a given virtual machine can issue for a given virtual disk. Once the virtual machine exceeds that number, any additional I/Os will be queued. In ESXi 6.0 and earlier this can be applied via the “Edit Settings” option of a virtual machine.

Screen Shot 2020-04-21 at 10.51.27 AM.png
In ESXi 6.5 and later, this can also be configured via a VM Storage Policy:

Screen Shot 2020-04-21 at 10.53.55 AM.png

BEST PRACTICE: Use the Paravirtual SCSI adapter for virtual machines for best performance.

Template Configuration

In general, template configuration is no different than virtual machine configuration. Standard recommendations apply. That being said, since templates are by definition frequently copied, Pure Storage recommends putting copies of the templates on FlashArrays that are frequent targets of virtual machines deployed from a template. If the template and target datastore are on the same FlashArray, the copy process can take advantage of VAAI XCOPY, which greatly accelerates the copy process while reducing the workload impact of the copy operation.

BEST PRACTICE: For the fastest and most efficient virtual machine deployments, place templates on the same FlashArray as the target datastore.

Prior to Full Copy (XCOPY) API support, when virtual machines needed to be copied or moved from one location to another, such as with Storage vMotion or a virtual machine cloning operation, ESXi would issue many SCSI read/write commands between the source and target storage location (the same or different device). This resulted in a very intense and often lengthy additional workload to this set of devices. This SCSI I/O consequently stole available bandwidth from more “important” I/O such as the I/O issued from virtualized applications. Therefore, copy or move operations often had to be scheduled to occur only during non-peak hours in order to limit interference with normal production storage performance.  This restriction effectively decreased the ability of administrators to use the virtualized infrastructure in the dynamic and flexible nature that was intended.

The introduction of XCOPY support for virtual machine movement allows for this workload to be offloaded from the virtualization stack to almost entirely onto the storage array. The ESXi kernel is no longer directly in the data copy path and the storage array instead does all the work. XCOPY functions by having the ESXi host identify a region of a VMFS that needs to be copied. ESXi describes this space into a series of XCOPY SCSI commands and sends them to the array. The array then translates these block descriptors and copies/moves the data from the described source locations to the described target locations. This architecture therefore does not require the moved data to be sent back and forth between the host and array—the SAN fabric does not play a role in traversing the data. This vastly reduces the time to move data. XCOPY benefits are leveraged during the following operations[1]:

  • Virtual machine cloning
  • Storage vMotion
  • Deploying virtual machines from template

During these offloaded operations, the throughput required on the data path is greatly reduced as well as the ESXi hardware resources (HBAs, CPUs etc.) initiating the request. This frees up resources for more important virtual machine operations by letting the ESXi resources do what they do best: run virtual machines, and lets the storage do what it does best: manage the storage.

On the Pure Storage FlashArray, XCOPY sessions are exceptionally quick and efficient. Due to FlashReduce technology (features like deduplication, pattern removal and compression) similar data is never stored on the FlashArray more than once. Therefore, during a host-initiated copy operation such as with XCOPY, the FlashArray does not need to copy the data—this would be wasteful. Instead, Purity simply accepts and acknowledges the XCOPY requests and creates new (or in the case of Storage vMotion, redirects existing) metadata pointers. By not actually having to copy/move data, the offload duration is greatly reduced. In effect, the XCOPY process is a 100% inline deduplicated operation. A non-VAAI copy process for a virtual machine containing 50 GB of data can take on the order of multiple minutes or more depending on the workload on the SAN. When XCOPY is enabled this time drops to a matter of a few seconds.

XCOPY on the Pure Storage FlashArray works directly out of the box without any configuration required. Nevertheless, there is one simple configuration change on the ESXi hosts that will increase the speed of XCOPY operations. ESXi offers an advanced setting called the MaxHWTransferSize that controls the maximum amount of data space that a single XCOPY SCSI command can describe. The default value for this setting is 4 MB. This means that any given XCOPY SCSI command sent from that ESXi host cannot exceed 4 MB of described data. Pure Storage recommends leaving this at the default value, but does support increasing the value if another vendor requires it to be. There is no XCOPY performance impact of increasing this value. Decreasing the value from 4 MB can slow down XCOPY sessions somewhat and should not be done without guidance from VMware or Pure Storage support. For this reason, Pure Storage recommends setting the transfer size to the maximum value of 16 MB.

Guest-level Settings

In general, standard operating system configuration best practices apply and Pure Storage does not make any overriding recommendations. So, please refer to VMware and/or OS vendor documentation for particulars of configuring a guest operating system for best operation in VMware virtualized environment.

That being said, Pure Storage does recommend two non-default options for file system configuration in a guest on a virtual disk residing on a FlashArray volume. Both configurations provide automatic space reclamation support. While it is highly recommended to follow these recommendations, it is not absolutely required.

In short:

  • For Linux guests in vSphere 6.5 or later using thin virtual disks, mount filesystems with the discard option
  • For Windows 2012 R2 or later guests in vSphere 6.0 or later using thin virtual disks, use a NTFS allocation unit size of 64K

Refer to the in-guest space reclamation section for a detailed description of enabling these options.

  • [1 ] Note that there are VMware-enforced caveats in certain situations that would prevent XCOPY behavior and revert to legacy software copy. Refer to VMware documentation for this information at www.vmware.com.

High-IOPS Virtual Machines

As mentioned earlier, the Paravirtual SCSI adapter should be leveraged for the best default performance. For virtual machines that host applications that need to push a large amount of IOPS (50,000+) to a single virtual disk, some non-default configurations are required. The PVSCSI adapter allows the default adapter queue depth limit and the per-device queue depth limit to be increased from the default of 256 and 64 (respectively) to 1024 and 256.

In general, this change is not needed and therefore not recommended for most workloads. Only increase these values if you know a virtual machine needs or will need this additional queue depth. Opening this queue for a virtual machine that does not (or should not) need it, can expose noisy neighbor performance issues. If a virtual machine has a process that unexpectedly becomes intense it can unfairly steal queue slots from other virtual machines sharing the underlying datastore on that host. This can then cause the performance of other virtual machines to suffer.

BEST PRACTICE: Leave virtual machine queue depth limits at the default unless performance requirements dictate otherwise.

If an application does need to push a high amount of IOPS to a single virtual disk these limits must be increased. See VMware KB here for information on how to configure Paravirtual SCSI adapter queue limits. The process slightly differs between Linux and Windows.

Refer to this blog post for more information:

http://www.codyhosterman.com/2017/02/understanding-vmware-esxi-queuing-and-the-flasharray/

A few general recommendations:

  1. Only increase these limits when needed
  2. If you change this limit it is required to change queue depth limits in ESXi as well, otherwise changing these values will have no tangible affect
  3. A good rule of knowing if you need to change these values is if you are not getting the IOPS you expect and the latency is high in the guest, but not reported as high in ESXi or on the FlashArray volume

Guest-level Settings

In general, standard operating system configuration best practices apply and Pure Storage does not make any overriding recommendations. So, please refer to VMware and/or OS vendor documentation for particulars of configuring a guest operating system for best operation in VMware virtualized environment.

That being said, Pure Storage does recommend two non-default options for file system configuration in a guest on a virtual disk residing on a FlashArray volume. Both configurations provide automatic space reclamation support. While it is highly recommended to follow these recommendations, it is not absolutely required.

In short:

  • For Linux guests in vSphere 6.5 or later using thin virtual disks, mount filesystems with the discard option
  • For Windows 2012 R2 or later guests in vSphere 6.0 or later using thin virtual disks, use a NTFS allocation unit size of 64K

Refer to the in-guest space reclamation section for a detailed description of enabling these options.

  • [1 ] Note that there are VMware-enforced caveats in certain situations that would prevent XCOPY behavior and revert to legacy software copy. Refer to VMware documentation for this information at www.vmware.com.

High-IOPS Virtual Machines

As mentioned earlier, the Paravirtual SCSI adapter should be leveraged for the best default performance. For virtual machines that host applications that need to push a large amount of IOPS (50,000+) to a single virtual disk, some non-default configurations are required. The PVSCSI adapter allows the default adapter queue depth limit and the per-device queue depth limit to be increased from the default of 256 and 64 (respectively) to 1024 and 256.

In general, this change is not needed and therefore not recommended for most workloads. Only increase these values if you know a virtual machine needs or will need this additional queue depth. Opening this queue for a virtual machine that does not (or should not) need it, can expose noisy neighbor performance issues. If a virtual machine has a process that unexpectedly becomes intense it can unfairly steal queue slots from other virtual machines sharing the underlying datastore on that host. This can then cause the performance of other virtual machines to suffer.

High-IOPS Virtual Machines

As mentioned earlier, the Paravirtual SCSI adapter should be leveraged for the best default performance. For virtual machines that host applications that need to push a large amount of IOPS (50,000+) to a single virtual disk, some non-default configurations are required. The PVSCSI adapter allows the default adapter queue depth limit and the per-device queue depth limit to be increased from the default of 256 and 64 (respectively) to 1024 and 256.

In general, this change is not needed and therefore not recommended for most workloads. Only increase these values if you know a virtual machine needs or will need this additional queue depth. Opening this queue for a virtual machine that does not (or should not) need it, can expose noisy neighbor performance issues. If a virtual machine has a process that unexpectedly becomes intense it can unfairly steal queue slots from other virtual machines sharing the underlying datastore on that host. This can then cause the performance of other virtual machines to suffer.

Read article
Best Practices: Configuration and Tuning

Horizon Connection Server Tuning

  1. Use SE sparse Virtual disks format–VMware Horizon 5.2 and above supports a vmdk disk format called Space Efficient (SE) sparse virtual disks which was introduced in vSphere 5.1. The advantages of SE sparse virtual disks can be summarized as follows:
    • Benefits of growing and shrinking dynamically, this prevents VMDK bloat as desktops rewrite data and delete data.
    • Available for Horizon View Composer based linked clone desktops (Not for persistent desktops) only
    • VM hardware version 9 or later
    • No need to do a refresh/recompose operation to reclaim space
    • No need to set blackout periods, as we handle UNMAPs efficiently
  2. We recommend using this disk format for deploying linked-clone and instant-clone desktops on Pure Storage due to the space efficiencies and preventing VMDK bloat.
  3. Disable View Storage Accelerator (linked-clones only, VSA must be enabled to use instant-clones)
    • The View storage accelerator, VSA, is a feature in VMware View 5.1 onwards based on VMware vSphere content based read caching (CBRC). There are several advantages of enabling VSA including containing boot storms by utilizing the host side caching of commonly used blocks. It even helps in steady state performance of desktops that use the same applications. As Pure Storage FlashArray gives you lots of IOPS at very low latency, we don’t need the extra layer of caching at the host level. The biggest disadvantage is the time it takes to recompose and refresh desktops, as every time you change the image file it has to rebuild the disk digest file. Also it consumes host side memory for caching and consume host CPU for building digest files. For shorter desktop recompose times, we recommend turning off VSA.
  4. Tune maximum concurrent vCenter operations—the default concurrent vCenter operations on the vCenter servers are defined in the View configuration’s advanced vCenter settings. These values are quite conservative and can be increased to higher values. Pure Storage FlashArray can withstand more operations including:
    • Max Concurrent vCenter provisioning operation (recommended value >= 50)
    • Max Concurrent Power operations (recommended value >= 50)
    • Max concurrent View composer operations (recommended value >= 50)

The higher values will drastically cut down the amount of time needed to accomplish typical View Administrative tasks such as recomposing or creating a new pool.

view.png

Some caveats include:

  1. These settings are global and will affect all pools. Pools on other slower disk arrays will suffer if you set these values higher, so enabling these will have adverse effects.
  2. The vCenter configuration, especially number of vCPUs, amount of memory, and the backing storage has implications from these settings. In order to attain the best possible performance levels, it is important to note the vCenter configurations and size them according to VMware’s sizing guidelines and increase them as needed if you notice a resource has become saturated.

Read article
Space Management and Reclamation

VMware Dead Space Overview

There are two place that dead space can be introduced:

  • VMFS — When an administrator deletes a virtual disk or an entire virtual machine (or moves it to another datastore) the space that used to store that virtual disk or virtual machine is now dead on the array. The array does not know that the space has been freed up, therefore, turning those blocks into dead space.
  • In-guest — When a file has been moved or deleted from a guest files system inside of a virtual machine on a virtual disk, the underlying VMFS does not know that a block is no longer in use by the virtual disk and consequently neither does the array. So that space is now also dead space.

So dead space can be accumulated in two ways. Fortunately, VMware has methods for dealing with both, that leverage the UNMAP feature support of the FlashArray.

Space Reclamation with VMFS

Space reclamation with VMFS differs depending on the version of ESXi. VMware has supported UNMAP in various forms since ESXi 5.0. This document is only going to focus on UNMAP implementation for ESXi 5.5 and later. For previous UNMAP behaviors, refer to VMware documentation.

In vSphere 5.5 and 6.0, VMFS UNMAP is a manual process, executed on demand by an administrator. In vSphere 6.5, VMFS UNMAP is an automatic process that gets executed by ESXi as needed without administrative intervention.

VMFS UNMAP in vSphere 5.5 through 6.0

To reclaim space in vSphere 5.5 and 6.0, UNMAP is available in the command “esxcli”. UNMAP can be run anywhere esxcli is installed and therefore does not require an SSH session:

esxcli storage vmfs unmap -l <datastore name> -n (blocks per iteration)

UNMAP with esxcli is an iterative process. The block count specifies how large each iteration is. If you do not specify a block count, 200 blocks will be the default value (each block is 1 MB, so each iteration issues UNMAP to a 200 MB section at a time). The operation runs UNMAP against the free space of the VMFS volume until the entirety of the free space has been reclaimed. If the free space is not perfectly divisible by the block count, the block count will be reduced at the final iteration to whatever amount of space is left.

While the FlashArray can handle very large values for this operation, ESXi does not support increasing the block count any larger than 1% of the free capacity of the target VMFS volume. Consequently, the best practice for block count during UNMAP is no greater than 1% of the free space. So as an example, if a VMFS volume has 1,048,576 MB free, the largest block count supported is 10,485 (always round down). If you specify a larger value the command will still be accepted, but ESXi will override the value back down to the default of 200 MB, which will dramatically slow down the operation.

It is imperative to calculate the block count value based off of the 1% of the free space only when that capacity is expressed in megabytes—since VMFS 5 blocks are 1 MB each. This will allow for simple and accurate identification of the largest allowable block count for a given datastore. Using GB or TB can lead to rounding errors, and as a result, too large of a block count value. Always round off decimals to the lowest near MB in order to calculate this number (do not round up).

BEST PRACTICE: For shortest UNMAP duration, use a large block count.

There are other methods to run or even schedule UNMAP, such as PowerCLI, vRealize Orchestrator and the FlashArray vSphere Web Client Plugin. These methods are outside of the scope of this document, please refer to the respective VMware and FlashArray integration documents for further detail.

If an UNMAP process seems to be slow, you can check to see if the block count value was overridden. You can check the hostd.log file in the /var/log/ directory on the target ESXi host. For every UNMAP operation there will be a series of messages that dictate the block count for every iteration. Examine the log and look for a line that indicates the UUID of the VMFS volume being reclaimed, the line will look like the example below:

Unmap: Async Unmapped 5000 blocks from volume 545d6633-4e026dce-d8b2-90e2ba392174 

From ESXi 5.5 Patch 3 and later, any UNMAP operation against a datastore that is 75% or more full will use a block count of 200 regardless to any block count specified in the command. For more information refer to the VMware KB article here.

VMFS UNMAP in vSphere 6.5

In the ESXi 6.5 release, VMware introduced automatic UNMAP support for VMFS volumes. ESXi 6.5 introduced a new version of VMFS, version 6. With VMFS-6, there is a new setting for all VMFS-6 volumes called UNMAP priority. This defaults to low.

unmap1.png

Pure Storage recommends that this be configured to “low” and not disabled. VMware only offers a low priority for ESXi 6.5—medium and high priorities were not enabled in the ESXi kernel.

Automatic UNMAP with vSphere 6.5 is an asynchronous task and reclamation will not occur immediately and will typically take 12 to 24 hours to complete. Each ESXi 6.5 host has a UNMAP “crawler” that will work in tandem to reclaim space on all VMFS-6 volumes they have access to. If, for some reason, the space needs to be reclaimed immediately, the esxcli UNMAP operation described in the previous section can be run.

Please note that VMFS-6 Automatic UNMAP will not be issued to inactive datastores. In other words, if a datastore does not have actively running virtual machines on it, the datastore will be ignored. In those cases, the simplest option to reclaim them is to run the traditional esxcli UNMAP command.

Pure Storage does support automatic UNMAP being disabled, if that is, for some reason, preferred by the customer. But to provide the most efficient and accurate environment, it is highly recommended to be left enabled.

VMFS UNMAP in vSphere 6.7 and later

In ESXi 6.7 VMware introduced a new option for utilizing automatic UNMAP as well as adding additional configuration options for the existing features available in ESXi 6.5.

The two methods available for Automatic UNMAP in ESXi 6.7 and later:

  • fixed (new to ESXi 6.7)
  • priority (medium and high)

Since we are already familiar with what priority based space reclamation is from the VMFS UNMAP in vSphere 6.5 section above let's start with the enhancements available in 6.7 before reviewing "fixed" space reclamation.

In ESXi 6.5 you only had one option available for automatic space reclamation (priority based) and a singular option of "low". With ESXi 6.7 that now changes and you have the added options of "medium" and "high".

The differences between the three are as follows:**

Space Reclamation Priority Description
None Disables UNMAP operations for the datastore.
Low (default) Sends the UNMAP command at a rate of approximately 25–50 MB per second.
Medium Sends the UNMAP command at a rate of  approximately 50–100 MB per second.
High Sends the UNMAP command at a rate of over 100 MB per second.

**Information on chart was found from VMware knowledge sources here.

As you will note above, priority based space reclamation process is a variable process with speed of reclamation depending on what option you have chosen. This provides for flexibility within ESXi depending on the current load of the datastore(s) on how quickly space can be recovered.

Also available in ESXi 6.7 is a new "fixed" space reclamation method. This option provides the end-user with the ability to determine how quickly (in MB/s) UNMAP operations can happen to the backing storage at a "fixed" rate. The options vary from 100 MB/s up to 2000 MB/s. Introducing this option provides the end-user with the ability to set a static rate for space reclamation as well as allowing for a much more aggressive reclamation process if the backing storage is able to ingest the higher load.

Pure Storage still recommends utilizing priority based reclamation set to the default option of "low". As additional testing is performed this recommendation may change in the future.

Modifying VMFS UNMAP priorities

It is while creating a new datastore when you will have the first opportunity to configure space reclamation. The options here are limited as you can only disable space reclamation (not recommended) or use priority based reclamation at low priority (default option selected).

create-ds-space-reclamation.png

Let's say however that you want to use the fixed space reclamation method for a higher rate of UNMAPs being sent to the underlying storage provider.

Once your datastore has been created you can right click on the datastore and select "Edit Space Reclamation". From here you can select the desired speed and save the changes.

This is illustrated below.

rightclick-ds-option.png

fixed-rate-space-reclamation.png

The last scenario is changing the priority level from "low" to "medium" or "high". As you can see from the screenshot above there are no options to modify the space reclamation priorities. As there are no options available in the GUI to modify these values the command line interface (CLI) on the ESXi host is where this change must be made. If this is something you wish to do please review the, Use the ESXCLI Command to Change the Space Reclamation Parameters, from VMware on how this change can be made.

Space Reclamation In-Guest

The discussion above speaks only about space reclamation directly on a VMFS volume which pertains to dead space accumulated by the deletion or migration of virtual machines and virtual disks. Running UNMAP on a VMFS only removes dead space in that scenario. But, as mentioned earlier, dead space can accumulate higher up in the VMware stack—inside of the virtual machine itself.

When a guest writes data to a file system on a virtual disk, the required capacity is allocated on the VMFS (if not already allocated) by expanding the file that represents the virtual disk. The data is then committed down to the array. When that data is deleted by the guest, the guest OS filesystem is cleared of the file, but this deletion is not reflected by the virtual disk allocation on the VMFS, nor the physical capacity on the array. To ensure the below layers are accurately reporting used space, in-guest UNMAP should be enabled.

Understanding In-Guest UNMAP in ESXi

Prior to ESXi 6.0 and virtual machine hardware version 11, guests could not leverage native UNMAP capabilities on a virtual disk because ESXi virtualized the SCSI layer and did not report UNMAP capability up through to the guest. So even if guest operating systems supported UNMAP natively, they could not issue UNMAP to a file system residing on a virtual disk. Consequently, reclaiming this space was a manual and tedious process.

In ESXi 6.0, VMware has resolved this problem and streamlined the reclamation process. With in-guest UNMAP support, guests running in a virtual machine using hardware version 11 can now issue UNMAP directly to virtual disks. The process is as follows:

  1. A guest application or user deletes a file from a file system residing on a thin virtual disk
  2. The guest automatically (or manually) issues UNMAP to the guest file system on the virtual disk
  3. The virtual disk is then shrunk in accordance to the amount of space reclaimed inside of it.
  4. If EnableBlockDelete is enabled, UNMAP will then be issued to the VMFS volume for the space that previously was held by the thin virtual disk. The capacity is then reclaimed on the FlashArray.

Prior to ESXi 6.0, the parameter EnableBlockDelete was a defunct option that was previously only functional in very early versions of ESXi 5.0 to enable or disable automated VMFS UNMAP. This option is now functional in ESXi 6.0 and has been re-purposed to allow in-guest UNMAP to be translated down to the VMFS and accordingly the SCSI volume. By default, EnableBlockDelete is disabled and can be enabled via the vSphere Web Client or CLI utilities.

enable-block-delete.png

In-guest UNMAP support does actually not require this parameter to be enabled though. Enabling this parameter allows for end-to-end UNMAP or in other words, in-guest UNMAP commands to be passed down to the VMFS layer. For this reason, enabling this option is a best practice for ESXi 6.x and later.

Enable the option “VMFS3.EnableBlockDelete” on ESXi 6.x & 7.x hosts where VMFS 5 datastores are in use. This is disabled by default and is not required for VMFS 6 datastores. To enable set the value to "1".

For more information on EnableBlockDelete and VMFS-6, you can refer to the following blog post here.

ESXi 6.5 expands support for in-guest UNMAP to additional guests types. ESXi 6.0 in-guest UNMAP only is supported with Windows Server 2012 R2 (or Windows 8) and later. ESXi 6.5 introduces support for Linux operating systems. The underlying reason for this is that ESXi 6.0 and earlier only supported SCSI version 2. Windows uses SCSI-2 UNMAP and therefore could take advantage of this feature set. Linux uses SCSI version 5 and could not. In ESXi 6.5, VMware enhanced their SCSI support to go up to SCSI-6, which allows guest like Linux to issue commands that they could not before.

Using the built-in Linux tool, sq_inq, you can see, through an excerpt of the response, the SCSI support difference between the ESXi versions:

unmap3.png

You can note the differences in SCSI support level and also the product revision of the virtual disk themselves (version 1 to 2).

It is important to note that simply upgrading to ESXi 6.5 will not provide SCSI-6 support. The virtual hardware for the virtual machine must be upgraded to version 13 once ESXi has been upgraded. VM hardware version 13 is what provides the additional SCSI support to the guest.

The following are the requirements for in-guest UNMAP to properly function:

  1. The target virtual disk must be a thin virtual disk. Thick-type virtual disks do not support UNMAP.
  2. For Windows In-Guest UNMAP:
    1. ESXi 6.0 and later
    2. VM Hardware version 11 and later
  3. For Linux In-Guest UNMAP:
    1. ESXi 6.5 and later
    2. VM Hardware version 13 and later
  4. If Change Block Tracking (CBT) is enabled for a virtual disk, In-Guest UNMAP for that virtual disk is only supported starting with ESXi 6.5

In-Guest UNMAP Alignment Requirements

VMware ESXi requires that any UNMAP request sent down by a guest must be aligned to 1 MB. For a variety of reasons, not all UNMAP requests will be aligned as such and in in ESXi 6.5 and earlier a large percentage failed. In ESXi 6.5 Patch 1, ESXi has been altered to be more tolerant of misaligned UNMAP requests. See the VMware patch information here.

Prior to this, any UNMAP requests that were even partially misaligned would fail entirely. Leading to no reclamation. In ESXi 6.5 P1, any portion of UNMAP requests that are aligned will be accepted and passed along to the underlying array. Misaligned portions will be accepted but not passed down. Instead, the affected blocks referred to by the misaligned UNMAPs will be instead zeroed out with WRITE SAME. The benefit of this behavior on the FlashArray, is that zeroing is identical in behavior to UNMAP so all of the space will be reclaimed regardless of misalignment.

BEST PRACTICE: Apply ESXi 6.5 Patch Release ESXi650-201703001 (2148989) as soon as possible to be able to take full advantage of in-guest UNMAP.

In-Guest UNMAP with Windows

Starting with ESXi 6.0, In-Guest UNMAP is supported with Windows 2012 R2 and later Windows-based operating systems. For a full report of UNMAP support with Windows, please refer to Microsoft documentation.

NTFS supports automatic UNMAP by default—this means (assuming the underlying storage supports it) Windows will issue UNMAP to the blocks a file used to consume immediately once it has been deleted or moved.

Automatic UNMAP is enabled by default in Windows. This can be verified with the following CLI command:

fsutil behavior query DisableDeleteNotify

If DisableDeleteNotify is set to 0, UNMAP is ENABLED. Setting it to 1, DISABLES it. Pure Storage recommends this value remain enabled. To change it, use the following command:

fsutil behavior set DisableDeleteNotify 0

fsutil1.png

Windows also supports manual UNMAP, which can be run on-demand or per a schedule. This is performed using the Disk Optimizer tool. Thin virtual disks can be identified in the tool as volume media types of “thin provisioned drive”—these are the volumes that support UNMAP.

fsutil2.png

Select the drive and click “Optimize”. Or configure a scheduled optimization.

Windows prior to ESXi 6.5 Patch 1

Ordinarily, this would work with the default configuration of NTFS, but VMware enforces additional UNMAP alignment, that requires a non-default NTFS configuration. In order to enable in-guest UNMAP in Windows for a given NTFS, that NTFS must be formatted using a 32 or 64K allocation unit size. This will force far more Windows UNMAP operations to be aligned with VMware requirements.

ntfs1.png

64K is also the standard recommendation for SQL Server installations—which therefore makes this a generally accepted change. To checking existing NTFS volumes are using the proper allocation unit size to support UNMAP, this simple PowerShell two-line command can be run to list a report:

$wql = "SELECT Label, Blocksize, Name FROM Win32_Volume WHERE FileSystem='NTFS'"
Get-WmiObject -Query $wql -ComputerName '.' | Select-Object Label, Blocksize, Name

ntfs2.png

BEST PRACTICE: Use the 32 or 64K Allocation Unit Size for NTFS to enable automatic UNMAP in a Windows virtual machine.

Due to alignment issues, the manual UNMAP tool (Disk Optimizer) is not particularly effective as often most UNMAPs are misaligned and will fail.

Windows with ESXi 6.5 Patch 1 and Later

As of ESXi 6.5 Patch 1, all NTFS allocation unit sizes will work with in-guest UNMAP. So at this ESXi level no unit size change is required to enable this functionality. That being said, there is additional benefit to using a 32 or 64 K allocation unit. While all sizes will allow all space to be reclaimed on the FlashArray, a 32 or 64 K allocation unit will cause more UNMAP requests to be aligned and therefore more of the underlying virtual disk will be returned to the VMFS (more of it will be shrunk).

The manual tool, Disk Optimizer, now works quite well and can be used. If UNMAP is disabled in Windows (it is enabled by default) this tool can be used to reclaim space on-demand or via a schedule. If automatic UNMAP is enabled, there is generally no need to use this tool.

For more information on this, please read the following blog post here.

In-Guest UNMAP with Linux

Starting with ESXi 6.5, In-Guest UNMAP is supported with Linux-based operating systems and most common file systems (Ext4, Btrfs, JFS, XFS, F2FS, VFAT). For a full report of UNMAP support with Linux configurations, please refer to appropriate Linux distribution documentation. To enable this behavior it is necessary to use Virtual Machine Hardware Version 13 or later.

Linux supports both automatic and manual methods of UNMAP.

Linux file systems do not support automatic UNMAP by default—this behavior needs to be enabled during the mount operation of the file system. This is achieved by mounting the file system with the “discard” option.

pureuser@ubuntu:/mnt$ sudo mount /dev/sdd /mnt/unmaptest -o discard

When mounted with the discard option, Linux will issue UNMAP to the blocks a file used to consume immediately once it has been deleted or moved.

Pure Storage does not require this feature to be enabled, but generally recommends doing so to keep capacity information correct throughout the storage stack.

BEST PRACTICE: Mount Linux filesystems with the “discard” option to enable in-guest UNMAP for Linux-based virtual machines.

Linux with ESXi 6.5

In ESXi 6.5, automatic UNMAP is supported and is able to reclaim most of the identified dead space. In general, Linux aligns most UNMAP requests in automatic UNMAP and therefore is quite effective in reclaiming space.

The manual method fstrim, does align initial UNMAP requests and therefore entirely fails.

linux1.png

Linux with ESXi 6.5 Patch 1 and Later

In ESXi 6.5 Patch 1 and later, automatic UNMAP is even more effective, now that even the small number of misaligned UNMAPs are handled. Furthermore, the manual method via fstrim works as well. So in this ESXi version, either method is a valid option.

What to expect after UNMAP is run on the FlashArray

The behavior of space reclamation (UNMAP) on a data-reducing array such as the FlashArray is somewhat changed and this is due to the concept of data-deduplication. When a host runs UNMAP (ESXi or otherwise), an UNMAP SCSI command is issued to the storage device that indicates what logical blocks are no longer in use. Traditionally, a logical block address referred to a specific part of an underlying disk on an array. So when UNMAP was issued, the physical space was always reclaimed because there was a direct correlation between a logical block and a physical cylinder/track/block on the storage device. This is not necessarily the case on a data reduction array.

A logical block on a FlashArray volume does not refer directly to a physical location on flash. Instead, if there is data written to that block, there is just a reference to a metadata pointer. That pointer then refers to a physical location. If UNMAP is executed against that block, only the metadata pointer is guaranteed to be removed. The physical data will remain if it is deduplicated, meaning other blocks (anywhere else on the array) have metadata pointers to that data too. A physical block is only reclaimed once the last pointer on your array to that data is removed. Therefore, UNMAP only directly removes metadata pointers. The reclamation of physical capacity is only a possible consequential result of UNMAP.

Herein lies the importance of UNMAP—making sure the metadata tables of the FlashArray are accurate. This allows space to be reclaimed as soon as possible. Generally, some physical space will be immediately returned upon reclamation, as not everything is dedupable. In the end, the amount of reclaimed space heavily relies on how dedupable the data set is—the higher the dedupability, the lower the likelihood, and amount, and immediacy of physical space being reclaimed. The fact to remember is that UNMAP is important for the long-term “health” of space reporting and usage on the array.

In addition to using the Pure Storage vSphere Web Client Plugin, standard provisioning methods through the FlashArray GUI or FlashArray CLI can be utilized. This section highlights the end-to-end provisioning of storage volumes on the Pure Storage FlashArray from creation of a volume to formatting it on an ESXi host. The management simplicity is one of the guiding principles of FlashArray as just a few clicks are required to configure and provision storage to the server.

Space Reclamation with VMFS

Space reclamation with VMFS differs depending on the version of ESXi. VMware has supported UNMAP in various forms since ESXi 5.0. This document is only going to focus on UNMAP implementation for ESXi 5.5 and later. For previous UNMAP behaviors, refer to VMware documentation.

In vSphere 5.5 and 6.0, VMFS UNMAP is a manual process, executed on demand by an administrator. In vSphere 6.5, VMFS UNMAP is an automatic process that gets executed by ESXi as needed without administrative intervention.

VMFS UNMAP in vSphere 5.5 through 6.0

To reclaim space in vSphere 5.5 and 6.0, UNMAP is available in the command “esxcli”. UNMAP can be run anywhere esxcli is installed and therefore does not require an SSH session:

esxcli storage vmfs unmap -l <datastore name> -n (blocks per iteration)

UNMAP with esxcli is an iterative process. The block count specifies how large each iteration is. If you do not specify a block count, 200 blocks will be the default value (each block is 1 MB, so each iteration issues UNMAP to a 200 MB section at a time). The operation runs UNMAP against the free space of the VMFS volume until the entirety of the free space has been reclaimed. If the free space is not perfectly divisible by the block count, the block count will be reduced at the final iteration to whatever amount of space is left.

While the FlashArray can handle very large values for this operation, ESXi does not support increasing the block count any larger than 1% of the free capacity of the target VMFS volume. Consequently, the best practice for block count during UNMAP is no greater than 1% of the free space. So as an example, if a VMFS volume has 1,048,576 MB free, the largest block count supported is 10,485 (always round down). If you specify a larger value the command will still be accepted, but ESXi will override the value back down to the default of 200 MB, which will dramatically slow down the operation.

It is imperative to calculate the block count value based off of the 1% of the free space only when that capacity is expressed in megabytes—since VMFS 5 blocks are 1 MB each. This will allow for simple and accurate identification of the largest allowable block count for a given datastore. Using GB or TB can lead to rounding errors, and as a result, too large of a block count value. Always round off decimals to the lowest near MB in order to calculate this number (do not round up).

BEST PRACTICE: For shortest UNMAP duration, use a large block count.

There are other methods to run or even schedule UNMAP, such as PowerCLI, vRealize Orchestrator and the FlashArray vSphere Web Client Plugin. These methods are outside of the scope of this document, please refer to the respective VMware and FlashArray integration documents for further detail.

If an UNMAP process seems to be slow, you can check to see if the block count value was overridden. You can check the hostd.log file in the /var/log/ directory on the target ESXi host. For every UNMAP operation there will be a series of messages that dictate the block count for every iteration. Examine the log and look for a line that indicates the UUID of the VMFS volume being reclaimed, the line will look like the example below:

Unmap: Async Unmapped 5000 blocks from volume 545d6633-4e026dce-d8b2-90e2ba392174 

From ESXi 5.5 Patch 3 and later, any UNMAP operation against a datastore that is 75% or more full will use a block count of 200 regardless to any block count specified in the command. For more information refer to the VMware KB article here.

VMFS UNMAP in vSphere 6.5

In the ESXi 6.5 release, VMware introduced automatic UNMAP support for VMFS volumes. ESXi 6.5 introduced a new version of VMFS, version 6. With VMFS-6, there is a new setting for all VMFS-6 volumes called UNMAP priority. This defaults to low.

unmap1.png

Pure Storage recommends that this be configured to “low” and not disabled. VMware only offers a low priority for ESXi 6.5—medium and high priorities were not enabled in the ESXi kernel.

Automatic UNMAP with vSphere 6.5 is an asynchronous task and reclamation will not occur immediately and will typically take 12 to 24 hours to complete. Each ESXi 6.5 host has a UNMAP “crawler” that will work in tandem to reclaim space on all VMFS-6 volumes they have access to. If, for some reason, the space needs to be reclaimed immediately, the esxcli UNMAP operation described in the previous section can be run.

Please note that VMFS-6 Automatic UNMAP will not be issued to inactive datastores. In other words, if a datastore does not have actively running virtual machines on it, the datastore will be ignored. In those cases, the simplest option to reclaim them is to run the traditional esxcli UNMAP command.

Pure Storage does support automatic UNMAP being disabled, if that is, for some reason, preferred by the customer. But to provide the most efficient and accurate environment, it is highly recommended to be left enabled.

VMFS UNMAP in vSphere 6.7 and later

In ESXi 6.7 VMware introduced a new option for utilizing automatic UNMAP as well as adding additional configuration options for the existing features available in ESXi 6.5.

The two methods available for Automatic UNMAP in ESXi 6.7 and later:

  • fixed (new to ESXi 6.7)
  • priority (medium and high)

Since we are already familiar with what priority based space reclamation is from the VMFS UNMAP in vSphere 6.5 section above let's start with the enhancements available in 6.7 before reviewing "fixed" space reclamation.

In ESXi 6.5 you only had one option available for automatic space reclamation (priority based) and a singular option of "low". With ESXi 6.7 that now changes and you have the added options of "medium" and "high".

The differences between the three are as follows:**

Space Reclamation Priority Description
None Disables UNMAP operations for the datastore.
Low (default) Sends the UNMAP command at a rate of approximately 25–50 MB per second.
Medium Sends the UNMAP command at a rate of  approximately 50–100 MB per second.
High Sends the UNMAP command at a rate of over 100 MB per second.

**Information on chart was found from VMware knowledge sources here.

As you will note above, priority based space reclamation process is a variable process with speed of reclamation depending on what option you have chosen. This provides for flexibility within ESXi depending on the current load of the datastore(s) on how quickly space can be recovered.

Also available in ESXi 6.7 is a new "fixed" space reclamation method. This option provides the end-user with the ability to determine how quickly (in MB/s) UNMAP operations can happen to the backing storage at a "fixed" rate. The options vary from 100 MB/s up to 2000 MB/s. Introducing this option provides the end-user with the ability to set a static rate for space reclamation as well as allowing for a much more aggressive reclamation process if the backing storage is able to ingest the higher load.

Pure Storage still recommends utilizing priority based reclamation set to the default option of "low". As additional testing is performed this recommendation may change in the future.

Modifying VMFS UNMAP priorities

It is while creating a new datastore when you will have the first opportunity to configure space reclamation. The options here are limited as you can only disable space reclamation (not recommended) or use priority based reclamation at low priority (default option selected).

create-ds-space-reclamation.png

Let's say however that you want to use the fixed space reclamation method for a higher rate of UNMAPs being sent to the underlying storage provider.

Once your datastore has been created you can right click on the datastore and select "Edit Space Reclamation". From here you can select the desired speed and save the changes.

This is illustrated below.

rightclick-ds-option.png

fixed-rate-space-reclamation.png

The last scenario is changing the priority level from "low" to "medium" or "high". As you can see from the screenshot above there are no options to modify the space reclamation priorities. As there are no options available in the GUI to modify these values the command line interface (CLI) on the ESXi host is where this change must be made. If this is something you wish to do please review the, Use the ESXCLI Command to Change the Space Reclamation Parameters, from VMware on how this change can be made.

VMFS UNMAP in vSphere 5.5 through 6.0

To reclaim space in vSphere 5.5 and 6.0, UNMAP is available in the command “esxcli”. UNMAP can be run anywhere esxcli is installed and therefore does not require an SSH session:

esxcli storage vmfs unmap -l <datastore name> -n (blocks per iteration)

UNMAP with esxcli is an iterative process. The block count specifies how large each iteration is. If you do not specify a block count, 200 blocks will be the default value (each block is 1 MB, so each iteration issues UNMAP to a 200 MB section at a time). The operation runs UNMAP against the free space of the VMFS volume until the entirety of the free space has been reclaimed. If the free space is not perfectly divisible by the block count, the block count will be reduced at the final iteration to whatever amount of space is left.

While the FlashArray can handle very large values for this operation, ESXi does not support increasing the block count any larger than 1% of the free capacity of the target VMFS volume. Consequently, the best practice for block count during UNMAP is no greater than 1% of the free space. So as an example, if a VMFS volume has 1,048,576 MB free, the largest block count supported is 10,485 (always round down). If you specify a larger value the command will still be accepted, but ESXi will override the value back down to the default of 200 MB, which will dramatically slow down the operation.

It is imperative to calculate the block count value based off of the 1% of the free space only when that capacity is expressed in megabytes—since VMFS 5 blocks are 1 MB each. This will allow for simple and accurate identification of the largest allowable block count for a given datastore. Using GB or TB can lead to rounding errors, and as a result, too large of a block count value. Always round off decimals to the lowest near MB in order to calculate this number (do not round up).

BEST PRACTICE: For shortest UNMAP duration, use a large block count.

There are other methods to run or even schedule UNMAP, such as PowerCLI, vRealize Orchestrator and the FlashArray vSphere Web Client Plugin. These methods are outside of the scope of this document, please refer to the respective VMware and FlashArray integration documents for further detail.

If an UNMAP process seems to be slow, you can check to see if the block count value was overridden. You can check the hostd.log file in the /var/log/ directory on the target ESXi host. For every UNMAP operation there will be a series of messages that dictate the block count for every iteration. Examine the log and look for a line that indicates the UUID of the VMFS volume being reclaimed, the line will look like the example below:

Unmap: Async Unmapped 5000 blocks from volume 545d6633-4e026dce-d8b2-90e2ba392174 

From ESXi 5.5 Patch 3 and later, any UNMAP operation against a datastore that is 75% or more full will use a block count of 200 regardless to any block count specified in the command. For more information refer to the VMware KB article here.

VMFS UNMAP in vSphere 6.5

In the ESXi 6.5 release, VMware introduced automatic UNMAP support for VMFS volumes. ESXi 6.5 introduced a new version of VMFS, version 6. With VMFS-6, there is a new setting for all VMFS-6 volumes called UNMAP priority. This defaults to low.

unmap1.png

Pure Storage recommends that this be configured to “low” and not disabled. VMware only offers a low priority for ESXi 6.5—medium and high priorities were not enabled in the ESXi kernel.

Automatic UNMAP with vSphere 6.5 is an asynchronous task and reclamation will not occur immediately and will typically take 12 to 24 hours to complete. Each ESXi 6.5 host has a UNMAP “crawler” that will work in tandem to reclaim space on all VMFS-6 volumes they have access to. If, for some reason, the space needs to be reclaimed immediately, the esxcli UNMAP operation described in the previous section can be run.

Please note that VMFS-6 Automatic UNMAP will not be issued to inactive datastores. In other words, if a datastore does not have actively running virtual machines on it, the datastore will be ignored. In those cases, the simplest option to reclaim them is to run the traditional esxcli UNMAP command.

Pure Storage does support automatic UNMAP being disabled, if that is, for some reason, preferred by the customer. But to provide the most efficient and accurate environment, it is highly recommended to be left enabled.

VMFS UNMAP in vSphere 6.7 and later

In ESXi 6.7 VMware introduced a new option for utilizing automatic UNMAP as well as adding additional configuration options for the existing features available in ESXi 6.5.

The two methods available for Automatic UNMAP in ESXi 6.7 and later:

  • fixed (new to ESXi 6.7)
  • priority (medium and high)

Since we are already familiar with what priority based space reclamation is from the VMFS UNMAP in vSphere 6.5 section above let's start with the enhancements available in 6.7 before reviewing "fixed" space reclamation.

In ESXi 6.5 you only had one option available for automatic space reclamation (priority based) and a singular option of "low". With ESXi 6.7 that now changes and you have the added options of "medium" and "high".

The differences between the three are as follows:**

Space Reclamation Priority Description
None Disables UNMAP operations for the datastore.
Low (default) Sends the UNMAP command at a rate of approximately 25–50 MB per second.
Medium Sends the UNMAP command at a rate of  approximately 50–100 MB per second.
High Sends the UNMAP command at a rate of over 100 MB per second.

**Information on chart was found from VMware knowledge sources here.

As you will note above, priority based space reclamation process is a variable process with speed of reclamation depending on what option you have chosen. This provides for flexibility within ESXi depending on the current load of the datastore(s) on how quickly space can be recovered.

Also available in ESXi 6.7 is a new "fixed" space reclamation method. This option provides the end-user with the ability to determine how quickly (in MB/s) UNMAP operations can happen to the backing storage at a "fixed" rate. The options vary from 100 MB/s up to 2000 MB/s. Introducing this option provides the end-user with the ability to set a static rate for space reclamation as well as allowing for a much more aggressive reclamation process if the backing storage is able to ingest the higher load.

Pure Storage still recommends utilizing priority based reclamation set to the default option of "low". As additional testing is performed this recommendation may change in the future.

Modifying VMFS UNMAP priorities

It is while creating a new datastore when you will have the first opportunity to configure space reclamation. The options here are limited as you can only disable space reclamation (not recommended) or use priority based reclamation at low priority (default option selected).

create-ds-space-reclamation.png

Let's say however that you want to use the fixed space reclamation method for a higher rate of UNMAPs being sent to the underlying storage provider.

Once your datastore has been created you can right click on the datastore and select "Edit Space Reclamation". From here you can select the desired speed and save the changes.

This is illustrated below.

rightclick-ds-option.png

Modifying VMFS UNMAP priorities

It is while creating a new datastore when you will have the first opportunity to configure space reclamation. The options here are limited as you can only disable space reclamation (not recommended) or use priority based reclamation at low priority (default option selected).

create-ds-space-reclamation.png

Let's say however that you want to use the fixed space reclamation method for a higher rate of UNMAPs being sent to the underlying storage provider.

Once your datastore has been created you can right click on the datastore and select "Edit Space Reclamation". From here you can select the desired speed and save the changes.

This is illustrated below.

rightclick-ds-option.png

fixed-rate-space-reclamation.png

The last scenario is changing the priority level from "low" to "medium" or "high". As you can see from the screenshot above there are no options to modify the space reclamation priorities. As there are no options available in the GUI to modify these values the command line interface (CLI) on the ESXi host is where this change must be made. If this is something you wish to do please review the, Use the ESXCLI Command to Change the Space Reclamation Parameters, from VMware on how this change can be made.

Space Reclamation In-Guest

The discussion above speaks only about space reclamation directly on a VMFS volume which pertains to dead space accumulated by the deletion or migration of virtual machines and virtual disks. Running UNMAP on a VMFS only removes dead space in that scenario. But, as mentioned earlier, dead space can accumulate higher up in the VMware stack—inside of the virtual machine itself.

When a guest writes data to a file system on a virtual disk, the required capacity is allocated on the VMFS (if not already allocated) by expanding the file that represents the virtual disk. The data is then committed down to the array. When that data is deleted by the guest, the guest OS filesystem is cleared of the file, but this deletion is not reflected by the virtual disk allocation on the VMFS, nor the physical capacity on the array. To ensure the below layers are accurately reporting used space, in-guest UNMAP should be enabled.

Understanding In-Guest UNMAP in ESXi

Prior to ESXi 6.0 and virtual machine hardware version 11, guests could not leverage native UNMAP capabilities on a virtual disk because ESXi virtualized the SCSI layer and did not report UNMAP capability up through to the guest. So even if guest operating systems supported UNMAP natively, they could not issue UNMAP to a file system residing on a virtual disk. Consequently, reclaiming this space was a manual and tedious process.

In ESXi 6.0, VMware has resolved this problem and streamlined the reclamation process. With in-guest UNMAP support, guests running in a virtual machine using hardware version 11 can now issue UNMAP directly to virtual disks. The process is as follows:

  1. A guest application or user deletes a file from a file system residing on a thin virtual disk
  2. The guest automatically (or manually) issues UNMAP to the guest file system on the virtual disk
  3. The virtual disk is then shrunk in accordance to the amount of space reclaimed inside of it.
  4. If EnableBlockDelete is enabled, UNMAP will then be issued to the VMFS volume for the space that previously was held by the thin virtual disk. The capacity is then reclaimed on the FlashArray.

Prior to ESXi 6.0, the parameter EnableBlockDelete was a defunct option that was previously only functional in very early versions of ESXi 5.0 to enable or disable automated VMFS UNMAP. This option is now functional in ESXi 6.0 and has been re-purposed to allow in-guest UNMAP to be translated down to the VMFS and accordingly the SCSI volume. By default, EnableBlockDelete is disabled and can be enabled via the vSphere Web Client or CLI utilities.

enable-block-delete.png

In-guest UNMAP support does actually not require this parameter to be enabled though. Enabling this parameter allows for end-to-end UNMAP or in other words, in-guest UNMAP commands to be passed down to the VMFS layer. For this reason, enabling this option is a best practice for ESXi 6.x and later.

Enable the option “VMFS3.EnableBlockDelete” on ESXi 6.x & 7.x hosts where VMFS 5 datastores are in use. This is disabled by default and is not required for VMFS 6 datastores. To enable set the value to "1".

For more information on EnableBlockDelete and VMFS-6, you can refer to the following blog post here.

ESXi 6.5 expands support for in-guest UNMAP to additional guests types. ESXi 6.0 in-guest UNMAP only is supported with Windows Server 2012 R2 (or Windows 8) and later. ESXi 6.5 introduces support for Linux operating systems. The underlying reason for this is that ESXi 6.0 and earlier only supported SCSI version 2. Windows uses SCSI-2 UNMAP and therefore could take advantage of this feature set. Linux uses SCSI version 5 and could not. In ESXi 6.5, VMware enhanced their SCSI support to go up to SCSI-6, which allows guest like Linux to issue commands that they could not before.

Using the built-in Linux tool, sq_inq, you can see, through an excerpt of the response, the SCSI support difference between the ESXi versions:

unmap3.png

You can note the differences in SCSI support level and also the product revision of the virtual disk themselves (version 1 to 2).

It is important to note that simply upgrading to ESXi 6.5 will not provide SCSI-6 support. The virtual hardware for the virtual machine must be upgraded to version 13 once ESXi has been upgraded. VM hardware version 13 is what provides the additional SCSI support to the guest.

The following are the requirements for in-guest UNMAP to properly function:

  1. The target virtual disk must be a thin virtual disk. Thick-type virtual disks do not support UNMAP.
  2. For Windows In-Guest UNMAP:
    1. ESXi 6.0 and later
    2. VM Hardware version 11 and later
  3. For Linux In-Guest UNMAP:
    1. ESXi 6.5 and later
    2. VM Hardware version 13 and later
  4. If Change Block Tracking (CBT) is enabled for a virtual disk, In-Guest UNMAP for that virtual disk is only supported starting with ESXi 6.5

In-Guest UNMAP Alignment Requirements

VMware ESXi requires that any UNMAP request sent down by a guest must be aligned to 1 MB. For a variety of reasons, not all UNMAP requests will be aligned as such and in in ESXi 6.5 and earlier a large percentage failed. In ESXi 6.5 Patch 1, ESXi has been altered to be more tolerant of misaligned UNMAP requests. See the VMware patch information here.

Prior to this, any UNMAP requests that were even partially misaligned would fail entirely. Leading to no reclamation. In ESXi 6.5 P1, any portion of UNMAP requests that are aligned will be accepted and passed along to the underlying array. Misaligned portions will be accepted but not passed down. Instead, the affected blocks referred to by the misaligned UNMAPs will be instead zeroed out with WRITE SAME. The benefit of this behavior on the FlashArray, is that zeroing is identical in behavior to UNMAP so all of the space will be reclaimed regardless of misalignment.

BEST PRACTICE: Apply ESXi 6.5 Patch Release ESXi650-201703001 (2148989) as soon as possible to be able to take full advantage of in-guest UNMAP.

In-Guest UNMAP with Windows

Starting with ESXi 6.0, In-Guest UNMAP is supported with Windows 2012 R2 and later Windows-based operating systems. For a full report of UNMAP support with Windows, please refer to Microsoft documentation.

NTFS supports automatic UNMAP by default—this means (assuming the underlying storage supports it) Windows will issue UNMAP to the blocks a file used to consume immediately once it has been deleted or moved.

Automatic UNMAP is enabled by default in Windows. This can be verified with the following CLI command:

fsutil behavior query DisableDeleteNotify

If DisableDeleteNotify is set to 0, UNMAP is ENABLED. Setting it to 1, DISABLES it. Pure Storage recommends this value remain enabled. To change it, use the following command:

fsutil behavior set DisableDeleteNotify 0

fsutil1.png

Windows also supports manual UNMAP, which can be run on-demand or per a schedule. This is performed using the Disk Optimizer tool. Thin virtual disks can be identified in the tool as volume media types of “thin provisioned drive”—these are the volumes that support UNMAP.

fsutil2.png

Select the drive and click “Optimize”. Or configure a scheduled optimization.

Windows prior to ESXi 6.5 Patch 1

Ordinarily, this would work with the default configuration of NTFS, but VMware enforces additional UNMAP alignment, that requires a non-default NTFS configuration. In order to enable in-guest UNMAP in Windows for a given NTFS, that NTFS must be formatted using a 32 or 64K allocation unit size. This will force far more Windows UNMAP operations to be aligned with VMware requirements.

ntfs1.png

64K is also the standard recommendation for SQL Server installations—which therefore makes this a generally accepted change. To checking existing NTFS volumes are using the proper allocation unit size to support UNMAP, this simple PowerShell two-line command can be run to list a report:

$wql = "SELECT Label, Blocksize, Name FROM Win32_Volume WHERE FileSystem='NTFS'"
Get-WmiObject -Query $wql -ComputerName '.' | Select-Object Label, Blocksize, Name

ntfs2.png

BEST PRACTICE: Use the 32 or 64K Allocation Unit Size for NTFS to enable automatic UNMAP in a Windows virtual machine.

Due to alignment issues, the manual UNMAP tool (Disk Optimizer) is not particularly effective as often most UNMAPs are misaligned and will fail.

Windows with ESXi 6.5 Patch 1 and Later

As of ESXi 6.5 Patch 1, all NTFS allocation unit sizes will work with in-guest UNMAP. So at this ESXi level no unit size change is required to enable this functionality. That being said, there is additional benefit to using a 32 or 64 K allocation unit. While all sizes will allow all space to be reclaimed on the FlashArray, a 32 or 64 K allocation unit will cause more UNMAP requests to be aligned and therefore more of the underlying virtual disk will be returned to the VMFS (more of it will be shrunk).

The manual tool, Disk Optimizer, now works quite well and can be used. If UNMAP is disabled in Windows (it is enabled by default) this tool can be used to reclaim space on-demand or via a schedule. If automatic UNMAP is enabled, there is generally no need to use this tool.

For more information on this, please read the following blog post here.

In-Guest UNMAP with Linux

Starting with ESXi 6.5, In-Guest UNMAP is supported with Linux-based operating systems and most common file systems (Ext4, Btrfs, JFS, XFS, F2FS, VFAT). For a full report of UNMAP support with Linux configurations, please refer to appropriate Linux distribution documentation. To enable this behavior it is necessary to use Virtual Machine Hardware Version 13 or later.

Linux supports both automatic and manual methods of UNMAP.

Linux file systems do not support automatic UNMAP by default—this behavior needs to be enabled during the mount operation of the file system. This is achieved by mounting the file system with the “discard” option.

pureuser@ubuntu:/mnt$ sudo mount /dev/sdd /mnt/unmaptest -o discard

When mounted with the discard option, Linux will issue UNMAP to the blocks a file used to consume immediately once it has been deleted or moved.

Pure Storage does not require this feature to be enabled, but generally recommends doing so to keep capacity information correct throughout the storage stack.

BEST PRACTICE: Mount Linux filesystems with the “discard” option to enable in-guest UNMAP for Linux-based virtual machines.

Linux with ESXi 6.5

In ESXi 6.5, automatic UNMAP is supported and is able to reclaim most of the identified dead space. In general, Linux aligns most UNMAP requests in automatic UNMAP and therefore is quite effective in reclaiming space.

The manual method fstrim, does align initial UNMAP requests and therefore entirely fails.

linux1.png

Linux with ESXi 6.5 Patch 1 and Later

In ESXi 6.5 Patch 1 and later, automatic UNMAP is even more effective, now that even the small number of misaligned UNMAPs are handled. Furthermore, the manual method via fstrim works as well. So in this ESXi version, either method is a valid option.

Understanding In-Guest UNMAP in ESXi

Prior to ESXi 6.0 and virtual machine hardware version 11, guests could not leverage native UNMAP capabilities on a virtual disk because ESXi virtualized the SCSI layer and did not report UNMAP capability up through to the guest. So even if guest operating systems supported UNMAP natively, they could not issue UNMAP to a file system residing on a virtual disk. Consequently, reclaiming this space was a manual and tedious process.

In ESXi 6.0, VMware has resolved this problem and streamlined the reclamation process. With in-guest UNMAP support, guests running in a virtual machine using hardware version 11 can now issue UNMAP directly to virtual disks. The process is as follows:

  1. A guest application or user deletes a file from a file system residing on a thin virtual disk
  2. The guest automatically (or manually) issues UNMAP to the guest file system on the virtual disk
  3. The virtual disk is then shrunk in accordance to the amount of space reclaimed inside of it.
  4. If EnableBlockDelete is enabled, UNMAP will then be issued to the VMFS volume for the space that previously was held by the thin virtual disk. The capacity is then reclaimed on the FlashArray.

Prior to ESXi 6.0, the parameter EnableBlockDelete was a defunct option that was previously only functional in very early versions of ESXi 5.0 to enable or disable automated VMFS UNMAP. This option is now functional in ESXi 6.0 and has been re-purposed to allow in-guest UNMAP to be translated down to the VMFS and accordingly the SCSI volume. By default, EnableBlockDelete is disabled and can be enabled via the vSphere Web Client or CLI utilities.

enable-block-delete.png

In-guest UNMAP support does actually not require this parameter to be enabled though. Enabling this parameter allows for end-to-end UNMAP or in other words, in-guest UNMAP commands to be passed down to the VMFS layer. For this reason, enabling this option is a best practice for ESXi 6.x and later.

Enable the option “VMFS3.EnableBlockDelete” on ESXi 6.x & 7.x hosts where VMFS 5 datastores are in use. This is disabled by default and is not required for VMFS 6 datastores. To enable set the value to "1".

For more information on EnableBlockDelete and VMFS-6, you can refer to the following blog post here.

ESXi 6.5 expands support for in-guest UNMAP to additional guests types. ESXi 6.0 in-guest UNMAP only is supported with Windows Server 2012 R2 (or Windows 8) and later. ESXi 6.5 introduces support for Linux operating systems. The underlying reason for this is that ESXi 6.0 and earlier only supported SCSI version 2. Windows uses SCSI-2 UNMAP and therefore could take advantage of this feature set. Linux uses SCSI version 5 and could not. In ESXi 6.5, VMware enhanced their SCSI support to go up to SCSI-6, which allows guest like Linux to issue commands that they could not before.

Using the built-in Linux tool, sq_inq, you can see, through an excerpt of the response, the SCSI support difference between the ESXi versions:

unmap3.png

You can note the differences in SCSI support level and also the product revision of the virtual disk themselves (version 1 to 2).

It is important to note that simply upgrading to ESXi 6.5 will not provide SCSI-6 support. The virtual hardware for the virtual machine must be upgraded to version 13 once ESXi has been upgraded. VM hardware version 13 is what provides the additional SCSI support to the guest.

The following are the requirements for in-guest UNMAP to properly function:

  1. The target virtual disk must be a thin virtual disk. Thick-type virtual disks do not support UNMAP.
  2. For Windows In-Guest UNMAP:
    1. ESXi 6.0 and later
    2. VM Hardware version 11 and later
  3. For Linux In-Guest UNMAP:
    1. ESXi 6.5 and later
    2. VM Hardware version 13 and later
  4. If Change Block Tracking (CBT) is enabled for a virtual disk, In-Guest UNMAP for that virtual disk is only supported starting with ESXi 6.5

In-Guest UNMAP Alignment Requirements

VMware ESXi requires that any UNMAP request sent down by a guest must be aligned to 1 MB. For a variety of reasons, not all UNMAP requests will be aligned as such and in in ESXi 6.5 and earlier a large percentage failed. In ESXi 6.5 Patch 1, ESXi has been altered to be more tolerant of misaligned UNMAP requests. See the VMware patch information here.

Prior to this, any UNMAP requests that were even partially misaligned would fail entirely. Leading to no reclamation. In ESXi 6.5 P1, any portion of UNMAP requests that are aligned will be accepted and passed along to the underlying array. Misaligned portions will be accepted but not passed down. Instead, the affected blocks referred to by the misaligned UNMAPs will be instead zeroed out with WRITE SAME. The benefit of this behavior on the FlashArray, is that zeroing is identical in behavior to UNMAP so all of the space will be reclaimed regardless of misalignment.

BEST PRACTICE: Apply ESXi 6.5 Patch Release ESXi650-201703001 (2148989) as soon as possible to be able to take full advantage of in-guest UNMAP.

In-Guest UNMAP with Windows

Starting with ESXi 6.0, In-Guest UNMAP is supported with Windows 2012 R2 and later Windows-based operating systems. For a full report of UNMAP support with Windows, please refer to Microsoft documentation.

NTFS supports automatic UNMAP by default—this means (assuming the underlying storage supports it) Windows will issue UNMAP to the blocks a file used to consume immediately once it has been deleted or moved.

Automatic UNMAP is enabled by default in Windows. This can be verified with the following CLI command:

fsutil behavior query DisableDeleteNotify

If DisableDeleteNotify is set to 0, UNMAP is ENABLED. Setting it to 1, DISABLES it. Pure Storage recommends this value remain enabled. To change it, use the following command:

fsutil behavior set DisableDeleteNotify 0

fsutil1.png

Windows also supports manual UNMAP, which can be run on-demand or per a schedule. This is performed using the Disk Optimizer tool. Thin virtual disks can be identified in the tool as volume media types of “thin provisioned drive”—these are the volumes that support UNMAP.

fsutil2.png

Select the drive and click “Optimize”. Or configure a scheduled optimization.

Windows prior to ESXi 6.5 Patch 1

Ordinarily, this would work with the default configuration of NTFS, but VMware enforces additional UNMAP alignment, that requires a non-default NTFS configuration. In order to enable in-guest UNMAP in Windows for a given NTFS, that NTFS must be formatted using a 32 or 64K allocation unit size. This will force far more Windows UNMAP operations to be aligned with VMware requirements.

ntfs1.png

64K is also the standard recommendation for SQL Server installations—which therefore makes this a generally accepted change. To checking existing NTFS volumes are using the proper allocation unit size to support UNMAP, this simple PowerShell two-line command can be run to list a report:

$wql = "SELECT Label, Blocksize, Name FROM Win32_Volume WHERE FileSystem='NTFS'"
Get-WmiObject -Query $wql -ComputerName '.' | Select-Object Label, Blocksize, Name

ntfs2.png

BEST PRACTICE: Use the 32 or 64K Allocation Unit Size for NTFS to enable automatic UNMAP in a Windows virtual machine.

Due to alignment issues, the manual UNMAP tool (Disk Optimizer) is not particularly effective as often most UNMAPs are misaligned and will fail.

Windows with ESXi 6.5 Patch 1 and Later

As of ESXi 6.5 Patch 1, all NTFS allocation unit sizes will work with in-guest UNMAP. So at this ESXi level no unit size change is required to enable this functionality. That being said, there is additional benefit to using a 32 or 64 K allocation unit. While all sizes will allow all space to be reclaimed on the FlashArray, a 32 or 64 K allocation unit will cause more UNMAP requests to be aligned and therefore more of the underlying virtual disk will be returned to the VMFS (more of it will be shrunk).

The manual tool, Disk Optimizer, now works quite well and can be used. If UNMAP is disabled in Windows (it is enabled by default) this tool can be used to reclaim space on-demand or via a schedule. If automatic UNMAP is enabled, there is generally no need to use this tool.

For more information on this, please read the following blog post here.

Windows prior to ESXi 6.5 Patch 1

Ordinarily, this would work with the default configuration of NTFS, but VMware enforces additional UNMAP alignment, that requires a non-default NTFS configuration. In order to enable in-guest UNMAP in Windows for a given NTFS, that NTFS must be formatted using a 32 or 64K allocation unit size. This will force far more Windows UNMAP operations to be aligned with VMware requirements.

ntfs1.png

64K is also the standard recommendation for SQL Server installations—which therefore makes this a generally accepted change. To checking existing NTFS volumes are using the proper allocation unit size to support UNMAP, this simple PowerShell two-line command can be run to list a report:

$wql = "SELECT Label, Blocksize, Name FROM Win32_Volume WHERE FileSystem='NTFS'"
Get-WmiObject -Query $wql -ComputerName '.' | Select-Object Label, Blocksize, Name

ntfs2.png

BEST PRACTICE: Use the 32 or 64K Allocation Unit Size for NTFS to enable automatic UNMAP in a Windows virtual machine.

Due to alignment issues, the manual UNMAP tool (Disk Optimizer) is not particularly effective as often most UNMAPs are misaligned and will fail.

Windows with ESXi 6.5 Patch 1 and Later

As of ESXi 6.5 Patch 1, all NTFS allocation unit sizes will work with in-guest UNMAP. So at this ESXi level no unit size change is required to enable this functionality. That being said, there is additional benefit to using a 32 or 64 K allocation unit. While all sizes will allow all space to be reclaimed on the FlashArray, a 32 or 64 K allocation unit will cause more UNMAP requests to be aligned and therefore more of the underlying virtual disk will be returned to the VMFS (more of it will be shrunk).

The manual tool, Disk Optimizer, now works quite well and can be used. If UNMAP is disabled in Windows (it is enabled by default) this tool can be used to reclaim space on-demand or via a schedule. If automatic UNMAP is enabled, there is generally no need to use this tool.

For more information on this, please read the following blog post here.

In-Guest UNMAP with Linux

Starting with ESXi 6.5, In-Guest UNMAP is supported with Linux-based operating systems and most common file systems (Ext4, Btrfs, JFS, XFS, F2FS, VFAT). For a full report of UNMAP support with Linux configurations, please refer to appropriate Linux distribution documentation. To enable this behavior it is necessary to use Virtual Machine Hardware Version 13 or later.

Linux supports both automatic and manual methods of UNMAP.

Linux file systems do not support automatic UNMAP by default—this behavior needs to be enabled during the mount operation of the file system. This is achieved by mounting the file system with the “discard” option.

pureuser@ubuntu:/mnt$ sudo mount /dev/sdd /mnt/unmaptest -o discard

When mounted with the discard option, Linux will issue UNMAP to the blocks a file used to consume immediately once it has been deleted or moved.

Pure Storage does not require this feature to be enabled, but generally recommends doing so to keep capacity information correct throughout the storage stack.

BEST PRACTICE: Mount Linux filesystems with the “discard” option to enable in-guest UNMAP for Linux-based virtual machines.

Linux with ESXi 6.5

In ESXi 6.5, automatic UNMAP is supported and is able to reclaim most of the identified dead space. In general, Linux aligns most UNMAP requests in automatic UNMAP and therefore is quite effective in reclaiming space.

The manual method fstrim, does align initial UNMAP requests and therefore entirely fails.

linux1.png

Linux with ESXi 6.5 Patch 1 and Later

In ESXi 6.5 Patch 1 and later, automatic UNMAP is even more effective, now that even the small number of misaligned UNMAPs are handled. Furthermore, the manual method via fstrim works as well. So in this ESXi version, either method is a valid option.

Linux with ESXi 6.5

In ESXi 6.5, automatic UNMAP is supported and is able to reclaim most of the identified dead space. In general, Linux aligns most UNMAP requests in automatic UNMAP and therefore is quite effective in reclaiming space.

The manual method fstrim, does align initial UNMAP requests and therefore entirely fails.

linux1.png

Linux with ESXi 6.5 Patch 1 and Later

In ESXi 6.5 Patch 1 and later, automatic UNMAP is even more effective, now that even the small number of misaligned UNMAPs are handled. Furthermore, the manual method via fstrim works as well. So in this ESXi version, either method is a valid option.

What to expect after UNMAP is run on the FlashArray

The behavior of space reclamation (UNMAP) on a data-reducing array such as the FlashArray is somewhat changed and this is due to the concept of data-deduplication. When a host runs UNMAP (ESXi or otherwise), an UNMAP SCSI command is issued to the storage device that indicates what logical blocks are no longer in use. Traditionally, a logical block address referred to a specific part of an underlying disk on an array. So when UNMAP was issued, the physical space was always reclaimed because there was a direct correlation between a logical block and a physical cylinder/track/block on the storage device. This is not necessarily the case on a data reduction array.

A logical block on a FlashArray volume does not refer directly to a physical location on flash. Instead, if there is data written to that block, there is just a reference to a metadata pointer. That pointer then refers to a physical location. If UNMAP is executed against that block, only the metadata pointer is guaranteed to be removed. The physical data will remain if it is deduplicated, meaning other blocks (anywhere else on the array) have metadata pointers to that data too. A physical block is only reclaimed once the last pointer on your array to that data is removed. Therefore, UNMAP only directly removes metadata pointers. The reclamation of physical capacity is only a possible consequential result of UNMAP.

Herein lies the importance of UNMAP—making sure the metadata tables of the FlashArray are accurate. This allows space to be reclaimed as soon as possible. Generally, some physical space will be immediately returned upon reclamation, as not everything is dedupable. In the end, the amount of reclaimed space heavily relies on how dedupable the data set is—the higher the dedupability, the lower the likelihood, and amount, and immediacy of physical space being reclaimed. The fact to remember is that UNMAP is important for the long-term “health” of space reporting and usage on the array.

In addition to using the Pure Storage vSphere Web Client Plugin, standard provisioning methods through the FlashArray GUI or FlashArray CLI can be utilized. This section highlights the end-to-end provisioning of storage volumes on the Pure Storage FlashArray from creation of a volume to formatting it on an ESXi host. The management simplicity is one of the guiding principles of FlashArray as just a few clicks are required to configure and provision storage to the server.

Read article
Troubleshooting: Pure Storage Icon Not Visible in vCenter Server

Problem

The Pure Storage icon is not showing up in the vCenter Web Client. This can be caused the JDK not being properly installed on the VMware vCenter server.  We can verify whether the JDK is installed correctly, and which Java version is being used by the Pure Plugin in the VMware vSphere Web Client main log file (vsphere_client_virgo.log).

Impact

An incorrect installation and configuration of the JDK will cause issues with the Pure Plugin.

Solution

The Java version can be found in the vshere_client_virgo.log, and will only show up in the log when the vSphere web client is restarted.

In this example, the JDK 1.7u17 is installed on the Windows Server 2008 R2 for vCenter Server 5.1.

[2016-07-15 10:22:41.225] INFO  [INFO ] start-signalling-1            com.vmware.vise.util.debug.SystemUsageMonitor                     System info :
 OS - Windows Server 2008 R2
 Arch - amd64
 Java Version - 1.7.0_17 
[2016-07-15 10:22:41.256] INFO  [INFO ] Timer-2                       com.vmware.vise.util.debug.SystemUsageMonitor                     
 Heap     : init = 201292928(196575K) used = 309326640(302076K) committed = 672727040(656960K) max = 954466304(932096K)
 non-Heap : init = 136773632(133568K) used = 82368472(80437K) committed = 142344192(139008K) max = 318767104(311296K)
 No of loaded classes : 13796 

The Java version requirements are listed in the vSphere Web Client Plugin Release Notes.

Read article
Quick Reference: Best Practice Settings

Best Practices for ALL versions of ESXi

ESXi Parameters Recommended Default Description

HardwareAcceleratedInit

1

1

Enables and controls use of Block Same (WRITESAME).

HardwareAcceleratedMove

1

1

Enables and controls use of XCOPY.

VMFS3.HardwareAcceleratedLocking

1

1

Enables and controls the use of Atomic Test & Set (ATS).

iSCSI Login Timeout

30

5

Ensures iSCSI sessions survive controller reboots.

TCP Delayed ACK (iSCSI) Disabled Enabled Improves performance when disabled in congested networks

Jumbo Frames (Optional)

9000 1500 If you have a workload that would benefit from Jumbo Frames, and a network that supports it, then this is the recommended configuration. Otherwise, utilize 1500 to reduce complexity in configuration.
HBA Queue Depth Limits Default Varies by vendor Default is recommended unless specifically requested by Pure Storage due to high-performance workloads.

VMware Tools

Install

Not Installed

VMware paravirtual driver, clock sync, increased disk timeouts, and graphic support are part of the tools hence it is a crucial step.

VM Virtual SCSI Adapter Paravirtual Varies by OS type Not required to be changed, but for high-performance requirements, PVSCSI is required to be used.
Network Time Protocol (NTP) Enabled Disabled Enabling NTP is recommended for more efficient troubleshooting.
Remote Syslog Server Enabled Disabled Configuring a remote syslog server is recommended to ensure logging required for troubleshooting is available.

It is also a best practice to set the "ESXi host personality" on the FlashArray for all ESXi host objects. This is described in detail in: FlashArray Configuration - Setting the FlashArray ESXi host Personality section. Please ensure this is applied whenever possible.

Best Practices specific to ESXi 5.x

ESXi Parameters Recommended Default Description
DSNRO / Number of outstanding IOs 32 32 Default is recommended unless specifically requested by Pure Storage due to high-performance workloads.
Path Selection Policy Round Robin MRU Path Selection Policy for FC and iSCSI.
IO Operations Limit (Path Switching) 1 1000 How many I/Os until ESXi switches to another path for a volume.
VMFS Version 5 5 Please upgrade if any VMFS-3 datastores are in use.
UNMAP Block Count 1% of free VMFS space or less 200 Please refer to the VAAI or Best Practices document for additional information.

Disk.SchedNumReqOutstanding (DSNRO) is the same as "Number of outstanding IOs". The difference is that it changed from a host level configuration (Disk.SchedNumReqOutstanding) to a per volume level (Number of outstanding IOs) in ESXi 5.5 and later.

You can read the VMware KB Article: Setting the Maximum Outstanding Disk Requests for Virtual Machines for additional information around this.

Best Practices specific to ESXi 6.x and ESXi 7.x+

ESXi Parameters Recommended Default Description
EnableBlockDelete 1 0 Provides end-to-end in guest support for space reclamation (UNMAP). Only applicable to VMFS-5.

Number of outstanding IOs
(Per LUN QDepth)

32 32 Default is recommended unless specifically requested by Pure Storage due to high-performance workloads.
Path Selection Policy Round Robin MRU Path Selection Policy for FC and iSCSI.
Latency Based PSP (ESXi 7.0+)

samplingCycles - 16
latencyEvalTime - 180000 ms

samplingCycles - 16
latencyEvalTime - 180000 ms
How often a path is evaluated (every 3 minutes) and how many I/Os to sample (16) during the evaluation.

IO Operations Limit (ESXi 6.0 - 6.7)

1 1000 How many I/Os until ESXi switches to another path for a volume.
VMFS Version 6 5 Use VMFS-6 on vSphere 6.5 and later.

If vSphere 6.7U1 or later are in use then you can use the VMW_PSP_RR module set to "latency" rather than IO Operations set to 1. In vSphere 7.0 and later, you should use VMW_PSP_RR module set to "latency". Please refer here for more information on Enhanced Round Robin Load Balancing and here for more details on why Pure recommends this change.

Read article

Still Thinking?
Give us a try!

We embrace agility in everything we do.
Our onboarding process is both simple and meaningful.
We can't wait to welcome you on AiDOOS!