VMware vSphere 8 Cluster Setup End to End -- ESXi, VCSA, vSAN, DRS, HA, and vLCM

vSphere 8 ESXi vCenter vSAN DRS HA vDS vLCM

Standalone Infrastructure | Component: vSphere 8.0, VCSA 8.0 | Audience: Enterprise Architects, Senior Sysadmins

Building a vSphere 8 cluster end to end means making a sequence of decisions that compound on each other. vSAN or external storage changes your networking requirements. vSAN OSA or ESA changes your disk requirements. vLCM or baseline management changes how you'll handle every future update. The decisions you make in the first hour are hard to reverse without rebuilding the cluster from scratch.

This article covers the full stack: hardware requirements, ESXi installation and post-install configuration, VCSA deployment, Distributed Switch setup with VMkernel adapters, cluster creation with DRS and HA, storage architecture with a decision guide between vSAN, SAN, and NFS, vSAN setup covering both OSA and ESA, and vSphere Lifecycle Manager for keeping the cluster current. The baseline assumption is a three node cluster minimum, which is the smallest configuration that gives you meaningful HA and vSAN with default FTT=1 tolerance.


1. Hardware Requirements and Pre-Install Decisions

Every component in a vSphere 8 cluster should be on the VMware Compatibility Guide (VCG). This isn't a formality. VMware support cases on hardware related issues require VCG compliance. An ESXi host with uncertified NICs or HBAs generates a warning banner in vCenter and limits some configuration options. Check the VCG before purchasing hardware, not after.

ComponentMinimumProduction RecommendedNotes
CPUDual core with hardware virtualizationDual socket, 10+ cores per socketIntel VT or AMD-V required. Check VCG for certified CPU list. Mixed CPU generations within a cluster require EVC mode.
RAM8 GB256 GB per nodeESXi overhead: ~1 GB. Plan for VM density. vSAN ESA requires minimum 512 GB per node for production workloads.
Boot deviceUSB or SD (deprecated in 8.0)Two SSDs in RAID1 or two NVMe M.2 drivesVMware deprecated USB/SD boot in vSphere 8.0. Use persistent SSD or NVMe for the ESXi boot partition. M.2 NVMe is the cleanest option.
Network1 GbETwo 25 GbE NICs minimum25 GbE strongly recommended for both vSAN OSA and ESA per official Broadcom guidance. 10 GbE is supported for both. Management on 1 GbE is acceptable.
vSAN OSA disksOne cache SSD + one capacity diskOne NVMe cache per disk group + multiple capacity SSDsOne disk group per host minimum. Two disk groups recommended for resilience. Max 5 disk groups per host, 7 capacity disks per group.
vSAN ESA disksOne NVMe per hostFour to eight NVMe per hostESA uses storage pools, no disk groups, no separate cache tier. All NVMe for both performance and capacity. Official minimum: 128 GB RAM per host. 512 GB required for VCF ESA deployments.

USB and SD Card Boot: What Changed in vSphere 8

VMware deprecated USB and SD card boot devices starting in vSphere 8.0. They still work for installation but Broadcom has signaled removal in a future release, and several vSphere 8 features require a local persistent storage device for coredump and scratch partition configuration. If you're deploying new hardware, use M.2 NVMe or SATA SSD for the ESXi boot partition. If you're upgrading existing hosts that boot from USB or SD, plan the migration to local storage as part of your upgrade work.


2. ESXi Installation and Post-Install Configuration

Install ESXi 8 on each host independently from ISO, kickstart, or via Auto Deploy. For three node clusters, manual ISO installation is straightforward. For larger environments, Auto Deploy with host profiles is worth the setup time. Every host in a cluster must run the same ESXi version and build number. Mixed versions are supported during rolling updates but shouldn't be your steady state.

Post-Install DCUI Configuration on Each Host

  1. Set a static management IP, subnet mask, default gateway, and DNS servers via the DCUI (F2 at the console). DHCP assigned management IPs cause vCenter connection failures when leases renew.
  2. Set the hostname to a proper FQDN. Add an A record and PTR record in DNS for each host before adding to vCenter. Forward and reverse DNS resolution is required for vCenter to manage hosts correctly.
  3. Configure NTP. Go to Troubleshooting Options in the DCUI and enable SSH temporarily for initial configuration, or use the vSphere Client after adding to vCenter. Set at least two NTP servers. vSphere HA and vSAN are sensitive to clock drift between hosts.
  4. If this host will contribute to vSAN, don't configure any storage yet. Let vCenter and the vSAN wizard claim disks during cluster setup.
ESXi Shell: Post-install NTP and SSH configuration via CLI
# Configure NTP servers
esxcli system ntp set --server=ntp1.yourdomain.local --server=ntp2.yourdomain.local
esxcli system ntp set --enabled=true

# Verify NTP is syncing
esxcli system ntp get

# Set hostname if not already set via DCUI
esxcli system hostname set --fqdn=esxi01.yourdomain.local

# Configure DNS
esxcli network ip dns server add --server=10.0.100.1
esxcli network ip dns server add --server=10.0.100.2
esxcli network ip dns search add --domain=yourdomain.local

# Disable SSH after initial configuration (re-enable via vCenter when needed)
# vim-cmd hostsvc/disable_ssh
# Best practice: leave disabled, enable only when needed, disable immediately after

3. vCenter Server (VCSA) Deployment

The VCSA is deployed as an OVA onto an existing ESXi host. You don't need a cluster or vCenter to deploy vCenter: you deploy it onto a standalone ESXi host, then add that host to vCenter after VCSA is up. The installer is a two-stage process: Stage 1 deploys the OVA and powers it on, Stage 2 runs first boot configuration and starts services.

VCSA Sizing

Deployment SizeHostsVMsvCPURAMStorage
TinyUp to 10Up to 100214 GB415 GB
SmallUp to 100Up to 1,000421 GB480 GB
MediumUp to 400Up to 4,000837 GB700 GB
LargeUp to 1,000Up to 10,0001669 GB1,065 GB
X-LargeUp to 2,000Up to 35,00024133 GB1,805 GB

Size up if in doubt. VCSA is the single most critical component in your vSphere environment and it runs constantly in the background collecting metrics, running DRS calculations, and processing vCenter events. An undersized VCSA causes slow vSphere Client performance and DRS latency. The storage requirement is for the embedded PostgreSQL database and vCenter log files. Use thin provisioning for the VCSA disks to avoid consuming the full allocation immediately, but make sure the datastore has room for the VCSA to grow to its maximum allocation.

Where to Run the VCSA

Deploy VCSA on one of your ESXi hosts initially. After the cluster is formed, migrate the VCSA VM onto the vSAN datastore or shared storage so it benefits from HA restart if the host it's on fails. A VCSA running on local storage on a host is vulnerable: if that host fails, vCenter is down until the host recovers, even though the other hosts are healthy and running VMs. vCenter being down doesn't immediately impact running VMs, but it does prevent DRS, HA, and most management operations from functioning.


4. vDS Networking: Distributed Switches, Port Groups, and VMkernel Adapters

The vSphere Distributed Switch (vDS) is the networking foundation of a vSphere cluster. Unlike the standard vSwitch which is configured per-host, the vDS is configured once in vCenter and the configuration is pushed to all member hosts. Network changes happen once. Port group consistency is enforced centrally. For any environment with more than one host, vDS is the right choice.

vDS Design for a Three Node vSAN Cluster

Port GroupVLANVMkernel ServicesNIC Uplinks
PG-Management100Management, Provisioningvmnic0 active, vmnic1 standby
PG-vMotion200vMotionvmnic1 active, vmnic0 standby
PG-vSAN300vSANvmnic2 active, vmnic3 active (both active for vSAN)
PG-VMstrunk (0-4094)None (VM traffic only)vmnic0 and vmnic1 active active
PowerCLI: Create vDS, port groups, and VMkernel adapters
# Connect to vCenter
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"

# Create the Distributed Switch
$vds = New-VDSwitch -Server $global:DefaultVIServers `
    -Name "vDS-Production" `
    -Location (Get-Datacenter "Datacenter01") `
    -NumUplinkPorts 4 `
    -Version "8.0.0" `
    -Mtu 9000    # Jumbo frames for vSAN

# Create port groups
New-VDPortgroup -Name "PG-Management" -VDSwitch $vds -VlanId 100
New-VDPortgroup -Name "PG-vMotion"    -VDSwitch $vds -VlanId 200
New-VDPortgroup -Name "PG-vSAN"       -VDSwitch $vds -VlanId 300
New-VDPortgroup -Name "PG-VMs"        -VDSwitch $vds -VlanTrunkRange "0-4094"

# Add each ESXi host to the vDS and migrate vmnic adapters
$hosts = Get-VMHost -Location (Get-Cluster "Cluster01")
foreach ($esxiHost in $hosts) {
    # Add host to vDS
    Add-VDSwitchVMHost -VDSwitch $vds -VMHost $esxiHost

    # Assign physical NICs to uplinks
    $spec = New-Object VMware.Vim.DVSConfigSpec
    # Use the vDS wizard in vSphere Client for NIC assignment in production
    # CLI NIC assignment varies by host hardware naming conventions

    # Create VMkernel adapter for vSAN traffic
    New-VMHostNetworkAdapter `
        -VMHost $esxiHost `
        -PortGroup "PG-vSAN" `
        -VirtualSwitch $vds `
        -IP "10.0.300.$($esxiHost.Name.Substring($esxiHost.Name.Length-1))" `
        -SubnetMask "255.255.255.0" `
        -VsanTrafficEnabled $true

    # Create VMkernel adapter for vMotion
    New-VMHostNetworkAdapter `
        -VMHost $esxiHost `
        -PortGroup "PG-vMotion" `
        -VirtualSwitch $vds `
        -IP "10.0.200.$($esxiHost.Name.Substring($esxiHost.Name.Length-1))" `
        -SubnetMask "255.255.255.0" `
        -VMotionEnabled $true
}

Disconnect-VIServer -Confirm:$false
vSAN traffic must use jumbo frames (MTU 9000) end to end: on the vDS, on the physical switch ports, and on the physical NICs. A mismatch anywhere in the path causes vSAN to operate with degraded performance and intermittent connectivity errors that are difficult to diagnose. Set MTU 9000 on the vDS and confirm the physical switch ports connected to vSAN NICs are also set to MTU 9000 before enabling vSAN.

5. Storage Decision Guide: vSAN vs SAN vs NFS

FactorvSANSAN (iSCSI or FC)NFS
ArchitectureHyperconverged: storage lives in the ESXi hostsExternal array, separate managementNAS appliance, file based access
Minimum nodes3 for standard, 2 for two-node with witness2 (storage is external)2 (storage is external)
Capital costHigher server cost (NVMe drives in hosts). No external array.Lower server cost, significant array costLower server cost, NAS appliance cost
PerformancevSAN ESA with NVMe: highest available. vSAN OSA with SAS HDDs: moderate.All-flash SAN: very high, consistentNFS v3/v4.1: good for most workloads, latency higher than block
SimplicityIntegrated with vSphere. No external management for storage.Separate storage team and tools. LUN management required.Simplest to set up. NFS share mounts directly.
vSphere integrationNative. vSAN is built into ESXi and vCenter.VAAI supported. Storage policies via SPBM with external providers.VAAI NAS supported on certified NAS. Storage policies limited.
When to useNew deployments, greenfield environments, teams who want one platform to manageExisting SAN investment, mixed vSphere and bare metal workloads, mature storage teamsFile heavy workloads, existing NAS infrastructure, dev/test

6. vSAN Setup: OSA and ESA

vSAN 8 ships with two storage architectures. The Original Storage Architecture (OSA) is the classic disk group model: a dedicated cache SSD plus capacity disks per disk group. The Express Storage Architecture (ESA) uses NVMe only, with storage pools replacing disk groups and no separate cache tier. ESA requires NVMe only disks, a minimum of 128 GB RAM per node (512 GB for VCF ESA deployments), and Broadcom strongly recommends 25 GbE networking, though 10 GbE is supported. If you have the hardware for ESA, it produces significantly better performance and simpler management. If you're on SAS or SATA SSDs, OSA is your path.

Creating the Cluster and Enabling vSAN

  1. In vCenter, right click the Datacenter and select New Cluster. Name the cluster and enable vSphere DRS, vSphere HA, and vSAN at the cluster level in the wizard. Add your ESXi hosts to the cluster during this step or after.
  2. On the vSAN configuration step, select Single Site Cluster. Choose OSA or ESA based on your hardware. If you select vLCM (Lifecycle Manager) managed cluster here, vSAN ESA becomes available as an option.
  3. Configure disk claiming. Manual disk claiming is strongly recommended for production: you explicitly choose which disks go to vSAN and which stay available for other uses. Automatic claiming takes all eligible disks.
  4. For OSA: assign cache and capacity tiers per disk group. The cache SSD should be at least 10% of the capacity tier's total size per disk group. NVMe backed cache dramatically outperforms SATA SSD cache.
  5. For ESA: assign NVMe disks to the storage pool. All assigned disks contribute to both performance and capacity; there's no separate cache designation.
PowerCLI: Create cluster, enable vSAN, and verify health
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"

# Create the cluster with DRS, HA, and vSAN
$cluster = New-Cluster `
    -Name "Cluster01" `
    -Location (Get-Datacenter "Datacenter01") `
    -DrsEnabled `
    -HAEnabled `
    -VsanEnabled

# Add hosts to the cluster
$hostList = "esxi01.yourdomain.local","esxi02.yourdomain.local","esxi03.yourdomain.local"
foreach ($h in $hostList) {
    Add-VMHost -Name $h `
               -Location $cluster `
               -User root `
               -Password "esxi-root-password" `
               -Force
}

# After hosts are added, verify vSAN health
$vsanView = Get-VsanView -Id "VsanVcClusterHealthSystem-vsan-cluster-health-system"
$healthResult = $vsanView.VsanQueryVcClusterHealthSummary(
    $cluster.ExtensionData.MoRef, 2, $null, $true, $null, $null, "defaultView"
)
$healthResult.OverallHealth

# List vSAN datastore
Get-Datastore | Where-Object { $_.Type -eq "vsan" }

Disconnect-VIServer -Confirm:$false

7. DRS and HA Configuration

DRS Automation Levels

DRS has three automation levels that control how aggressively it migrates VMs to balance load across hosts.

  • Manual: DRS generates recommendations but doesn't act. You approve each migration. Use this when you're first deploying DRS and want visibility before trusting it to act autonomously, or in environments where unplanned VM migrations are operationally problematic.
  • Partially Automated: DRS places VMs automatically during initial power-on but only generates recommendations for ongoing load balancing. The right middle ground for environments with mixed workload sensitivity.
  • Fully Automated: DRS places VMs at power-on and migrates them automatically to balance load. This is the right setting for most production environments. DRS won't migrate a VM unless the load imbalance exceeds the migration threshold you configure.

The DRS migration threshold (scale of 1 to 5 in the vSphere Client) controls how aggressively DRS acts. A threshold of 3 (the default) means DRS only migrates when it will produce a meaningful improvement. Setting it to 1 makes DRS very aggressive and generates unnecessary vMotion traffic. Setting it to 5 makes DRS essentially inactive. Leave it at 3 unless you have a specific reason to change it.

HA Admission Control

HA admission control ensures the cluster always has enough spare capacity to restart VMs from a failed host. The three policies are slot based (legacy), percentage based, and dedicated failover hosts. For most environments, percentage based admission control is the right choice: set the reserved CPU and memory percentages to represent the capacity of your largest host as a fraction of the cluster total.

For a three node cluster where you want to tolerate one host failure, reserve 33% CPU and memory as the failover capacity. This guarantees that if any one host fails, the surviving two hosts can restart all VMs that were on the failed host. If your hosts aren't identical in size, use the capacity of the largest host divided by the cluster total as your percentage.

PowerCLI: Configure DRS and HA admission control settings
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"

$cluster = Get-Cluster -Name "Cluster01"

# Configure DRS: Fully Automated with threshold 3
Set-Cluster -Cluster $cluster `
    -DrsAutomationLevel FullyAutomated `
    -DrsEnabled $true `
    -Confirm:$false

# Configure HA: percentage based admission control for 1-host failure tolerance
# 33% = 1 of 3 nodes worth of capacity reserved
Set-Cluster -Cluster $cluster `
    -HAEnabled $true `
    -HAAdmissionControlEnabled $true `
    -HAFailoverLevel 1 `
    -Confirm:$false

# Set HA restart priority for critical VMs
Get-VM -Name "SQL-Primary" | Set-VM -HARestartPriority High -Confirm:$false
Get-VM -Name "DC-Primary"  | Set-VM -HARestartPriority High -Confirm:$false

# Verify cluster configuration
Get-Cluster -Name "Cluster01" | Select-Object Name, DrsEnabled, DrsAutomationLevel, HAEnabled

Disconnect-VIServer -Confirm:$false

8. vSphere Lifecycle Manager

vSphere Lifecycle Manager (vLCM) replaced the older Update Manager baselines with a single image approach to cluster management. Instead of applying individual patches, you define a desired software image for the cluster (ESXi version, firmware versions, and driver versions), and vLCM ensures every host in the cluster matches that image. When VMware releases a new ESXi build, you update the image and vLCM remediates each host in sequence.

vLCM Image vs Baseline Management

A cluster can be managed by either vLCM images or baselines, not both. You choose at cluster creation. vLCM images are the forward looking approach: they give you a single consistent software definition for the entire cluster, support firmware and driver management alongside ESXi patches, and are required for vSAN ESA and for Quickboot (fast reboot during patching). Baselines are the legacy approach and still work, but if you're building a new cluster on vSphere 8, there's no reason to start with baselines.

PowerCLI: Check cluster compliance and initiate remediation via vLCM
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"

$cluster = Get-Cluster -Name "Cluster01"

# Check current vLCM compliance status
$lcmManager = Get-View -Id "LifecycleManager"

# Get cluster lifecycle state
$clusterView = $cluster | Get-View
$clusterSpec  = $clusterView.ConfigurationEx

Write-Host "Cluster: $($cluster.Name)"
Write-Host "vLCM Managed: $(($clusterSpec.VsanConfigInfo -ne $null))"

# List hosts and their current ESXi versions
Get-VMHost -Location $cluster | Sort-Object Name | Select-Object `
    Name, Version, Build, ConnectionState, PowerState | Format-Table -AutoSize

# Check for updates via vLCM (use vSphere Client for interactive image management)
# CLI compliance check:
$lcmClusterView = Get-View -VIObject $cluster -Property "ConfigurationEx"
Write-Host "`nTo remediate, use vSphere Client: Cluster > Updates > Image"
Write-Host "vLCM will perform sequential host evacuation, update, and rejoin"
Write-Host "with Quickboot enabled, most hosts patch in under 5 minutes"

Disconnect-VIServer -Confirm:$false

The actual vLCM image update workflow is best done through the vSphere Client rather than PowerCLI because it involves interactive steps: reviewing what changes will be applied, checking hardware compatibility for the new firmware versions, and approving the remediation schedule. The CLI can check status and trigger remediation, but the image editing and compliance review steps are UI-first in vSphere 8.


Key Takeaways

  • Check the VMware Compatibility Guide before purchasing hardware. Uncertified components generate vCenter warnings and limit configuration options. VCG compliance is required for VMware support cases on hardware issues.
  • USB and SD card boot are deprecated in vSphere 8.0. Use M.2 NVMe or SATA SSD for the ESXi boot partition on new deployments. Don't start a new cluster on boot media that's on a deprecation path.
  • Set static management IPs and proper FQDNs with forward and reverse DNS on every host before adding to vCenter. DHCP assigned management IPs and missing DNS records cause vCenter connectivity failures.
  • vSAN ESA requires NVMe only disks, 25 GbE networking, and 512 GB RAM per node minimum for production. If you have that hardware, ESA outperforms OSA significantly. OSA works on 10 GbE with mixed SSD and HDD configurations.
  • vSAN traffic requires jumbo frames (MTU 9000) end to end: on the vDS, on the physical switch ports, and on the physical NICs. A single MTU mismatch causes degraded performance and intermittent errors that don't always present as obvious connectivity failures.
  • Deploy VCSA, then migrate it to shared storage (vSAN or SAN) after the cluster is formed. A VCSA running on local storage on a single host is vulnerable to that host's failure, which takes down management even though other hosts are healthy.
  • For a three node cluster tolerating one host failure, set HA admission control to 33% reserved CPU and memory. This guarantees the two surviving hosts can restart all VMs from the failed node.
  • Use vLCM image management for new vSphere 8 clusters, not baselines. vLCM images are required for vSAN ESA and Quickboot, provide firmware and driver management alongside ESXi patches, and enforce a single consistent software definition across all cluster nodes.

Read more