VMware vSphere 8 Cluster Setup End to End -- ESXi, VCSA, vSAN, DRS, HA, and vLCM
Standalone Infrastructure | Component: vSphere 8.0, VCSA 8.0 | Audience: Enterprise Architects, Senior Sysadmins
Building a vSphere 8 cluster end to end means making a sequence of decisions that compound on each other. vSAN or external storage changes your networking requirements. vSAN OSA or ESA changes your disk requirements. vLCM or baseline management changes how you'll handle every future update. The decisions you make in the first hour are hard to reverse without rebuilding the cluster from scratch.
This article covers the full stack: hardware requirements, ESXi installation and post-install configuration, VCSA deployment, Distributed Switch setup with VMkernel adapters, cluster creation with DRS and HA, storage architecture with a decision guide between vSAN, SAN, and NFS, vSAN setup covering both OSA and ESA, and vSphere Lifecycle Manager for keeping the cluster current. The baseline assumption is a three node cluster minimum, which is the smallest configuration that gives you meaningful HA and vSAN with default FTT=1 tolerance.
1. Hardware Requirements and Pre-Install Decisions
Every component in a vSphere 8 cluster should be on the VMware Compatibility Guide (VCG). This isn't a formality. VMware support cases on hardware related issues require VCG compliance. An ESXi host with uncertified NICs or HBAs generates a warning banner in vCenter and limits some configuration options. Check the VCG before purchasing hardware, not after.
| Component | Minimum | Production Recommended | Notes |
|---|---|---|---|
| CPU | Dual core with hardware virtualization | Dual socket, 10+ cores per socket | Intel VT or AMD-V required. Check VCG for certified CPU list. Mixed CPU generations within a cluster require EVC mode. |
| RAM | 8 GB | 256 GB per node | ESXi overhead: ~1 GB. Plan for VM density. vSAN ESA requires minimum 512 GB per node for production workloads. |
| Boot device | USB or SD (deprecated in 8.0) | Two SSDs in RAID1 or two NVMe M.2 drives | VMware deprecated USB/SD boot in vSphere 8.0. Use persistent SSD or NVMe for the ESXi boot partition. M.2 NVMe is the cleanest option. |
| Network | 1 GbE | Two 25 GbE NICs minimum | 25 GbE strongly recommended for both vSAN OSA and ESA per official Broadcom guidance. 10 GbE is supported for both. Management on 1 GbE is acceptable. |
| vSAN OSA disks | One cache SSD + one capacity disk | One NVMe cache per disk group + multiple capacity SSDs | One disk group per host minimum. Two disk groups recommended for resilience. Max 5 disk groups per host, 7 capacity disks per group. |
| vSAN ESA disks | One NVMe per host | Four to eight NVMe per host | ESA uses storage pools, no disk groups, no separate cache tier. All NVMe for both performance and capacity. Official minimum: 128 GB RAM per host. 512 GB required for VCF ESA deployments. |
USB and SD Card Boot: What Changed in vSphere 8
VMware deprecated USB and SD card boot devices starting in vSphere 8.0. They still work for installation but Broadcom has signaled removal in a future release, and several vSphere 8 features require a local persistent storage device for coredump and scratch partition configuration. If you're deploying new hardware, use M.2 NVMe or SATA SSD for the ESXi boot partition. If you're upgrading existing hosts that boot from USB or SD, plan the migration to local storage as part of your upgrade work.
2. ESXi Installation and Post-Install Configuration
Install ESXi 8 on each host independently from ISO, kickstart, or via Auto Deploy. For three node clusters, manual ISO installation is straightforward. For larger environments, Auto Deploy with host profiles is worth the setup time. Every host in a cluster must run the same ESXi version and build number. Mixed versions are supported during rolling updates but shouldn't be your steady state.
Post-Install DCUI Configuration on Each Host
- Set a static management IP, subnet mask, default gateway, and DNS servers via the DCUI (F2 at the console). DHCP assigned management IPs cause vCenter connection failures when leases renew.
- Set the hostname to a proper FQDN. Add an A record and PTR record in DNS for each host before adding to vCenter. Forward and reverse DNS resolution is required for vCenter to manage hosts correctly.
- Configure NTP. Go to Troubleshooting Options in the DCUI and enable SSH temporarily for initial configuration, or use the vSphere Client after adding to vCenter. Set at least two NTP servers. vSphere HA and vSAN are sensitive to clock drift between hosts.
- If this host will contribute to vSAN, don't configure any storage yet. Let vCenter and the vSAN wizard claim disks during cluster setup.
# Configure NTP servers esxcli system ntp set --server=ntp1.yourdomain.local --server=ntp2.yourdomain.local esxcli system ntp set --enabled=true # Verify NTP is syncing esxcli system ntp get # Set hostname if not already set via DCUI esxcli system hostname set --fqdn=esxi01.yourdomain.local # Configure DNS esxcli network ip dns server add --server=10.0.100.1 esxcli network ip dns server add --server=10.0.100.2 esxcli network ip dns search add --domain=yourdomain.local # Disable SSH after initial configuration (re-enable via vCenter when needed) # vim-cmd hostsvc/disable_ssh # Best practice: leave disabled, enable only when needed, disable immediately after
3. vCenter Server (VCSA) Deployment
The VCSA is deployed as an OVA onto an existing ESXi host. You don't need a cluster or vCenter to deploy vCenter: you deploy it onto a standalone ESXi host, then add that host to vCenter after VCSA is up. The installer is a two-stage process: Stage 1 deploys the OVA and powers it on, Stage 2 runs first boot configuration and starts services.
VCSA Sizing
| Deployment Size | Hosts | VMs | vCPU | RAM | Storage |
|---|---|---|---|---|---|
| Tiny | Up to 10 | Up to 100 | 2 | 14 GB | 415 GB |
| Small | Up to 100 | Up to 1,000 | 4 | 21 GB | 480 GB |
| Medium | Up to 400 | Up to 4,000 | 8 | 37 GB | 700 GB |
| Large | Up to 1,000 | Up to 10,000 | 16 | 69 GB | 1,065 GB |
| X-Large | Up to 2,000 | Up to 35,000 | 24 | 133 GB | 1,805 GB |
Size up if in doubt. VCSA is the single most critical component in your vSphere environment and it runs constantly in the background collecting metrics, running DRS calculations, and processing vCenter events. An undersized VCSA causes slow vSphere Client performance and DRS latency. The storage requirement is for the embedded PostgreSQL database and vCenter log files. Use thin provisioning for the VCSA disks to avoid consuming the full allocation immediately, but make sure the datastore has room for the VCSA to grow to its maximum allocation.
Where to Run the VCSA
Deploy VCSA on one of your ESXi hosts initially. After the cluster is formed, migrate the VCSA VM onto the vSAN datastore or shared storage so it benefits from HA restart if the host it's on fails. A VCSA running on local storage on a host is vulnerable: if that host fails, vCenter is down until the host recovers, even though the other hosts are healthy and running VMs. vCenter being down doesn't immediately impact running VMs, but it does prevent DRS, HA, and most management operations from functioning.
4. vDS Networking: Distributed Switches, Port Groups, and VMkernel Adapters
The vSphere Distributed Switch (vDS) is the networking foundation of a vSphere cluster. Unlike the standard vSwitch which is configured per-host, the vDS is configured once in vCenter and the configuration is pushed to all member hosts. Network changes happen once. Port group consistency is enforced centrally. For any environment with more than one host, vDS is the right choice.
vDS Design for a Three Node vSAN Cluster
| Port Group | VLAN | VMkernel Services | NIC Uplinks |
|---|---|---|---|
| PG-Management | 100 | Management, Provisioning | vmnic0 active, vmnic1 standby |
| PG-vMotion | 200 | vMotion | vmnic1 active, vmnic0 standby |
| PG-vSAN | 300 | vSAN | vmnic2 active, vmnic3 active (both active for vSAN) |
| PG-VMs | trunk (0-4094) | None (VM traffic only) | vmnic0 and vmnic1 active active |
# Connect to vCenter
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"
# Create the Distributed Switch
$vds = New-VDSwitch -Server $global:DefaultVIServers `
-Name "vDS-Production" `
-Location (Get-Datacenter "Datacenter01") `
-NumUplinkPorts 4 `
-Version "8.0.0" `
-Mtu 9000 # Jumbo frames for vSAN
# Create port groups
New-VDPortgroup -Name "PG-Management" -VDSwitch $vds -VlanId 100
New-VDPortgroup -Name "PG-vMotion" -VDSwitch $vds -VlanId 200
New-VDPortgroup -Name "PG-vSAN" -VDSwitch $vds -VlanId 300
New-VDPortgroup -Name "PG-VMs" -VDSwitch $vds -VlanTrunkRange "0-4094"
# Add each ESXi host to the vDS and migrate vmnic adapters
$hosts = Get-VMHost -Location (Get-Cluster "Cluster01")
foreach ($esxiHost in $hosts) {
# Add host to vDS
Add-VDSwitchVMHost -VDSwitch $vds -VMHost $esxiHost
# Assign physical NICs to uplinks
$spec = New-Object VMware.Vim.DVSConfigSpec
# Use the vDS wizard in vSphere Client for NIC assignment in production
# CLI NIC assignment varies by host hardware naming conventions
# Create VMkernel adapter for vSAN traffic
New-VMHostNetworkAdapter `
-VMHost $esxiHost `
-PortGroup "PG-vSAN" `
-VirtualSwitch $vds `
-IP "10.0.300.$($esxiHost.Name.Substring($esxiHost.Name.Length-1))" `
-SubnetMask "255.255.255.0" `
-VsanTrafficEnabled $true
# Create VMkernel adapter for vMotion
New-VMHostNetworkAdapter `
-VMHost $esxiHost `
-PortGroup "PG-vMotion" `
-VirtualSwitch $vds `
-IP "10.0.200.$($esxiHost.Name.Substring($esxiHost.Name.Length-1))" `
-SubnetMask "255.255.255.0" `
-VMotionEnabled $true
}
Disconnect-VIServer -Confirm:$false
5. Storage Decision Guide: vSAN vs SAN vs NFS
| Factor | vSAN | SAN (iSCSI or FC) | NFS |
|---|---|---|---|
| Architecture | Hyperconverged: storage lives in the ESXi hosts | External array, separate management | NAS appliance, file based access |
| Minimum nodes | 3 for standard, 2 for two-node with witness | 2 (storage is external) | 2 (storage is external) |
| Capital cost | Higher server cost (NVMe drives in hosts). No external array. | Lower server cost, significant array cost | Lower server cost, NAS appliance cost |
| Performance | vSAN ESA with NVMe: highest available. vSAN OSA with SAS HDDs: moderate. | All-flash SAN: very high, consistent | NFS v3/v4.1: good for most workloads, latency higher than block |
| Simplicity | Integrated with vSphere. No external management for storage. | Separate storage team and tools. LUN management required. | Simplest to set up. NFS share mounts directly. |
| vSphere integration | Native. vSAN is built into ESXi and vCenter. | VAAI supported. Storage policies via SPBM with external providers. | VAAI NAS supported on certified NAS. Storage policies limited. |
| When to use | New deployments, greenfield environments, teams who want one platform to manage | Existing SAN investment, mixed vSphere and bare metal workloads, mature storage teams | File heavy workloads, existing NAS infrastructure, dev/test |
6. vSAN Setup: OSA and ESA
vSAN 8 ships with two storage architectures. The Original Storage Architecture (OSA) is the classic disk group model: a dedicated cache SSD plus capacity disks per disk group. The Express Storage Architecture (ESA) uses NVMe only, with storage pools replacing disk groups and no separate cache tier. ESA requires NVMe only disks, a minimum of 128 GB RAM per node (512 GB for VCF ESA deployments), and Broadcom strongly recommends 25 GbE networking, though 10 GbE is supported. If you have the hardware for ESA, it produces significantly better performance and simpler management. If you're on SAS or SATA SSDs, OSA is your path.
Creating the Cluster and Enabling vSAN
- In vCenter, right click the Datacenter and select New Cluster. Name the cluster and enable vSphere DRS, vSphere HA, and vSAN at the cluster level in the wizard. Add your ESXi hosts to the cluster during this step or after.
- On the vSAN configuration step, select Single Site Cluster. Choose OSA or ESA based on your hardware. If you select vLCM (Lifecycle Manager) managed cluster here, vSAN ESA becomes available as an option.
- Configure disk claiming. Manual disk claiming is strongly recommended for production: you explicitly choose which disks go to vSAN and which stay available for other uses. Automatic claiming takes all eligible disks.
- For OSA: assign cache and capacity tiers per disk group. The cache SSD should be at least 10% of the capacity tier's total size per disk group. NVMe backed cache dramatically outperforms SATA SSD cache.
- For ESA: assign NVMe disks to the storage pool. All assigned disks contribute to both performance and capacity; there's no separate cache designation.
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"
# Create the cluster with DRS, HA, and vSAN
$cluster = New-Cluster `
-Name "Cluster01" `
-Location (Get-Datacenter "Datacenter01") `
-DrsEnabled `
-HAEnabled `
-VsanEnabled
# Add hosts to the cluster
$hostList = "esxi01.yourdomain.local","esxi02.yourdomain.local","esxi03.yourdomain.local"
foreach ($h in $hostList) {
Add-VMHost -Name $h `
-Location $cluster `
-User root `
-Password "esxi-root-password" `
-Force
}
# After hosts are added, verify vSAN health
$vsanView = Get-VsanView -Id "VsanVcClusterHealthSystem-vsan-cluster-health-system"
$healthResult = $vsanView.VsanQueryVcClusterHealthSummary(
$cluster.ExtensionData.MoRef, 2, $null, $true, $null, $null, "defaultView"
)
$healthResult.OverallHealth
# List vSAN datastore
Get-Datastore | Where-Object { $_.Type -eq "vsan" }
Disconnect-VIServer -Confirm:$false
7. DRS and HA Configuration
DRS Automation Levels
DRS has three automation levels that control how aggressively it migrates VMs to balance load across hosts.
- Manual: DRS generates recommendations but doesn't act. You approve each migration. Use this when you're first deploying DRS and want visibility before trusting it to act autonomously, or in environments where unplanned VM migrations are operationally problematic.
- Partially Automated: DRS places VMs automatically during initial power-on but only generates recommendations for ongoing load balancing. The right middle ground for environments with mixed workload sensitivity.
- Fully Automated: DRS places VMs at power-on and migrates them automatically to balance load. This is the right setting for most production environments. DRS won't migrate a VM unless the load imbalance exceeds the migration threshold you configure.
The DRS migration threshold (scale of 1 to 5 in the vSphere Client) controls how aggressively DRS acts. A threshold of 3 (the default) means DRS only migrates when it will produce a meaningful improvement. Setting it to 1 makes DRS very aggressive and generates unnecessary vMotion traffic. Setting it to 5 makes DRS essentially inactive. Leave it at 3 unless you have a specific reason to change it.
HA Admission Control
HA admission control ensures the cluster always has enough spare capacity to restart VMs from a failed host. The three policies are slot based (legacy), percentage based, and dedicated failover hosts. For most environments, percentage based admission control is the right choice: set the reserved CPU and memory percentages to represent the capacity of your largest host as a fraction of the cluster total.
For a three node cluster where you want to tolerate one host failure, reserve 33% CPU and memory as the failover capacity. This guarantees that if any one host fails, the surviving two hosts can restart all VMs that were on the failed host. If your hosts aren't identical in size, use the capacity of the largest host divided by the cluster total as your percentage.
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"
$cluster = Get-Cluster -Name "Cluster01"
# Configure DRS: Fully Automated with threshold 3
Set-Cluster -Cluster $cluster `
-DrsAutomationLevel FullyAutomated `
-DrsEnabled $true `
-Confirm:$false
# Configure HA: percentage based admission control for 1-host failure tolerance
# 33% = 1 of 3 nodes worth of capacity reserved
Set-Cluster -Cluster $cluster `
-HAEnabled $true `
-HAAdmissionControlEnabled $true `
-HAFailoverLevel 1 `
-Confirm:$false
# Set HA restart priority for critical VMs
Get-VM -Name "SQL-Primary" | Set-VM -HARestartPriority High -Confirm:$false
Get-VM -Name "DC-Primary" | Set-VM -HARestartPriority High -Confirm:$false
# Verify cluster configuration
Get-Cluster -Name "Cluster01" | Select-Object Name, DrsEnabled, DrsAutomationLevel, HAEnabled
Disconnect-VIServer -Confirm:$false
8. vSphere Lifecycle Manager
vSphere Lifecycle Manager (vLCM) replaced the older Update Manager baselines with a single image approach to cluster management. Instead of applying individual patches, you define a desired software image for the cluster (ESXi version, firmware versions, and driver versions), and vLCM ensures every host in the cluster matches that image. When VMware releases a new ESXi build, you update the image and vLCM remediates each host in sequence.
vLCM Image vs Baseline Management
A cluster can be managed by either vLCM images or baselines, not both. You choose at cluster creation. vLCM images are the forward looking approach: they give you a single consistent software definition for the entire cluster, support firmware and driver management alongside ESXi patches, and are required for vSAN ESA and for Quickboot (fast reboot during patching). Baselines are the legacy approach and still work, but if you're building a new cluster on vSphere 8, there's no reason to start with baselines.
Connect-VIServer -Server vcenter.yourdomain.local -User administrator@vsphere.local -Password "password"
$cluster = Get-Cluster -Name "Cluster01"
# Check current vLCM compliance status
$lcmManager = Get-View -Id "LifecycleManager"
# Get cluster lifecycle state
$clusterView = $cluster | Get-View
$clusterSpec = $clusterView.ConfigurationEx
Write-Host "Cluster: $($cluster.Name)"
Write-Host "vLCM Managed: $(($clusterSpec.VsanConfigInfo -ne $null))"
# List hosts and their current ESXi versions
Get-VMHost -Location $cluster | Sort-Object Name | Select-Object `
Name, Version, Build, ConnectionState, PowerState | Format-Table -AutoSize
# Check for updates via vLCM (use vSphere Client for interactive image management)
# CLI compliance check:
$lcmClusterView = Get-View -VIObject $cluster -Property "ConfigurationEx"
Write-Host "`nTo remediate, use vSphere Client: Cluster > Updates > Image"
Write-Host "vLCM will perform sequential host evacuation, update, and rejoin"
Write-Host "with Quickboot enabled, most hosts patch in under 5 minutes"
Disconnect-VIServer -Confirm:$false
The actual vLCM image update workflow is best done through the vSphere Client rather than PowerCLI because it involves interactive steps: reviewing what changes will be applied, checking hardware compatibility for the new firmware versions, and approving the remediation schedule. The CLI can check status and trigger remediation, but the image editing and compliance review steps are UI-first in vSphere 8.
Key Takeaways
- Check the VMware Compatibility Guide before purchasing hardware. Uncertified components generate vCenter warnings and limit configuration options. VCG compliance is required for VMware support cases on hardware issues.
- USB and SD card boot are deprecated in vSphere 8.0. Use M.2 NVMe or SATA SSD for the ESXi boot partition on new deployments. Don't start a new cluster on boot media that's on a deprecation path.
- Set static management IPs and proper FQDNs with forward and reverse DNS on every host before adding to vCenter. DHCP assigned management IPs and missing DNS records cause vCenter connectivity failures.
- vSAN ESA requires NVMe only disks, 25 GbE networking, and 512 GB RAM per node minimum for production. If you have that hardware, ESA outperforms OSA significantly. OSA works on 10 GbE with mixed SSD and HDD configurations.
- vSAN traffic requires jumbo frames (MTU 9000) end to end: on the vDS, on the physical switch ports, and on the physical NICs. A single MTU mismatch causes degraded performance and intermittent errors that don't always present as obvious connectivity failures.
- Deploy VCSA, then migrate it to shared storage (vSAN or SAN) after the cluster is formed. A VCSA running on local storage on a single host is vulnerable to that host's failure, which takes down management even though other hosts are healthy.
- For a three node cluster tolerating one host failure, set HA admission control to 33% reserved CPU and memory. This guarantees the two surviving hosts can restart all VMs from the failed node.
- Use vLCM image management for new vSphere 8 clusters, not baselines. vLCM images are required for vSAN ESA and Quickboot, provide firmware and driver management alongside ESXi patches, and enforce a single consistent software definition across all cluster nodes.