OpenStack End-to-End Setup -- Kolla-Ansible, OVN Networking, Ceph Storage, and Multi-Tenancy
Standalone Infrastructure | Component: OpenStack 2024.2 (Dalmatian) via Kolla-Ansible | Audience: Private Cloud Architects, Senior Infrastructure Engineers
OpenStack is the largest open source private cloud platform in production at scale. It's also the most complex infrastructure platform in this series by a significant margin, and it's honest to say that upfront. A properly designed OpenStack deployment can run hundreds of thousands of VMs across dozens of sites. Getting there requires making good decisions about control plane topology, networking architecture, storage backend selection, and multi-tenancy model before you run a single command.
This article covers a production quality OpenStack deployment using Kolla-Ansible, which is the current recommended deployment method for OpenStack. PackStack is deprecated and no longer suitable for production. The OpenStack release covered here is 2024.2 (Dalmatian), the most recent stable release at time of writing. The article covers control plane architecture, Kolla-Ansible deployment, networking with OVN, storage with Ceph via Cinder, VM and image management, multi-tenancy with projects and quotas, and OpenShift on OpenStack considerations.
1. Hardware Requirements and Pre-Install Decisions
Node Roles
An OpenStack deployment has distinct node roles. In a lab you can run everything on one machine. In production, these roles run on separate servers for scale and blast radius isolation.
| Role | Services | Minimum Specs | Notes |
|---|---|---|---|
| Controller | Keystone, Glance, Nova API, Neutron Server, Cinder API, Horizon, RabbitMQ, MariaDB, Memcached, HAProxy | 3 nodes (HA), 8 vCPU, 32 GB RAM, 100 GB OS disk | Three controllers minimum for HA. RabbitMQ and MariaDB require three nodes for quorum. HAProxy balances API traffic across them. |
| Compute | Nova Compute, Neutron OVN agent, Ceph client | 2+ nodes, 16+ cores, 64+ GB RAM | Add compute nodes to scale VM capacity. No maximum. Each compute node runs the hypervisor (KVM) and Nova Compute daemon. |
| Network (Edge) | OVN Northbound/Southbound DBs, OVN Gateway chassis | 2+ nodes, 4 cores, 16 GB RAM | Handles North South routing between tenant networks and the physical network. Can run on controller nodes in smaller deployments. |
| Storage (Ceph) | Ceph MON, MGR, OSD | 3+ nodes, 4 cores, 16 GB RAM, dedicated SSDs per OSD | Separate Ceph cluster recommended for production. Ceph provides Cinder block storage and Glance image storage. |
Network Requirements
OpenStack requires multiple networks. At minimum you need three. In a proper production deployment, five or more isolated networks reduce the blast radius of any single network failure and keep different traffic types from interfering with each other.
- Management/API network: All OpenStack API calls, service-to-service communication, RabbitMQ, MariaDB replication. All controller, compute, and storage nodes need this network.
- Tunnel network: Carries the OVN Geneve overlay traffic between compute nodes for tenant VM-to-VM communication. Dedicated 10 GbE or better. Keep it separate from management.
- Provider/external network: Connects OVN gateway to the physical network for floating IPs and North South routing. Requires a NIC on gateway nodes that connects directly to the external network without a Linux bridge getting in the way.
- Storage network: Ceph OSD-to-OSD replication and compute-to-Ceph client traffic. 10 GbE minimum, 25 GbE preferred.
2. Control Plane Overview: The Core Services
Before deploying, understand what each service does. Most OpenStack operational questions come down to knowing which service owns which function and where to look when something breaks.
| Service | What It Does | Where Failures Show Up |
|---|---|---|
| Keystone | Identity and authentication. Issues tokens for all API calls. Every other service validates tokens against Keystone. | Login failures, "401 Unauthorized" on any API call, service-to-service authentication errors. |
| Nova | Compute management. Schedules VMs to compute nodes, manages VM lifecycle (start, stop, migrate), talks to the hypervisor via libvirt. | VM launch failures, migration errors, compute capacity reporting issues. |
| Neutron | Networking. Manages virtual networks, routers, floating IPs, security groups, and the OVN integration that enforces them on compute nodes. | VM network connectivity failures, floating IP failures, security group enforcement issues. |
| Cinder | Block storage. Manages volume lifecycle: create, attach, detach, snapshot, resize. Talks to the storage backend (Ceph RBD, LVM, or commercial arrays). | Volume attach failures, snapshot failures, storage quota errors. |
| Glance | Image service. Stores and retrieves VM images (qcow2, raw, ISO). Nova pulls images from Glance when launching a VM. | VM launch failures where "image not found," slow VM launches from large images in slow backends. |
| Horizon | Web dashboard. A thin UI layer over the OpenStack APIs. Everything in Horizon can also be done via CLI or direct API calls. | Horizon failures usually indicate a Keystone token issue or memcached cache corruption. The APIs still work when Horizon doesn't. |
| RabbitMQ | Message queue. Services communicate with each other by sending messages via RabbitMQ. A RabbitMQ failure stops inter-service communication. | Slow operations, delayed VM state changes, eventual complete stoppage of most operations. |
| MariaDB (Galera) | Database for all services except Swift. Runs as a three-node Galera cluster in HA deployments. | Any service failure. When MariaDB is down, no writes succeed anywhere. |
3. Deploying with Kolla-Ansible
Kolla-Ansible is the recommended production deployment tool for OpenStack. It deploys OpenStack services as Docker containers managed by Ansible. Every service runs in its own container with a consistent configuration across all nodes. Updates are container image swaps rather than package upgrades, which significantly simplifies the upgrade path.
PackStack is no longer recommended for production. It installs OpenStack services directly on the host OS via RPM packages, which creates complex dependency chains and makes upgrades risky. If you're evaluating PackStack for a lab, that's fine. Don't use it for anything you intend to operate long-term.
Install Kolla-Ansible from PyPI pinned to the versioned release rather than pulling from the stable branch tip via git. The stable branch can contain unreleased commits that haven't completed the full test cycle. Pinning to a PyPI release gives you a tested, reproducible deployment baseline. Kolla-Ansible 18.x maps to OpenStack 2024.2 (Dalmatian).
# Deploy node: a separate machine or the first controller # that orchestrates the Kolla-Ansible playbooks # Install in a Python virtualenv to avoid dependency conflicts python3 -m venv /opt/kolla-venv source /opt/kolla-venv/bin/activate # Install Kolla-Ansible for OpenStack 2024.2 (Dalmatian) # Pin to the PyPI release rather than the stable branch tip to avoid # unreleased commits landing in your deployment pip install 'kolla-ansible==18.4.0' # Create the Kolla config directory mkdir -p /etc/kolla chown $USER:$USER /etc/kolla # Copy the example configuration files cp -r $(python3 -c "import kolla_ansible; print(kolla_ansible.__path__[0])")/etc_examples/kolla/* /etc/kolla/ # Copy the multinode inventory template cp $(python3 -c "import kolla_ansible; print(kolla_ansible.__path__[0])")/ansible/inventory/multinode . # Install Ansible dependencies kolla-ansible install-deps
# /etc/kolla/globals.yml - critical settings to configure before deployment # OpenStack release kolla_base_distro: "ubuntu" openstack_release: "2024.2" # Networking: set to the management interface on controller/compute nodes network_interface: "eno1" # The interface used for tenant overlay (OVN Geneve tunnels) neutron_external_interface: "eno2" # The VIP used to access the OpenStack APIs (load-balanced across controllers) kolla_internal_vip_address: "10.0.100.50" # Enable High Availability (requires 3 controller nodes) enable_haproxy: "yes" enable_keepalived: "yes" # Networking backend: OVN is the current recommended backend neutron_plugin_agent: "ovn" # Storage: use Ceph for Cinder and Glance enable_cinder: "yes" cinder_backend_ceph: "yes" glance_backend_ceph: "yes" ceph_glance_pool_name: "glance" ceph_cinder_pool_name: "cinder-volumes" # Enable Horizon dashboard enable_horizon: "yes" # TLS for the API endpoints (recommended for production) kolla_enable_tls_internal: "yes" kolla_enable_tls_external: "yes"
source /opt/kolla-venv/bin/activate # Generate cryptographic passwords for all services kolla-genpwd # Run prechecks - fix everything flagged before proceeding kolla-ansible -i multinode prechecks # Bootstrap the target servers (installs Docker, sets sysctl, configures NTP) kolla-ansible -i multinode bootstrap-servers # Deploy OpenStack kolla-ansible -i multinode deploy # After deployment, generate the admin credentials file kolla-ansible -i multinode post-deploy # Source the admin credentials source /etc/kolla/admin-openrc.sh # Verify the deployment openstack endpoint list openstack compute service list openstack network agent list
4. Networking with OVN
OVN (Open Virtual Network) is the current standard networking backend for OpenStack Neutron, replacing the older OVS based ML2 plugin. OVN implements distributed routing: every compute node participates in routing decisions rather than routing all traffic through a centralized network node. This eliminates the network node as a single point of failure and scales routing throughput with the number of compute nodes.
Core Networking Concepts
- Provider networks: Directly connected to the physical network infrastructure. VMs on a provider network get IPs from an external DHCP server or from fixed IPs assigned via Nova/Neutron. Provider networks bypass the overlay entirely. Use provider networks when VMs need direct layer 2 access to the physical network.
- Tenant (self-service) networks: Private networks created by tenants using Geneve overlay encapsulation. VMs on tenant networks are isolated from each other and from the physical network unless a router and floating IP are configured. Each tenant creates their own network topology independently of other tenants.
- Routers: Neutron routers connect tenant networks to provider networks for North South traffic. With OVN, routing is distributed: the router is implemented on each compute node that has VMs connected to it, rather than on a dedicated network node.
- Floating IPs: Public IP addresses allocated from a provider network pool and associated with a specific VM's private tenant network IP. Incoming traffic to the floating IP is DNAT'd to the private IP. The VM itself only sees its private tenant network IP.
- Security groups: Stateful firewall rules applied per virtual port (VM NIC). OVN implements security groups in the OVS flow tables on each compute node. Rules apply to ingress and egress traffic independently. The default behavior is deny all ingress, allow all egress unless you override it.
source /etc/kolla/admin-openrc.sh # or source your project credentials # Create a tenant private network openstack network create tenant-net-01 openstack subnet create \ --network tenant-net-01 \ --subnet-range 192.168.100.0/24 \ --gateway 192.168.100.1 \ --dns-nameserver 8.8.8.8 \ tenant-subnet-01 # Create a router and attach the tenant network openstack router create tenant-router-01 openstack router add subnet tenant-router-01 tenant-subnet-01 # Set the router's external gateway to the provider network openstack router set \ --external-gateway public \ tenant-router-01 # Allocate a floating IP from the public network pool openstack floating ip create public # Launch a VM and associate the floating IP openstack server create \ --image ubuntu-22.04 \ --flavor m1.small \ --network tenant-net-01 \ --key-name my-keypair \ test-vm-01 # Associate the floating IP with the VM openstack floating ip set \ --port $(openstack port list --server test-vm-01 -f value -c ID | head -1) \ $(openstack floating ip list -f value -c "Floating IP Address" | head -1)
5. Storage: Cinder and Ceph Integration
Cinder provides block storage volumes that attach to VMs like virtual hard disks. The underlying storage backend determines the performance, reliability, and features available. Ceph RBD is the recommended backend for production OpenStack deployments: it provides thin provisioning, snapshots, cloning, and multipath access without a dedicated storage appliance.
Connecting Cinder to an Existing Ceph Cluster
# On your existing Ceph cluster, create pools and users for OpenStack ceph osd pool create cinder-volumes 64 64 ceph osd pool create glance 32 32 ceph osd pool create nova-vms 64 64 # for Nova ephemeral storage if needed # Create a Ceph user for Cinder ceph auth get-or-create client.cinder \ mon 'profile rbd' \ osd 'profile rbd pool=cinder-volumes, profile rbd pool=nova-vms, profile rbd-read-only pool=glance' \ -o /etc/ceph/ceph.client.cinder.keyring # Create a Ceph user for Glance ceph auth get-or-create client.glance \ mon 'profile rbd' \ osd 'profile rbd pool=glance' \ -o /etc/ceph/ceph.client.glance.keyring # Copy the keyring and ceph.conf to the Kolla config directories mkdir -p /etc/kolla/config/cinder/cinder-volume mkdir -p /etc/kolla/config/glance cp /etc/ceph/ceph.conf /etc/kolla/config/cinder/cinder-volume/ cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-volume/ cp /etc/ceph/ceph.conf /etc/kolla/config/glance/ cp /etc/ceph/ceph.client.glance.keyring /etc/kolla/config/glance/
Volume Types
Cinder volume types map to different storage backends or configurations. You create volume types as an admin and users select them when creating volumes. A common pattern: a "performance" volume type that maps to an all-flash Ceph pool, and a "standard" volume type that maps to a hybrid pool. Users choose the right tier for their workload. Quota enforcement applies per volume type, so you can limit how much premium storage any project can consume.
6. VM Management via Horizon and CLI
Horizon is the web dashboard and the right interface for operators doing infrequent tasks or showing the environment to stakeholders. The OpenStack CLI and direct API calls are the right tools for anything you do more than once. Horizon is a thin wrapper over the APIs with limited filtering and batch operation support. Everything Horizon can do, the CLI can do faster with better output formatting.
source /etc/kolla/admin-openrc.sh # List all servers across all projects (admin) openstack server list --all-projects # Get detailed information about a specific VM openstack server show test-vm-01 # Live migrate a VM to a specific compute node openstack server migrate --live-migration --host compute-node-02 test-vm-01 # Create a volume snapshot openstack volume snapshot create \ --volume my-data-volume \ --force \ my-data-snapshot-$(date +%Y%m%d) # Create a VM image from a running server (requires shutdown first for consistency) openstack server image create \ --name "web-server-template-v1" \ test-vm-01 # Show quotas for a project openstack quota show my-project # Update quotas for a project openstack quota set \ --cores 100 \ --ram 204800 \ --instances 50 \ --volumes 100 \ --gigabytes 10000 \ my-project
7. Multi-Tenancy: Projects, Users, and Quotas
Multi-tenancy in OpenStack is built around projects (historically called tenants). A project is an isolated namespace with its own network topology, VMs, volumes, images, and quota limits. Users belong to one or more projects with role based access control determining what they can do within each project.
Role Hierarchy
- admin: Cloud administrator. Can see and manage all projects, all resources, all quotas. Set policies, add compute nodes, manage Ceph backends.
- member: Standard project user. Can create and manage VMs, volumes, and networks within their assigned project. Can't see other projects' resources.
- reader: Read-only access within a project. Can view resource states but can't create or modify anything.
source /etc/kolla/admin-openrc.sh # Create a new project openstack project create \ --description "Development team project" \ --enable \ dev-team-project # Create a user and assign to the project openstack user create \ --project dev-team-project \ --password "SecurePassword123!" \ --enable \ dev-user-01 openstack role add \ --project dev-team-project \ --user dev-user-01 \ member # Set reasonable quotas for the project openstack quota set \ --cores 40 \ --ram 81920 \ --instances 20 \ --volumes 50 \ --gigabytes 5000 \ --floating-ips 10 \ --security-groups 20 \ dev-team-project # Verify the quotas openstack quota show dev-team-project
8. OpenShift on OpenStack
Running OpenShift on OpenStack is a supported and common pattern at organizations that have invested in OpenStack for their private cloud and want to run OpenShift workloads on the same infrastructure. OpenShift treats OpenStack as a cloud provider, using Nova for VM provisioning, Cinder for persistent volumes, Neutron for VM networking, and Octavia for load balancers.
Prerequisites
- Octavia load balancer service must be deployed in OpenStack. OpenShift uses Octavia to provision load balancers for the API server and for OpenShift Route and Service objects. Deploying OpenShift on OpenStack without Octavia requires manual load balancer configuration that's significantly more complex to maintain.
- The OpenStack project that will host OpenShift needs quotas sized for the cluster: typically 30+ cores, 96+ GB RAM, and 1 TB+ of Cinder storage for the initial three-master, three-worker deployment. Quotas that are too small cause cryptic installation failures late in the IPI deployment process.
- Floating IPs must be available. OpenShift IPI on OpenStack allocates floating IPs for the API VIP and the Ingress VIP automatically. Have at least two floating IPs pre-allocated or ensure the quota allows their creation.
IPI Installation
OpenShift IPI (Installer Provisioned Infrastructure) on OpenStack reads your clouds.yaml file for OpenStack credentials and handles all VM provisioning automatically. The install-config.yaml specifies the platform as openstack with the cloud name, the external network for floating IPs, the compute flavor for master and worker nodes, and the number of replicas. The openshift-install program then creates the VMs, configures networking, and bootstraps the cluster. You don't manually create VMs or configure Nova.
apiVersion: v1
baseDomain: yourdomain.local
metadata:
name: ocp-cluster-01
platform:
openstack:
cloud: myopenstack # matches entry in clouds.yaml
externalNetwork: public # the provider network with floating IPs
defaultMachinePlatform:
type: m1.xlarge # flavor: 4 vCPU, 16 GB RAM minimum for masters
pullSecret: '{"auths": ...}' # from Red Hat pull secret page
sshKey: |
ssh-rsa AAAA... # your SSH public key
controlPlane:
name: master
replicas: 3
platform:
openstack:
type: m1.xlarge
compute:
- name: worker
replicas: 3
platform:
openstack:
type: m1.large # 2 vCPU, 8 GB RAM minimum per worker
Key Takeaways
- PackStack is deprecated. Don't use it for production. Kolla-Ansible is the current recommended deployment tool. It deploys OpenStack services as Docker containers, making upgrades container image swaps rather than package dependency nightmares.
- Run kolla-ansible prechecks and fix every failure before running deploy. A partially failed deployment requires manual cleanup. Prechecks are not optional.
- OVN is the current standard networking backend, replacing the older OVS ML2 plugin. OVN implements distributed routing on every compute node, eliminating the network node as a single point of failure.
- Three controllers are the production minimum for HA. RabbitMQ and MariaDB Galera both require three nodes for quorum. A two-controller deployment has no quorum on these critical services and isn't truly HA.
- Ceph RBD is the recommended Cinder and Glance backend for production. It provides thin provisioning, snapshots, cloning, and multi-node redundancy without a dedicated storage appliance. Size Ceph pools before running kolla-ansible deploy and configure the keyring files in /etc/kolla/config before deployment.
- Horizon is a thin wrapper over the APIs. Use the OpenStack CLI for any operation you do more than once. Horizon has limited filtering, no bulk operations, and adds latency. The CLI is faster for every operational task.
- OpenShift IPI on OpenStack requires Octavia. Without it, load balancer provisioning for the API server and ingress controller fails late in the installation, which is one of the most painful places for an IPI install to fail. Deploy Octavia in OpenStack before starting the OpenShift installation.