System Requirements for an Embedded Cluster Install

31 min

while the turbine platform can be installed on a single node for testing, a 3+ node cluster is recommended for production environments to provide redundancy and high availability (ha) any multiple node cluster must have an odd number of total nodes this table details the recommended sizing and data per node components single node 3 node cluster small 3 node cluster medium 3 node cluster large cpu 8 cpu cores 8 cpu cores 16 cpu cores 32 cpu cores memory 32 gb ram 32 gb ram 64 gb ram 128 gb ram storage 600 gb ssd / 3000 iops per node 600 gb ssd / 3000 iops per node 1 tb ssd / 3000 iops per node 1 tb ssd / 3000 iops per node record creation boundaries + active users records created in a day 250,000 total records 5 million active users 10 records created in a day 500,000 total records 20 million active users 30 records created in a day 1 million total records 20 million active users 50 records created in a day 1 million total records 20 million active users 200 integration calculations integrations in use < 20 average integration actions/day < 250,000 integrations in use < 20 average integration actions/day < 500,000 integrations in use < 20 average integration actions/day < 1 million integrations in use > 20 average integration actions/day < 1 million pods api 1 tasks 1 web 1 mongodb 1 reports 1 api 3 tasks 3 web 3 mongodb 3 reports 3 api 3 tasks 3 web 3 mongodb 3 reports 3 api 6 tasks 9 web 3 mongodb 3 reports 3 external mongodb resource recommendations this table illustrates the resource recommendations (per node) for a standalone mongo deployment all of these values can be subtracted from the system requirements above when allocating resources for the remainder of the turbine pods for more information about deploying on an external mongodb cluster, see deploy with an external mongodb cluster docid 4z bhymwnvyvrloycg0kj components single node 3 node cluster sm 3 node cluster med 3 node cluster lg cpu 4 cpu cores 4 cpu cores 8 cpu cores 8 cpu cores ram 16 gb ram 16 gb ram 16 gb ram 32 gb ram storage 300 gb ssd / 3000 iops per node 300 gb ssd / 3000 iops per node 700 gb ssd / 3000 iops per node 700 gb ssd / 3000 iops per node remaining turbine cluster resources this table illustrates the resources necessary for the remainder of turbine if you are using external mongodb resources components single node 3 node cluster sm 3 node cluster med 3 node cluster lg cpu 4 cpu cores 4 cpu cores 8 cpu cores 24 cpu cores ram 16 gb ram 16 gb ram 48 gb ram 96 gb ram storage 300 gb ssd / 3000 iops per node 300 gb ssd / 3000 iops per node 300 gb ssd / 3000 iops per node 300 gb ssd / 3000 iops per node resource utilization thresholds all nodes need to stay below certain resource utilization thresholds to ensure that pods always have available resources to operate in if any of the following thresholds are exceeded on a node, all pods on that node will be removed until the resource utilization is addressed resource threshold memory less than 100 mebibytes https //simple wikipedia org/wiki/mebibyte (mib) available disk space (/var/lib/containerd partition) less than 15% available disk space (/var/lib/kubelet partition) less than 10% available disk inodes (/var/lib/kubelet partition) less than 5% available backup requirements taking a snapshot requires enough free disk space for a compressed archive of the swimlane database to be saved in ephemeral storage before it is uploaded to the snapshot destination free disk space on the cluster at /var/lib/kubelet should be greater than or equal to the size of the uncompressed database to ensure there is no disk pressure during the snapshot process cloud service sizing tools sizing calculators sizing guides instance sizing calculators aws instance sizing aws https //aws amazon com/ec2/instance types/ instance sizing calculators azure instance sizing azure https //docs microsoft com/en us/azure/virtual machines/windows/sizes general#dsv2 series instance sizing calculators gcp instance sizing gcp https //cloud google com/compute/docs/machine types most cloud providers limit iops for disks and for instances/virtual machines consult your provider's documentation to ensure the effective iops for the cluster nodes meet the requirements in the sizing table above provider documentation aws disk performance limits https //docs aws amazon com/awsec2/latest/userguide/ebs volume types html aws instance performance limits https //docs aws amazon com/awsec2/latest/userguide/ebs optimized html azure disk performance limits https //docs microsoft com/en us/azure/virtual machines/disks scalability targets azure vm performance limits https //docs microsoft com/en us/azure/virtual machines/sizes general gcp disk performance limits https //cloud google com/compute/docs/disks/performance gcp vm performance limits https //cloud google com/compute/docs/disks/performance#machine type disk limits critical prerequisites the following operating systems have been tested by swimlane for your node setup rhel 8 8 6, 8 9, 8 10 rhel 9 9 2, 9 4, 9 5 rocky linux 9 9 2, 9 4 ubuntu 20 04 6 lts, 22 04 2 lts, 24 04 1 lts amazon linux amazon linux 2, amazon linux 2023 limitations only static ips are supported (dynamic ips are not allowed), and the selected static ip cannot be changed later node hostnames must be static and cannot be changed nodes must not have any existing installations of kubernetes, docker, or containerd automatic updates for kubernetes and docker related packages must be disabled updating these will be handled through the turbine platform installer ipv6 and dual stack networks are not supported ip forwarding must be enabled the installer enables ipv4 forwarding, but you must ensure it remains enabled on all cluster nodes permanently, including after reboots prerequisites for installation on rocky linux or rhel 9 x nfs utils conntrack tools socat git fio use the following command to install sudo dnf y install nfs utils conntrack tools socat git fio container selinux tar zip unzip for rocky linux 9 2, run the following command to change the file permissions sudo chmod 755 /etc/rc d/rc local for each node, ensure that you have sudo/root access numa (non uniform memory access) disabled accurate system time to maintain accurate system time you must have network time protocol (ntp) or a similar time sync service critical cpu architecture prerequisites cpu architecture must be compatible with mongodb, unless an external mongo solution is used see the https //www mongodb com/docs/v7 0/administration/production notes/#platform support matrix https //www mongodb com/docs/v7 0/administration/production notes/#platform support matrix mongodb platform support matrix https //www mongodb com/docs/v5 0/administration/production notes/#platform support matrix for more information to outline some important requirements noted in the mongodb documentation avx ( https //en wikipedia org/wiki/advanced vector extensions#cpus with avx https //en wikipedia org/wiki/advanced vector extensions#cpus with avx ) cpu instruction set is required to run mongodb numa (non uniform memory access) needs to be disabled critical network prerequisites the ip ranges 10 32 0 0/20 and 10 96 0 0/22 are the default ip ranges used internally in the cluster for kubernetes service and pod networking if these are in use in your network and routable by the cluster nodes, the internal cluster ip ranges will need to be overridden see define custom pod and service subnets docid\ qunhxryi23les7qgzoyp1 for instructions on overriding the internal cluster ip ranges at a minimum, a network throughput of 1gbps is generally acceptable for most use cases swimlane recommends, maintaining a latency of less than 5 milliseconds between workload pods and the primary mongodb pod for latency between mongodb replica set members, a more relaxed threshold of 10 milliseconds is sufficient ensure all nodes are in the same cloud provider region or physical data center network nodes behind different wan links in the same cluster are not supported critical prerequisites for airgapped network installations for access to these optional services, ensure you have these things within the airgapped network ldap login functionality requires an ldap server inside of the airgapped subnet, or access to outside subnets or the ip or domain where the ldap server resides in order to open these ports, non secure ldap uses port 389, but ldaps (secure ldap) uses port 636 and is preferred sso login functionality requires the service to be able to reach inside the airgapped network (where the turbine instance resides) email functionality requires the turbine instance to use a functioning mail proxy that resides within the airgapped subnet, access to outside subnets where an email server resides, or to your chosen email server on the internet in order to open these ports, non secure smtp uses port 25, but the secure smtp uses port 587 and is preferred in order to provide threat intelligence enrichment, the soc solution requires access to the following threat intelligence urls virustotal https //virustotal com https //virustotal com/ and subdomains url haus urlhaus malware url exchange https //urlhaus abuse ch/ and subdomains recorded future recorded future securing our world with intelligence http //recordedfuture com/ and subdomains ipqualityscore fraud prevention | bot detection | bot protection | prevent fraud with ipqs http //ipqualityscore com/ and subdomains each of these requires tcp port 443 access and tcp port 80 access critical load balancer prerequisites a load balancer that supports hairpinning is required for ha installations here are some suggested load balancers and configurations that you can use layer 7 load balancers aws application load balancer docid\ tl2j2e7j8f0dqnpjxipck azure standard application gateway docid\ offu d1dgya gr2yi8zbj azure standard v2 application gateway docid\ jqogempcn3li4ehwmyr85 gcp https load balancer docid 3pnnqlt1xtzs5cfugwjol haproxy load balancer docid\ gxu2fjdl9trvrremw3ggm tcp forwarding layer 4 load balancers aws network load balancer docid\ kpvoopioo3vzu3blwwj3h azure load balancer docid\ i6m czmguxmg4gjxe9yii gcp tcp load balancer docid 05obhk lgtulp7bzwhgyq haproxy load balancer docid\ gxu2fjdl9trvrremw3ggm extra considerations for load balancer set up on nodes each node in the cluster needs to be reachable by the load balancer on these ports tcp 443 (turbine platform ui) tcp 8800 (turbine platform installer ui) tcp 6443 (kubernetes api) tcp 80 (optional http to https redirect for the turbine platform ui) a load balancer is required for the kubernetes api and the turbine platform installer if you are using a layer 4 load balancer for the turbine platform installer and the turbine platform, it can be combined with the kubernetes api load balancer so that only one is required the kubernetes api load balancer must be a layer 4 load balancer the front end port on the load balancer should be 6443 (kubernetes control plane) the back end port on the load balancer to the nodes should be 6443 the turbine platform installer and the turbine platform load balancer can be either a layer 4 or layer 7 load balancer front end ports on the load balancer should be 443 (turbine platform) and 8800 (turbine platform installer) backend ports on the load balancer map different based on load balancer type for layer 7 load balancers 443 should map to 4443 8800 should map to 8800 for layer 4 load balancers 443 should map to 443 8800 should map to 8800 other critical prerequisites for production environments use an amazon s3 bucket or s3 compatible storage solution to store snapshots externally so that a total cluster outage does not result in a loss of snapshots if amazon s3 is unavailable for use then an s3 compatible solution like min io https //min io/ can be used but should be installed external to the cluster in order to maximize disk space, swimlane recommends that you set up partitions here are the minimum recommended partition sizes partition size description of highest storage consumers by function storage concerns and what to look out for / 50 gb base os, installation files, logs, and other smaller cluster dependencies local logs storage and log rotation policies /var/lib/containerd 100 gb container images, logs, and runtime volumes image growth (unused images get pruned when the default images threshold of 15% is reached) and container log growth /var/lib/kubelet 100 gb kubernetes runtime, ephemeral storage, and in flight snapshot temporary storage snapshot size and pod temporary storage increasing /var/openebs 300 gb persistent storage subsystem used for database volumes swimlane platform installer database, and completed local snapshots /var/log 5 gb storage for the kubernetes api server logs 100% usage of /var/log/apiserver will cause a swimlane outage until space is retrieved local storage of snapshots (see note below) amount of turbine data and integrations the directory /var/lib/kurl/ is the default location for installer temporary files but the path can be overridden at install time the partition that path is on needs to have 20gb available during the install to ensure successful installation and can be removed after the installation has completed production environments should be using external storage locations for snapshots if you plan to store snapshots locally in a test/lab environment, you will need to make the /var/openebs partition larger to accommodate them if you plan to use a separate disk or volume for /var/openebs ensure it meets the minimum iops defined in the table with recommended data and sizing at the beginning of this topic the partition /var/log needs at least 5gb available to avoid degrading a turbine cluster or outage domain name system (dns) use the ip or dns record for the kubernetes api load balancer to specify the load balancer address during the initial installation a dns record should be created that points to the turbine platform installer and turbine platform load balancer this will be the dns address specified when you configure turbine after the installation append that address with 8800 in order to access the turbine platform installer ui if only one layer 4 load balancer is used then a single dns record can be created and used for both purposes you must use dns compliant records a dns record can be up to 63 characters long and can only contain letters, numbers, and hyphens the record cannot start or end with a hyphen, nor have consecutive hyphens exceptions to network access control lists (acl) turbine installations on servers with tight network access control (nac) will need several exceptions made to properly install, license, and stage a turbine deployment with the turbine platform installer see the tables below for the outbound and inbound exceptions required outbound url acl exceptions exception purpose get swimlane io turbine platform installation script k8s kurl sh turbine platform installation script kurl sh turbine platform installation script kurl sh s3 amazonaws com turbine platform installation script dependencies registry replicated com turbine platform container images proxy replicated com turbine platform container images ghcr io swimlane platform container dependency images registry k8s io swimlane platform container dependency images k8s gcr io turbine platform container dependency images storage googleapis com turbine platform container dependency images quay io cdn quay io cdn01 quay io cdn02 quay io cdn03 quay io cdn04 quay io cdn05 quay io cdn06 quay io turbine platform container dependency images replicated app turbine platform installer license verification auth docker io docker authentication registry 1 docker io docker registry production cloudflare docker com docker infrastructure files pythonhosted org python packages for turbine integrations pypi org python packages for turbine integrations \<loadbalancerip> 6443 kubernetes api port requirements external access the following ports are required externally to allow access to the turbine platform components ports protocol purpose access from 443 tcp turbine platform ui clients that need to access the turbine platform ui 80 tcp turbine platform ui optional http to https redirect for the turbine platform ui 8800 tcp turbine platform installer ui clients that need to access the turbine platform installer ui 22 tcp shell access management workstations to manage the cluster nodes and install turbine between cluster nodes the following ports are required between the clusters nodes to allow cluster operation ports protocol purpose 2379 2380 tcp kubernetes etcd 6443 tcp kubernetes api 8472 udp kubernetes cni 10250 10252 tcp kubernetes components (kubelet, kube scheduler, kube controller) from load balancers the following ports are required between the cluster nodes and the load balancer(s) ports protocol purpose 443 tcp turbine platform ui 80 tcp optional http to https redirect for the turbine platform ui 8800 tcp turbine platform installer ui 6443 tcp kubernetes api available ports in addition to all ports listed above in the between cluster nodes table, the following ports are required to be available and unused by other processes on each cluster node in order to install ports protocol purpose 2381 tcp kubernetes etcd 8472 tcp kubernetes cni 10248 tcp kubernetes kubelet health server 10249 tcp kubernetes kube proxy metrics server 10257 tcp kubernetes kube controller manager health server 10259 tcp kubernetes kube scheduler health server external monitoring considerations swimlane recommends that any tpi installation has some amount of external monitoring set up and set to alert when any user defined thresholds are met as to possible instances of your production instances going down different installation scenarios may call for different metrics to monitor for, so your implementation may vary as a baseline, the following metrics are recommended are all nodes healthy? does each node have sufficient free space in all partitions? are cpu and memory usage levels within acceptable levels? is disk latency within acceptable ranges? are any pods in a not ready state? are any deployments or statefulsets not reporting the correct number of ready replicas? do any load balancers have health checks in place? are they healthy? third party monitoring solutions there are several third party monitoring solutions that you can use to monitor resource usage for a node any tool that you put into use should be installed externally to the cluster so as to not interfere with cluster operations and to be able to alert if a metric enters into a failing scenario these products may require that their own agents or exporters be installed on the nodes in order to facilitate monitoring any agent or exporter should be tested against your cluster to validate that they do not interfere with turbine operations or port requirements the use of net core process monitors such as dynatrace is known to cause instabilities in swimlane/turbine kubernetes pods, specifically higher than normal cpu and/or memory consumption please use such software with caution and uninstall it from your swimlane/turbine servers if its use causes swimlane/turbine pods to crash and restart

Embedded Cluster Installation

Infrastructure Examples