
smartscaler apps installer
smartscaler-apps-installer
Repository Info
About This Server
smartscaler-apps-installer
Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.
Documentation
Smart Scaler Apps Installer
Ansible-based installer for Smart Scaler components and Kubernetes cluster deployment.
Table of Contents
- Prerequisites for Deploying K8s Cluster (~2–3 mins)
- Installation Steps for Deploying K8s Cluster (~15–20 mins)
- Prerequisites for Installing SmartScaler Apps (~2 mins)
- Instructions to Deploy SmartScaler Apps (Depends on NIM profiles 70b(~20-25 mins), 8b(~10-15 mins), 1b(~10 mins))
- Example Test Run Steps (~15 mins)
- Execution Order Control (optional) (~1 min)
- Destroying the Kubernetes Cluster (~5 mins)
- Documentation Links
- Troubleshooting
1. Prerequisites for Deploying K8s Cluster
System Requirements
Control Plane Nodes (Master)
- CPU: 8 cores minimum
- RAM: 16GB minimum
- Storage: 500GB minimum (Depends on NIM Profile Requirements for loading Image/Nim Cache PVC Requirements)
- OS: Ubuntu 22.04+ or compatible Linux distribution
Worker Nodes (Optional)
- CPU: 8 cores minimum
- RAM: 16GB minimum
- Storage: 500GB minimum (Depends on NIM Profile Requirements for loading Image/Nim Cache PVC Requirements)
- OS: Same as control plane nodes
Required Software
- Python 3.x and pip
- Git
- SSH key generation capability
- helm v3.15.0+
- kubectl v1.25.0+
Network Requirements
- SSH access between installer machine and all cluster nodes
- Internet connectivity for downloading packages
- Open ports: 6443 (API server), 2379-2380 (etcd), 10250 (kubelet)
2. Installation Steps for Deploying K8s Cluster
Step 2.1: Clone Repository and Setup Environment
# Clone the repository
git clone https://github.com/smart-scaler/smartscaler-apps-installer.git
cd smartscaler-apps-installer
# Install Python3
sudo apt update
sudo apt-get install python3-venv python3-full -y
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Python dependencies
chmod +x files/install-requirements.sh
./files/install-requirements.sh
# Install Ansible collections
LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 ansible-galaxy collection install -r requirements.yml --force
Step 2.2: Generate SSH Keys
# Generate SSH key for cluster access
ssh-keygen -t rsa -b 4096 -f ~/.ssh/k8s_rsa -N ""
# Copy SSH key to each node (repeat for all nodes)
ssh-copy-id -i ~/.ssh/k8s_rsa.pub user@node-ip
Step 2.3: Configure user_input.yml
Edit user_input.yml with your cluster configuration:
This section defines the settings required to enable and configure a Kubernetes cluster deployment using Ansible.
🔧 Note: Replace placeholders with actual values before running the playbook.
kubernetes_deployment:
enabled: true # Enable Kubernetes deployment via Ansible
api_server:
host: "PUBLIC_IP" # Public IP of Kubernetes API server
port: 6443 # Default secure port
secure: true # Use HTTPS (recommended)
ssh_key_path: "/absolute/path/to/.ssh/k8s_rsa" # SSH private key path
default_ansible_user: "REPLACE_SSH_USER" # SSH user (e.g., ubuntu, ec2-user)
ansible_sudo_pass: "" # Optional: sudo password
control_plane_nodes:
- name: "master-1"
ansible_host: "PUBLIC_IP" # Public IP for SSH
ansible_user: "REPLACE_SSH_USER"
ansible_become: true
ansible_become_method: "sudo"
ansible_become_user: "root"
private_ip: "PRIVATE_IP" # Internal/private IP
⚙️ For Single Node: Quick Configuration Update (Command-Line Shortcut)
You can quickly update your user_input.yml by replacing only the values in this command based on your environment.
Keep the placeholder keywords (PUBLIC_IP, PRIVATE_IP, etc.) on the left side exactly as-is.
⚠️ Warning: Replace only the values on the right-hand side (
192.168.1.100,root, etc.) with your actual environment details. Do not modify the placeholder keywords (PUBLIC_IP,PRIVATE_IP, etc.) — they are required for matching.
🧪 Example Command
sed -i \
-e 's|PUBLIC_IP|172.235.157.18|g' \
-e 's|PRIVATE_IP|172.235.157.18|g' \
-e 's|REPLACE_SSH_USER|root|g' \
-e 's|/absolute/path/to/.ssh/k8s_rsa|/root/.ssh/k8s_rsa|g' \
-e '/kubernetes_deployment:/,/^[[:space:]]*[^[:space:]]*enabled:/ s/enabled: false/enabled: true/' \
user_input.yml
✅ This command will:
- Replace
PUBLIC_IPandPRIVATE_IPplaceholders with your node IP- Set the correct SSH user and key path
- Enable Kubernetes deployment by updating
enabled: false→enabled: true
📌 Note:
If you're deploying on a single node and running the command from the same server, you can use the same IP address for both PUBLIC_IP and PRIVATE_IP.
Step 2.4: Deploy Kubernetes Cluster
# Make the script executable
chmod +x setup_kubernetes.sh
# Run the installation script with sudo
./setup_kubernetes.sh
Step 2.5 Change ownership of the smartscaler working directory
sudo chown $(whoami):$(whoami) -R .
# Set the KUBECONFIG environment variable
export KUBECONFIG=output/kubeconfig
# Verify cluster access and node status
kubectl get nodes
Step 2.6: Verify Installation
# Check cluster status
kubectl get nodes
kubectl cluster-info
# Verify all system pods are running
kubectl get pods --all-namespaces
3. Prerequisites for Installing SmartScaler Apps
Cluster Requirements
- Kubernetes cluster must be running and accessible
- kubectl configured with proper kubeconfig
- Helm v3.15.0+ installed
Required Environment Variables
Set the following environment variables before deployment:
export NGC_API_KEY="your_ngc_api_key"
export NGC_DOCKER_API_KEY="your_ngc_docker_api_key"
export AVESHA_DOCKER_USERNAME="your_avesha_username"
export AVESHA_DOCKER_PASSWORD="your_avesha_password"
Configure user_input.yml
Important: Set kubernetes_deployment.enabled to false in user_input.yml before running apps installation:
kubernetes_deployment:
enabled: false # Must be false for apps-only deployment
> ℹ️ **Required Kubeconfig Settings** – Already included above; this section can be skipped.
global_control_plane_ip: "YOUR_MASTER_PUBLIC_IP" # Provide the public IP for metallb/Nginx
global_kubeconfig: "output/kubeconfig" # Required: Path to kubeconfig file
global_kubecontext: "kubernetes-admin@cluster.local" # Required: Kubernetes context
use_global_context: true # Required: Use global context
Quick Configuration Update (Command-Line Shortcut)
You can quickly replace the placeholder values in your user_input.yml configuration using the following sed command:
🧪 Example:
sed -i \
-e '/kubernetes_deployment:/,/^[[:space:]]*[^[:space:]]*enabled:/ s/enabled: true/enabled: false/' \
user_input.yml
4. Instructions to Deploy SmartScaler Apps
Step 4.1: Verify Prerequisites
# Verify cluster access
kubectl get nodes
kubectl cluster-info
# Verify required tools
kubectl version --client
helm version
# Verify environment variables
echo $NGC_API_KEY
echo $NGC_DOCKER_API_KEY
echo $AVESHA_DOCKER_USERNAME
echo $AVESHA_DOCKER_PASSWORD
Step 4.2: Deploy Applications
# Deploy with explicit credentials
ansible-playbook site.yml \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vvvv
Step 4.3: Verify Deployment
# Check all namespaces
kubectl get namespaces
# Expected namespaces:
# - gpu-operator
# - keda
# - monitoring
# - nim
# - nim-load-test
# - smart-scaler
# Verify component status
kubectl get pods -n gpu-operator
kubectl get pods -n monitoring
kubectl get pods -n keda
kubectl get pods -n nim
kubectl get pods -n smart-scaler
kubectl get pods -n nim-load-test
Expected output:
## Infrastructure Components
# GPU Operator
gpu-operator-666bbffcd-drrwk 1/1 Running 0 96m
gpu-operator-node-feature-discovery-gc-7c7f68d5f4-dz7jk 1/1 Running 0 96m
gpu-operator-node-feature-discovery-master-58588c6967-8pjhc 1/1 Running 0 96m
gpu-operator-node-feature-discovery-worker-xkbk2 1/1 Running 0 96m
# Monitoring
alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 98m
prometheus-grafana-67dc5c9fc9-5jzhh 3/3 Running 0 98m
prometheus-kube-prometheus-operator-775d58dc6b-bgglg 1/1 Running 0 98m
prometheus-kube-state-metrics-856b96f64d-7st5q 1/1 Running 0 98m
prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 98m
prometheus-prometheus-node-exporter-nm8zl 1/1 Running 0 98m
pushgateway-65497548cc-6v7sv 1/1 Running 0 97m
# Keda
keda-admission-webhooks-7c6fc8d849-9cchf 1/1 Running 0 98m
keda-operator-6465596cb9-4j54h 1/1 Running 1 (98m ago) 98m
keda-operator-metrics-apiserver-dc4dd6d79-gzxpq 1/1 Running 0 98m
# AI/ML
meta-llama3-8b-instruct-pod 1/1 Running 0 97m
nim-k8s-nim-operator-7565b7477b-6d7rs 1/1 Running 0 98m
# Smart Scaler
smart-scaler-llm-inf-5f4bf754dd-6qbm9 1/1 Running 0 98m
# Load Testing Service
locust-load-54748fd47d-tndsr 1/1 Running 0 97m
Step 4.4: Accessing Prometheus & Grafana via NodePort
After deploying the application stack, Prometheus and Grafana can be accessed through the exposed NodePort services using your node’s IP address.
🧾 Check Service Ports
Run the following command to list the monitoring services:
kubectl get svc -n monitoring
✅ Sample Output
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3m21s
prometheus-grafana NodePort 10.233.59.186 <none> 80:32321/TCP 3m30s
prometheus-kube-prometheus-alertmanager ClusterIP 10.233.23.33 <none> 9093/TCP,8080/TCP 3m30s
prometheus-kube-prometheus-operator ClusterIP 10.233.49.28 <none> 443/TCP 3m30s
prometheus-kube-prometheus-prometheus NodePort 10.233.38.213 <none> 9090:30090/TCP,8080:32020/TCP 3m30s
prometheus-kube-state-metrics ClusterIP 10.233.40.63 <none> 8080/TCP 3m30s
prometheus-operated ClusterIP None <none> 9090/TCP 3m21s
prometheus-prometheus-node-exporter ClusterIP 10.233.55.211 <none> 9100/TCP 3m30s
pushgateway ClusterIP 10.233.42.8 <none> 9091/TCP 104s
🌐 Access URLs
Assuming your node IP is 192.168.100.10:
-
Grafana Dashboard 🔗 http://192.168.100.10:32321
-
Prometheus UI 🔗 http://192.168.100.10:30090
⚠️ Note:
- User and Password for Grafana UI is: admin/prom-operator
- NodePort values (like
32321for Grafana and30090for Prometheus) may change as per your environment. Always verify withkubectl get svc -n monitoring.- Ensure firewall rules or cloud security groups allow traffic to these NodePorts.
-
Import NIM Dashboard
Import the following NIM Dashboard JSON in Grafana https://github.com/smart-scaler/smartscaler-apps-installer/blob/main/files/grafana-dashboards/nim-dashboard.json
Note: Customize to your environment and model, if needed.
Proceed to Test Run
📖 (Example Test Run Steps)
Documentation Links
- User Input Configuration Guide - Complete user_input.yml guide
- User Input Reference - All configuration options
- Kubernetes Configuration - Cluster setup details
- Kubernetes Firewall Configuration - Network and firewall setup
- NVIDIA Container Runtime Configuration - GPU runtime setup
Troubleshooting
Common Issues
-
SSH Connection Failed
- Verify SSH keys are properly copied to all nodes
- Check SSH user permissions and sudo access
-
Cluster Deployment Failed
- Check system requirements are met
- Verify network connectivity between nodes
- Review firewall settings
-
Apps Deployment Failed
- Ensure
kubernetes_deployment.enabledis set tofalse - Verify all environment variables are set
- Check cluster accessibility with
kubectl get nodes
- Ensure
-
GPU Support Issues
- Verify NVIDIA drivers are installed on nodes
- Check
nvidia_runtime.enabledis set totrue - Review GPU operator pod status
Debug Commands
# Check specific namespace issues
kubectl describe pods -n <namespace>
kubectl logs -n <namespace> <pod-name>
# Verify cluster resources
kubectl top nodes
kubectl get events --all-namespaces
For additional support, please refer to the detailed documentation in the docs/ folder or create an issue in the repository.
Execution Order Control
The deployment process follows a specific execution order defined in user_input.yml. You can control which components to execute by modifying the execution order or using --extra-vars with Ansible.
Available Components
Core Infrastructure (Optional)
metallb_chart- MetalLB load balancer installationmetallb_l2_config- L2 configuration for MetalLBmetallb_ip_pool- IP pool configuration for MetalLBnginx_ingress_config- NGINX ingress controller configurationnginx_ingress_chart- NGINX ingress controller installationcert_manager- Cert-manager for certificate management (required for AMD GPU operator)
Base Components
gpu_operator_chart- NVIDIA GPU operator installationprometheus_stack- Prometheus monitoring stackpushgateway_manifest- Prometheus Pushgatewaykeda_chart- KEDA autoscalingnim_operator_chart- NIM operator installationcreate_ngc_secrets- NGC credentials setupverify_ngc_secrets- NGC credentials verificationcreate_avesha_secret- Avesha credentials setup
AMD GPU Support (Alternative to NVIDIA)
amd_gpu_operator_chart- AMD GPU operator for AMD Instinct GPU acceleratorsamd_gpu_deviceconfig_manifest- AMD GPU device configuration and settings
EGS Installation
kubeslice_controller_egs- KubeSlice EGS controller for multi-cluster managementkubeslice_ui_egs- KubeSlice EGS management UI interfaceegs_project_manifest- EGS project configurationegs_cluster_registration_worker_1- Register worker clusterfetch_worker_secret_worker_1- Fetch worker authentication secretskubeslice_worker_egs_worker_1- Install EGS worker components
NIM 70B Components
nim_cache_manifest_70b- NIM cache for 70B modelwait_for_nim_cache_70b- Wait for cache initializationnim_cache_wait_job_70b- Cache wait jobnim_service_manifest_70b- NIM service for 70B modelkeda_scaled_object_manifest_70b- KEDA scaling configurationcreate_inference_pod_configmap_70b- Inference configurationsmart_scaler_inference_70b- Smart Scaler setupcreate_locust_configmap_70b- Load test configurationlocust_manifest_70b- Load testing setupsmart_scaler_mcp_server_manifest- MCP server configuration
NIM 1B Components (Optional)
nim_cache_manifest_1b- NIM cache for 1B modelnim_service_manifest_1b- NIM service for 1B modelkeda_scaled_object_manifest_1b- KEDA scaling configurationcreate_inference_pod_configmap_1b- Inference configurationsmart_scaler_inference_1b- Smart Scaler setupcreate_locust_configmap_1b- Load test configurationlocust_manifest_1b- Load testing setup
NIM 8B Components (Optional)
nim_cache_manifest_8b- NIM cache for 8B modelnim_service_manifest_8b- NIM service for 8B modelkeda_scaled_object_manifest_8b- KEDA scaling configurationcreate_inference_pod_configmap_8b- Inference configurationsmart_scaler_inference_8b- Smart Scaler setupcreate_locust_configmap_8b- Load test configurationlocust_manifest_8b- Load testing setup
Controlling Execution
To execute specific components, use the execution_order variable with a list of components:
# Execute only GPU operator and monitoring stack
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['gpu_operator_chart','prometheus_stack']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute AMD GPU operator setup (alternative to NVIDIA)
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['cert_manager','amd_gpu_operator_chart','amd_gpu_deviceconfig_manifest']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute EGS installation
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['cert_manager','kubeslice_controller_egs','kubeslice_ui_egs','egs_project_manifest','egs_cluster_registration_worker_1','fetch_worker_secret_worker_1','kubeslice_worker_egs_worker_1']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute only NGINX ingress setup
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['nginx_ingress_config','nginx_ingress_chart']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute all NIM 70B components
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['nim_cache_manifest_70b','wait_for_nim_cache_70b','nim_cache_wait_job_70b','nim_service_manifest_70b','keda_scaled_object_manifest_70b','create_inference_pod_configmap_70b','smart_scaler_inference_70b','create_locust_configmap_70b','locust_manifest_70b']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
💡 Tip: Components are executed in the order they appear in the list. Make sure to list dependent components in the correct order and include all required credentials.
Destroying the Kubernetes Cluster
To completely remove the Kubernetes cluster and clean up all resources, run the following command from the root directory:
ansible-playbook kubespray/reset.yml -i inventory/kubespray/inventory.ini
This command will:
- Remove all Kubernetes components from the nodes
- Clean up all cluster-related configurations
- Reset the nodes to their pre-Kubernetes state
⚠️ Warning: This action is irreversible. Make sure to backup any important data before proceeding with the cluster destruction.
Example Test Run Steps
Each test run can include multiple cycles, with each cycle typically lasting around 1 hour. Running multiple cycles helps in evaluating consistency and observing Smart Scaler's behavior over time.
🔄 Starting (restarting) a Test Run
Follow these steps to (re)start a clean test cycle:
Scale Down LLM and Load Generator Pods
Scale the Locust deployment replicas to 0:
kubectl scale deployment locust-load-70b --replicas=0 -n nim-load-test
Scale the NIM LLM deployment replicas to 1:
kubectl scale deployment meta-llama3-70b-instruct --replicas=1 -n nim
Verify Smart Scaler and HPA Settings
Ensure the HorizontalPodAutoscaler (HPA)replica is also set to 1:
kubectl get hpa -n nim
Wait for Stabilization
Wait for some time (5-20 minutes) to allow both Smart Scaler and HPA to fully scale down and stabilize at 1 replica.
kubectl get hpa -n nim
Ensure the HorizontalPodAutoscaler (HPA)replica is also set to 1:
Smart Scaler/HPA configuration, verify configuration
Smart Scaler
Note:
- verify and edit scaledobject, if needed (Typically you would need to edit this if you are switching from HPA to Smart Scaler)
Edit ScaledObject resource
kubectl edit scaledobjects llm-demo-keda-70b -n nim
Set spec.metadata fields with the following data
- metadata:
metricName: smartscaler_hpa_num_pods
query: smartscaler_hpa_num_pods{ss_app_name="nim-llama",ss_deployment_name="meta-llama3-8b-instruct",job="pushgateway",ss_app_version="1.0", ss_cluster_name="nim-llama", ss_namespace="nim", ss_tenant_name="tenant-b200-local"}
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
threshold: "1"
Check and reset the spec.maxReplicaCount to 8
For HPA setup
Note:
- verify and edit scaledobject, if needed (Typically you would need to edit this if you are switching from Smart Scaler to HPA)
Edit ScaledObject resource
kubectl edit scaledobjects llm-demo-keda-70b -n nim
Set spec.metadata fields with the following data
Note: threshold value will be different for different models and GPUs, based on the PSE values.
- For B200: llama3.1 70b, threshold:80
- For B200: llama3.1 8b, threshold:200
- metadata:
metricName: smartscaler_hpa_num_pods
query: sum(num_requests_running) + sum(num_requests_waiting)
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
threshold: "80"
Check to make sure current replicas set to 1 and model pod is running and ready
kubectl get hpa -n nim
kubectl get pods -n nim
Restart Load Generation
Scale the Locust replicas up to 1 to initiate the next test cycle:
kubectl scale deployment locust-load-70b -n nim-load-test --replicas=1
Monitor the Test
Observe metrics and scaling behavior using the NIM Dashboard.
Quick Start
Clone the repository
git clone https://github.com/smart-scaler/smartscaler-apps-installerInstall dependencies
cd smartscaler-apps-installer
npm installFollow the documentation
Check the repository's README.md file for specific installation and usage instructions.
Repository Details
Recommended MCP Servers
Discord MCP
Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.
Knit MCP
Connect AI agents to 200+ SaaS applications and automate workflows.
Apify MCP Server
Deploy and interact with Apify actors for web scraping and data extraction.
BrowserStack MCP
BrowserStack MCP Server for automated testing across multiple browsers.
Zapier MCP
A Zapier server that provides automation capabilities for various apps.