Install Alauda Hyperflux

Download package and upload to cluster

You can download the app named 'Alauda Hyperflux' from the Marketplace on the Customer Portal website. The downloaded package is a tarball file named alauda-hyperflux-<version>.tar.gz.

Download the violet command line tool if it is not present on the machine:

  1. Log into the ACP Web Console and switch to the Administrator view.
  2. In Marketplace / Upload Packages, click Download Packaging and Listing Tool.
  3. Select the right OS/CPU arch, and click Download.
  4. Run chmod +x ${PATH_TO_THE_VIOLET_TOOL} to make the tool executable.

Save the following script in upload.sh, then edit the file to fill in the correct configuration values acording to the comments.

#!/usr/bin/env bash
# Set ACP address and admin user credentials
export PLATFORM_ADDRESS=https://platform-address  
export PLATFORM_ADMIN_USER=<admin>
export PLATFORM_ADMIN_PASSWORD=<admin-password>
# Set the package file to push
export PACKAGE_FILE=alauda-hyperflux-<version>.tar.gz  

VIOLET_EXTRA_ARGS=()
IS_EXTERNAL_REGISTRY=""

# If the image registry type of destination cluster is not platform built-in (external private or public repository).
# Additional configuration is required (uncomment following line):
# IS_EXTERNAL_REGISTRY=true
if [[ "${IS_EXTERNAL_REGISTRY}" == "true" ]]; then
    REGISTRY_ADDRESS=<external-registry-url>
    REGISTRY_USERNAME=<registry-username>
    REGISTRY_PASSWORD=<registry-password>

    VIOLET_EXTRA_ARGS+=(
        --dst-repo "${REGISTRY_ADDRESS}"
        --username "${REGISTRY_USERNAME}"
        --password "${REGISTRY_PASSWORD}"
    )
fi

# Push **Alauda AI Cluster** operator package to destination cluster
violet push \
    ${AI_CLUSTER_OPERATOR_NAME} \
    --platform-address=${PLATFORM_ADDRESS} \
    --platform-username=${PLATFORM_ADMIN_USER} \
    --platform-password=${PLATFORM_ADMIN_PASSWORD} \
    --clusters=${CLUSTER} \
    ${VIOLET_EXTRA_ARGS[@]}

Prepare your LLM and rerank service

Before installing Alauda Hyperflux, you need to prepare an LLM service for Alauda Hyperflux to use. You can use Azure OpenAI service, or deploy an On-Premise LLM service like vLLM using Alauda AI.

You will use the LLM service endpoint, model name and API key in the Alauda Hyperflux installation step.

Optionally, if you want to enable the rerank feature in Alauda Hyperflux, you also need to prepare a rerank service that supports Cohere Reranker API v2. See Setup On-Premise Reranker Service for one way to deploy this with Alauda AI + vLLM.

NOTE: Starting from v1.4.0, the bundled knowledge-base dump file ships inside the plugin package — you no longer need to download it separately, and the manual pg_restore step from earlier versions is gone. The init container restores the selected dump on first startup automatically. See Build a Custom Knowledge Base if you want to add or replace the bundled corpus with your own internal documentation.

Install Alauda Hyperflux cluster plugin

Go to Administrator / Marketplace / Cluster Plugins page, select "global" cluster from the cluster dropdown list, then find the Alauda Hyperflux plugin and click Install.

NOTE: Alauda Hyperflux MUST be installed in the Global cluster.

The install form is grouped below by topic. Required fields are marked (required).

Database

  • Enable builtin PGVector — when enabled, the chart provisions a single PostgreSQL + ParadeDB instance for Alauda Hyperflux. Set:
    • PGVector Storage Size — the storage size for the PostgreSQL PVC (e.g. 10Gi).
    • PGVector StorageClass name — the Kubernetes storage class for the PVC, e.g. sc-topolvm.
  • When disabled, create a Secret with the external PostgreSQL connection info instead. Hyperflux uses three logical databases on the same instance — docvec_sys_kb (built-in product knowledge base), docvec_user_kb (user-uploaded BYO Knowledge), and docvec_kb (chat history) — all created automatically by the init container if missing.
    apiVersion: v1
    kind: Secret
    metadata:
      name: pg-secret
      namespace: cpaas-system
    type: Opaque
    stringData:
      host: <your-pg-host>
      port: <your-pg-port>
      username: <your-pg-username>
      password: <your-pg-password>
      uri: "postgresql+psycopg://<your-pg-username>:<your-pg-password>@<your-pg-host>:<your-pg-port>"
    Then enter the secret name in pg database secret name.
  • PG database name — the chat-history database name (default docvec). Created on first start if missing.
  • PG collection name — the LangChain PGVector collection name used by the server. The default value matches the bundled gte-multilingual-base dump and should not be changed unless you are deploying a custom KB built with Build a Custom Knowledge Base.
  • Built-in KnowledgeBase File — choose which built-in dump to restore on first start:
    • docvec_gte_cs2000_<date>.dump (default) — chunk size 2000, balanced recall and answer focus.
    • docvec_gte_cs3000_<date>.dump — chunk size 3000, slightly better recall on long-form documents at the cost of larger context per hit.
  • Enable builtin Redis — when enabled, the chart provisions a single Redis instance used by the rate limiter. When disabled, supply a Redis credentials secret in redis database secret name.

Scheduling

  • Node Selector (optional) — pin the Hyperflux pods to specific nodes by label. Add one or more rows; different label keys are evaluated with OR.

LLM service

  • LLM Model type (required)azure or openai.
  • LLM Base URL (required) — base URL for LLM API calls. For an on-premise vLLM deployment use http://<your-vllm-host>:<port>/v1.
  • LLM Model Name (required) — the model name passed in API calls, e.g. gpt-5-mini or qwen2.
  • LLM API Key (required) — the API key for LLM API calls. Stored as an external password.
  • Azure API Version — only when LLM Model type = azure, e.g. 2024-12-01-preview.
  • Azure Deployment Name — only when LLM Model type = azure, e.g. o4-mini.

Reranker

  • Enable Reranker (required) — turn on Cohere-API-compatible reranking. Boosts answer relevance at the cost of one extra service hop. When enabled, set:
    • Cohere Reranker BaseUrl — base URL of the reranker service.
    • Cohere Reranker Model — model name.
    • Cohere Reranker API key — API key (any non-empty value works for vLLM deployments that don't enforce auth).

Agent Mode

  • Enable Agent Mode (required) — turn on multi-step reasoning so the agent can call MCP tools. Recommended: use a strong LLM (≥ GPT-4 / Qwen-72B class) when this is on; smaller models can loop or misuse tools.
  • Enable MCP Tools — load ACP MCP tools so the agent can read live cluster state. Only available when Agent Mode is on.
  • Expose MCP — expose the bundled acp-mcp-server over Ingress so external MCP clients (e.g. IDE-side coding agents) can reach it. Only available when Agent Mode is on.

NOTE: Earlier versions required setting an "MCP K8s API Server Address" (the erebus URL). That field has been removed in v1.4.0 — the bundled acp-mcp-server now talks to the cluster directly inside the global cluster, and external traffic is routed through the Ingress.

Retrieval (RAG) tuning

  • Total Search K (required) — number of candidates to fetch from the knowledge base before reranking, default 20.
  • RAG Similarity Threshold (required) — minimum cosine similarity for a chunk to be kept, default 0.81. Lower values trade precision for recall.
  • Cohere Reranker Top N (required) — number of top-ranked chunks fed to the LLM after reranking, default 6. Only applied when reranking is on.
  • Max History Number (required) — number of previous turns kept in the prompt, default 1.
  • Model Context Window — total context window of the LLM in tokens (e.g. 128000). Leave empty to auto-detect by model name; the conversation-history compressor uses this to decide when to summarise older turns.

Audit and identity

  • Admin Users — comma-separated list of usernames that can view audit logs in Alauda Hyperflux, e.g. admin@cpaas.io,admin.

Rate limiter

  • Enable Rate Limiter (required) — when on, per-user request frequency and daily token quotas are enforced via Redis.
  • Max Requests Per Minute (RPM) — per-user request cap, default 5.
  • RPM Window Time (Minute) — sliding window for the RPM check, default 5.
  • Max Total Tokens Per Day — per-user combined input + output token cap, default 1000000.
  • Max Input Tokens Per Day — per-user input token cap, default 200000.
  • Max Output Tokens Per Day — per-user output token cap, default 1000000.

Click Install to start installation. The init container will:

  1. Create the three logical databases (docvec_sys_kb, docvec_user_kb, docvec_kb) if missing.
  2. Restore the selected built-in dump into docvec_sys_kb.
  3. Apply schema migrations and ensure the BM25 index exists.

Troubleshooting

If the chat interface fails to respond, check the Alauda Hyperflux pod logs:

# Server
kubectl -n cpaas-system logs -l app=smart-doc -c serve

# Init container (first-start KB bootstrap and upgrade-time KB swap)
kubectl -n cpaas-system logs -l app=smart-doc -c init-database

Most issues are caused by:

  • Incorrect LLM service configuration — wrong base URL, wrong API version for Azure, wrong model name.
  • Cohere API misconfiguration when reranking is on.
  • The init container failing to create or restore the system KB database — the init log lines (prefixed [upgrade] for the data swap step) point to the failing step.