1. Home
  2. Knowledge Base
  3. Documentation
  4. Deployments
  5. Pipeline Deployment Overview

Pipeline Deployment Overview

A Pipeline Deployment represents the configuration, resources and desired state for an instance of a pipeline. Deployments are scoped and named, similar to a pipeline, which enables you to segregate each pipeline into it's own namespace. Pipeline deployments create a separation between the building of a pipeline from the operation of the pipeline.

Although the naming may be the same, a pipeline deployment in AlgoRun is different than a native Kubernetes deployment. An AlgoRun pipeline deployment is a superset of a Kubernetes deployment in that it is composed of many Kubernetes deployments and other Kubernetes resource types.

Configuration

Create a Deployment

A new deployment can be created from the Deployments tab in the AlgoRun UI. A newly created deployment does not have any pipeline assigned to it. Once the empty deployment is created, a pipeline can be assigned.

Assign a Pipeline

A pipeline can be assigned to the deployment using the UI Pipeline tab of the Deployment:

A single pipeline version can be assigned to a deployment at any given time. In order to assign a pipeline, the status must be terminated.

Allocate Resources

Once a pipeline is assigned, options become available that enable you to allocate resources and setup auto-scaling for each component in the pipeline. The resource settings use the same options as defined in the Kubernetes documentation. You can also control the number of instances (replicas) for each component in the pipeline, which enables horizontal scaling within the Kubernetes cluster.

Configure Hooks

Here you can register a callback for WebHook urls to the named event hooks that have were added to the pipeline during the pipeline design. When you assign a WebHook to a named event, any resulting message from anything that is piped to it will be relayed to the WebHook url. To configure the WebHook you must set the Url, Http Verb and and additional Http Headers you want to include in the callback.

Pipeline Deployment Lifecycle

A pipeline deployment has following States:

  • Terminated - The initial state of a pipeline deployment is terminated. When in the terminated state, all the associated pipeline resources in the Kubernetes cluster have been removed. The pipeline deployment configuration is stored within AlgoRun awaiting (re)deployment.
  • Progressing - When a deployment is started, the pipeline resources within Kubernetes will take some time to reach the desired state. During this period, the pipeline deployment will be in the progressing state.
  • Deployed - Once all resources have been deployed within Kubernetes and all readiness and liveness probes have succeeded, the pipeline deployment will be in the Deployed state.
  • Error - If the pipeline deployment is unable to start due to errors within one or more resources, the pipeline deployment will be in the error state.

Pipeline Operator

When AlgoRun is installed in the Kubernetes cluster, a component called the Pipeline Operator is created. The Pipeline Operator is responsible for managing all Kubernetes custom resources and configuration required to operate the pipeline deployment. The Pipeline Operator is continually watching the Kubernetes cluster for changes to any pipeline deployment. When the PipelineDeployment custom resource is applied to Kubernetes, the operator picks up the configuration and begins the reconciliation process. This reconciliation process involves updating the desired state and monitoring each component in the pipeline then relaying the status back to the AlgoRun API and all other event consumers.

The complete pipeline operator reconciliation process is comprised of sub-reconciliation processes for each of the components in the pipeline.

Algos

To complete the deployment of each Algo, the reconciliation ensures that:

  • A Kubernetes Deployment is created or updated for each Algo in the pipeline. The deployment consists of a Pod with two containers, the Algo container and the AlgoRunner sidecar container.
  • A Kubernetes Service is created to expose the http endpoint to the AlgoRunner sidecar for metrics and health monitoring.
  • Depending on the Algo Executor, the AlgoRunner is configured appropriately:
    • If Executable, the AlgoRunner binary will be copied into the Algo container using an Init container. This is to allow the AlgoRunner to run the Algo executable within the context of the Algo container.
    • If HTTP or gRPC, the AlgoRunner will be started from the sidecar and the Algo container will start it’s HTTP or gRPC server.
  • Two local EmptyDir volumes are mounted
    • /input folder is mounted for all input data
    • /ouput folder is mounted for all output data
  • The deployment is complete when the readiness and liveness probes for every Algo is returning successfully.
Kubernetes Labels

The Algo Kubernetes resources are labelled with the following info, which can be used to query the Kubernetes cluster directly:

Label Value
system algorun
component algo
pipelinedeploymentowner << The username of the pipeline deployment owner >>
pipelinedeployment << The pipeline deployment name >>
pipelineowner << The username of the pipeline owner >>
pipeline << The pipeline name >>
algoowner << The username of the algo owner >>
algo << The algo name >>
algoversion << The version of the algo >>
index << The index of the algo in the pipeline >>

Endpoint

The endpoint reconciliation ensures that:

  • A Kubernetes Deployment is created or updated for the Endpoint container.
  • A Kubernetes Service is created to expose the gRPC and HTTP ports internally with a ClusterIP.
  • Ambassador mappings are created to open the ingress gateway to the Endpoint.
Kubernetes Labels

The Endpoint Kubernetes resources are labelled with the following info:

Label Value
system algorun
component endpoint
pipelinedeploymentowner << The username of the pipeline deployment owner >>
pipelinedeployment << The pipeline deployment name >>
pipelineowner << The username of the pipeline owner >>
pipeline << The pipeline name >>

Data Connectors

To complete the deployment of each Algo, the reconciliation ensures that:

  • A Kubernetes Deployment is created for each data connector in the pipeline. Currently, this is created through an integration with the Strimzi Kafka operator.
  • Once the data connector deployment is running, the connector instance is created using the Kafka Connect REST API.
Kubernetes Labels

The Data Connector Kubernetes resources are labelled with the following info:

Label Value
system algorun
component dataconnector
pipelinedeploymentowner << The username of the pipeline deployment owner >>
pipelinedeployment << The pipeline deployment name >>
pipelineowner << The username of the pipeline owner >>
pipeline << The pipeline name >>
dataconnector << The data connector name >>
dataconnectorversion << The version of the data connector >>
index << The index of the data connector in the pipeline >>

Hook Reconcile

The endpoint reconciliation ensures that a Kubernetes Deployment is created or updated for the hook container.

Kubernetes Labels

The Hook Kubernetes resources are labelled with the following info:

Label Value
system algorun
component hook
pipelinedeploymentowner << The username of the pipeline deployment owner >>
pipelinedeployment << The pipeline deployment name >>
pipelineowner << The username of the pipeline owner >>
pipeline << The pipeline name >>

Topic Reconcile

Each Endpoint, Algo, Data Connector can have a set of outputs, all of which require a Kafka topic that the data will be written to. The optimize the Kafka topic configuration, each output topic is pre-created in the Kafka cluster when the pipeline is deployed. The topic reconciliation process will automatically calculate the number of partitions by by counting the destination instance counts for downstream pipes. In order to prevent under partitioned topics and reduce the need to add partitions later, the topic is over-partitioned by a configurable growth factor. By default the topic is over-partitioned by 50%.

Kubernetes Labels

The Topic Kubernetes resources are labelled with the following info:

Label Value
system algorun
component topic
pipelinedeploymentowner << The username of the pipeline deployment owner >>
pipelinedeployment << The pipeline deployment name >>
pipelineowner << The username of the pipeline owner >>
pipeline << The pipeline name >>

Bucket Reconcile

A pipeline deployment uses a S3 compatible storage bucket for any non-embedded data output from an Algo. This bucket can also be used to read and write any files to be utilized by the deployment. The bucket reconciliation process ensures the bucket is created and ready for use.

The bucket name will follow the following naming convention:
{Pipeline Deployment Owner}.{Pipeline Deployment Name}

Starting a Deployment

When a Deploy command is received, the AlgoRun API will validate that the deployment is in the Terminated state. If the state is Deployed, Progressing or Error, the deployment can only be updated or terminated. The API then generates a Kubernetes Custom Resource called PipelineDeployment, which is applied to the Kubernetes Cluster.

Terminating the Deployment

Terminating the deployment will remove all of the pipeline related resources from Kubernetes. The deployment configuration is still persisted in the database so you can re-deploy the pipeline again when needed.

Updating the Pipeline

Updating a pipeline deployment requires that the deployment is terminated before applying the new pipeline version. Pipelines can be drastically changed in between versions, which can cause breaking changes in the operation of the pipeline. For this reason, pipeline version updates should be carefully planned to ensure the changes will not break expected functionality.

Work is in progress to enable a pipeline deployment that is already running can be updated in-place. This update process will utilize Kubernetes rollouts to achieve eventual consistency of any changes to the components of the pipeline. In many cases, creating a new pipeline deployment and assigning the new pipeline version to it is a better approach, assuming the external urls used by the deployment can easily be updated.

To update the Pipeline Deployment in the UI:

  • Ensure the Deployment is is the Terminated state
  • Go to the Pipeline tab in the Deployment
  • Click 'Change Pipeline' and choose the updated pipeline version
  • Click 'Deploy' to deploy the updated version

Monitoring

AlgoRun simplifies the monitoring, status tracking, logging and alerting for pipelines by centralizing the management of core observability software stack. Utilizing prometheus, grafana and a plugable logging framework, you are able to track metrics for everything required to run the pipeline. Status tracking and monitoring capabilities for an active pipeline is managed by the Pipeline Operator. Continually monitoring the Pods allocated to each component, the Pipeline Operator is able to gather status and expose additional performance metrics beyond what Kubernetes provides. These resources provide the foundation for monitoring your entire AI pipeline within a single interface.

Status

The Endpoint Operator observes any changes to the Pod and Deployment statuses to create the overall pipeline deployment status. This status is stored in the Kubernetes custom resource and also displayed in the AlgoRun UI.

Logs

The logs for a pipeline deployment can be accessed centrally through AlgoRun. The logs are separated into types that help organize and filter log content for debugging, troubleshooting and detailed observation of the pipeline operations. The logging types are:

  • Algo - Algo logs are the stdout/stderr output produced by any executable Algo. In the case of Algos that are server based, this log output will be empty, as all the logging facilities will come from the server itself.
  • Server - Server log messages are the output from the server that is started by the runner (ie HTTP or gRPC Server)
  • Runner - Runner logs are the messages produced directly by the AlgoRunner sidecar.
  • Data - Data logs are produced by the S3 data operations and anything data transfer related.
  • Hook - Hook logs are produced by the pipeline hook container and contain messages related to the event webhook deliveries.
  • Operator - Operator logs are produced by the pipeline operator and contain messages related to the deployment of the pipeline in Kubernetes

Metrics

Using prometheus endpoints, every component in the system exposes a set of metrics that are automatically configured when the pipeline is deployed. AlgoRun also installs Grafana, which provides the visualizations and graphs for monitoring each pipeline deployment and the entire AlgoRun system.

The Grafana UI can be accessed from:
http(s)://{AlgoRun IP}/grafana

For a complete list of available metrics, check out this article.

Was this article helpful?

Related Articles

Share Your Valuable Opinions