With the constantly evolving landscape of machine learning frameworks, micro-services, cloud infrastructure, data management and many AI related technologies, we make it a point to keep an open mind (and roadmap) about what is included in the AlgoRun platform. Our goal is to evolve right alongside the industry and continuously integrate with other projects and simplify the operation of the ultimate AI pipeline.
With that said, our current roadmap consists of some core features we are building to increase the flexibility of the platform.
Below is a high level overview of the features and enhancements being developed but is not in any particular order of priority.
We are adding a schema registry to the pipeline, which enables better data validation and compacted encoding of the data streams. The plan is implement the Hortonworks Schema Registry, which will support both Avro schemas and protocol buffers. Each Endpoint, Algo and Data Connector will be able to define the schema used for it's output so the structure and compatibility of each will be validated.
Finding the correct and most up-to-date data feature for your ML models or data processing pipelines can be daunting. Our goal is to build in a feature registry (or feature store) that catalogs all of the available data features created by you, someone on your team and the entire organization.
- Kafka TLS and authentication - Currently the Kafka setup can be secured but out-of-the-box the configuration does not have TLS or authentication enabled. We will be enhancing the Kafka deployment configuration with a way to quickly enable TLS and authentication.
- TLS throughout - Currently some internal services by default do not have TLS enabled. While these services are not exposed outside of the Kubernetes cluster, we will be adding configuration to ensure every service communication can have TLS encrypted enabled.
- Auth proxy for grafana dashboards - In order to provide a more seamless integration with Grafana, a custom auth proxy is being developed to create a robust SSO interface.
- Synchronize with custom remote repositories - Currently AlgoRun can only synchronize with the AlgoHub.com Algo registry. We are enhancing the import and push to allow adding any remote AlgoRun instance to share Algos and Pipelines between any user, team and the community.
- Save History - Currently the only way to snapshot a pipeline configuration is to create a new version. Creating a pipeline save history will allow tracking of all pipeline changes and add the ability to undo unwanted mistakes.
- Duplicate Pipeline - Currently you can duplicate a pipeline version but we will also be adding the ability to duplicate the entire pipeline configuration into a new name.
- Inline scripts - Currently you can package up and python, R or other script into a docker container to be utilized by the pipeline. We are adding the ability to quickly add a Python or R script directly to the pipeline, define the dependencies and the script will be dynamically mounted to a container at runtime. This makes it much easier to get started building dynamic script based pipelines where the script versions follow the pipeline version.
- Locate component - As the number of components in a pipeline grows, it becomes difficult to locate the component you would like to edit. We are adding a component search functionality to locate the component within the graph UI.
- Implement pre/post transformation processors - Currently data transformations are required to be done within a container, which is then piped to the destination Algo. Sometimes it can be useful to do simple data transformations inside of the actual Algo container, just before being delivered to the Algo input. We are adding data transforms (think JQ) that execute within the algo-runner sidecar.
- Data Connector Management UI - The current data connector configuration UI is functional but rudimentary and we have plans to revamp the interface. Better parameter management and configuration defaults, plus the ability the test connectivity and preview queries are some of the features planned.
- Custom Kafka source topics - Currently all routable Kafka topics are from Endpoint, Algo or Data connector outputs. We plan to allow for manually adding Kafka topics to the pipeline that can be routed to any input. You can then push data into the topic using any existing Kafka producer you may have, which will then be consumed by the pipeline.
Pipeline Deployment Features
- Synchronous execution - Currently, all calls to a pipeline endpoint are executed asynchronously. In order for the calling client to get a result, it must receive a webhook callback. In order to simplify this interaction, an option to run the pipeline synchronously is being added. When this option is set, the endpoint will block the response until all hook event results have been fulfilled (or the timeout reached). Once all the hook results are complete, the endpoint will send the results back for all hooks in a single response envelope.
- Run batch methods - Currently endpoints receive the data that is expected to be processed. We are adding functionality to upload batches of data prior to running the pipeline and allow calling the Endpoint with a reference to the batch data to be processed.
- Auto Scaling Implementation - Currently auto scale uses the Kubernetes horizontal pod autoscaler based on CPU / memory only. We will be enhancing the functionality to enable scaling based on any performance metric generated by the pipeline.
- Node selector - Currently the Kubernetes node that an Algo is selected to run on is based on resource requirements. For example if enough CPU / Memory / GPU exists on any node, the Algo pod will be placed there. We are adding the ability to add node labelling directly to the pipeline configuration to allow for placement of specific Algos to targeted nodes. This can be very helpful for fine tuning the performance of a pipeline.
- Dashboard Graphs - We have plans to add new graphs and monitoring metrics to the dashboard.
API Integration Generator
Our goal is to make it as easy as possible to integrate AlgoRun with existing APIs. One way we plan to make this happen is with tooling that can instantly generate an AlgoRun integration based on an OpenAPI or swagger based API spec. This will allow any pipeline to call REST APIs and execute actions on thousands of existing APIs on the internet and internally developed.