JuliaRun is used for scalable deployment of Julia in production for large-scale parallel simulations in the public or private cloud, whether it is AWS, Microsoft Azure, or Google Cloud.
JuliaRun takes the guesswork out of building scalable solutions and can be used by data scientists and engineers with little to no knowledge of how such systems need to be architected. A Julia user can develop an application using JuliaPro and deploy it into a scalable robust deployment environment with a single click.
JuliaRun offers other mechanisms of distributed programming, apart from what is possible with core Julia. JuliaRun can revive failed worker nodes of a distributed application. Workers can be requested to join gradually and asynchronously as and when resources are available, with the available workers at any given time continuing to do the work. A distributed application communicating via a failsafe external messaging framework can also recover from failure of the master node. Different Julia applications can be connected together in flexible ways via a queue based messaging framework. That makes it possible to have long-running, autoscaled, processing pipelines deployed via JuliaRun.
JuliaRun provides a great interface to manage both code and packages. It also gives insight into all those packages (and code) with comprehensive application metrics and verbose log management tools. The user can, with the help of simple and uniform platform-independent APIs, use JuliaRun for authentication and multi-tenancy, choose appropriate storage for data, and let it autoscale(processes and VMs alike), on dynamic node types (GPUs, spot instances, hi-memory, hi-CPU).
JuliaRun is deeply integrated with Azure, AWS and Google Cloud via their published APIs. It has the ability to authenticate to services, create containers and VMs, configure disks and IP addresses, and apply security policies for this. This integration is also of benefit to the end user, not just to the devops engineer creating the cluster.
The robust JuliaRun architecture leverages on open source technologies like Kubernetes and Docker, making it very flexible and easily accessible, via simple Julia APIs.
JuliaRun provides APIs to filter, stream and search into distributed application logs at any level of granularity. JuliaRun collects and assimilates logs, and messages from stdout and stderr into a central location and makes real-time operations on it possible via the popular open source analytics tool, ElasticSearch.
JuliaRun can record metrics as name-value pairs and can assist in visualizing, aggregating and plotting them in multiple ways.
Alerts and events derived from metrics and logs notify abnormal or extra normal conditions and enable scaling or powering applications accordingly up or down.
OpenID based authentication enables JuliaRun to combine multiple authentication platforms, while Kubernetes enables seamless authorization and resource allocation (using RBACs, Namespaces, and Quotas).
i) Cluster Autoscaler: JuliaRun provides fully customizable cluster autoscaler, with features like selecting specific nodes while scaling down to avoid interruption of critical processes, scaling based on resource reservation instead of system load, and keeping configurable headroom to handle workload spikes predictably.
ii) Process Autoscaler: Autoscaling isn’t just limited to clusters. JuliaRun autoscales processes as well, and makes it fully customizable in Julia. Apart from the standard metrics, it is possible to use custom application published metrics and custom Julia functions to determine scaling factor.
JuliaRun includes an authentication proxy that can be dynamically and instantaneously set up to authenticate and route external requests to HTTP services on the cluster. APIs provided as part of JuliaRun register services with the router. Authenticated credentials are passed to the downstream service.
Apart from all the above features, JuliaRun equips the user to deploy a fully customizable cluster scheduler, when needed for extraordinary scheduling rules. It can be used to direct processes to specific nodes and co-locate them with required resources or other related processes.