Installing Acceldata Torch

Acceldata Torch uses Kubernetes for deployment and execution. This is guided by Replicated Kots which is responsible for deployment providing a single click experience to end customers. Torch can be deployed in both managed cloud Kubernetes environment and on-premise machines. In on-prem environment, Replicated Kots installs Kubernetes in the provided nodes and ensures that Torch gets deployed in that environment.

Once you sign up for Torch, you will be provided with a license file that needs to be used during the installation process.

To install Acceldata Torch, you should provide a cloud-managed Kubernetes environment, or on-premise nodes where Kubernetes will be installed.

Once the Kubernetes environment is ready, the process of configuring and installing Torch is same for both the environments.

Minimum Hardware Recommendation

The recommended HW configuration depends on the following factors:

  • Amount of data to be processed
  • Type of Spark deployment used
Data VolumeSpark deployment modeK8S Cluster Configuration
Low(< 10GB)External Spark (Hadoop cluster)1 master and 2 worker nodes (2 cores, 8GB + Memory each)
Low(< 10GB)Spark On Kubernetes1 master and 4 worker nodes (4 cores, 8GB + Memory each)
Medium(10GB to 100GB)External Spark (Hadoop cluster)1 master and 2 worker nodes (2 cores, 8GB + Memory each)
Medium(10GB to 100GB)Spark On Kubernetes1 master and 6 worker nodes (4 cores, 8GB + Memory each)
High(100GB+ )External Spark (Hadoop cluster)1 master and 2 worker nodes (2 cores, 8GB + Memory each)
High(100GB+ )Spark On Kubernetes1 master and 8 worker nodes (4 cores, 16GB + Memory each)

On-premise software installation

The first step in the process is to install Kubernetes cluster in the nodes.

  • SSH into the master node.
  • Execute the following command
curl -sSL | sudo bash
  • Follow the guided procedure step by step.
  • If you are prompted with the given statement, This application is incompatible with memory swapping enabled. Disable swap to continue? (Y/n), press Y.
  • Firewall is to be disabled.
  • Select the network interface, if prompted.
  • Finally, Kots installs the components required for Kubernetes master.

Once the Kots components are installed, copy the content at the end and store it for future reference. Example output:

Complete ✔
The UIs of Prometheus, Grafana and Alertmanager are exposed on NodePorts 30900, 30902 and 30903 respectively.
To access Grafana use the generated user:password of admin:xxxxxxxxx .
Login with password (will not be shown again): xxxxxxxxx
To access the cluster with kubectl, copy kubeconfig to your home directory:
cp /etc/kubernetes/admin.conf ~/.kube/config
chown -R root ~/.kube
echo unset KUBECONFIG >> ~/.profile
bash -l
You need to use sudo to copy and chown admin.conf.
Node join commands expire after 24 hours.
To generate new node join commands, run curl -sSL | sudo bash -s join_token on this node.
To add worker nodes to this installation, run the following script on your other nodes:
curl -sSL | sudo bash -s kubeadm-token=v5atbd.gvkm08e0lx3t8iks kubeadm-token-ca-hash=sha256:a0fe34a1f1fc7ea9d4adb76e58c6264b555302f12b07d7fb9352d58abd2d1731 kubernetes-version=1.19.2

In the next step, log into the worker nodes and execute the join command as mentioned at the bottom of the master installation procedure.

curl -sSL | sudo bash -s kubeadm-token=v5atbd.gvkm08e0lx3t8iks kubeadm-token-ca-hash=sha256:a0fe34a1f1fc7ea9d4adb76e58c6264b555302f12b07d7fb9352d58abd2d1731 kubernetes-version=1.19.2

Follow the instructions and on completion of the installation, the worker nodes are joined with the cluster.

To check if the nodes are ready, execute the following code on the master node.

kubectl get nodes
xxxxxxxxxxxxxx Ready master 45m v1.19.2
xxxxxxxxxxxxxx Ready worker 45m v1.19.2

Managed Cloud Kubernetes Installation

In a managed Kubernetes cluster, the nodes are managed by the cloud provider, hence only kots needs to be installed.

Execute the following command in an environment where kubectl is configured and it points to the cluster.

curl | bash
kubectl kots install torch/db-kots

The above command installs Kots and the system is ready for Torch deployment.

Configure and Install Torch

  • Open any browser and go to the following URL: http://master-node:8800 to open up the replicated manager. Bypass TLS Click Continue to Setup button.

  • If you view a pop-up warning message about the connection not being private, proceed by adding an exception.

  • In the following window, click skip & continue to bypass setting SSL certificate for Admin console.

Skip SSL

  • The password window is displayed.


Enter your password for Kots admin provided the installation is completed. Look out for the following output

Login with password (will not be shown again): xxxxxxxxx
  • Upload the license provided by Acceldata in the next window that is displayed.


  • Next, you need to provide configurations.

  • Click continue.

  • In a few minutes, Torch installation will complete and the Kubernetes artifacts will be deployed.


Torch version

Displays the current Acceldata Torch version that is to be installed. This is a read-only configuration for reference.

Torch version

Hive Configuration

Click Enable hive support if Hive support is required. Upload the hive-site.xml file in the specified location. If enabled, provide the configurations for Other Hadoop settings.

Hive version

Other Hadoop Configuration

If you have enabled Hive support, core-site.xml and hdfs-site.xml must be provided. Also, if the job result is to be saved in HDFS, then this configuration is required.

Other Hadoop Config

Job result persistence configuration

Torch stores the result of the jobs in few distributed file systems. Currently, it can store in HDFS or AWS S3.

Select one of the two options given below:

  • Use HDFS file system
  • Use AWS S3 file system

HDFS configuration:

Job Result HDFS

Inputs required:

  • Directory: HDFS directory where job results will be stored (Default: /tmp/ad/torch_results)

Other Hadoop configurations are also required for this option. Refer Other Hadoop Config.

AWS S3 configuration:

Job Result S3

Inputs required:

  1. AWS S3 Access key: Access Key for the bucket

  2. AWS S3 Secret key: Secret Key for the bucket

  3. AWS S3 Bucket name: Bucket name for where the job results are to be stored.


    It should only contain alphanumeric letters.

Spark Support

Torch uses Apache Spark for running jobs. Currenly, Torch supports three modes of deployment.

Use Embedded Spark

Use Embedded Spark

In this mode, Torch runs jobs locally inside a service. No separate installation of configuration is required.


This should only be used for testing.

Use Existing Spark cluster

Use Existing Spark

If there is an existing hadoop cluster with Apache Spark installed, then Torch can run the jobs inside the cluster. It is required to have Apache Livy installed as well. Torch connects to Livy using HTTP and submits the Spark jobs.

Inputs required:

  1. Apache Livy URL: HTTP endpoint for Livy
  2. Apache Livy Queue: The queue name to which the jobs are submitted
  3. Number of executors: Number of executors that are spawned for each job
  4. Number of CPU cores: Number of CPU cores per executor
  5. Memory per executor: Amount of memory to be allocated to each executor

Deploy Spark on Kubernetes

Use Existing Spark

In this mode, the installer deploys Spark on Kubernetes and that is used for running the Jobs.

Inputs required:

  1. Number of executors: Number of executors to be spawned for each job
  2. Number of CPU cores: Number of CPU cores per executor
  3. Memory per executor: Amount of memory to be allocated to each executor

This is the most preferred option.

Notification Configuration

Click enable notification, if notification support is required. On enabling, Torch would send emails or Slack messages for multiple events occurring in the system.

Use Existing Spark

Input required:

  1. Default email ID: The default mail ID from which mails are sent
  2. Default Slack webhook url: The default channel to send Slack messages

After the configuring the system, click the Deploy button on the next screen. This would take few minutes to complete.

Deployment Screen

Verify installation

After few minutes, the following services should be visible.

☁ ~ kubectl get services
ad-analysis-service ClusterIP <none> 19021/TCP 11d
ad-catalog ClusterIP <none> 8888/TCP 11d
ad-catalog-auth-db ClusterIP <none> 27017/TCP 11d
ad-catalog-db ClusterIP <none> 5432/TCP 11d
ad-catalog-ui ClusterIP <none> 4000/TCP 11d
ad-torch-auth ClusterIP <none> 9090/TCP 11d
ad-torch-ml ClusterIP <none> 19035/TCP 11d
kotsadm ClusterIP <none> 3000/TCP 11d
kotsadm-postgres ClusterIP <none> 5432/TCP 11d
kubernetes ClusterIP <none> 443/TCP 11d
kurl-proxy-kotsadm NodePort <none> 8800:8800/TCP 11d
torch-api-gateway NodePort <none> 80:80/TCP,443:443/TCP 10d

Accessing the Torch Application

For the nginx-ingress-nginx-controller service shown above, the system assigns a node port 80.

The Torch UI can be accessed from port 80 of the Kubernetes master node.

For example: Open to start the Torch UI, where is the K8S master node IP or hostname.