From Zero to Diffgram on Amazon Web Services (AWS)

 Diffgram Installation Guide on Elastic Kubernetes Service (EKS)

Diffgram is deployed on any K8S (Kubernetes) setup - Bare metal, Azure, GCP, etc. 
This guide is for AWS. Please contact us for more platform specific guides.


If you're choosing an AI Data Labeling Platform consider that Diffgram offers the most robust on premise and K8S services and support. Here's we will show a small part of that with step by step instructions on going from 0 to having a fully deployed EKS cluster on AWS with Diffgram installed on it. 


Pre-Requisites
 
  1. To install Diffgram you will need access to the on premise helm chart we offer to our corporate clients. Contact us if you’re interested in getting it!

  2. You will need an AWS account with access to manage the EKS service, RDS service, S3 service and EC2 service (for subnet creation).

  3. An AWS S3 bucket or GCP Storage Bucket with its credentials. You can reference this link if you need help setting it up: https://medium.com/@shamnad.p.s/how-to-create-an-s3-bucket-and-aws-access-key-id-and-secret-access-key-for-accessing-it-5653b6e54337 

  4. A TLS certificate if you plan to use HTTPS on your Diffgram Instance.

  5. You need the aws cli installed and configured with you AWS credentials. You can check AWS official guide on how to set this up: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html (About 5 minutes to install)

  6. You need the eksctl command line utility: https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html (About 5 minutes to install)

  7. Helm installed https://helm.sh/docs/intro/install/ 

 

You can skip this section if you already have a Relational Database Service (RDS) instance on your AWS account. We strongly recommend that you use RDS as your DB service if you're on AWS as this gives you a highly scalable, highly available database service that is easy to administer and is low on costs.

1.1 Creating the RDS vpc

The first step to create the RDS service is to create a VPC. You can create one with the following command.

aws ec2 create-vpc --cidr-block 10.0.0.0/24

You can save the output in a JSON file as we might need it later for referencing the subnet.

# For referencing later

export RDS_VPC_ID=<VPC ID FROM ABOVE COMMAN>D>

1.2 Creating the RDS Subnets

Each RDS instance created must have a Database (DB) subnet group. DB subnet groups are a collection of subnets that live inside a Virtual Private Cloud (VPC). Each DB subnet group should have a subnet in at least 2 Availability Zones on an AWS region. We’ll divide the RDS VPC into 2 equal subnets: 10.0.0.0/25 and 10.0.0.128/125.

Create The First Subnet

aws ec2 create-subnet --availability-zone "us-west-1a" --vpc-id ${RDS_VPC_ID} --cidr-block 10.0.0.0/25


# For referencing later

export RDS_SUBNET_1_ID=<THE ID FROM THE ABOVE COMMAND> 

Note: Changing the availability Zone. 

You must change the availability Zone to be different for each command. For example us-west-1c  and us-west-1a. Otherwise it will throw `Please add subnets to cover at least 2 availability zones.` later in the process. If you already created the net and must go back use the command `aws ec2 delete-subnet --subnet-id {id_of_one_to_be_deleted}`

https://docs.aws.amazon.com/cli/latest/reference/ec2/delete-subnet.html 

Then resume steps including the association step.

Create Second Subnet

aws ec2 create-subnet --availability-zone "us-west-1c" --vpc-id ${RDS_VPC_ID} --cidr-block 10.0.0.128/25

# For referencing later

export RDS_SUBNET_2_ID=<THE ID FROM THE ABOVE COMMAND>  

Replace “RDS_VPC_ID” with your VPC ids from step 1.1. You can also change the availability zone to one that fits your infrastructure needs or regions. Save the subnet ID of each command as we will use them in the next commands.

Now we need to associate the created Subnets with the VPC’s route table

# Get the route table information.

aws ec2 describe-route-tables --filters Name=vpc-id,Values=${RDS_VPC_ID}

# For referencing later.

export RDS_ROUTE_TABLE_ID=<RESULT FROM THE ABOVE COMMAND>

Associate The Route Table to the Created Subnets

We must associate because in order for Route Table rules to work, they need to be attached to a subnet. If we don’t do this, route tables will have rules but will not affect anything inside the VPC.

# Associate the first subnet to the VPC

aws ec2 associate-route-table --route-table-id ${RDS_ROUTE_TABLE_ID} --subnet-id ${RDS_SUBNET_1_ID}

# Associate the second subnet to the VPC

aws ec2 associate-route-table --route-table-id ${RDS_ROUTE_TABLE_ID} --subnet-id ${RDS_SUBNET_2_ID}


Note:
The route table ID begins with rtb and the subnet with subnet. These values are different from the other ids saved prior.

Congrats!! You are doing great.

1.3 Create the DB Subnet Group

Now that we have the 2 subnets we can group them together by creating the DB subnet group.

aws rds create-db-subnet-group --db-subnet-group-name  "DiffgramDBSubnetGroup" --db-subnet-group-description "Diffgram DB Subnet Group" --subnet-ids ${RDS_SUBNET_1_ID} ${RDS_SUBNET_2_ID} 

Potential Errors:

An error occurred (DBSubnetGroupDoesNotCoverEnoughAZs) when calling the CreateDBSubnetGroup operation: DB Subnet Group doesn't meet availability zone coverage requirement. Please add subnets to cover at least 2 availability zones. Current coverage: 1

Solution: Go back to the Subnet creation step [Section 1.2] for help on how to solve this.

1.4 Create the VPC Security Group

Now we’ll need to create a VPC security group. This provides a virtual firewall where we can configure rules to control inbound and outbound traffic.

aws ec2 create-security-group --group-name DiffgramRDSSecurityGroup --description "Diffgram RDS security group" --vpc-id ${RDS_VPC_ID}

This will be useful later, as we’ll need to set an inbound rule to allow all the traffic from the EKS cluster to the RDS instance.

1.5 Create the RDS Instance

Now we can finally create the RDS instance. We’ve added some suggested defaults on the instance class and the storage amount, but feel free to increase it if you feel that your user organization will need more. Please also replace the user and password with whatever fits your security standards. You can always update the instance class and storage if you start running out of resources.

aws rds create-db-instance \
  --db-name diffgramdb \
  --db-instance-identifier diffgramdbinstance \
  --allocated-storage 200 \
  --db-instance-class db.t2.micro \
  --engine postgres \
  --engine-version "12.5" \
  --master-username diffgramuser \
  --master-user-password diffgrampassword \
  --no-publicly-accessible \
  --vpc-security-group-ids ${RDS_VPC_SECURITY_GROUP_ID} \
  --db-subnet-group-name "DiffgramDBSubnetGroup" \
  --availability-zone us-west-1a \
  --port 5432


Same command with no line breaks (for windows users)

aws rds create-db-instance --db-name diffgramdb --db-instance-identifier diffgramdbinstance --allocated-storage 200 --db-instance-class db.t2.micro --engine postgres --engine-version "12.5" --master-username diffgramuser --master-user-password diffgrampassword --no-publicly-accessible --vpc-security-group-ids ${RDS_VPC_SECURITY_GROUP_ID} --db-subnet-group-name "DiffgramDBSubnetGroup" --availability-zone us-west-1a --port 5432

Please take note of the RDS endpoint as we’ll need that when we set up all the kubernetes resources


PDF Version Available

This guide is also available as PDF with more sample screenshots and examples.

Section 4: Install Diffgram Helm Chart


2.1 Installing EKS

To create a kubernetes cluster you only need to run one single command.Please feel free to change the name of the cluster, number of nodes and region if you want

eksctl create cluster --name=diffgram-eks-cluster --nodes=2 --region=us-west-1 --node-volume-size=100 

The above command creates a 2 node cluster with 100GB’s disk size on each node. The default memory and CPU of each node is 8GB and 2 CPUs. If you want to specify different memory or cpus size  you can do it by providing a clusterconfig.yaml file on the --kubeconfig flag. Here’s an example (do not copy)



CREATE_FAILED – "The maximum number of VPCs has been reached.

Potential solution:  Change the region to a different region.

Cluster creation typically takes between 10 and 15 minutes.


2.2 Installing Dependencies for Ingress

Diffgram’s helm chart uses an Nginx ingress controller as its main controller. If you would like to have a separate ingress controller or want to create your own ingress for the system please contact us.

1. Install Nginx controller resources

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.43.0/deploy/static/provider/aws/deploy.yaml

This is for getting the nginx ingress to work on our kubernetes cluster. We provide an nginx ingress on our helm chart, but feel free to rewrite another one if you need a different ingress for your cluster.

2.3 [Optional] Install Cert manager. This is useful for managing TLS certificates and being able to access Diffgram via HTTPS.

If you want to have TLS connections, please make sure you have a domain available and access to the name servers so you can modify the records to point to the IP addresses of the ingress.

  1. First We’ll need to get the IP addresses to point our domain to. Go to the AWS console and to the EC2 service, select load balancer on the left panel. The select the load balancer of the EKS cluster and copy the name

  1. Now go to the Network Interfaces section on the left panel and paste the name on the search bar. You should see 2 interfaces that are attached to the load balancer. If you click each of them you can see on the “Public Ip Address” section, the actual IP address for each interface. Copy both IP addresses and make sure that you add an A Record on your DNS Provider to point to this IPs.


Once you do that you should be able to see the same as when you access the ingress public endpoint of the kubernetes cluster. Notice that the connection is not secured as we still don’t have the TLS certificate.

3. Now, we need to install cert-manager with helm, this will allow us to generate certificates for our domain.

helm repo add jetstack https://charts.jetstack.io

helm install cert-manager --namespace default jetstack/cert-manager --set installCRDs=true

Success Output


  1. Now edit the values.yaml of Diffgram’s helm chart and change the following keys:

    1. diffgramDomain: set it to the domain you own.

    2. useCertManager: set this to true. This will allow the certificate issue to be created so you can automatically get a TLS certificate for your domain with let’s encrypt.

  2. Reinstall the helm chart

helm upgrade diffgram -f diffgram/values_testing.yaml

  1. After a few minutes you should be able to see the issuer and the certificate generated. You can confirm this by running:


  2. kubectl describe issuer letsencrypt-prod


To check the issued certificate run kubectl describe secret diffgram-cert-tls

Now you should have your TLS certificates ready to go!

2.4 [Optional] Set Autoscaler and Kubernetes Dashboard.

  1. To setup auto scaling follow this guide: https://aws.amazon.com/premiumsupport/knowledge-center/eks-cluster-autoscaler-setup

  2. For the UI dashboard follow this guide: https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html 

Section 2: Creating and Setting Up the EKS Kubernetes Cluster

The final step we need to take is to create a VPC Peering Connection. This will enable the RDS instance in one VPC to communicate with the EKS instance’s VPC.

3.1 Create and Accept VPC Peering Connections

  1. Go to the VPC console: https://console.aws.amazon.com/vpc/

  2. Select Peering Connections and then click “Create Peering Connection” on the left panel.

  3. Configure the details as the screenshot below and hit create.

The key thing here is that:

  1. The requester is the Kubernetes cluster (EKS)

  2. The Acceptor is the Database (RDS)








  1. After creating the Peering Connection, go to the Peering Connection List, select the created connection and in the actions menu. Accept the peering connection request.

  2. Now just save the ID on your terminal for future reference.

export VPC_PEERING_CONNECTION_ID=<THE ID OF YOUR PEERING CONNECTION>








3.2 Update The EKS Cluster’s VPC’s Route Table

  1. Search in the VPC section of the AWS console the Route tables subsection on the left panel.

  2. Search for the EKS VPC’s Route table by searching for the VPC ID that corresponds to your EKS cluster.








Get the ID of the route table that hast the “Main” attribute with the value “Yes”

# For easy reference

export EKS_ROUTE_TABLE_ID=<THE RouteTableId FROM THE ABOVE STEP>

Vpc-peering-connection-id is provided on the success message of the prior step.

# Add route:

aws ec2 create-route --route-table-id ${EKS_ROUTE_TABLE_ID} --destination-cidr-block 10.0.0.0/24 --vpc-peering-connection-id ${VPC_PEERING_CONNECTION_ID}

An error occurred (InvalidRouteTableID.NotFound) when calling the CreateRoute operation: The routeTable ID 'rtb-09e0ab9912789a921' does not exist

3.3 Update the RDS VPC’s Route Table

You can find the RDS_ROUTE_TABLE_ID inside the VPC service of the AWS console.Go to the route table section in the left panel and search for the row which has the same VPC ID as the RDS VPC. Essentially the same process as step 4.2 but with the RDS Main Route Table





# For referencing

export RDS_ROUTE_TABLE_ID=<THE ROUTE_TABLE ID FROM THE CONSOLE>

# Create the route in the RDS route table.

aws ec2 create-route --route-table-id ${RDS_ROUTE_TABLE_ID} --destination-cidr-block 192.168.0.0/16 --vpc-peering-connection-id ${VPC_PEERING_CONNECTION_ID}

3.4 Update RDS Instance’s security group

The following command will allow traffic from EKS cluster to the RDS instance on port 5432. You can find the VPC Security Group ID, on the RDS service of the AWS console, inside the Connectivity & Security Tab.







# For referencing

export RDS_VPC_SECURITY_GROUP_ID=<THE VPC SECURITY GROUP ID FROM THE ABOVE STEP>

aws ec2 authorize-security-group-ingress --group-id ${RDS_VPC_SECURITY_GROUP_ID} --protocol tcp --port 5432 --cidr 192.168.0.0/16

3.5 Test The connection

# Run a test container

kubectl run -i --tty --rm postgresdebug --image=alpine:3.5 -- 

 restart=Never -- sh

# Install psql

apk update

apk add postgresql

# Connect to DB

psql -h <HOST> -U <USER> -d <DBNAME>

Replace HOST with your RDS endpoint, user with the RDS user you created and DBNAME with the database name you created be default. The command will ask you for the password. If you can connect successfully then peering connection is working correctly.

Section 3: Create a VPC Peering Connection To Access the RDS Instance

The Diffgram Team should provide you with the diffgram helm chart or give you access to the repositories where you can download the chart.

If you're intersted in getting it please contact us

 Once you have the chart there are some variables that you’ll need to set in the “values.yaml” to correctly install on AWS using EKS: Inside the values.yaml please set the values of the following keys:

4.1 Database Settings

  • dbSettings.dbProvider: Set this to “rds”

  • dbSettings.rdsEndpoint: Set this to your RDS instance endpoint, so diffgram can use it as the database.

  • dbSettings.dbProvider: Set this to “rds”

  • dbSettings.dbUser: Set this to the postgres user you want to use with Diffgram.

  • dbSettings.dbName: Set this to Postgres Database name you want to create the tables on

  • dbSettings.dbPassword: Set this to RDS instance’s password

4.2 Diffgram Configuration Settings

  • diffgramSecrets.DIFFGRAM_AWS_ACCESS_KEY_ID: Set this to your AWS credentials access key. Make sure the account has permissions to the S3 bucket you’ll use as static storage.

  • diffgramSecrets.DIFFGRAM_AWS_ACCESS_KEY_SECRET: Set this to your AWS credentials secret. Make sure the account has permissions to the S3 bucket you’ll use as static storage.

  • diffgramSettings.DIFFGRAM_S3_BUCKET_NAME: Set this to your S3’s bucket name for static file storage.

  • diffgramSettings.ML__DIFFGRAM_S3_BUCKET_NAME: Set this to your S3’s bucket name for static file storage.

You can also tweak the allocated resources per service. We recommend you start with the given defaults and scale replicas as necessary as you start getting usage data.

4.3 Domain and TLS Settings

  • diffgramDomain.: Set this to your owned domain, so the TLS manager can generate certificates for you.

  • useCertManager: Set this to true an make sure to follow Section 2.3 beforehand

You can also tweak the allocated resources per service. We recommend you start with the given defaults and scale replicas as necessary as you start getting usage data.

4.4 Installing the helm chart.

Once you have set the value of the values.yaml file, you can install the chart by running:

helm install diffgram ./diffgram

You may need to replace “./diffgram” with the path where you have the folder that contains the helm chart (The folder that contains the Chart.yaml file).


Now you should have a fully working instance of Diffgram on your very own EKS cluster!




Done!


If you have any questions or are interested in getting our on premise system, please contact us!

Contact Sales

Guide purpose:

The purpose of this document is to guide you through the installation of Diffgram in a Kuberenetes cluster using AWS EKS service, RDS and Simple Storage (S3). After you finish following the guide, you should be able to have a fully working instance of Diffgram on your private AWS cloud.


Why Private Deploy:

  1. 100% your data and system control. You control where and how Diffgram runs, the encryption keys and the data.

  2. You have complete control over system load and specifications. You can use your existing cloud setup.

  3. Run on any bare metal server or cloud of your choice. 


Step by Step Guide

From start to finish this guide can be completed in about one to two hours. If you already have an EKS cluster up and running then it can be completed much faster. The guide provides step by step suggestions, with every single command line and UI action needed. Plus context for why actions should be taken and visual context with screenshots.

Table of Contents

Section 1: Creating a Relational Database (RDS) Instance

Section 2: Creating and Setting Up the EKS Kubernetes Cluster

Section 3: Create a VPC Peering Connection To Access the RDS Instance

Section 4: Install Diffgram Helm Chart

Section 1: Creating The RDS Instance

We'll create a Postgres RDS Instance that can be ready to be connected to EKS via a VPC Peering Connection

Request PDF Version
PDF Version Available

This guide is also available as PDF with more sample screenshots and examples.

Request PDF Version
PDF Version Available

This guide is also available as PDF with more sample screenshots and examples.

Request PDF Version
PDF Version Available

This guide is also available as PDF with more sample screenshots and examples.

Request PDF Version