Create a Highly Scalable Cluster in the cloud using Terraform on OCI

Creating Cluster in Cloud
Step 1 : Setting Up a Terraform VM on OCI

Assumptions

  • All required SSH Private, Public , PEM Keys have been Generated and
  • User has full administrative access to cloud Environment,
  • OCI Compartment has already been created.

Login to Cloud My Services Dashboard and Select Compute

from the Compute Web Console of OCI, Select Instances

then click on Create Instance button

 

In less than few seconds Instance should be up and running 

SSH to VM and Install Terraform
ubuntu@TerraformVM:~$ sudo apt-get update
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [83.2 kB]
Hit:2 http://iad-ad-3.clouds.archive.ubuntu.com/ubuntu bionic InRelease
Get:3 http://iad-ad-3.clouds.archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
...
Fetched 26.4 MB in 6s (4631 kB/s)                           
Reading package lists... Done
ubuntu@TerraformVM:~$ sudo apt-get install unzip
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  grub-pc-bin
Use 'sudo apt autoremove' to remove it.
Suggested packages:
  zip
The following NEW packages will be installed:
  unzip
0 upgraded, 1 newly installed, 0 to remove and 10 not upgraded.
Need to get 167 kB of archives.
After this operation, 558 kB of additional disk space will be used.
Get:1 http://iad-ad-3.clouds.archive.ubuntu.com/ubuntu bionic/main amd64 unzip amd64 6.0-21ubuntu1 [167 kB]
Fetched 167 kB in 0s (371 kB/s)
Selecting previously unselected package unzip.
(Reading database ... 67173 files and directories currently installed.)
Preparing to unpack .../unzip_6.0-21ubuntu1_amd64.deb ...
Unpacking unzip (6.0-21ubuntu1) ...
Processing triggers for mime-support (3.60ubuntu1) ...
Setting up unzip (6.0-21ubuntu1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
ubuntu@TerraformVM:~$ wget https://releases.hashicorp.com/terraform/0.11.10/terraform_0.11.10_linux_amd64.zip
--2018-12-14 06:16:42--  https://releases.hashicorp.com/terraform/0.11.10/terraform_0.11.10_linux_amd64.zip
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.201.183, 2a04:4e42:2f::439
Connecting to releases.hashicorp.com (releases.hashicorp.com)|151.101.201.183|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20940986 (20M) [application/zip]
Saving to: ‘terraform_0.11.10_linux_amd64.zip’

terraform_0.11.10_linux_amd6 100%[===========================================>]  19.97M  --.-KB/s    in 0.1s    

2018-12-14 06:16:42 (178 MB/s) - ‘terraform_0.11.10_linux_amd64.zip’ saved [20940986/20940986]

ubuntu@TerraformVM:~$ unzip terraform_0.11.10_linux_amd64.zip
Archive:  terraform_0.11.10_linux_amd64.zip
  inflating: terraform               
ubuntu@TerraformVM:~$ sudo mv terraform /usr/local/bin/
ubuntu@TerraformVM:~$ terraform --version 
Terraform v0.11.10
Setting up OCI Terraform Cluster from GIT Repo

Git Source : https://github.com/ACRC/oci-cluster-terraform

Clone from Git Repo and Initialize the terraform

SSH to Machine where we installed Terraform and do the following

ubuntu@TerraformVM:~$ 
git clone https://github.com/ACRC/oci-cluster-terraform.git
Cloning into 'oci-cluster-terraform'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 132 (delta 3), reused 3 (delta 1), pack-reused 118
Receiving objects: 100% (132/132), 24.90 KiB | 12.45 MiB/s, done.
Resolving deltas: 100% (70/70), done.
ubuntu@TerraformVM:~$ ls
oci-cluster-terraform  terraform_0.11.10_linux_amd64.zip
ubuntu@TerraformVM:~$ cd oci-cluster-terraform/
ubuntu@TerraformVM:~/oci-cluster-terraform$ ls
LICENSE         export.tf        network.tf                userdata
README.rst      file_system.tf   output.tf                 variables.tf
compute.tf      files            provider.tf
datasources.tf  mount_target.tf  terraform.tfvars.example
ubuntu@TerraformVM:~/oci-cluster-terraform$ terraform init

Initializing provider plugins...
- Checking for available provider plugins on 
https://releases.hashicorp.com...
- Downloading plugin for provider "tls" (1.2.0)...
- Downloading plugin for provider "oci" (3.10.0)...
- Downloading plugin for provider "null" (1.0.0)...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.null: version = "~> 1.0"
* provider.tls: version = "~> 1.2"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
ubuntu@TerraformVM:~/oci-cluster-terraform$ terraform version
Terraform v0.11.10
+ provider.null v1.0.0
+ provider.oci v3.10.0
+ provider.tls v1.2.0

ubuntu@TerraformVM:~/oci-cluster-terraform$ ls
LICENSE     compute.tf      export.tf       files            network.tf  provider.tf               userdata
README.rst  datasources.tf  file_system.tf  mount_target.tf  output.tf   terraform.tfvars.example  variables.tf
ubuntu@TerraformVM:~/oci-cluster-terraform$ cp terraform.tfvars.example terraform.tfvars

Understanding Terraform OCI Variables

API Key based authentication

Calls to OCI using API Key authentication requires that you provide the following credentials:

tenancy_ocid – The global identifier for your account, always shown on the bottom of the web console.
user_ocid – The identifier of the user account you will be using for Terraform. For information on setting the correct policies for your user see Managing Users.
private_key_path – The path to the private key stored on your computer. The public key portion must be added to the user account above in the API Keys section of the web console. For details on how to create and configure keys see Required Keys and OCIDs.
fingerprint – The fingerprint of the public key added in the above user’s API Keys section of the web console.
region – The region to target with this provider configuration.

we need to get all these variables filled in from different sources, let us look into detail of each of them

  • TF_VAR_tenancy_ocid
  • TF_VAR_region
  • TF_VAR_user_ocid
  • TF_VAR_fingerprint
  • TF_VAR_compartment_ocid
  • TF_VAR_private_key_path ( PEM )
  • TF_VAR_ssh_public_key_ocid ( Required to SSH to VMs created with Terraform )
  • TF_VAR_ssh_private_key_ocid ( Required to SSH to VMs created with Terraform )

1 Tenancy OCID & Home Region

You can copy OCID and Home Region in textpad , we would need it later.

2 User OCID & FingerPrint

Add an User, Add this user to Administrator Group , now click on the username and get the users OCID

Add PEM Key by Copy Pasting the one we generated earlier , this should give us Fingerprint which should be copied into textpad again.

3 Compartment ID

4 Private Key Path

private_key_path = “/some directory/keyName.pem”

Check this link for Key Generations

5. Getting Image ID

Each Image Provided by Oracle has an Unique OCID for different Region , please refer this link

Example for Ubuntu 18 , the Image IDs look something like this

Editing terraform.tfvars file to add all the Variables 

By Now to create a cluster on OCI , we have gathered all the required IDS , Image Id, User Id, Region Id, Tenancy Id, Fingerprint etc .. this should be able to connect to the Environment and Run the Terraform Script

We will update the file with the variables that we need

My Updated terraform.tfvars file looks like this

Running the Terroform Plan
ubuntu@TerraformVM:~$ cd oci-cluster-terraform/
ubuntu@TerraformVM:~/oci-cluster-terraform$ terraform validate
ubuntu@TerraformVM:~/oci-cluster-terraform$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
...
data.tls_public_key.oci_public_key: Refreshing state...
oci_core_virtual_network.ClusterVCN: Refreshing state... (ID: 
.....
null_resource.copy_in_setup_data_mgmt: Refreshing state... (ID: 8183904091393847053)
null_resource.copy_in_setup_data_compute[3]: Refreshing state... (ID: 8235046557467448990)
null_resource.copy_in_setup_data_compute[1]: Refreshing state... (ID: 936366522781033501)
null_resource.copy_in_setup_data_compute[0]: Refreshing state... (ID: 4747213338281297594)
null_resource.copy_in_setup_data_compute[2]: Refreshing state... (ID: 6652985447051023762)
------------------------------------------------------------------------
No changes. Infrastructure is up-to-date.
This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.
ubuntu@TerraformVM:~/oci-cluster-terraform$ terraform apply
The Output of Terraform Apply to create Cluster

Output on the OCI Web Console

The Output shows we created 5 Oracle Linux Machines in 2.50 Seconds

Performance Benchmark using my Stop Clock

I Checked with my Stop-clock how much time it actually took to create these 5 Oracle Linux 7.6 VMs in Ashburn Region on OCI , It takes 2.70 Exact seconds from my Home Network connected using 150 MBPS Connection, I lost 7 seconds while typing Yes in the command prompt, during this test, Also since this happens over the cloud it should not really be dependent on your local network speed.

Destroying all that was created

terraform destroy

What Next >> Managing the OCI Cluster with Slurm Workload Manager & Grafana

Blog Author Madhusudhan Rao

Appendix : Important

While Terraform can be installed on your local laptop or desktop machine within your home network or within corporate intranet it will have challenges connecting to the cloud environment and can get timeout error, its very important that Terraform runs on the same cloud environment possible with same tennacy or different tennacy for it to work effectively

Expected Error while running Terraform on On-Premise Networks
terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.data.tls_public_key.
oci_public_key: Refreshing state...
data.oci_identity_availability_domains.ADs: Refreshing state...
Error: Error refreshing state: 1 error(s) occurred:* data.oci_
identity_availability_domains.ADs: 1 error(s) occurred:* data.
oci_identity_availability_domains.ADs: data.oci_identity_avail
ability_domains.ADs: Get https://identity.us-ashburn-1.oraclec
loud.com/20160918/availabilityDomains?compartmentId=ocid1.tena
ncy.oc1..aaaaaaaXXXXXX3oaafb6thjew2qXXXX: dial tcp: i/o timeout

Reference Links

  1. https://cluster-in-the-cloud.readthedocs.io/en/latest/infrastructure.html
  2. https://docs.cloud.oracle.com/iaas/images/
  3. https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/