Skip to content

Cloud Blogs

Author – Madhusudhan Rao

Menu
  • AI for Healthcare
  • Archive Blogs
    • Cloud-Blogs by Madhusudhan Rao
      • ADF
        • ADF based Android Apps
        • ADF CRUD Cascaded LOVs
        • ADF CRUD Operation
        • ADF for Dummies
        • ADF for Financial Services Software Development
      • Analytics & Visualization
        • Business Intelligence
        • Oracle BI & DV Cloud Service
        • Setting up Oracle Analytics Cloud Instance and Data Visualization Techniques
      • App Servers & DevOps
        • How to deploy NodeJS Application on Oracle Application Container Cloud Service
        • Oracle Application Container Cloud Service
        • Oracle Java Cloud Service
      • OCI Admin Blogs
        • Create Oracle Cloud Infrastructure Instance
        • ElasticSearch & Kibana – Must for All Search Engine Development
        • How to Create Oracle Bare metal Compute Instance
        • Measuring Latency and TraceRoute Details with Oracle Edge Services
        • OCI Oracle cloud infrastructure – Setting up a NAT Instances for Public Internet Access from a Private Subnet
        • Oracle Cloud Infrastructure – OCI Creating buckets and object storage – setting up self expiry URLs
        • Oracle Cloud Infrastructure OCI – Create Instance ,Attach Block Volume , Open Ports, Setup Firewall rules
        • Python Basics
        • setting up Apache Tomcat on Oracle Cloud Infrastructure OCI
        • setting up CloudFlare DNS & HAProxy on Oracle Cloud Infrastructure for High Availability
        • Setting up Django Python Web Environment on Oracle Cloud
        • setting up File Storage on OCI and uploading files through Filezilla FTP
        • Setting up GPU-enabled Narupa server on Oracle Bare Metal Cloud for Virtual Reality Clients
        • setting up LAMP Stack , PHP 5.6, PhpMyAdmin , FTP and Mail Server on OCI
        • setting up Load Balancer on Oracle Cloud Infrastructure
        • setting up MongoDB Enterprise version Instance on Oracle Cloud Infrastructure Baremetal
        • setting up MongoDB on Oracle Cloud Infrastructure Classic and Opening Ports
        • Setting up SSH , FTP and Opening Ports on Oracle Cloud PaaS and IaaS
        • setting up web infrastructure DNS and HTTPS for your cloud Infrastructure as a Service OCI
        • World of Slack Programming
      • OCI Classic Blogs
        • How to Create WordPress blog on Oracle Cloud
        • running Function as a Service on Oracle Cloud Infrastructure – fnproject.io
        • running Hadoop & Big Data on Oracle Cloud Infrastructure
        • running TensorFlow Machine Learning for Image Recognition on Oracle Cloud Infrastructure
        • Setting up Kubernetes on Oracle Cloud Infrastructure – Classic
      • Oracle BPM for Financial Services
        • BPM for FS
        • 1. Setting up the development environment
        • 2. Modeling a home loan business process
        • 3. Implementing a home loan business process
        • 4. Deploying and testing a process workflow
        • 5. Administering processes
        • 6. Changing a business process by the process analyst
        • 7. Creating business reports for process owners
        • 8. Participating in a business process
        • 9. Integrating with business partners
        • 10. Collaborating with customers and end users
      • Oracle JET Series
    • Dockers on OCI Series
      • Comprehensive Blog on Dockers running on OCI
      • running a Docker Container on OCI
      • opening OCI Ports for Docker Containers
      • Building a Docker Image by using Dockerfile
      • Committing changes made in a Docker without using Dockerfile
      • pushing image to Docker Hub
      • DevOps with Github, Docker Hub and Oracle Container Cloud Services
      • running Apache Kafka for continuous data streaming on Oracle cloud infrastructure
    • Oracle Database Cloud Service
    • Oracle OKE Series
      • 01 Comprehensive Blog on Oracle Kubernetes Engine – getting started
      • 02 Configure Network Resources for Oracle Kubernetes Engine
      • 03 Creating 3 Worker and 2 Load Balancers Subnets for OKE
      • 04 Creating Oracle Kubernetes Cluster
      • 05 – Enable Cluster access through Command line interface
      • 06 – Getting onto Oracle Kubernetes Engine Dashboard
      • 07 Pod Configuration using a YAML Deployment
    • setting up and running Oracle Big Data PaaS
    • Setting up MySQL PaaS on Oracle Cloud
    • SOA BPM IDM Portal Series
      • FMW
        • Comprehensive Business Process Management – BPM 11g
        • FMW Install Startup Scripts
        • Oracle 11gR2 and 12c Database on Linux / OEL / Fedora
        • Oracle Identity Mangement
        • PS6, PS7, PS8 Install and Configuration
      • Webcenter Portal
        • Art of Webcenter Templating
        • Webcenter Sites Installation Linux
        • Webcenter Sites Look n feel
        • Webcenter Spaces
          • Custom Landing Page
        • Webcenter UCM Integration PS5
        • Webcenter, BPM Process Spaces – UCM PS5
        • BPM UCM and Webcenter PS5
      • BPM Series
        • 01-Simple BPM Task Initiator
        • 02-Business Rules and Exclusive Gateway
        • 03-Service Invocation Adapters
        • 04-BPM as a Web Service
        • 05-Mediator and Conditional Service Routing
        • 06-BPM Composer – Runtime Edit
        • 07-BPM and JMS Adapters
        • Oracle BPM 12c features
      • Process Cloud Service
        • Part 1 – Working with Process Cloud Service
        • Part 2 – How to Build Process , Data Persistence in Database Cloud Service
        • Part 3 – Invoking PCS Process from Oracle JET Framework
      • SOA Series
        • Oracle OSB 11g
        • Oracle SOA 11g Business Rules
        • Oracle SOA 11g DB Adapter
        • Oracle SOA 11g Mediator
        • SOA 11g AQ Adapters
    • Terraform on OCI Series
      • Create a Highly Scalable Cluster in the cloud using Terraform on OCI
      • Creating an Instance with New VNC Network
      • Managing the OCI Cluster with Slurm Workload Manager & Grafana
      • Terraform on OCI create instance
      • Terraform on OCI create load balancer
      • Terraform on OCI create non-federated user
      • Terraform on OCI create user, group, dynamic group and policies
      • Terraform on Oracle Cloud OCI
  • Database Series
    • Autonomous Data Warehouse and Analytics
      • 01 Preparing Client Machine
      • 02 Creating Autonomous Data Warehouse Instance
      • 03 Creating Tables into Autonomous Data Warehouse through SQL Developer
      • 04 Load Data into OCI Object storage and import that data into Autonomous Data warehouse
      • 05 Create Autonomous Analytics Cloud ADW Connection and create Data Visualization
    • Oracle Apex Cloud Service
Menu

Managing the OCI Cluster with Slurm Workload Manager & Grafana

Objective

There are lot of things that you can do to manage all the cluster nodes ,such as shutting down or starting a particular or all compute nodes using Slurm Workload Manager https://slurm.schedmd.com/

Assumption : You have managed to create cluster using our Previous Blog – Create a Highly Scalable Cluster in the cloud using Terraform on OCI 

SSH to Management Compute

SSH to Management Compute Node

D:\BM>ssh -i bm_ssh_key opc@public_ip
Last login: Sun Dec 16 05:45:13 2018 from public_ip
######################

Welcome to the cluster
In order to create users, run the script "./finish" and follow the instructions

######################
[opc@mgmt ~]$ ls
ansible-pull.log  bm_ssh_key.pub  finish  nodes.yaml       shapes.yaml             
test.slm users.yml.example bm_ssh_key config hosts oci_api_key.pem
slurm-ansible-playbook users.yml

Check if you can access one of the Compute Nodes from Management Node

[opc@mgmt ~]$ ssh -i bm_ssh_key opc@public_ip_compute_1
Last login: Sun Dec 16 04:16:13 2018 from public_ip_compute_1
[opc@compute001 ~]$ ls
ansible-pull.log  hosts
[opc@compute001 ~]$ exit
logout
Connection to public_ip_compute_1 closed.
Check if the Nodes are Running
[opc@mgmt ~]$ cluset --list-all
@compute
@state:drained
@role:mgmt
[opc@mgmt ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up   infinite      4  drain compute[001-004]
Slurm Elastic Computing (Cloud Bursting)

Slurm is configured to use its elastic computing mode. This allows Slurm to automatically turn off any nodes which are not currently being used for running jobs and turn on any nodes which are needed for running jobs. This is particularly useful in the cloud as a node which has been shut down will not be charged for.

Refer : https://slurm.schedmd.com/elastic_computing.html

Slurm Commands

[opc@mgmt ~]$ smap

Submitting a Job to Stop a Compute Node from Master
[opc@mgmt ~]$ sudo -u slurm /usr/local/bin/stopnode compute001
{
  "data": {
    "availability-domain": "zULs:US-ASHBURN-AD-1",
    "compartment-id": "ocid1.compartment.oc1..aaaaaaaay6kjvt2udXXXXjyy7rmx4eclxcbya",
    "defined-tags": {},
    "display-name": "compute001",
    "extended-metadata": {},
    "fault-domain": "FAULT-DOMAIN-1",
    "freeform-tags": {
      "cluster": "mycluster",
      "nodetype": "compute"
    },
    "id": "ocid1.instance.oc1.iadXXXXXbd7yez5qaltr2aeanya",
    "image-id": "ocid1.image.oc1.iad.aaaaaaaa2mnepqp7wn3XXXXXXli67z6mktdiq",
    "ipxe-script": null,
    "launch-mode": "NATIVE",
    "launch-options": {
      "boot-volume-type": "ISCSI",
      "firmware": "UEFI_64",
      "is-pv-encryption-in-transit-enabled": true,
      "network-type": "VFIO",
      "remote-data-volume-type": "PARAVIRTUALIZED"
    },
    "lifecycle-state": "STOPPING",
    "metadata": {
      "ssh_authorized_keys": "ssh-rsa AAAAB3NzaC1ycXXXXXXNy4P\n",
      "user_data": "IyEvYmluL2Jhc2gK"
    },
    "region": "iad",
    "shape": "VM.Standard1.2",
    "source-details": {
      "boot-volume-size-in-gbs": null,
      "image-id": "ocid1.image.oc1.iad.aaaaaaaa2mnepXXXXXXwf7uc246tcltg4li67z6mktdiq",
      "kms-key-id": null,
      "source-type": "image"
    },
    "time-created": "2018-12-14T17:16:14.298000+00:00",
    "time-maintenance-reboot-due": null
  },
  "etag": "5aa96088b4555d1820ea42bXXXX28266c9e3b711f0b487ef065b70"
}

[opc@mgmt ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up   infinite      1 drain* compute001
compute*     up   infinite      3  drain compute[002-004]
Submitting a Job to Start Compute Node from Master
[opc@mgmt ~]$ sudo -u slurm /usr/local/bin/startnode compute001
{
  "data": {
    "availability-domain": "zULs:US-ASHBURN-AD-1",
    "compartment-id": "ocid1.compartment.oc1..aaaaaaaay6kjvtXXXXXXXXmx4eclxcbya",
    "defined-tags": {},
    "display-name": "compute001",
    "extended-metadata": {},
    "fault-domain": "FAULT-DOMAIN-1",
    "freeform-tags": {
      "cluster": "mycluster",
      "nodetype": "compute"
    },
    "id": "ocid1.instance.oc1.iad.abuwcljtdymfpnoppXXXXX7yez5qaltr2aeanya",
    "image-id": "ocid1.image.oc1.iad.aaaaaXXXXX246tcltg4li67z6mktdiq",
    "ipxe-script": null,
    "launch-mode": "NATIVE",
    "launch-options": {
      "boot-volume-type": "ISCSI",
      "firmware": "UEFI_64",
      "is-pv-encryption-in-transit-enabled": true,
      "network-type": "VFIO",
      "remote-data-volume-type": "PARAVIRTUALIZED"
    },
    "lifecycle-state": "STARTING",
    "metadata": {
      "ssh_authorized_keys": "ssh-rsa AAAAB3NzaC1yc2EXXX6/Ny4P\n",
      "user_data": "IyEvYmluL2Jhc2gK"
    },
    "region": "iad",
    "shape": "VM.Standard1.2",
    "source-details": {
      "boot-volume-size-in-gbs": null,
      "image-id": "ocid1.image.oc1.iad.aaaaaaaa2mnXXXXg4li67z6mktdiq",
      "kms-key-id": null,
      "source-type": "image"
    },
    "time-created": "2018-12-14T17:16:14.298000+00:00",
    "time-maintenance-reboot-due": null
  },
  "etag": "613d29962d14f0e98eXXXX4239bbcf139475b1e7"
}

[opc@mgmt ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up   infinite      4  drain compute[001-004]
Using GRAFANA to Manage your cluster

Grafana is open platform for beautiful analytics and monitoring tool for your cluster

login as admin user to your public ip of Management Compute : 3000 

Effectively you can have MySQL Running or Stopped on all Compute Nodes as controlled by Grafana 

Blog Author : Madhusudhan Rao

Previous Blog Link : Create a Highly Scalable Cluster in the cloud using Terraform on OCI

Reference Links :

  • https://cluster-in-the-cloud.readthedocs.io/en/latest/running.html
  • https://slurm.schedmd.com/overview.html
  • https://grafana.com/

 

Log in
  • AI for Healthcare
  • Archive Blogs
    • Cloud-Blogs by Madhusudhan Rao
      • ADF
        • ADF based Android Apps
        • ADF CRUD Cascaded LOVs
        • ADF CRUD Operation
        • ADF for Dummies
        • ADF for Financial Services Software Development
      • Analytics & Visualization
        • Business Intelligence
        • Oracle BI & DV Cloud Service
        • Setting up Oracle Analytics Cloud Instance and Data Visualization Techniques
      • App Servers & DevOps
        • How to deploy NodeJS Application on Oracle Application Container Cloud Service
        • Oracle Application Container Cloud Service
        • Oracle Java Cloud Service
      • OCI Admin Blogs
        • Create Oracle Cloud Infrastructure Instance
        • ElasticSearch & Kibana - Must for All Search Engine Development
        • How to Create Oracle Bare metal Compute Instance
        • Measuring Latency and TraceRoute Details with Oracle Edge Services
        • OCI Oracle cloud infrastructure - Setting up a NAT Instances for Public Internet Access from a Private Subnet
        • Oracle Cloud Infrastructure - OCI Creating buckets and object storage - setting up self expiry URLs
        • Oracle Cloud Infrastructure OCI - Create Instance ,Attach Block Volume , Open Ports, Setup Firewall rules
        • Python Basics
        • setting up Apache Tomcat on Oracle Cloud Infrastructure OCI
        • setting up CloudFlare DNS & HAProxy on Oracle Cloud Infrastructure for High Availability
        • Setting up Django Python Web Environment on Oracle Cloud
        • setting up File Storage on OCI and uploading files through Filezilla FTP
        • Setting up GPU-enabled Narupa server on Oracle Bare Metal Cloud for Virtual Reality Clients
        • setting up LAMP Stack , PHP 5.6, PhpMyAdmin , FTP and Mail Server on OCI
        • setting up Load Balancer on Oracle Cloud Infrastructure
        • setting up MongoDB Enterprise version Instance on Oracle Cloud Infrastructure Baremetal
        • setting up MongoDB on Oracle Cloud Infrastructure Classic and Opening Ports
        • Setting up SSH , FTP and Opening Ports on Oracle Cloud PaaS and IaaS
        • setting up web infrastructure DNS and HTTPS for your cloud Infrastructure as a Service OCI
        • World of Slack Programming
      • OCI Classic Blogs
        • How to Create WordPress blog on Oracle Cloud
        • running Function as a Service on Oracle Cloud Infrastructure - fnproject.io
        • running Hadoop & Big Data on Oracle Cloud Infrastructure
        • running TensorFlow Machine Learning for Image Recognition on Oracle Cloud Infrastructure
        • Setting up Kubernetes on Oracle Cloud Infrastructure - Classic
      • Oracle BPM for Financial Services
        • BPM for FS
        • 1. Setting up the development environment
        • 2. Modeling a home loan business process
        • 3. Implementing a home loan business process
        • 4. Deploying and testing a process workflow
        • 5. Administering processes
        • 6. Changing a business process by the process analyst
        • 7. Creating business reports for process owners
        • 8. Participating in a business process
        • 9. Integrating with business partners
        • 10. Collaborating with customers and end users
      • Oracle JET Series
    • Dockers on OCI Series
      • Comprehensive Blog on Dockers running on OCI
      • running a Docker Container on OCI
      • opening OCI Ports for Docker Containers
      • Building a Docker Image by using Dockerfile
      • Committing changes made in a Docker without using Dockerfile
      • pushing image to Docker Hub
      • DevOps with Github, Docker Hub and Oracle Container Cloud Services
      • running Apache Kafka for continuous data streaming on Oracle cloud infrastructure
    • Oracle Database Cloud Service
    • Oracle OKE Series
      • 01 Comprehensive Blog on Oracle Kubernetes Engine - getting started
      • 02 Configure Network Resources for Oracle Kubernetes Engine
      • 03 Creating 3 Worker and 2 Load Balancers Subnets for OKE
      • 04 Creating Oracle Kubernetes Cluster
      • 05 - Enable Cluster access through Command line interface
      • 06 - Getting onto Oracle Kubernetes Engine Dashboard
      • 07 Pod Configuration using a YAML Deployment
    • setting up and running Oracle Big Data PaaS
    • Setting up MySQL PaaS on Oracle Cloud
    • SOA BPM IDM Portal Series
      • FMW
        • Comprehensive Business Process Management - BPM 11g
        • FMW Install Startup Scripts
        • Oracle 11gR2 and 12c Database on Linux / OEL / Fedora
        • Oracle Identity Mangement
        • PS6, PS7, PS8 Install and Configuration
      • Webcenter Portal
        • Art of Webcenter Templating
        • Webcenter Sites Installation Linux
        • Webcenter Sites Look n feel
        • Webcenter Spaces
          • Custom Landing Page
        • Webcenter UCM Integration PS5
        • Webcenter, BPM Process Spaces - UCM PS5
        • BPM UCM and Webcenter PS5
      • BPM Series
        • 01-Simple BPM Task Initiator
        • 02-Business Rules and Exclusive Gateway
        • 03-Service Invocation Adapters
        • 04-BPM as a Web Service
        • 05-Mediator and Conditional Service Routing
        • 06-BPM Composer - Runtime Edit
        • 07-BPM and JMS Adapters
        • Oracle BPM 12c features
      • Process Cloud Service
        • Part 1 - Working with Process Cloud Service
        • Part 2 – How to Build Process , Data Persistence in Database Cloud Service
        • Part 3 – Invoking PCS Process from Oracle JET Framework
      • SOA Series
        • Oracle OSB 11g
        • Oracle SOA 11g Business Rules
        • Oracle SOA 11g DB Adapter
        • Oracle SOA 11g Mediator
        • SOA 11g AQ Adapters
    • Terraform on OCI Series
      • Create a Highly Scalable Cluster in the cloud using Terraform on OCI
      • Creating an Instance with New VNC Network
      • Managing the OCI Cluster with Slurm Workload Manager & Grafana
      • Terraform on OCI create instance
      • Terraform on OCI create load balancer
      • Terraform on OCI create non-federated user
      • Terraform on OCI create user, group, dynamic group and policies
      • Terraform on Oracle Cloud OCI
  • Database Series
    • Autonomous Data Warehouse and Analytics
      • 01 Preparing Client Machine
      • 02 Creating Autonomous Data Warehouse Instance
      • 03 Creating Tables into Autonomous Data Warehouse through SQL Developer
      • 04 Load Data into OCI Object storage and import that data into Autonomous Data warehouse
      • 05 Create Autonomous Analytics Cloud ADW Connection and create Data Visualization
    • Oracle Apex Cloud Service

"Technology has the shelf life of a banana. By the time you buy it, implement it and train people on it, it’s obsolete. … ” as said by Mr Scott McNealy

© 2023 Cloud Blogs | Powered by Minimalist Blog WordPress Theme