Upgrade to 3.4.1 and 7.4.1
1. Upgrade From 7.3/7.4 to 7.4.1
RDAF Infra Upgrade: from 1.0.2 to 1.0.3, 1.0.3.1(haproxy)
RDAF Platform: From 3.3 to 3.4.1
AIOps (OIA) Application: From 7.3 to 7.4.1
RDAF Deployment rdafk8s CLI: From 1.1.10 to 1.2.1
RDAF Client rdac CLI: From 3.3 to 3.4.1
RDAF Infra Upgrade: From 1.0.3.1(haproxy)
RDAF Platform: From 3.4 to 3.4.1
OIA (AIOps) Application: From 7.4 to 7.4.1
RDAF Deployment rdaf CLI: From 1.2.0 to 1.2.1
RDAF Client rdac CLI: From 3.4 to 3.4.1
1.1. Prerequisites
Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.
-
RDAF Deployment CLI version: 1.1.10
-
Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)
-
Platform Services and RDA Worker tag: 3.3
-
OIA Application Services tag: 7.3,7.3.0.1(event_consumer),7.3.2(alert-ingester)
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
-
Delete
alert-modeldataset from datasets reports on UI before start upgrade -
Check all MariaDB nodes are sync on HA setup using below commands before start upgrade
Danger
Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.
Please proceed to the below steps only after scheduled downtime is approved.
Tip
Please run the below commands on the VM host where RDAF deployment CLI was installed and rdafk8s setup command was run. The mariadb configuration is read from /opt/rdaf/rdaf.cfg file.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"
Please verify that the mariadb cluster state is in Synced state.
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
Please run the below command and verify that the mariadb cluster size is 3.
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
-
RDAF Deployment CLI version: 1.2.0
-
Infra Services tag: 1.0.3
-
Platform Services and RDA Worker tag: 3.4
-
OIA Application Services tag: 7.4
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
Useful Information
Danger
In this release, all of the RDAF Infrastucture services are upgraded. So, it is mandatory to take VM level snapshot before proceeding with the upgrade process.
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Kubernetes: Please run the below backup command to take the backup of application data.
Note: Please make sure this backup-dir is mounted across all infra,cli vms.
Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)
Danger
In this release, all of the RDAF Infrastucture services are upgraded. So, it is mandatory to take VM level snapshot before proceeding with the upgrade process.
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Non-Kubernetes: Please run the below backup command to take the backup of application data.
Note: Please make sure this backup-dir is mounted across all infra,cli vms.- Verify that RDAF deployment
rdafcli version is 1.2.0 orrdafk8scli version is 1.1.10 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
- On-premise docker registry service version is 1.0.2
- RDAF Infrastructure services version is 1.0.2 (
rda-natsservice version is1.0.2.1andrda-minioservice version isRELEASE.2022-11-11T03-44-20Z)
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.3
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.3/7.3.0.1/7.3.2
Run the below command to get RDAF App services details
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.4
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.4
Run the below command to get RDAF App services details
RDAF Deployment CLI Upgrade:
Please follow the below given steps.
Note
Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.
Login into the VM where rdaf & rdafk8s deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
- Download the RDAF Deployment CLI's newer version 1.2.1 bundle.
- Upgrade the
rdaf & rdafk8sCLI to version 1.2.1
- Verify the installed
rdaf & rdafk8sCLI version is upgraded to 1.2.1
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which
rdaf & rdafk8sdeployment CLI was installed.
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.2.1
- Verify the installed
rdafCLI version
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.2.1
- Verify the installed
rdafCLI version
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle
- Upgrade the
rdafCLI to version 1.2.1
- Verify the installed
rdafCLI version is upgraded to 1.2.1
- Download the RDAF Deployment CLI's newer version 1.2.1 bundle and copy it to RDAF management VM on which
rdaf & rdafk8sdeployment CLI was installed.
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.2.1
- Verify the installed
rdafCLI version
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.2.1
- Verify the installed
rdafCLI version
1.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to upgrade the registry
To fetch registry please use the below commandRun the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 1.0.3.1 image tag is downloaded for the below RDAF Infra service.
- rda-platform-haproxy
Please make sure 1.0.3 image tag is downloaded for the below RDAF Infra service.
- rda-platform-haproxy
- rda-platform-kafka
- rda-platform-zookeeper
- rda-platform-mariadb
- rda-platform-opensearch
- rda-platform-nats
- rda-platform-busybox
- rda-platform-nats-box
- rda-platform-nats-boot-config
- rda-platform-nats-server-config-reloader
- rda-platform-prometheus-nats-exporter
- rda-platform-redis
- rda-platform-redis-sentinel
- rda-platform-arangodb-starter
- rda-platform-kube-arangodb
- rda-platform-arangodb
- rda-platform-kubectl
- rda-platform-logstash
- rda-platform-fluent-bit
Please make sure RELEASE.2023-09-30T07-02-29Z image tag is downloaded for the below RDAF Infra service.
- minio
Please make sure 3.4.1 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-scheduler
- rda-collector
- rda-identity
- rda-fsm
- rda-stack-mgr
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rda-chat-helper
- rdac
- rdac-full
- cfxcollector
Please make sure 3.4.1.2 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-scheduler
- onprem-portal
- onprem-portal-nginx
Please make sure 7.4.1 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
- rda-irm-service
- rda-alert-processor-companion
Please make sure 7.4.1.2 image tag is downloaded for the below RDAF OIA Application services.
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-smtp-server
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2 or /opt/rdaf-registry/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
1.3. Upgrade Steps
1.3.1 Upgrade RDAF Infra Services
1.3.1.1 Update RDAF Infra/Platform Services Configuration
Please download the below python script (python rdaf_upgrade_1110_121.py)
Warning
Please verify the python binary version using which RDAF deployment CLI was installed.
ls -l /home/rdauser/.local/lib --> this will show python version as a directory name. (ex: python3.7 or python3.8)
python --version --> The major version (ex: Python 3.7.4 or 3.8.10) should match output from the above.
If it doesn't match, please run the below commands.
sudo mv /usr/bin/python /usr/bin/python_backup
sudo ln -s /usr/bin/python3.7 /usr/bin/python --> Please choose the python binary version using which RDAF deployment CLI was installed. In this example, pythin3.7 binary was used.
Note: If the python version is not either 3.7.x or 3.8.x, please stop the upgrade and contact CloudFabrix support for additional assistance.
Please run the downloaded python upgrade script rdaf_upgrade_1110_121.py as shown below.
The below step will generate *values.yaml.latest files for all RDAF Infrastructure services under /opt/rdaf/deployment-scripts directory.
Please run the below commands to take backup of the values.yaml files of Infrastrucutre and Application services.
cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup
cp /opt/rdaf/deployment-scripts/nats-values.yaml /opt/rdaf/deployment-scripts/nats-values.yaml.backup
cp /opt/rdaf/deployment-scripts/minio-values.yaml /opt/rdaf/deployment-scripts/minio-values.yaml.backup
cp /opt/rdaf/deployment-scripts/mariadb-values.yaml /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup
cp /opt/rdaf/deployment-scripts/opensearch-values.yaml /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup
cp /opt/rdaf/deployment-scripts/kafka-values.yaml /opt/rdaf/deployment-scripts/kafka-values.yaml.backup
cp /opt/rdaf/deployment-scripts/redis-values.yaml /opt/rdaf/deployment-scripts/redis-values.yaml.backup
cp /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml.backup
Update NATs configuration:
Run the below command to copy the upgraded NATs configuration from nats-values.yaml.latest to nats-values.yaml
cp /opt/rdaf/deployment-scripts/nats-values.yaml.latest /opt/rdaf/deployment-scripts/nats-values.yaml
Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/nats-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/nats-values.yaml.backup file.
Note: Below given values are for a reference only.
Update Minio configuration:
Run the below command to copy the upgraded Minio configuration from minio-values.yaml.latest to minio-values.yaml
cp /opt/rdaf/deployment-scripts/minio-values.yaml.latest /opt/rdaf/deployment-scripts/minio-values.yaml
Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/minio-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/minio-values.yaml.backup file.
Note: Below given values are for a reference only.
Update Opensearch configuration:
Run the below command to copy the upgraded Opensearch configuration opensearch-values.yaml.latest to opensearch-values.yaml
cp /opt/rdaf/deployment-scripts/opensearch-values.yaml.latest /opt/rdaf/deployment-scripts/opensearch-values.yaml
Please update the opensearchJavaOpts and memory limit values (below highlighted parameters) in /opt/rdaf/deployment-scripts/opensearch-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup file.
Note: Below given values are for a reference only.
Update Redis configuration:
Run the below command to copy the upgraded Redis configuratoin from redis-values.yaml.latest to redis-values.yaml
cp /opt/rdaf/deployment-scripts/redis-values.yaml.latest /opt/rdaf/deployment-scripts/redis-values.yaml
Update MariaDB configuration:
Run the below command to copy the upgraded MariaDB configuratoin from mariadb-values.yaml.latest to mariadb-values.yaml
cp /opt/rdaf/deployment-scripts/mariadb-values.yaml.latest /opt/rdaf/deployment-scripts/mariadb-values.yaml
Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/mariadb-values.yaml file.
-
memory: Update it by copying the current value from
/opt/rdaf/deployment-scripts/mariadb-values.yaml.backupfile -
initialDelaySeconds: set the value to 1200 (Under livenessProbe section)
-
failureThreshold: set the value to 15 (Under livenessProbe section)
-
expire_logs_days set the value to 1
-
innodb_buffer_pool_size: Update it by copying the current value from
/opt/rdaf/deployment-scripts/mariadb-values.yaml.backupfile -
Comment out wsrep_replicate_myisam=ON line. Please ignore, if it is already commented out.
Note: Below given values are for a reference only.
Update Kafka configuration:
Run the below command to copy the upgraded Kafka configuratoin from kafka-values.yaml.latest to kafka-values.yaml
cp /opt/rdaf/deployment-scripts/kafka-values.yaml.latest /opt/rdaf/deployment-scripts/kafka-values.yaml
Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/kafka-values.yaml file.
-
memory: Update it by copying the current value from
/opt/rdaf/deployment-scripts/kafka-values.yaml.backupfile -
nodePorts: Update it by copying the current value from
kafka-values.yaml.backupfile, please make sure to maintain the order of the nodePorts same as in the current configuration. -
initialDelaySeconds: set the value to 1200 (Under livenessProbe section)
-
failureThreshold: set the value to 15 (Under livenessProbe section)
Note: Below given values are for a reference only.
Update rda_scheduler Service Configuration:
Please take a backup of the /opt/rdaf/deployment-scripts/values.yaml
Edit /opt/rdaf/deployment-scripts/values.yaml file and update the rda_scheduler service configuration by adding the below environment variable as shown below.
- NUM_SERVER_PROCESSES: Set the value to 4
....
....
rda_scheduler:
replicas: 1
privileged: true
resources:
requests:
memory: 100Mi
limits:
memory: 2Gi
env:
NUM_SERVER_PROCESSES: '4'
RDA_ENABLE_TRACES: 'no'
DISABLE_REMOTE_LOGGING_CONTROL: 'no'
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
RDA_GIT_ACCESS_TOKEN: ''
RDA_GIT_URL: ''
RDA_GITHUB_ORG: ''
RDA_GITHUB_REPO: ''
RDA_GITHUB_BRANCH_PREFIX: ''
LABELS: tenant_name=rdaf-01
- Download the python script (
rdaf_upgrade_120_121.py)
- Please run the downloaded python upgrade script.
- Install haproxy service using below command
Run the below RDAF command to check infra status
+----------------------+----------------+------------+--------------+------------------------------+
| Name | Host | Status | Container Id | Tag |
+----------------------+----------------+------------+--------------+------------------------------+
| haproxy | 192.168.133.97 | Up 2 hours | 342fc1338ba1 | 1.0.3.1 |
| haproxy | 192.168.133.98 | Up 2 hours | ec0de9d45a66 | 1.0.3.1 |
| keepalived | 192.168.133.97 | active | N/A | N/A |
| keepalived | 192.168.133.98 | active | N/A | N/A |
| nats | 192.168.133.97 | Up 4 hours | d2dc79419daa | 1.0.3 |
| nats | 192.168.133.98 | Up 4 hours | ef7c632bdb58 | 1.0.3 |
| minio | 192.168.133.93 | Up 4 hours | 414d2a2351b9 | RELEASE.2023-09-30T07-02-29Z |
| minio | 192.168.133.97 | Up 4 hours | aa0f20af7d70 | RELEASE.2023-09-30T07-02-29Z |
| minio | 192.168.133.98 | Up 4 hours | 91e123f8ba43 | RELEASE.2023-09-30T07-02-29Z |
| minio | 192.168.133.99 | Up 4 hours | 74e74cc328b5 | RELEASE.2023-09-30T07-02-29Z |
| mariadb | 192.168.133.97 | Up 4 hours | c2d71adc09ce | 1.0.3 |
| mariadb | 192.168.133.98 | Up 4 hours | 54615146c0fc | 1.0.3 |
| mariadb | 192.168.133.99 | Up 4 hours | 68e2a6088477 | 1.0.3 |
| opensearch | 192.168.133.97 | Up 3 hours | 7e700c133672 | 1.0.3 |
| opensearch | 192.168.133.98 | Up 3 hours | a582e7b552d6 | 1.0.3 |
| opensearch | 192.168.133.99 | Up 3 hours | f752837167e2 | 1.0.3 |
+----------------------+----------------+------------+--------------+------------------------------+
Run the below RDAF command to check infra healthcheck status
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| haproxy | Port Connection | OK | N/A | 192.168.133.97 | 340d7ce361e0 |
| haproxy | Service Status | OK | N/A | 192.168.133.97 | 340d7ce361e0 |
| haproxy | Firewall Port | OK | N/A | 192.168.133.97 | 340d7ce361e0 |
| haproxy | Port Connection | OK | N/A | 192.168.133.98 | 4a6015c9362a |
| haproxy | Service Status | OK | N/A | 192.168.133.98 | 4a6015c9362a |
| haproxy | Firewall Port | OK | N/A | 192.168.133.98 | 4a6015c9362a |
| keepalived | Service Status | OK | N/A | 192.168.133.97 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.133.98 | N/A |
| nats | Port Connection | OK | N/A | 192.168.133.97 | 991873bb3420 |
| nats | Service Status | OK | N/A | 192.168.133.97 | 991873bb3420 |
| nats | Firewall Port | OK | N/A | 192.168.133.97 | 991873bb3420 |
| nats | Port Connection | OK | N/A | 192.168.133.98 | 016438fe2d17 |
| nats | Service Status | OK | N/A | 192.168.133.98 | 016438fe2d17 |
| nats | Firewall Port | OK | N/A | 192.168.133.98 | 016438fe2d17 |
| minio | Port Connection | OK | N/A | 192.168.133.93 | 0c3c86e896c6 |
| minio | Service Status | OK | N/A | 192.168.133.93 | 0c3c86e896c6 |
| minio | Firewall Port | OK | N/A | 192.168.133.93 | 0c3c86e896c6 |
| minio | Port Connection | OK | N/A | 192.168.133.97 | 604fc5ce14a3 |
| minio | Service Status | OK | N/A | 192.168.133.97 | 604fc5ce14a3 |
| minio | Firewall Port | OK | N/A | 192.168.133.97 | 604fc5ce14a3 |
| minio | Port Connection | OK | N/A | 192.168.133.98 | 0c2ae986076e |
| minio | Service Status | OK | N/A | 192.168.133.98 | 0c2ae986076e |
| minio | Firewall Port | OK | N/A | 192.168.133.98 | 0c2ae986076e |
| minio | Port Connection | OK | N/A | 192.168.133.99 | 67a7681a40b4 |
| minio | Service Status | OK | N/A | 192.168.133.99 | 67a7681a40b4 |
| minio | Firewall Port | OK | N/A | 192.168.133.99 | 67a7681a40b4 |
| mariadb | Port Connection | OK | N/A | 192.168.133.97 | 40e9915a3cf4 |
| mariadb | Service Status | OK | N/A | 192.168.133.97 | 40e9915a3cf4 |
| mariadb | Firewall Port | OK | N/A | 192.168.133.97 | 40e9915a3cf4 |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
1.3.1.2 Upgrade RDAF Infra Services
- Upgrade haproxy service using below command
- Please use the below mentioned command to see haproxy is up and in Running state.
Warning
Please verify RDAF portal access to make sure it is accessible after haproxy service is ugpraded before proceeding to the next step.
- Upgrade nats service using below command
- Please use the below mentioned command and wait till all of the nats pods are in Running state and Ready status is 2/2
Tip
If the nats service upgrade is failed with PodDisruptionBudget policy version error message, please update the below file with apiVersion to policy/v1beta1
vi /home/rdauser/.local/lib/python3.7/site-packages/rdaf/deployments/helm/rda-nats/files/pod-disruption-budget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
{{- include "nats.metadataNamespace" $ | nindent 2 }}
name: {{ .Values.podDisruptionBudget.name }}
labels:
{{- include "nats.labels" $ | nindent 4 }}
....
Run the nats service upgrade command.
- Upgrade minio service using below command
- Please use the below mentioned command and wait till all of the minio pods are in Running state and Ready status is 1/1
- Upgrade redis service using below command
- Please use the below mentioned command and wait till all of the redis pods are in Running state and Ready status is 1/1
- Upgrade opensearch service using below command
- Please use the below mentioned command and wait till all of the opensearch pods are in Running state and Ready status is 1/1
Run the below command to get RDAF Infra services details
Danger
Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.
Please proceed to the below steps only after scheduled downtime is approved.
Please download the MariaDB upgrade scripts:
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_migration_ddl_version_from_20_to_22.ql
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_copy_history_data_version_from_20_to_22.ql
Stop RDAF Application Services:
- To stop
rda-webhook-serverapplication service and wait for 60 seconds. This step is to help stop receiving the incoming webhook alerts and allow rest of the application services complete processing the in-transit alerts.
- To stop all of the Application services.
- Check the Application services status. When all of the application services are stopped, it will show an empty output.
Upgrade kafka Service:
- Please run the below upgrade script
rdaf_upgrade_1110_121.py. This script will clear all the data of Kafka and Zookeeper services under the mount points /kafka-logs and /zookeeper, and delete Kubernetes (k8s) pods, Helm charts, persistent volumes (pv), and persistent volume claims (pvc) configuration. After this step, it will uninstall the Kafka and Zookeeper services.
- Please run the below command to check kafka and zookeeper services are uninstalled.
- Install kafka service using below command.
- Please run the below command and wait till all of the kafka pods are in Running state and the Ready status is 1/1
- Please run the below command to create necessary Kafka Topics and corresponding configuration.
Upgrade mariadb Service:
- To stop mariadb services, run the below command. Wait until all of the services are stopped.
- Please run the below command to check mariadb pods are down
- Upgrade mariadb service using the below command
- Please run the below command and wait till all of the mariadb pods are in Running state and Ready status is 1/1
Warning
Please wait till all of the Kafka and MariaDB infra serivce pods are in Running state and Ready status is 1/1
- Run the below commands to check the status of the mariadb cluster. Please verify that the cluster state is in Synced state.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
Run the below commands to check the cluster size of the mariadb cluster. Please verify that the cluster size is 3.
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
- Please run the below commands to drop the indexes on two alert tables of AIOps application services.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertAlternateKey on alert;"
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertHistoryAlternateKey on alerthistory;"
Warning
Please make sure above commands are executed successfully, before continuing to the below step.
- Please run the below command to upgrade the DB schema configuration of the mariadb serivce post the 1.0.3 version upgrade.
- Please run the below RDAF command to check infra services status
+--------------------------+----------------+-----------------+--------------+------------------------------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+----------------+-----------------+--------------+------------------------------+
| haproxy | 192.168.131.41 | Up 16 hours | e2b3b46f702d | 1.0.3.1 |
| haproxy | 192.168.131.42 | Up 5 hours | a89fdd2c5299 | 1.0.3.1 |
| keepalived | 192.168.131.41 | active | N/A | N/A |
| keepalived | 192.168.131.42 | active | N/A | N/A |
| rda-nats | 192.168.131.41 | Up 16 Hours ago | 3682271b3b58 | 1.0.3 |
| rda-nats | 192.168.131.42 | Up 4 Hours ago | 1f3599cf7193 | 1.0.3 |
| rda-minio | 192.168.131.41 | Up 16 Hours ago | 80a865d27b2c | RELEASE.2023-09-30T07-02-29Z |
| rda-minio | 192.168.131.42 | Up 4 Hours ago | 22c7da5bc030 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio | 192.168.131.43 | Up 3 Weeks ago | 1af5abda3061 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio | 192.168.131.48 | Up 3 Weeks ago | 7eec14f4ce0e | RELEASE.2023-09-30T07-02-29Z |
| rda-mariadb | 192.168.131.41 | Up 16 Hours ago | 2596eaddb435 | 1.0.3 |
| rda-mariadb | 192.168.131.42 | Up 4 Hours ago | c004da615516 | 1.0.3 |
| rda-mariadb | 192.168.131.43 | Up 2 Weeks ago | b49f33d491d6 | 1.0.3 |
| rda-opensearch | 192.168.131.41 | Up 16 Hours ago | 5595347d56d6 | 1.0.3 |
...
...
+--------------------------+--------------+-----------------+--------------+--------------------------------+
- Please run the below commands to create a copy of alert and alerthistory tables of
rda-alert-processorservice DB as a backup and update the schema.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_migration_ddl_version_from_20_to_22.ql
- Please run the below commands to copy the data from alert_bak and alerthistory_bak backup tables of
rda-alert-processorservice DB back to primary alert and alerthistory tables.
Note
The copy process would take sometime depends on the historical data in alerthistory table. Please continue with rest of the steps while the data is being copied.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_copy_history_data_version_from_20_to_22.ql
Installing GraphDB Service:
Tip
Please skip the below step if GraphDB service is NOT going to be installed.
Warning
For installing GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here
It is a pre-requisite and this step need to be completed before installing the GraphDB service.
- Please use the below mentioned command and wait till all of the arangodb pods are in Running state.
1.3.2 Upgrade RDAF Platform Services to 3.4.1
Step-1: Run the below command to initiate upgrading RDAF Platform services.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.
Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.
Note
If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Step-6: Run the below command to delete the Terminating RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.4.1 version.
+---------------------+----------------+-----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+---------------------+----------------+-----------------+--------------+-------+
| rda-api-server | 192.168.131.46 | Up 21 Hours ago | 98ec9561d787 | 3.4.1 |
| rda-api-server | 192.168.131.45 | Up 21 Hours ago | e7b7cdb7d3d2 | 3.4.1 |
| rda-registry | 192.168.131.44 | Up 21 Hours ago | bc2fed4a15f3 | 3.4.1 |
| rda-registry | 192.168.131.46 | Up 21 Hours ago | 1b6da7ff3ce2 | 3.4.1 |
| rda-identity | 192.168.131.45 | Up 21 Hours ago | 30053cf6667e | 3.4.1 |
| rda-identity | 192.168.131.46 | Up 21 Hours ago | 6ee2e6a861f7 | 3.4.1 |
| rda-fsm | 192.168.131.44 | Up 21 Hours ago | c014e84bf197 | 3.4.1 |
| rda-fsm | 192.168.131.46 | Up 21 Hours ago | 6a609f8ab579 | 3.4.1 |
+---------------------+----------------+-----------------+--------------+-------+
Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server | True | rda-api-server | 5081891f | | 0 :29:54 | 8 | 31.33 | | |
| Infra | api-server | True | rda-api-server | 9fc5db97 | | 0 :29:52 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | f9b6a00d | | 0 :30:00 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 0a4eb8cd | | 0 :30:01 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-7 | 758fc2cb | | 0 :30:51 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-7 | 3d56a31f | | 0 :28:49 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | 8b570be5 | | 0 :30:44 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | 44930ac7 | *leader* | 0 :30:47 | 8 | 31.33 | | |
| Infra | worker | True | rda-worker-69d | 91615244 | rda-site-01 | 0 :25:30 | 8 | 31.33 | 0 | 9 |
| Infra | worker | True | rda-worker-69d | af99d199 | rda-site-01 | 0 :25:31 | 8 | 31.33 | 2 | 14 |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
Warning
For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
-
To stop application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
Run the below command to initiate upgrading RDAF Platform services.
Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.3 version.
+--------------------------+----------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+----------------+------------+--------------+-------+
| rda_api_server | 192.168.133.92 | Up 2 hours | 6366c9717f07 | 3.4.1 |
| rda_api_server | 192.168.133.93 | Up 2 hours | d5b8c2722f72 | 3.4.1 |
| rda_registry | 192.168.133.92 | Up 2 hours | 47f722aab97b | 3.4.1 |
| rda_registry | 192.168.133.93 | Up 2 hours | f5ce662af82f | 3.4.1 |
| rda_scheduler | 192.168.133.92 | Up 2 hours | 28b597777069 | 3.4.1 |
| rda_scheduler | 192.168.133.93 | Up 2 hours | 2d70a4ac184e | 3.4.1 |
| rda_collector | 192.168.133.92 | Up 2 hours | 637a07f4df17 | 3.4.1 |
| rda_collector | 192.168.133.93 | Up 2 hours | 478167b3952a | 3.4.1 |
| rda_asset_dependency | 192.168.133.92 | Up 2 hours | c910651896fe | 3.4.1 |
| rda_asset_dependency | 192.168.133.93 | Up 2 hours | c1ddfde81b13 | 3.4.1 |
| rda_identity | 192.168.133.92 | Up 2 hours | f70beaa486a6 | 3.4.1 |
| rda_identity | 192.168.133.93 | Up 2 hours | a726b0f154c8 | 3.4.1 |
| rda_fsm | 192.168.133.92 | Up 2 hours | 87b26529566a | 3.4.1 |
| rda_fsm | 192.168.133.93 | Up 2 hours | 13891be75c05 | 3.4.1 |
+--------------------------+----------------+------------+--------------+-------+
Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | minio-connectivity | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-initialization-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=2, Brokers=[1, 2, 3] |
| rda_app | alert-processor | 2afde67935ac | 33170bc7 | | service-status | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
1.3.2.1 Upgrade RDAF Platform Services to 3.4.1.2
Step-1: Run the below command to initiate upgrading below RDAF Platform services.
- rda-scheduler
- rda-api-server
- rda-portal
rdafk8s platform upgrade --tag 3.4.1.2 --service rda-scheduler --service rda-api-server --service rda-portal
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.
Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Step-6: Run the below command to delete the Terminating RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.4.1 version.
+---------------------+----------------+-----------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+---------------------+----------------+-----------------+--------------+---------+
| rda-api-server | 192.168.131.46 | Up 21 Hours ago | 98ec9561d787 | 3.4.1.2 |
| rda-api-server | 192.168.131.45 | Up 21 Hours ago | e7b7cdb7d3d2 | 3.4.1.2 |
| rda-scheduler | 192.168.131.44 | Up 21 Hours ago | bc2fed4a15f3 | 3.4.1.2 |
| rda-scheduler | 192.168.131.46 | Up 21 Hours ago | 1b6da7ff3ce2 | 3.4.1.2 |
| rda-portal-backend | 192.168.131.45 | Up 21 Hours ago | 30053cf6667e | 3.4.1.2 |
| rda-portal-backend | 192.168.131.46 | Up 21 Hours ago | 6ee2e6a861f7 | 3.4.1.2 |
| rda-portal-frontend | 192.168.131.44 | Up 21 Hours ago | c014e84bf197 | 3.4.1.2 |
| rda-portal-frontend | 192.168.131.46 | Up 21 Hours ago | 6a609f8ab579 | 3.4.1.2 |
+---------------------+----------------+-----------------+--------------+-------+
Run the below command to check and verify that one of the rda-scheduler service is elected as a leader under Site column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server | True | rda-api-server | 5081891f | | 0 :29:54 | 8 | 31.33 | | |
| Infra | api-server | True | rda-api-server | 9fc5db97 | | 0 :29:52 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | f9b6a00d | | 0 :30:00 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 0a4eb8cd | | 0 :30:01 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-7 | 758fc2cb | | 0 :30:51 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-7 | 3d56a31f | | 0 :28:49 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | 8b570be5 | | 0 :30:44 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | 44930ac7 | *leader* | 0 :30:47 | 8 | 31.33 | | |
| Infra | worker | True | rda-worker-69d | 91615244 | rda-site-01 | 0 :25:30 | 8 | 31.33 | 0 | 9 |
| Infra | worker | True | rda-worker-69d | af99d199 | rda-site-01 | 0 :25:31 | 8 | 31.33 | 2 | 14 |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
1.3.3 Upgrade rdac CLI
1.3.4 Upgrade OIA Application Services to 7.4.1
Step-1: Run the below commands to initiate upgrading RDAF OIA Application services
Step-2: Run the below command to check the status of the newly upgraded PODs.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.
Step-3: Run the below command to put all Terminating RDAF Application service PODs into maintenance mode. It will list all of the POD Ids of application services along with rdac maintenance command that required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF application services.
Step-6: Run the below command to delete the Terminating RDAF application service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Application service PODs.
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.4.1 version.
+-------------------------------+----------------+----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+----------------+----------------+--------------+-------+
| rda-alert-ingester | 192.168.131.50 | Up 4 Hours ago | 013e6fb89274 | 7.4.1 |
| rda-alert-ingester | 192.168.131.49 | Up 4 Hours ago | ce269889fe6c | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.49 | Up 4 Hours ago | b4bca9347589 | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.50 | Up 4 Hours ago | 1c530b32c563 | 7.4.1 |
| rda-alert-processor | 192.168.131.47 | Up 4 Hours ago | b0e25a38c72d | 7.4.1 |
| rda-alert-processor | 192.168.131.46 | Up 4 Hours ago | 2a5b0f764cfd | 7.4.1 |
| rda-app-controller | 192.168.131.50 | Up 4 Hours ago | 0261820f6e01 | 7.4.1 |
| rda-app-controller | 192.168.131.46 | Up 4 Hours ago | 134844ff7208 | 7.4.1 |
| rda-collaboration | 192.168.131.50 | Up 4 Hours ago | e5e196b74462 | 7.4.1 |
| rda-collaboration | 192.168.131.46 | Up 4 Hours ago | ed4ec37435b7 | 7.4.1 |
| rda-configuration-service | 192.168.131.46 | Up 4 Hours ago | 74e22e5ddee1 | 7.4.1 |
| rda-configuration-service | 192.168.131.50 | Up 4 Hours ago | b09637691cbd | 7.4.1 |
+-------------------------------+----------------+----------------+--------------+-------+
cfxdimensions-app-irm_service has leader status under Site column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App | alert-ingester | True | rda-alert-inge | 7861bd4f | | 4:20:52 | 8 | 31.33 | | |
| App | alert-ingester | True | rda-alert-inge | 4abc521f | | 4:20:52 | 8 | 31.33 | | |
| App | alert-processor | True | rda-alert-proc | 9bf94e67 | | 4:20:50 | 8 | 31.33 | | |
| App | alert-processor | True | rda-alert-proc | 4e679139 | | 4:20:48 | 8 | 31.33 | | |
| App | alert-processor-companion | True | rda-alert-proc | 745dfbb9 | | 4:20:39 | 8 | 31.33 | | |
| App | alert-processor-companion | True | rda-alert-proc | 02f6bce0 | | 4:20:41 | 8 | 31.33 | | |
| App | asset-dependency | True | rda-asset-depe | fc6c7a60 | | 4:28:00 | 8 | 31.33 | | |
| App | asset-dependency | True | rda-asset-depe | d3ca4c11 | | 4:27:07 | 8 | 31.33 | | |
| App | authenticator | True | rda-identity-6 | 4cd59d9c | | 4:27:01 | 8 | 31.33 | | |
| App | authenticator | True | rda-identity-6 | 174298c3 | | 4:25:53 | 8 | 31.33 | | |
| App | cfx-app-controller | True | rda-app-contro | 4d923832 | | 4:20:42 | 8 | 31.33 | | |
| App | cfx-app-controller | True | rda-app-contro | b16deafa | | 4:20:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | rda-access-man | 09d1fada | | 4:27:56 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | rda-access-man | e0af2bcc | | 4:27:54 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | rda-collaborat | 9e7f7bcb | | 4:20:31 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | rda-collaborat | 38db5386 | | 4:20:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 589e18f8 | | 4:20:20 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 853545f8 | | 4:19:59 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | d17f8dcd | | 4:20:06 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | 44decaa7 | *leader* | 4:19:41 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | rda-notificati | 74e58855 | | 4:20:14 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | kafka-connectivity | ok | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=0, Brokers=[0, 1, 2] |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | kafka-connectivity | ok | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=2, Brokers=[0, 1, 2] |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | service-status | ok | |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | minio-connectivity | ok | |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | service-dependency:cfx-app-controller | ok | 2 pod(s) found for cfx-app-controller |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
Run the below commands to initiate upgrading the RDA Fabric OIA Application services.
Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.4.1 version.
+-----------------------------------+----------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+-----------------------------------+----------------+------------+--------------+-------+
| cfx-rda-app-controller | 192.168.133.96 | Up 2 hours | deab59a554f6 | 7.4.1 |
| cfx-rda-app-controller | 192.168.133.92 | Up 2 hours | 7e3cbfc6d899 | 7.4.1 |
| cfx-rda-reports-registry | 192.168.133.96 | Up 2 hours | 934ef236dde2 | 7.4.1 |
| cfx-rda-reports-registry | 192.168.133.92 | Up 2 hours | 8749187dfb82 | 7.4.1 |
| cfx-rda-notification-service | 192.168.133.96 | Up 2 hours | eaaa0116b25c | 7.4.1 |
| cfx-rda-notification-service | 192.168.133.92 | Up 2 hours | 7f5b91f6b166 | 7.4.1 |
| cfx-rda-file-browser | 192.168.133.96 | Up 2 hours | 62ba48307a89 | 7.4.1 |
| cfx-rda-file-browser | 192.168.133.92 | Up 2 hours | ad83ab7f2611 | 7.4.1 |
| cfx-rda-configuration-service | 192.168.133.96 | Up 2 hours | 6f24b3296c44 | 7.4.1 |
| cfx-rda-configuration-service | 192.168.133.92 | Up 2 hours | ad93c6ddf2bc | 7.4.1 |
| cfx-rda-alert-ingester | 192.168.133.96 | Up 2 hours | 9132494ea9ab | 7.4.1 |
| cfx-rda-alert-ingester | 192.168.133.92 | Up 2 hours | f5312c1fc474 | 7.4.1 |
+-----------------------------------+----------------+------------+--------------+-------+
cfxdimensions-app-irm_service has leader status under Site column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App | alert-ingester | True | 9132494ea9ab | ad43cf79 | | 1:56:34 | 4 | 31.21 | | |
| App | alert-ingester | True | f5312c1fc474 | 2a129b31 | | 1:56:21 | 4 | 31.21 | | |
| App | alert-processor | True | 2afde67935ac | 33170bc7 | | 1:54:29 | 4 | 31.21 | | |
| App | alert-processor | True | f289e1088a16 | 831fe5c3 | | 1:54:14 | 4 | 31.21 | | |
| App | alert-processor-companion | True | 83ebf4300ac5 | c9dba0df | | 1:47:44 | 4 | 31.21 | | |
| App | alert-processor-companion | True | 9b1b55d78d1a | a66ecf29 | | 1:47:29 | 4 | 31.21 | | |
| App | asset-dependency | True | c1ddfde81b13 | 985fc496 | | 2:20:03 | 4 | 31.21 | | |
| App | asset-dependency | True | c910651896fe | 9c355c7d | | 2:20:06 | 4 | 31.21 | | |
| App | authenticator | True | f70beaa486a6 | 955eb254 | | 2:19:59 | 4 | 31.21 | | |
| App | authenticator | True | a726b0f154c8 | 898c36b4 | | 2:19:57 | 4 | 31.21 | | |
| App | cfx-app-controller | True | 7e3cbfc6d899 | 2097a877 | | 1:58:49 | 4 | 31.21 | | |
| App | cfx-app-controller | True | deab59a554f6 | 3bd4ce27 | | 1:59:02 | 4 | 31.21 | | |
| App | cfxdimensions-app-access-manager | True | f47c6cab13f1 | e0636eea | | 2:19:32 | 4 | 31.21 | | |
| App | cfxdimensions-app-access-manager | True | 02b526adf7f9 | 7a286ce7 | | 2:19:23 | 4 | 31.21 | | |
| App | cfxdimensions-app-collaboration | True | b602c2cddd90 | 836e0134 | | 1:53:02 | 4 | 31.21 | | |
| App | cfxdimensions-app-collaboration | True | 2f02987f249d | c4d4720d | | 1:48:31 | 4 | 31.21 | | |
| App | cfxdimensions-app-file-browser | True | 62ba48307a89 | 48d1d0d2 | | 1:57:34 | 4 | 31.21 | | |
| App | cfxdimensions-app-file-browser | True | ad83ab7f2611 | 93078496 | | 1:57:14 | 4 | 31.21 | | |
| App | cfxdimensions-app-irm_service | True | 56dffc7d6501 | 672ff70a | *leader* | 1:53:57 | 4 | 31.21 | | |
| App | cfxdimensions-app-irm_service | True | b40a96601c73 | 25fe51f5 | | 1:53:42 | 4 | 31.21 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | minio-connectivity | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-initialization-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=3, Brokers=[1, 2, 3] |
| rda_app | alert-processor | 2afde67935ac | 33170bc7 | | service-status | ok | |
| rda_app | alert-processor | 2afde67935ac | 33170bc7 | | minio-connectivity | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
1.3.4.1 Upgrade OIA Application Services to 7.4.1.2/7.4.1.3
Step-1: Run the below commands to initiate upgrading the below RDAF OIA Application services
-
rda-webhook-server
-
rda-event-consumer
-
rda-smtp-server
Step-2: Run the below command to check the status of the newly upgraded PODs.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.
Step-3: Run the below command to put above mentioned Terminating RDAF Application service PODs into maintenance mode. It will list POD Ids of application services along with rdac maintenance command that required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF application services.
Step-6: Run the below command to delete the Terminating RDAF application service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Application service PODs.
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.4.1.2 version.
+-------------------------------+----------------+----------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+----------------+----------------+--------------+---------+
| rda-event-consumer | 192.168.131.50 | Up 4 Hours ago | 013e6fb89274 | 7.4.1.2 |
| rda-event-consumer | 192.168.131.49 | Up 4 Hours ago | ce269889fe6c | 7.4.1.2 |
| rda-webhook-server | 192.168.131.49 | Up 4 Hours ago | b4bca9347589 | 7.4.1.2 |
| rda-webhook-server | 192.168.131.50 | Up 4 Hours ago | 1c530b32c563 | 7.4.1.2 |
| rda-smtp-server | 192.168.131.47 | Up 4 Hours ago | b0e25a38c72d | 7.4.1.2 |
| rda-smtp-server | 192.168.131.46 | Up 4 Hours ago | 2a5b0f764cfd | 7.4.1.2 |
+-------------------------------+----------------+----------------+--------------+-------+
+-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App | alert-ingester | True | rda-alert-inge | 7861bd4f | | 4:20:52 | 8 | 31.33 | | |
| App | alert-ingester | True | rda-alert-inge | 4abc521f | | 4:20:52 | 8 | 31.33 | | |
| App | alert-processor | True | rda-alert-proc | 9bf94e67 | | 4:20:50 | 8 | 31.33 | | |
| App | alert-processor | True | rda-alert-proc | 4e679139 | | 4:20:48 | 8 | 31.33 | | |
| App | alert-processor-companion | True | rda-alert-proc | 745dfbb9 | | 4:20:39 | 8 | 31.33 | | |
| App | alert-processor-companion | True | rda-alert-proc | 02f6bce0 | | 4:20:41 | 8 | 31.33 | | |
| App | asset-dependency | True | rda-asset-depe | fc6c7a60 | | 4:28:00 | 8 | 31.33 | | |
| App | asset-dependency | True | rda-asset-depe | d3ca4c11 | | 4:27:07 | 8 | 31.33 | | |
| App | authenticator | True | rda-identity-6 | 4cd59d9c | | 4:27:01 | 8 | 31.33 | | |
| App | authenticator | True | rda-identity-6 | 174298c3 | | 4:25:53 | 8 | 31.33 | | |
| App | cfx-app-controller | True | rda-app-contro | 4d923832 | | 4:20:42 | 8 | 31.33 | | |
| App | cfx-app-controller | True | rda-app-contro | b16deafa | | 4:20:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | rda-access-man | 09d1fada | | 4:27:56 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | rda-access-man | e0af2bcc | | 4:27:54 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | rda-collaborat | 9e7f7bcb | | 4:20:31 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | rda-collaborat | 38db5386 | | 4:20:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 589e18f8 | | 4:20:20 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 853545f8 | | 4:19:59 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | d17f8dcd | | 4:20:06 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | 44decaa7 | *leader* | 4:19:41 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | rda-notificati | 74e58855 | | 4:20:14 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | kafka-connectivity | ok | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=0, Brokers=[0, 1, 2] |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | kafka-connectivity | ok | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=2, Brokers=[0, 1, 2] |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | service-status | ok | |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | minio-connectivity | ok | |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | service-dependency:cfx-app-controller | ok | 2 pod(s) found for cfx-app-controller |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
1.3.5 Upgrade RDA Worker Services
Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.
NAME READY STATUS RESTARTS AGE
rda-worker-69d485f476-99tnv 1/1 Running 0 45h
rda-worker-69d485f476-gwq4f 1/1 Running 0 45h
Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.
Step-6: Run the below command to delete the Terminating RDAF worker service PODs
for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.
Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.
+------------+----------------+-----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+-----------------+--------------+-------+
| rda-worker | 192.168.131.45 | Up 19 Hours ago | 6360f61b4249 | 3.4.1 |
| rda-worker | 192.168.131.44 | Up 19 Hours ago | 806b7b334943 | 3.4.1 |
+------------+----------------+-----------------+--------------+-------+
Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.
- Upgrade RDA Worker Services
Please run the below command to initiate upgrading the RDA Worker service PODs.
Note
If the worker is deployed in proxy environment, please add the required environment proxy variables in /opt/rdaf/deployment-scripts/values.yaml, under the section rda_worker -> env:, instead of making changes to worker.yaml (this is needed only if there are any new changes needed for worker)
Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.
+------------+----------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+------------+--------------+-------+
| rda_worker | 192.168.133.96 | Up 2 hours | 03061dd8dfcc | 3.4.1 |
| rda_worker | 192.168.133.92 | Up 2 hours | cbb31b875cf6 | 3.4.1 |
+------------+----------------+------------+--------------+-------+
1.4.Post Upgrade Steps
1.4.1 OIA
1. Deploy latest Alerts and Incidents Dashboard configuration
Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.
Warning
It is mandatory to deploy the oia_l1_l2_bundle (Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.
After deploying the oia_l1_l2_bundle, we will get main incident landing page which is oia-incidents-os-template. From this page we can drill down on any incident which will take us to incident-details-app. Within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.
- Alerts
- Topology
- Metrics
- Insights
- Collaboration
- Diagnostics
- Remediation
- Activities
Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.
If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.
Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.
....
....
"dashboard_pages": [
{
"name": "incident-details-alerts",
"label": "Alerts",
"icon": "alert.svg"
},
{
"name": "incident-details-topology",
"label": "Topology",
"icon": "topology.svg"
},
{
"name": "incident-details-metrics",
"label": "Metrics",
"icon": "metrics.svg"
},
{
"name": "incident-details-insights",
"label": "Insights",
"icon": "nextSteps.svg"
},
{
"name": "incident-details-collaboration",
"label": "Collaboration",
"icon": "collaboration.svg"
},
{
"name": "incident-details-diagnostics",
"label": "Diagnostics",
"icon": "diagnostic.svg"
},
{
"name": "incident-details-remediation",
"label": "Remediation",
"icon": "remedial.svg"
},
{
"name": "incident-details-activities",
"label": "Activities",
"icon": "activities.svg"
}
....
....
Note
Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.
1. Deploy latest Alerts and Incidents Dashboard configuration
Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.
Warning
It is mandatory to deploy the oia_l1_l2_bundle (Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.
After deploying the oia_l1_l2_bundle, we will get main incident landing page which is oia-incidents-os-template. From this page we can drill down on any incident which will take us to incident-details-app. Within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.
- Alerts
- Topology
- Metrics
- Insights
- Collaboration
- Diagnostics
- Remediation
- Activities
Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.
If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.
Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.
....
....
"dashboard_pages": [
{
"name": "incident-details-alerts",
"label": "Alerts",
"icon": "alert.svg"
},
{
"name": "incident-details-topology",
"label": "Topology",
"icon": "topology.svg"
},
{
"name": "incident-details-metrics",
"label": "Metrics",
"icon": "metrics.svg"
},
{
"name": "incident-details-insights",
"label": "Insights",
"icon": "nextSteps.svg"
},
{
"name": "incident-details-collaboration",
"label": "Collaboration",
"icon": "collaboration.svg"
},
{
"name": "incident-details-diagnostics",
"label": "Diagnostics",
"icon": "diagnostic.svg"
},
{
"name": "incident-details-remediation",
"label": "Remediation",
"icon": "remedial.svg"
},
{
"name": "incident-details-activities",
"label": "Activities",
"icon": "activities.svg"
}
....
....
Note
Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.
1.4.2 Migrating Custom Built Dashboards
1.4.2.1 For Alert Dashboards
Below are the changes done to Alerts Dashboard
1. Moved alerts to tabular report to oia-alert-os.json and reused it across all the alerts dashboards.
2. Need to update Clear Alerts(BULK CLEAR) action context with below context if we are using individual alerts tabular report defined in respective dashboards.
a) Oia-Alert-OS
Add below variable in template_variables, This is needed to fix bulk clear alerts issue
"SOURCE_DASHBOARD_ID": {
"contextId": "sourceDashboardId",
"default": "user-dashboard-oia-alerts-os"
}
- Update Clear Alerts Action Context With Below Context
- Noise Reduction
add/update extra_filter
- Incidents Without Policy
add/update extra_filter
- Alerts without Policy
add/update extra_filter
b) Oia-Alert-Groups-Policy-OS
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-oia-alert-groups-policy-os",
"contextId": "sourceDashboardId"
}
c) Oia-View-Alerts-Policy-OS
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-oia-view-alerts-policy-os",
"contextId": "sourceDashboardId"
}
d) Alert-Trail
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-incident-details-alerts",
"contextId": "sourceDashboardId"
},
e) Incident-Details-Alerts
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-incident-details-alerts",
"contextId": "sourceDashboardId"
}
f) Oia-Alert-Group-View-Alerts-OS
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-oia-alert-group-view-alerts-os",
"contextId": "sourceDashboardId"
}
g) OIA-Alert-Group-View-details-OS-V2
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-oia-alert-group-view-details-os-v2",
"contextId": "sourceDashboardId"
}
h) OIA-Alert-Group-View-Details-OS
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-oia-alert-group-view-details-os",
"contextId": "sourceDashboardId"
}
Note
From (b - h) in case Individual Alerts Tabular Report defined in the dashboard definition shows BULK CLEAR ACTION. Please refer to point No. 2 to update context.
i) OIA-Alert-Groups-OS
"SOURCE_DASHBOARD_ID": {
"default": "user-dashboard-oia-alert-groups-os",
"contextId": "sourceDashboardId"
}
j) Update Clear Alerts Action Context With Below Context
1.4.2.2 For Incident Dashboards
Below are the changes done to Incidents Dashboard
1. Change "appName" from "incident-details" to "user-dashboard/incident-details-app"
2. Locate column name "i_cfx_state" and remove it from list of columns, group filters etc
a) Incident-Topology
Add "auto_group": false after "stack_type": "OIA"
b) l1-service-health.json, l2-l3-service-health.json
From contextParamList -> contextParams remove { "paramKey": "project_id", "paramId": "id"}
c) Oia-Incidents-Os-Template
1. Remove actions with titles: "Collect Data", "Share"
2. "auto_group": false to be added after "stack_type": "OIA"