Reference Guide
High Availability

Watchdog Mechanism

Each watchdog executes the following actions:

  • Control its own Tomcat service and database; write information into its own logs.
  • Communicate between each other and notify via heartbeat if a Tomcat is down.
  • Log the connection time to each instance.

If a master watchdog is down:

  • The next member takes on the mastership, after checking the down status

If a master instance Tomcat is down:

  • The watchdog notifies other watchdogs.
  • The next member takes on the mastership directly.

Detailed table:

Service

Master

Member

Action

Tomcat

Up

Down

Watchdog takes no extra action.

Tomcat

Down

Up

1- Master watchdog sends a fail acknowledgment to another watchdog. 2- Member watchdog will initiate a failover scenario.

Tomcat

Down

Down

Watchdog takes no extra action

Watchdog

Up

Down

Watchdog takes no extra action

Watchdog

Down

Up

1- Member will take on the mastership after checking the down status three times (parametric) 2- Member watchdog will initiate a failover scenario .

Watchdog

Down

Down

Watchdog takes no extra action.

Wathdog, Tomcat (Happy path)

Up

Up

Master and member watchdog will send heartbeat and acknowledge information to each other and log all communication.

There are three services for the following actions:

  • A service checks the Tomcat status. It sends a request to the DB; if it receives an answer (if there is a connection), the service returns an OK.
  • A service runs the parameters, including instance information and threshold values related to the watchdog mechanism:
  • Instance name, IP, and priority order from t_instance table The threshold value of how many times it will check if an answer from the watchdog cannot be received, and how often the watchdogs will communicate.
  • Those two values will be in the t_system_parameter table and will start with the watchdog.
  • A service will update the master instance information.

Replication between each Kron PAM instance is controlled periodically by watchdogs, and SAPM jobs can also be checked and stopped, if necessary, along with the replication status for disabled SAPM jobs.

Disabled SAPM job scenarios are outlined in the following table:

Service

Master

Member

Action

Tomcat & Database

Up

Down

Each watchdog stops SAPM jobs in their instance.

Replication Status

Up

Down

Each watchdog stops SAPM jobs in their instance.

Watchdog

Up

Down

Active watchdogs stop SAPM jobs in their instance.

Tomcat & Database

Down

Up

Each watchdog stops SAPM jobs in their instance.

Replication Status

Down

Up

Each watchdog stops SAPM jobs in their instance.

Watchdog

Down

Up

Active watchdogs stop SAPM jobs in their instance.

Tomcat & Database and Replication (Happy path)

Up

Up

Master and member watchdogs keep watching services and log all communications.