US20050188283A1 - Node management in high-availability cluster - Google Patents

Node management in high-availability cluster Download PDF

Info

Publication number
US20050188283A1
US20050188283A1 US10/764,244 US76424404A US2005188283A1 US 20050188283 A1 US20050188283 A1 US 20050188283A1 US 76424404 A US76424404 A US 76424404A US 2005188283 A1 US2005188283 A1 US 2005188283A1
Authority
US
United States
Prior art keywords
node
status
cluster
inter
heartbeat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/764,244
Other versions
US6928589B1 (en
Inventor
Ken Pomaranski
Andrew Barr
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US10/764,244 priority Critical patent/US6928589B1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BARR, ANDREW HARVEY, POMARANSKI, KEN GARY
Priority to GB0501119A priority patent/GB2410406B/en
Priority to JP2005012196A priority patent/JP2005209201A/en
Application granted granted Critical
Publication of US6928589B1 publication Critical patent/US6928589B1/en
Publication of US20050188283A1 publication Critical patent/US20050188283A1/en
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q3/00Selecting arrangements
    • H04Q3/0016Arrangements providing connection between exchanges
    • H04Q3/0062Provisions for network management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q3/00Selecting arrangements
    • H04Q3/0016Arrangements providing connection between exchanges
    • H04Q3/0062Provisions for network management
    • H04Q3/0087Network testing or monitoring arrangements

Definitions

  • the present disclosure relates generally to computer networks. More particularly, the present disclosure relates to clusters of interconnected computer systems.
  • a high-availability (HA) cluster is a parallel or distributed system that comprises a collection of interconnected computer systems or servers that is used as a single, unified computing unit. Members of a cluster are referred to as nodes or systems.
  • the cluster service is the collection of software on each node that manages cluster-related activity.
  • the cluster service sees all resources as identical objects. Resource may include physical hardware devices, such as disk drives and network cards, or logical items, such as logical disk volumes, TCP/IP addresses, entire applications and databases, among other examples.
  • a group is a collection of resources to be managed as a single unit. Generally, a group contains all of the components that are necessary for running a specific application and allowing a user to connect to the service provided by the application. Operations performed on a group typically affect all resources contained within that group. By coupling two or more servers together, clustering increases the system availability, performance, and capacity for network systems and applications.
  • Clustering may be used for parallel processing or parallel computing to simultaneously use two or more CPUs to execute an application or program.
  • Clustering is a popular strategy for implementing parallel processing applications because it allows system administrators to leverage already existing computers and workstations. Because it is difficult to predict the number of requests that will be issued to a networked server, clustering is also useful for load balancing to distribute processing and communications activity evenly across a network system so that no single server is overwhelmed. If one server is running the risk of being swamped, requests may be forwarded to another clustered server with greater capacity. For example, busy Web sites may employ two or more clustered Web servers in order to employ a load balancing scheme. Clustering also provides for increased scalability by allowing new components to be added as the system load increases.
  • clustering simplifies the management of groups of systems and their applications by allowing the system administrator to manage an entire group as a single system.
  • Clustering may also be used to increase the fault tolerance of a network system. If one server suffers an unexpected software or hardware failure, another clustered server may assume the operations of the failed server. Thus, if any hardware of software component in the system fails, the user might experience a performance penalty, but will not lose access to the service.
  • MSCS Microsoft Cluster Server
  • NWCS Novell Netware Cluster Services
  • Clustering may also be implemented in computer networks utilizing storage area networks (SAN) and similar networking environments.
  • SAN networks allow storage systems to be shared among multiple clusters and/or servers.
  • the storage devices in a SAN may be structured, for example, in a RAID configuration.
  • clustered nodes may use a heartbeat mechanism to monitor the health of each other.
  • a heartbeat is a signal that is sent by one clustered node to another clustered node.
  • Heartbeat signals are typically sent over an Ethernet or similar network, where the network is also utilized for other purposes.
  • Failure of a node is detected when an expected heartbeat signal is not received from the node.
  • the clustering software may, for example, transfer the entire resource group of the failed node to another node.
  • a client application affected by the failure may detect the failure in the session and reconnect in the same manner as the original connection.
  • a heartbeat signal is received from a node of the cluster, then that node is normally defined to be in an “up” state. In the up state, the node is presumed to be operating properly. On the other hand, if the heartbeat signal is no longer received from a node, then that node is normally defined to be in a “down” state. In the down state, the node is presumed to have failed.
  • One embodiment disclosed herein pertains to a method of status generation for a node of a high-availability cluster.
  • a heartbeat signal is sent from the node through a network to the cluster.
  • a current status of the node is determined, and the status is sent out through a specialized interface to a next node.
  • Another embodiment disclosed herein pertains to a method of cluster-wide management performed per node.
  • a heartbeat input received from the previous node is checked.
  • an up/down status input received from the previous node and a degraded status input received from the previous node are also checked.
  • Another embodiment disclosed herein pertains to a system for of a high-availability cluster.
  • the system includes a general inter-node communication network that is configured to carry signals including heartbeat signals from the nodes.
  • a separate inter-node communication channel is included for communicating node status signals.
  • FIG. 1 is a schematic diagram depicting a conventional high-availability cluster.
  • FIG. 2 is a schematic diagram depicting a representative high-availability cluster in accordance with an embodiment of the invention.
  • FIG. 3 is a flow chart depicting a node status generation process in accordance with an embodiment of the invention.
  • FIG. 4 is a flow chart depicting a cluster-wide management process in accordance with an embodiment of the invention.
  • FIGS. 5A and 5B are flow charts depicting a logical analysis procedure in accordance with an embodiment of the invention.
  • FIG. 6 is a flow chart depicting a removal procedure in accordance with an embodiment of the invention.
  • FIG. 1 is a schematic diagram depicting a conventional high-availability cluster 100 .
  • the conventional cluster 100 includes multiple nodes 102 and a network or network mesh 104 (typically an Ethernet network) interconnecting the nodes.
  • a network or network mesh 104 typically an Ethernet network
  • heartbeat signals are sent from the nodes to the cluster over the network 104 .
  • all nodes provide a heartbeat signal through an Ethernet (or other networking) interface. All nodes in the cluster monitor these signals. If a node determines (or several nodes determine) that a node has stopped sending heartbeat signals, then that node is “removed” from the HA cluster.
  • the communications path uses relatively slow and high-overhead connections between the nodes (Ethernet, for instance).
  • the conventional approach defines an “up” state in which the node is sending heartbeat signals to the cluster, and a “down” state in which the node fails to generate these heartbeat signals. This is disadvantageous in that a node can still send heartbeats even if a target critical application is “down.”
  • a missed heartbeat signal is ambiguous in that it may be due to any number of causes (for instance, either node or interconnect failure).
  • the above problems and disadvantages result in inefficient cluster-level software and sub-optimum uptime.
  • the efficiency (i.e. uptime) of an HA cluster is largely determined by the amount of time it takes for the cluster system to recognize that a node in the cluster is in a “down” state.
  • a node is in a down state when it ceases in its ability to perform useful computing or storage functions for the HA cluster.
  • the HA clustering software can perform the necessary tasks to keep the rest of the cluster running, with little interruption of user tasks.
  • the efficiency can also be limited by the number of unnecessary switchovers in an HA cluster, as each switchover event ‘costs’ some cluster-level uptime.
  • the ‘split-brain’ situation should be avoided for an HA cluster to perform correctly.
  • ‘Split brain’ is the situation (known by those skilled in the art) that results when a node that is thought to be ‘down’ really is not ‘down’. Such a situation can result in data loss and/or failure of an HA cluster. Accuracy in node state determination is key to assuring that ‘split brain’ does not occur in an HA cluster.
  • the disclosure of the present application addresses some of the problems and disadvantages with the conventional approach.
  • the very harmful ‘split-brain’ situation is avoided since the invention has built-in mechanisms for quicky and accurately double (or triple) checking node status when it looks like a node may be down.
  • FIG. 2 is a schematic diagram depicting a representative high-availability cluster 200 in accordance with an embodiment of the invention.
  • Four nodes 202 are shown in the diagram, but various numbers of nodes may be used within the scope of the invention.
  • each node 202 may send status information over a communication link 206 to the next node 202 in the ring (going clockwise in the illustrated example) and may receive status information over another link 206 from the previous node 202 in the ring.
  • additional channel for status communications allows for rapid and reliable exchange of node status data.
  • cluster-level software runs on each node 202 .
  • Each node 202 may be configured to provide the following resources to the cluster-level software.
  • Hardware resources include a processor or processors, a memory system, disk-based or similar data storage, an interface to the network 104 interconnecting the nodes, and the dedicated signaling hardware 204 for inter-node status signaling.
  • Software resources includes routines to perform the following tasks: updating and transmitting the status of the present node 202 ; monitoring and analyzing status data from another node 202 in the cluster; and taking appropriate action based on the status data from the node 202 being monitored.
  • the computational subsystem of the node may, or may not, be running a mission-critical software application. If it is, then the mission-critical application is listed in a configuration file of the cluster-level software.
  • the node status signals may include the following: an up/down status signal; a degraded status signal, and a heartbeat signal.
  • the heartbeat signal may be transmitted conventionally via the network 104 so that the heartbeat information of all nodes in the HA cluster is on the network 104 .
  • the up/down and degraded status signals may be transmitted and received separately via the additional signaling hardware 204 and independent communication links 206 .
  • up (or GOOD) indicates that the node is operating
  • down (or BAD) indicates that the node has failed.
  • the degraded status signal may comprise a two-state signal having DEGRADED and NOT_DEGRADED states. Alternatively, the degraded status signal may include multiple degradation levels.
  • FIG. 3 is a flow chart depicting a node status generation process 300 in accordance with an embodiment of the invention. This process 300 occurs at each active node of the cluster.
  • Each active node determines 302 its current up/down status. This determination may be accomplished by applying rules in a rule file stored in memory or on disk at the present node. An up (or GOOD) status indicates that the node is operating, and a down (or BAD) status indicates that the present node has failed. The up/down status data is then sent out 304 from the present node through the specialized hardware interface 204 to the next node in the cluster. For example, in the case of the topology of FIG.
  • node A 202 A would send its up/down status data to node B 202 B
  • node B 202 B would send its up/down status data to node C 202 C
  • node C 202 C would send its up/down status data to node D 202 D
  • node D 202 D would send its up/down status data to node A 202 A.
  • Each node also sends 306 its heartbeat signal to the cluster. This is conventionally done via the network 104 .
  • each active node determines 308 its current degraded status (or level). This determination may be accomplished by applying rules in a rule file stored in memory or on disk at the present node. For example, the degraded levels may be indicated by a multiple bit signal wherein all zeroes may indicate a failed (down or BAD) node, all ones may indicate that no degradation was detected, and non-zero values (some zeroes and some ones) may indicate a level of degradation between failure and no degradation.
  • the degraded status data is then sent out 310 from the present node through the specialized hardware interface 204 to the next node in the cluster. For example, in the case of the topology of FIG.
  • node A 202 A would send its degraded status data to node B 202 B
  • node B 202 B would send its degraded status data to node C 202 C
  • node C 202 C would send its degraded status data to node D 202 D
  • node D 202 D would send its degraded status data to node A 202 A.
  • the process 300 then loops from the last step 310 to the first step 302 . Note that, although an exemplary order for the steps in the process 300 is shown, variations of the order are possible with same or similar result.
  • FIG. 4 is a flow chart depicting a cluster-wide management process 400 in accordance with an embodiment of the invention.
  • the process 400 involves steps performed at a present node and relates to the management of a previous node in the cluster. For example, node B 202 B would perform steps relating to the management of node A 202 A, node C 202 C would perform steps relating to the management of node B 202 B, and so on.
  • the process 400 is setup by retrieving 402 a configuration file for a previous node from that previous node, and storing 404 that configuration file at the present node.
  • the configuration file includes various information, such as the application(s) needing to be failed over from the previous node in the event that the node is removed from the cluster.
  • the present node checks 405 whether the configuration file for the previous node is up-to-date (i.e. has not been updated since it was last retrieved). If it is not up-to-date, then the process 400 loops back to the step where the file is retrieved 402 . If it is up-to-date, then the process 400 goes on to the following steps.
  • the node removal threshold may be determined from a ruleset of the cluster system. This threshold indicates to the system at which level of degradation will a node be proactively removed from the HA cluster.
  • the threshold may be set or varied by the user. The threshold may also vary depending on how many nodes have been already removed from the HA cluster.
  • the present node reads 408 the up/down status input received from the previous node.
  • the present node also reads 410 the degraded status input received from the previous node. Both the up/down status signal and the degraded status signal may be received via a dedicated communication link or cable 206 between the nodes.
  • the present node also checks 412 the heartbeat input received from the previous node. The heartbeat signal may be received by way of a conventional network 104 interconnecting the nodes.
  • the present node performs a logical analysis 414 using these status-related inputs.
  • the logical analysis 414 determines, for example, whether the inputs indicate that the preceding node is up, whether they indicate that the preceding node is down (failed), and whether they indicate that there is an interconnect problem.
  • One embodiment for the analysis procedure 414 is described below in relation to FIGS. 5A and 5B .
  • a determination 418 is made as to whether removal of the preceding node was indicated by the analysis. If the preceding node is to be removed, then a removal procedure 420 is run. One embodiment for the removal procedure 420 is described below in relation to FIG. 6 . Otherwise the management process 400 loops back to the step where a check 405 is made as to whether the configuration file for the preceding node has been updated.
  • FIGS. 5A and 5B are flow charts depicting a logical analysis procedure 414 in accordance with an embodiment of the invention.
  • the up/down status input from the previous node is denoted as UP_IN
  • the degraded status input from the previous node is denoted as DEGRADED_IN
  • the heartbeat input from the previous node is denoted as HEARTBEAT_IN.
  • UP_IN can be in two states, GOOD or BAD.
  • DEGRADED_IN can be in multiple degradation levels, including a BAD state, a GOOD state, and levels in between BAD and GOOD.
  • the HEARTBEAT_IN can be either OK or Bad.
  • the degradation level is reported and the analysis procedure is exited 510 . If the amount of degradation is above the removal threshold, then the performance of the previous node is deemed too poor to keep in the cluster. In that case, the previous node is “killed” 512 , then failure of the previous node is indicated and the analysis procedure is exited 514 .
  • both of these status inputs indicate that the previous node is down, so it does not matter what the heartbeat input indicates. In this case, failure is indicated and the analysis procedure exits 520 .
  • the performance level of the previous node is acceptable. In that case, the degradation level is reported and the analysis procedure is exited 510 . If the amount of degradation is above the removal threshold, then the performance of the previous node is deemed too poor to keep in the cluster. In that case, the previous node is “killed” 512 , then failure of the previous node is indicated and the analysis procedure is exited 514 .
  • FIG. 6 is a flow chart depicting a removal procedure 418 in accordance with an embodiment of the invention.
  • the removal procedure 418 is entered when the analysis 414 indicated failure of the previous node.
  • a determination 602 is made as to the application or applications on the previous node that need to be failed over. This information may be obtained, for example, from the above-discussed configuration file stored 404 at the present node. Fail over 604 is performed on these applications from the previous node to nodes of the cluster that are up and running. After the fail over is completed, success of the failover is signaled 606 to the other nodes of the cluster. The HA cluster is then running with the previous node removed.
  • a node join (or re-join) procedure may be applied.
  • the procedure is as follows. If the cables for independent status communications are not connected, then those status communications are temporarily suspended throughout the HA cluster. In other words, the HA cluster falls back to a heartbeat only mode. Next, the cables are connected to the new node. Then, the status communications via the cables are restarted in the cluster.
  • the node to join or re-join boots, it will start sending out GOOD signals through its specialized cable connection. At that point, the next node's cluster software will re-integrate the newly added node into the HA cluster.
  • the above disclosure provides a novel technique for a node in a high availability cluster to quickly and accurately determine each node's current state and to perform the appropriate action to maximize cluster uptime.
  • the use of the three status indicators (up/down, degraded, and heartbeat) from each node allows for significant improvement in the efficiency (i.e. uptime) of the HA cluster.

Abstract

One embodiment disclosed relates to a method of status generation for a node of a high-availability cluster. A heartbeat signal is sent from the node through a network to the cluster. In addition, a current status of the node is determined, and the status is sent out through a specialized interface to a next node. Another embodiment disclosed relates to a method of cluster-wide management performed per node. A heartbeat input received from the previous node is checked. Furthermore, an up/down status input received from the previous node and a degraded status input received from the previous node are also checked. Another embodiment disclosed relates to a system for of a high-availability cluster. The system includes a general inter-node communication network that is configured to carry signals including heartbeat signals from the nodes. In addition, a separate inter-node communication channel is included for communicating node status signals.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present disclosure relates generally to computer networks. More particularly, the present disclosure relates to clusters of interconnected computer systems.
  • 2. Description of the Background Art
  • A high-availability (HA) cluster is a parallel or distributed system that comprises a collection of interconnected computer systems or servers that is used as a single, unified computing unit. Members of a cluster are referred to as nodes or systems. The cluster service is the collection of software on each node that manages cluster-related activity. The cluster service sees all resources as identical objects. Resource may include physical hardware devices, such as disk drives and network cards, or logical items, such as logical disk volumes, TCP/IP addresses, entire applications and databases, among other examples. A group is a collection of resources to be managed as a single unit. Generally, a group contains all of the components that are necessary for running a specific application and allowing a user to connect to the service provided by the application. Operations performed on a group typically affect all resources contained within that group. By coupling two or more servers together, clustering increases the system availability, performance, and capacity for network systems and applications.
  • Clustering may be used for parallel processing or parallel computing to simultaneously use two or more CPUs to execute an application or program. Clustering is a popular strategy for implementing parallel processing applications because it allows system administrators to leverage already existing computers and workstations. Because it is difficult to predict the number of requests that will be issued to a networked server, clustering is also useful for load balancing to distribute processing and communications activity evenly across a network system so that no single server is overwhelmed. If one server is running the risk of being swamped, requests may be forwarded to another clustered server with greater capacity. For example, busy Web sites may employ two or more clustered Web servers in order to employ a load balancing scheme. Clustering also provides for increased scalability by allowing new components to be added as the system load increases. In addition, clustering simplifies the management of groups of systems and their applications by allowing the system administrator to manage an entire group as a single system. Clustering may also be used to increase the fault tolerance of a network system. If one server suffers an unexpected software or hardware failure, another clustered server may assume the operations of the failed server. Thus, if any hardware of software component in the system fails, the user might experience a performance penalty, but will not lose access to the service.
  • Current cluster services include Microsoft Cluster Server (MSCS), designed by Microsoft Corporation for clustering for its Windows NT 4.0 and Windows 2000 Advanced Server operating systems, and Novell Netware Cluster Services (NWCS), among other examples. For instance, MSCS supports the clustering of two NT servers to provide a single highly available server.
  • Clustering may also be implemented in computer networks utilizing storage area networks (SAN) and similar networking environments. SAN networks allow storage systems to be shared among multiple clusters and/or servers. The storage devices in a SAN may be structured, for example, in a RAID configuration.
  • In order to detect system failures, clustered nodes may use a heartbeat mechanism to monitor the health of each other. A heartbeat is a signal that is sent by one clustered node to another clustered node. Heartbeat signals are typically sent over an Ethernet or similar network, where the network is also utilized for other purposes.
  • Failure of a node is detected when an expected heartbeat signal is not received from the node. In the event of failure of a node, the clustering software may, for example, transfer the entire resource group of the failed node to another node. A client application affected by the failure may detect the failure in the session and reconnect in the same manner as the original connection.
  • If a heartbeat signal is received from a node of the cluster, then that node is normally defined to be in an “up” state. In the up state, the node is presumed to be operating properly. On the other hand, if the heartbeat signal is no longer received from a node, then that node is normally defined to be in a “down” state. In the down state, the node is presumed to have failed.
  • SUMMARY
  • One embodiment disclosed herein pertains to a method of status generation for a node of a high-availability cluster. A heartbeat signal is sent from the node through a network to the cluster. In addition, a current status of the node is determined, and the status is sent out through a specialized interface to a next node.
  • Another embodiment disclosed herein pertains to a method of cluster-wide management performed per node. A heartbeat input received from the previous node is checked. Furthermore, an up/down status input received from the previous node and a degraded status input received from the previous node are also checked.
  • Another embodiment disclosed herein pertains to a system for of a high-availability cluster. The system includes a general inter-node communication network that is configured to carry signals including heartbeat signals from the nodes. In addition, a separate inter-node communication channel is included for communicating node status signals.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a schematic diagram depicting a conventional high-availability cluster.
  • FIG. 2 is a schematic diagram depicting a representative high-availability cluster in accordance with an embodiment of the invention.
  • FIG. 3 is a flow chart depicting a node status generation process in accordance with an embodiment of the invention.
  • FIG. 4 is a flow chart depicting a cluster-wide management process in accordance with an embodiment of the invention.
  • FIGS. 5A and 5B are flow charts depicting a logical analysis procedure in accordance with an embodiment of the invention.
  • FIG. 6 is a flow chart depicting a removal procedure in accordance with an embodiment of the invention.
  • DETAILED DESCRIPTION
  • FIG. 1 is a schematic diagram depicting a conventional high-availability cluster 100. As shown, the conventional cluster 100 includes multiple nodes 102 and a network or network mesh 104 (typically an Ethernet network) interconnecting the nodes. For example, heartbeat signals are sent from the nodes to the cluster over the network 104.
  • In the conventional cluster, all nodes provide a heartbeat signal through an Ethernet (or other networking) interface. All nodes in the cluster monitor these signals. If a node determines (or several nodes determine) that a node has stopped sending heartbeat signals, then that node is “removed” from the HA cluster.
  • However, there are several problems and disadvantages with this conventional approach. First, the communications path uses relatively slow and high-overhead connections between the nodes (Ethernet, for instance). Second, the conventional approach defines an “up” state in which the node is sending heartbeat signals to the cluster, and a “down” state in which the node fails to generate these heartbeat signals. This is disadvantageous in that a node can still send heartbeats even if a target critical application is “down.” Third, a missed heartbeat signal is ambiguous in that it may be due to any number of causes (for instance, either node or interconnect failure). Fourth, there is no means for a node to send a predictive message to the remaining nodes in the cluster. Such a predictive message, for example, would allow for the HA cluster software to pro-actively remove a node before it fails, resulting in increased cluster uptime. The above problems and disadvantages result in inefficient cluster-level software and sub-optimum uptime.
  • It turns out that the efficiency (i.e. uptime) of an HA cluster is largely determined by the amount of time it takes for the cluster system to recognize that a node in the cluster is in a “down” state. A node is in a down state when it ceases in its ability to perform useful computing or storage functions for the HA cluster. Once it has been determined that a node is “down”, the HA clustering software can perform the necessary tasks to keep the rest of the cluster running, with little interruption of user tasks. The efficiency can also be limited by the number of unnecessary switchovers in an HA cluster, as each switchover event ‘costs’ some cluster-level uptime. Finally, the ‘split-brain’ situation should be avoided for an HA cluster to perform correctly. ‘Split brain’ is the situation (known by those skilled in the art) that results when a node that is thought to be ‘down’ really is not ‘down’. Such a situation can result in data loss and/or failure of an HA cluster. Accuracy in node state determination is key to assuring that ‘split brain’ does not occur in an HA cluster.
  • In HA clusters, the downtime may be represented by the following equation: Downtime per year = ( num_unplanned _yr ) * ( ave_unplanned _switchover _time ) + ( num_planned _yr ) * ( ave_planned _switchover _time ) + ( num_filed _switchovers _yr ) * ( fail_recovery _time )
    with the following definitions
      • num_unplanned_yr=the number of times a node in an HA cluster fails in a year
      • ave_unplanned_switchover_time=the average time for the HA cluster to “recover” from an unplanned node failure (i.e., a system crash or operating system panic)
      • num_planned_yr=the number of times a node is removed in a planned downtime event in a year
      • ave_planned_switchover_time=the average time for the HA cluster to “recover” from a planned node removal
      • num_failed_switchovers_yr=the number of times a switchover try “fails” and the cluster or critical application crashes
      • fail_recovery_time=the average time for the HA cluster to “recover” from a failed switchover
        Reducing the value of any of the above factors contributes to the uptime of an HA cluster. It turns out that all or most of the above factors are influenced by the cluster's ability to both accurately and rapidly determine the current state of any given node in the cluster and to deal with the current state with the appropriate actions.
  • The disclosure of the present application addresses some of the problems and disadvantages with the conventional approach. First, the number of “false” detections of node failures is reduced. This reduces num_unplanned_yr. Second, predictive means (degradation status signaling) is used to move some unplanned failures to planned switchovers. Moreover, since failed switchovers typically occur under unplanned (uncontrolled) circumstances, this also reduces the num_failed_switchovers_yr. Third, reducing the time to detect a node failure. This reduces the ave_unplanned_switchover_time. Finally, the very harmful ‘split-brain’ situation is avoided since the invention has built-in mechanisms for quicky and accurately double (or triple) checking node status when it looks like a node may be down.
  • FIG. 2 is a schematic diagram depicting a representative high-availability cluster 200 in accordance with an embodiment of the invention. Four nodes 202 are shown in the diagram, but various numbers of nodes may be used within the scope of the invention.
  • In addition to inter-node communications via the network 104, independent inter-node communications of status information are enabled by way of a separate communication channel. As shown, the separate communication channel may, for example, utilize additional signaling hardware circuitry 204 in each node to provide point-to-point links 206 in an exemplary ring topology. In the ring topology, each node 202 may send status information over a communication link 206 to the next node 202 in the ring (going clockwise in the illustrated example) and may receive status information over another link 206 from the previous node 202 in the ring. Advantageously, such a configuration having an additional channel for status communications allows for rapid and reliable exchange of node status data.
  • In one embodiment, cluster-level software runs on each node 202. Each node 202 may be configured to provide the following resources to the cluster-level software. Hardware resources include a processor or processors, a memory system, disk-based or similar data storage, an interface to the network 104 interconnecting the nodes, and the dedicated signaling hardware 204 for inter-node status signaling. Software resources includes routines to perform the following tasks: updating and transmitting the status of the present node 202; monitoring and analyzing status data from another node 202 in the cluster; and taking appropriate action based on the status data from the node 202 being monitored. The computational subsystem of the node may, or may not, be running a mission-critical software application. If it is, then the mission-critical application is listed in a configuration file of the cluster-level software.
  • In one embodiment, the node status signals may include the following: an up/down status signal; a degraded status signal, and a heartbeat signal. The heartbeat signal may be transmitted conventionally via the network 104 so that the heartbeat information of all nodes in the HA cluster is on the network 104. The up/down and degraded status signals may be transmitted and received separately via the additional signaling hardware 204 and independent communication links 206. For the up/down status signal, up (or GOOD) indicates that the node is operating, and down (or BAD) indicates that the node has failed. The degraded status signal may comprise a two-state signal having DEGRADED and NOT_DEGRADED states. Alternatively, the degraded status signal may include multiple degradation levels.
  • FIG. 3 is a flow chart depicting a node status generation process 300 in accordance with an embodiment of the invention. This process 300 occurs at each active node of the cluster.
  • Each active node determines 302 its current up/down status. This determination may be accomplished by applying rules in a rule file stored in memory or on disk at the present node. An up (or GOOD) status indicates that the node is operating, and a down (or BAD) status indicates that the present node has failed. The up/down status data is then sent out 304 from the present node through the specialized hardware interface 204 to the next node in the cluster. For example, in the case of the topology of FIG. 2, node A 202A would send its up/down status data to node B 202B, node B 202B would send its up/down status data to node C 202C, node C 202C would send its up/down status data to node D 202D, and node D 202D would send its up/down status data to node A 202A.
  • Each node also sends 306 its heartbeat signal to the cluster. This is conventionally done via the network 104.
  • Furthermore, each active node determines 308 its current degraded status (or level). This determination may be accomplished by applying rules in a rule file stored in memory or on disk at the present node. For example, the degraded levels may be indicated by a multiple bit signal wherein all zeroes may indicate a failed (down or BAD) node, all ones may indicate that no degradation was detected, and non-zero values (some zeroes and some ones) may indicate a level of degradation between failure and no degradation. The degraded status data is then sent out 310 from the present node through the specialized hardware interface 204 to the next node in the cluster. For example, in the case of the topology of FIG. 2, node A 202A would send its degraded status data to node B 202B, node B 202B would send its degraded status data to node C 202C, node C 202C would send its degraded status data to node D 202D, and node D 202D would send its degraded status data to node A 202A.
  • The process 300 then loops from the last step 310 to the first step 302. Note that, although an exemplary order for the steps in the process 300 is shown, variations of the order are possible with same or similar result.
  • FIG. 4 is a flow chart depicting a cluster-wide management process 400 in accordance with an embodiment of the invention. The process 400 involves steps performed at a present node and relates to the management of a previous node in the cluster. For example, node B 202B would perform steps relating to the management of node A 202A, node C 202C would perform steps relating to the management of node B 202B, and so on.
  • The process 400 is setup by retrieving 402 a configuration file for a previous node from that previous node, and storing 404 that configuration file at the present node. The configuration file includes various information, such as the application(s) needing to be failed over from the previous node in the event that the node is removed from the cluster.
  • Subsequent to the setup steps, the following steps are performed. The present node checks 405 whether the configuration file for the previous node is up-to-date (i.e. has not been updated since it was last retrieved). If it is not up-to-date, then the process 400 loops back to the step where the file is retrieved 402. If it is up-to-date, then the process 400 goes on to the following steps.
  • One of the steps involves setting 406 the node removal threshold. The node removal threshold may be determined from a ruleset of the cluster system. This threshold indicates to the system at which level of degradation will a node be proactively removed from the HA cluster. The threshold may be set or varied by the user. The threshold may also vary depending on how many nodes have been already removed from the HA cluster.
  • Other steps relate to reading or checking various inputs received from the preceding node. The present node reads 408 the up/down status input received from the previous node. The present node also reads 410 the degraded status input received from the previous node. Both the up/down status signal and the degraded status signal may be received via a dedicated communication link or cable 206 between the nodes. Furthermore, the present node also checks 412 the heartbeat input received from the previous node. The heartbeat signal may be received by way of a conventional network 104 interconnecting the nodes.
  • The present node performs a logical analysis 414 using these status-related inputs. The logical analysis 414 determines, for example, whether the inputs indicate that the preceding node is up, whether they indicate that the preceding node is down (failed), and whether they indicate that there is an interconnect problem. One embodiment for the analysis procedure 414 is described below in relation to FIGS. 5A and 5B.
  • After exiting from the analysis procedure 414, a determination 418 is made as to whether removal of the preceding node was indicated by the analysis. If the preceding node is to be removed, then a removal procedure 420 is run. One embodiment for the removal procedure 420 is described below in relation to FIG. 6. Otherwise the management process 400 loops back to the step where a check 405 is made as to whether the configuration file for the preceding node has been updated.
  • FIGS. 5A and 5B are flow charts depicting a logical analysis procedure 414 in accordance with an embodiment of the invention. In the figure, the up/down status input from the previous node is denoted as UP_IN, the degraded status input from the previous node is denoted as DEGRADED_IN, and the heartbeat input from the previous node is denoted as HEARTBEAT_IN. UP_IN can be in two states, GOOD or BAD. DEGRADED_IN can be in multiple degradation levels, including a BAD state, a GOOD state, and levels in between BAD and GOOD. The HEARTBEAT_IN can be either OK or Bad.
  • In a first case, a determination 502 is made that UP_IN=GOOD and DEGRADED_IN=not BAD (either GOOD or a level in between). If so, then the previous node is determined to be up (though perhaps degraded). The condition of HEARTBEAT_IN is then checked 504. If HEARTBEAT_IN=Bad, then the analysis 414 determines that the network connection that normally carries the heartbeat signal is down and reports 506 that the network to the previous node is down. If HEARTBEAT_IN=OK, then no such report is made. In either case, the level of DEGRADED_IN is compared 508 with the node removal threshold. If the amount of degradation is below the removal threshold, then the performance level of the previous node is acceptable. In that case, the degradation level is reported and the analysis procedure is exited 510. If the amount of degradation is above the removal threshold, then the performance of the previous node is deemed too poor to keep in the cluster. In that case, the previous node is “killed” 512, then failure of the previous node is indicated and the analysis procedure is exited 514.
  • In a second case, a determination 516 is made that UP_IN=GOOD and DEGRADED_IN=BAD. The condition of HEARTBEAT_IN is then checked 518. If HEARTBEAT_IN=Bad, then failure of the previous node is indicated (due to two of three inputs showing a down node) and the analysis procedure is exited 514. If HEARTBEAT_IN=OK, then the previous node is deemed to be running okay. In that case, a cable problem is reported (due to the non-matching degraded input) and the analysis procedure exits 520.
  • In a third case, a determination 522 is made that UP_IN=BAD and DEGRADED_IN=BAD. Here, both of these status inputs indicate that the previous node is down, so it does not matter what the heartbeat input indicates. In this case, failure is indicated and the analysis procedure exits 520.
  • In a fourth case, a determination 524 is made (by default since it's the last case) that UP_IN=BAD and DEGRADED_IN=not BAD. The condition of HEARTBEAT_IN is then checked 526. If HEARTBEAT_IN=Bad, then failure of the previous node is indicated (due to two of three inputs showing a down node) and the analysis procedure is exited 514. If HEARTBEAT_IN=OK, then the previous node is deemed to be running okay. In that case, a cable problem is reported 528 (due to the non-matching up/down input), and the level of DEGRADED_IN is compared 508 with the node removal threshold. If the amount of degradation is below the removal threshold, then the performance level of the previous node is acceptable. In that case, the degradation level is reported and the analysis procedure is exited 510. If the amount of degradation is above the removal threshold, then the performance of the previous node is deemed too poor to keep in the cluster. In that case, the previous node is “killed” 512, then failure of the previous node is indicated and the analysis procedure is exited 514.
  • FIG. 6 is a flow chart depicting a removal procedure 418 in accordance with an embodiment of the invention. The removal procedure 418 is entered when the analysis 414 indicated failure of the previous node.
  • A determination 602 is made as to the application or applications on the previous node that need to be failed over. This information may be obtained, for example, from the above-discussed configuration file stored 404 at the present node. Fail over 604 is performed on these applications from the previous node to nodes of the cluster that are up and running. After the fail over is completed, success of the failover is signaled 606 to the other nodes of the cluster. The HA cluster is then running with the previous node removed.
  • When a node is getting added (or re-added) to the HA cluster, a node join (or re-join) procedure may be applied. The procedure is as follows. If the cables for independent status communications are not connected, then those status communications are temporarily suspended throughout the HA cluster. In other words, the HA cluster falls back to a heartbeat only mode. Next, the cables are connected to the new node. Then, the status communications via the cables are restarted in the cluster. When the node to join or re-join boots, it will start sending out GOOD signals through its specialized cable connection. At that point, the next node's cluster software will re-integrate the newly added node into the HA cluster.
  • The above disclosure provides a novel technique for a node in a high availability cluster to quickly and accurately determine each node's current state and to perform the appropriate action to maximize cluster uptime. The use of the three status indicators (up/down, degraded, and heartbeat) from each node allows for significant improvement in the efficiency (i.e. uptime) of the HA cluster.
  • In the above description, numerous specific details are given to provide a thorough understanding of embodiments of the invention. However, the above description of illustrated embodiments of the invention is not intended to be exhaustive or to limit the invention to the precise forms disclosed. One skilled in the relevant art will recognize that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures or operations are not shown or described in detail to avoid obscuring aspects of the invention. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
  • These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims. Rather, the scope of the invention is to be determined by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.

Claims (20)

1. (canceled)
2. (canceled)
3. (canceled)
4. (canceled)
5. (canceled)
6. A method of cluster-wide management performed per node, the method comprising:
checking an up/down status input received from a previous node;
checking a degraded status input received from the previous node, wherein the degraded status input comprises multiple degradation levels with one such level comprising a “bad” state indicating that the previous node appears down;
checking a heartbeat input received from the previous node; and
comparing the degraded status with a node removal threshold for potential removal of the previous node from the cluster if the degraded status shows degradation above the threshold.
7. (canceled)
8. The method of claim 6, further comprising:
determining whether a configuration file at the previous node has been changed; and
if the configuration file has been changed, then retrieving the configuration file from the previous node and storing the retrieved configuration file at the present node.
9. The method of claim 6, further comprising:
performing a logical analysis of the inputs to determine whether a failure of the previous node is indicated.
10. The method of claim 9, wherein the logical analysis comprises determining a failure of the previous node if a majority of the status inputs indicates that the previous node appears down.
11. The method of claim 9, wherein the logical analysis differentiates between the failure of the previous node and a failure of an inter-node communication channel.
12. The method of claim 11, wherein the logical analysis further differentiates between a problem with a first inter-node communication channel and a problem with a second inter-node communication channel.
13. The method of claim 12, wherein the first inter-node communication channel comprises a point-to-point link dedicated for node status information, and wherein the second inter-node communication channel comprises a network for carrying heartbeat signals and other communications.
14. The method of claim 7, further comprising reporting that a network carrying the heartbeat is down if the heartbeat is bad and the two status inputs are not both bad.
15. The method of claim 7, further comprising reporting a problem with an inter-node communication channel carrying the status inputs if the heartbeat is okay and one, but not both, of the two status inputs is bad.
16. (canceled)
17. A system for of a high-availability cluster, the system comprising:
a general inter-node communication network that is configured to carry signals including heartbeat signals from the nodes; and
a separate inter-node communication channel for communicating node status signals including at least an up/down status signal and a degraded status signal,
wherein the degraded status signal is compared with a node removal threshold for potential removal of a node from the cluster if the degraded status signal shows degradation above the threshold.
18. (canceled)
19. The system of claim 18, wherein the system is configured with a logical analysis procedure that differentiates between a failure of a node and a problem with inter-node communication.
20. The system of claim 19, wherein the logical analysis further differentiates between a problem with the general inter-node communication network and a problem with the separate inter-node communication channel.
US10/764,244 2004-01-23 2004-01-23 Node management in high-availability cluster Expired - Lifetime US6928589B1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US10/764,244 US6928589B1 (en) 2004-01-23 2004-01-23 Node management in high-availability cluster
GB0501119A GB2410406B (en) 2004-01-23 2005-01-19 Node management in high-availability cluster
JP2005012196A JP2005209201A (en) 2004-01-23 2005-01-20 Node management in high-availability cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/764,244 US6928589B1 (en) 2004-01-23 2004-01-23 Node management in high-availability cluster

Publications (2)

Publication Number Publication Date
US6928589B1 US6928589B1 (en) 2005-08-09
US20050188283A1 true US20050188283A1 (en) 2005-08-25

Family

ID=34227105

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/764,244 Expired - Lifetime US6928589B1 (en) 2004-01-23 2004-01-23 Node management in high-availability cluster

Country Status (3)

Country Link
US (1) US6928589B1 (en)
JP (1) JP2005209201A (en)
GB (1) GB2410406B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060053336A1 (en) * 2004-09-08 2006-03-09 Pomaranski Ken G High-availability cluster node removal and communication
US20060179270A1 (en) * 2005-02-07 2006-08-10 International Business Machines Corporation All row, planar fault detection system
US20080320330A1 (en) * 2005-02-07 2008-12-25 International Business Machines Corporation Row fault detection system
US20100262871A1 (en) * 2007-10-03 2010-10-14 William Bain L Method for implementing highly available data parallel operations on a computational grid
US20100318835A1 (en) * 2005-02-07 2010-12-16 International Business Machines Corporation Bisectional fault detection system
US20110010383A1 (en) * 2009-07-07 2011-01-13 Thompson Peter C Systems and methods for streamlining over-the-air and over-the-wire device management
US20110179170A1 (en) * 2010-01-15 2011-07-21 Andrey Gusev "Local Resource" Type As A Way To Automate Management Of Infrastructure Resources In Oracle Clusterware
US20110179172A1 (en) * 2010-01-15 2011-07-21 Oracle International Corporation Dispersion dependency in oracle clusterware
US20110179173A1 (en) * 2010-01-15 2011-07-21 Carol Colrain Conditional dependency in a computing cluster
US20110179169A1 (en) * 2010-01-15 2011-07-21 Andrey Gusev Special Values In Oracle Clusterware Resource Profiles
US20110179428A1 (en) * 2010-01-15 2011-07-21 Oracle International Corporation Self-testable ha framework library infrastructure
US10154086B1 (en) * 2010-12-23 2018-12-11 EMC IP Holding Company LLC Distributed consumer cloud storage system
CN114844799A (en) * 2022-05-27 2022-08-02 深信服科技股份有限公司 Cluster management method and device, host equipment and readable storage medium

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003345531A (en) * 2002-05-24 2003-12-05 Hitachi Ltd Storage system, management server, and its application managing method
DE10331485A1 (en) * 2003-07-11 2005-02-10 Marconi Communications Gmbh Back-plane with wired coupling between pluggable sites e.g. for communication -technical systems, uses at least two positions of contacts at pluggable sites
US8009556B2 (en) * 2003-10-17 2011-08-30 Ip Infusion, Inc. System and method for providing redundant routing capabilities for a network node
US7539755B2 (en) * 2006-04-24 2009-05-26 Inventec Corporation Real-time heartbeat frequency regulation system and method utilizing user-requested frequency
US20090172155A1 (en) * 2008-01-02 2009-07-02 International Business Machines Corporation Method and system for monitoring, communicating, and handling a degraded enterprise information system
JP4941439B2 (en) * 2008-09-22 2012-05-30 日本電気株式会社 Method for identifying cause of performance degradation in cluster system, cluster system
BRPI1008451A2 (en) * 2009-02-18 2016-02-23 Commw Scient Ind Res Org method and apparatus for displaying a masked bit pulse signal
US8154992B2 (en) * 2009-08-11 2012-04-10 Google Inc. System and method for graceful restart
US20110289342A1 (en) * 2010-05-21 2011-11-24 Schaefer Diane E Method for the file system of figure 7 for the cluster
US9032053B2 (en) * 2010-10-29 2015-05-12 Nokia Corporation Method and apparatus for upgrading components of a cluster
US20120324456A1 (en) * 2011-06-16 2012-12-20 Microsoft Corporation Managing nodes in a high-performance computing system using a node registrar
US9772784B2 (en) 2011-08-10 2017-09-26 Nutanix, Inc. Method and system for maintaining consistency for I/O operations on metadata distributed amongst nodes in a ring structure
US8984325B2 (en) * 2012-05-30 2015-03-17 Symantec Corporation Systems and methods for disaster recovery of multi-tier applications
KR101348453B1 (en) 2012-10-30 2014-01-09 한양대학교 산학협력단 Packet scheduling method and apparatus in high avalability seamless
US20150154048A1 (en) * 2013-12-04 2015-06-04 International Business Machines Corporation Managing workload to provide more uniform wear among components within a computer cluster
US9450852B1 (en) * 2014-01-03 2016-09-20 Juniper Networks, Inc. Systems and methods for preventing split-brain scenarios in high-availability clusters
US9590843B2 (en) 2014-03-12 2017-03-07 Nutanix, Inc. Method and system for providing distributed management in a networked virtualization environment
EP3140734B1 (en) 2014-05-09 2020-04-08 Nutanix, Inc. Mechanism for providing external access to a secured networked virtualization environment
US9733958B2 (en) * 2014-05-15 2017-08-15 Nutanix, Inc. Mechanism for performing rolling updates with data unavailability check in a networked virtualization environment for storage management
US9740472B1 (en) * 2014-05-15 2017-08-22 Nutanix, Inc. Mechanism for performing rolling upgrades in a networked virtualization environment
US10642507B2 (en) 2015-01-30 2020-05-05 Nutanix, Inc. Pulsed leader consensus management
US11218418B2 (en) 2016-05-20 2022-01-04 Nutanix, Inc. Scalable leadership election in a multi-processing computing environment
US10362092B1 (en) 2016-10-14 2019-07-23 Nutanix, Inc. Entity management in distributed systems
US11194680B2 (en) 2018-07-20 2021-12-07 Nutanix, Inc. Two node clusters recovery on a failure
US11770447B2 (en) 2018-10-31 2023-09-26 Nutanix, Inc. Managing high-availability file servers
US11768809B2 (en) 2020-05-08 2023-09-26 Nutanix, Inc. Managing incremental snapshots for fast leader node bring-up
CN111698132B (en) * 2020-06-12 2022-03-01 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for controlling heartbeat events in a cluster

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5390326A (en) * 1993-04-30 1995-02-14 The Foxboro Company Local area network with fault detection and recovery
US5898827A (en) * 1996-09-27 1999-04-27 Hewlett-Packard Co. Routing methods for a multinode SCI computer system
US6072857A (en) * 1996-12-19 2000-06-06 Bellsouth Intellectual Property Management Corporation Methods and system for monitoring the operational status of a network component in an advanced intelligent network
US6304546B1 (en) * 1996-12-19 2001-10-16 Cisco Technology, Inc. End-to-end bidirectional keep-alive using virtual circuits
US6389551B1 (en) * 1998-12-17 2002-05-14 Steeleye Technology, Inc. Method of preventing false or unnecessary failovers in a high availability cluster by using a quorum service
US6389550B1 (en) * 1998-12-23 2002-05-14 Ncr Corporation High availability protocol computing and method
US6609213B1 (en) * 2000-08-10 2003-08-19 Dell Products, L.P. Cluster-based system and method of recovery from server failures
US20030158936A1 (en) * 2002-02-15 2003-08-21 International Business Machines Corporation Method for controlling group membership in a distributed multinode data processing system to assure mutually symmetric liveness status indications
US6654801B2 (en) * 1999-01-04 2003-11-25 Cisco Technology, Inc. Remote system administration and seamless service integration of a data communication network management system
US6725401B1 (en) * 2000-10-26 2004-04-20 Nortel Networks Limited Optimized fault notification in an overlay mesh network via network knowledge correlation
US6728896B1 (en) * 2000-08-31 2004-04-27 Unisys Corporation Failover method of a simulated operating system in a clustered computing environment
US20040100971A1 (en) * 2000-05-09 2004-05-27 Wray Stuart Charles Communication system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6347335B1 (en) * 1995-09-22 2002-02-12 Emc Corporation System using a common and local event logs for logging event information generated by plurality of devices for determining problem in storage access operations
US6502203B2 (en) * 1999-04-16 2002-12-31 Compaq Information Technologies Group, L.P. Method and apparatus for cluster system operation
WO2004107196A1 (en) * 2003-05-27 2004-12-09 Nokia Corporation Data collection in a computer cluster

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5390326A (en) * 1993-04-30 1995-02-14 The Foxboro Company Local area network with fault detection and recovery
US5898827A (en) * 1996-09-27 1999-04-27 Hewlett-Packard Co. Routing methods for a multinode SCI computer system
US6072857A (en) * 1996-12-19 2000-06-06 Bellsouth Intellectual Property Management Corporation Methods and system for monitoring the operational status of a network component in an advanced intelligent network
US6304546B1 (en) * 1996-12-19 2001-10-16 Cisco Technology, Inc. End-to-end bidirectional keep-alive using virtual circuits
US6389551B1 (en) * 1998-12-17 2002-05-14 Steeleye Technology, Inc. Method of preventing false or unnecessary failovers in a high availability cluster by using a quorum service
US6389550B1 (en) * 1998-12-23 2002-05-14 Ncr Corporation High availability protocol computing and method
US6654801B2 (en) * 1999-01-04 2003-11-25 Cisco Technology, Inc. Remote system administration and seamless service integration of a data communication network management system
US20040100971A1 (en) * 2000-05-09 2004-05-27 Wray Stuart Charles Communication system
US6609213B1 (en) * 2000-08-10 2003-08-19 Dell Products, L.P. Cluster-based system and method of recovery from server failures
US6728896B1 (en) * 2000-08-31 2004-04-27 Unisys Corporation Failover method of a simulated operating system in a clustered computing environment
US6725401B1 (en) * 2000-10-26 2004-04-20 Nortel Networks Limited Optimized fault notification in an overlay mesh network via network knowledge correlation
US20030158936A1 (en) * 2002-02-15 2003-08-21 International Business Machines Corporation Method for controlling group membership in a distributed multinode data processing system to assure mutually symmetric liveness status indications

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7664994B2 (en) * 2004-09-08 2010-02-16 Hewlett-Packard Development Company, L.P. High-availability cluster node removal and communication
US20060053336A1 (en) * 2004-09-08 2006-03-09 Pomaranski Ken G High-availability cluster node removal and communication
US8112658B2 (en) * 2005-02-07 2012-02-07 International Business Machines Corporation Row fault detection system
US20060179270A1 (en) * 2005-02-07 2006-08-10 International Business Machines Corporation All row, planar fault detection system
US20080320330A1 (en) * 2005-02-07 2008-12-25 International Business Machines Corporation Row fault detection system
US20100318835A1 (en) * 2005-02-07 2010-12-16 International Business Machines Corporation Bisectional fault detection system
US8495411B2 (en) 2005-02-07 2013-07-23 International Business Machines Corporation All row, planar fault detection system
US8117502B2 (en) 2005-02-07 2012-02-14 International Business Machines Corporation Bisectional fault detection system
US20100262871A1 (en) * 2007-10-03 2010-10-14 William Bain L Method for implementing highly available data parallel operations on a computational grid
US9880970B2 (en) * 2007-10-03 2018-01-30 William L. Bain Method for implementing highly available data parallel operations on a computational grid
US20110010383A1 (en) * 2009-07-07 2011-01-13 Thompson Peter C Systems and methods for streamlining over-the-air and over-the-wire device management
US20110179169A1 (en) * 2010-01-15 2011-07-21 Andrey Gusev Special Values In Oracle Clusterware Resource Profiles
US20110179428A1 (en) * 2010-01-15 2011-07-21 Oracle International Corporation Self-testable ha framework library infrastructure
US20110179173A1 (en) * 2010-01-15 2011-07-21 Carol Colrain Conditional dependency in a computing cluster
US20110179172A1 (en) * 2010-01-15 2011-07-21 Oracle International Corporation Dispersion dependency in oracle clusterware
US8949425B2 (en) 2010-01-15 2015-02-03 Oracle International Corporation “Local resource” type as a way to automate management of infrastructure resources in oracle clusterware
US9069619B2 (en) 2010-01-15 2015-06-30 Oracle International Corporation Self-testable HA framework library infrastructure
US9098334B2 (en) 2010-01-15 2015-08-04 Oracle International Corporation Special values in oracle clusterware resource profiles
US9207987B2 (en) 2010-01-15 2015-12-08 Oracle International Corporation Dispersion dependency in oracle clusterware
US20110179170A1 (en) * 2010-01-15 2011-07-21 Andrey Gusev "Local Resource" Type As A Way To Automate Management Of Infrastructure Resources In Oracle Clusterware
US10154086B1 (en) * 2010-12-23 2018-12-11 EMC IP Holding Company LLC Distributed consumer cloud storage system
CN114844799A (en) * 2022-05-27 2022-08-02 深信服科技股份有限公司 Cluster management method and device, host equipment and readable storage medium

Also Published As

Publication number Publication date
GB0501119D0 (en) 2005-02-23
US6928589B1 (en) 2005-08-09
GB2410406A (en) 2005-07-27
GB2410406B (en) 2007-03-07
JP2005209201A (en) 2005-08-04

Similar Documents

Publication Publication Date Title
US6928589B1 (en) Node management in high-availability cluster
US11106388B2 (en) Monitoring storage cluster elements
US6609213B1 (en) Cluster-based system and method of recovery from server failures
Oliner et al. What supercomputers say: A study of five system logs
US6757836B1 (en) Method and apparatus for resolving partial connectivity in a clustered computing system
US20030158933A1 (en) Failover clustering based on input/output processors
GB2410346A (en) Multi-state status reporting for high-availability cluster nodes
US9450700B1 (en) Efficient network fleet monitoring
US8117487B1 (en) Method and apparatus for proactively monitoring application health data to achieve workload management and high availability
US20030065760A1 (en) System and method for management of a storage area network
GB2418040A (en) Monitoring a high availability cluster using a smart card
US7464302B2 (en) Method and apparatus for expressing high availability cluster demand based on probability of breach
JP4045282B2 (en) Removing and communicating highly available cluster nodes
US8489721B1 (en) Method and apparatus for providing high availabilty to service groups within a datacenter
US11349964B2 (en) Selective TCP/IP stack reconfiguration
US7228462B2 (en) Cluster node status detection and communication
Lundin et al. Significant advances in Cray system architecture for diagnostics, availability, resiliency and health
KR100753565B1 (en) High availability system and his task devide takeover method
Sharifi et al. Failure Prediction Mechanisms in Cluster Systems
CN112564968A (en) Fault processing method, device and storage medium
Soi On reliability of a computer network
Janardhanan et al. Highly Resilient Network Elements
KR20030056290A (en) A Process error Recovery Technique by the Duplication System and Process

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:POMARANSKI, KEN GARY;BARR, ANDREW HARVEY;REEL/FRAME:014930/0820

Effective date: 20040122

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

FPAY Fee payment

Year of fee payment: 12