US20040168168A1 - Method for hibernation of host channel adaptors in a cluster - Google Patents

Method for hibernation of host channel adaptors in a cluster Download PDF

Info

Publication number
US20040168168A1
US20040168168A1 US10/787,883 US78788304A US2004168168A1 US 20040168168 A1 US20040168168 A1 US 20040168168A1 US 78788304 A US78788304 A US 78788304A US 2004168168 A1 US2004168168 A1 US 2004168168A1
Authority
US
United States
Prior art keywords
channel adaptor
host channel
hibernation
host
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/787,883
Inventor
Rajesh Shah
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/787,883 priority Critical patent/US20040168168A1/en
Publication of US20040168168A1 publication Critical patent/US20040168168A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode

Definitions

  • the present invention is directed to a method for hibernation of a computer system. More particularly, the present invention is directed to a method for hibernation of a host channel adaptor for a computer system included in a cluster arrangement.
  • a sleep state Another capability which is desirable to have in a computer system is a sleep state. That is, if the system or parts of the system are not being utilized, it is desirable to remove power from these devices, to the extent possible. This lowers the amount of power being consumed which makes the operating cost of the system less and also means that less heat is being produced, which improves the operation of the device. In addition, it also means that the system is in an active state for a much shorter time, which reduces the “wear and tear” on the device.
  • Some systems utilize various kinds of sleep states already. For example, the Windows 2000 system permits a sleep state. However, such systems do not typically describe how to utilize a sleep state for an individual system when it is connected in a cluster arrangement with other systems.
  • FIG. 1 is an example block diagram of an example cluster system having an advantageous arrangement
  • FIG. 2 is a diagram of a software arrangement for a host in an example system, such as shown in FIG. 1.
  • FIG. 3 is an example flow chart of an example system having an advantageous arrangement of the present invention
  • FIG. 4 is an example flow chart of an example system having an advantageous arrangement of the present invention.
  • FIG. 1 shows the cluster arrangement 10 as including four hosts 11 , 12 , 13 , 14 , I/O enclosures 15 and 16 and three switches 20 , 21 and 22 . While the example shown in FIG. 1 includes four hosts, two I/O enclosures and three switches, in fact any number of similar devices may be combined in such a cluster.
  • Each host is a computer system such as a server or other computer system.
  • the I/O enclosures may support a simple device such as display or printer, or may involve a number of connections to other systems such as connections to networks through either metallic or fiber optic connections.
  • Each host and I/O enclosure contains at least one channel adaptor. This is a device which terminates a link to a switch of the cluster and thus is the device through which each system is connected to the cluster.
  • channel adaptors When channel adaptors are part of a host system, they are referred to as host channel adaptors and when part of an I/O enclosure, they are referred to as target channel adaptors.
  • Each host can have one or more host channel adaptors with each host channel adaptor (HCA) having one or more ports, and each port being connected to a different switch. This arrangement of switches and their connections to the various ports of the different channel adaptors is referred to as an interconnection fabric.
  • This arrangement of the cluster allows the individual channel adaptors to talk to the channel adaptors in other units through the switches and through the interconnection fabric. As a result, it is possible for different hardware and software arrangements from different providers to interact.
  • the adaptors and switches utilize various protocols in order to make the various units compatible.
  • the cluster arrangement is controlled by the use of a manager unit which keeps track of the topology of the fabric and assigns addresses to the various ports and controls how the data is switched.
  • the manager may be one of the host systems as shown in FIG. 1 or may be a separate unit connected in a similar fashion.
  • Each host channel adaptor is controlled by a software device driver stack that runs under the host operating system.
  • This driver stack is a collection of one or more device drivers and resides in a program within the host system. This driver stack provides the possibility of causing the host channel adaptor to hibernate and also to resume operations.
  • the hibernation operation is a sleep state where power is removed from all of the components of the system.
  • the system When the hibernation is ended, the system must resume its operations where it left off. This is accomplished by storing the memory contents to disk, saving the registered state of all processors, saving the settings for applications and saving the hardware state of all I/O controllers and peripheral devices. Once power is restored, applications are resumed from the point where they left off. This is done in such a way that applications do not need to be re-launched and do not go through a new initialization sequence.
  • the present invention outlines a procedure that can be used to place a host in a hibernated state even while connected in a cluster arrangement. Also, the present method of hibernation can be used even if the host operating system for the host does not support the ability to hibernate the entire system. Thus, the HCA software stack can be utilized to control the hibernation rather than the host operating system. If the HCA is inserted in a hot-plug slot on a host system, the procedure can also be used to hibernate and power down just the HCA.
  • the ability to hibernate allows a part of the system to be replaced while other parts of the cluster are active. Thus, it sometimes happens that one part of the hardware of a system is marginal or faulty. These components can be replaced with identical hardware while the system is in hibernation since all of the operating data is stored before hibernation.
  • the present method provides a system for recognizing that new hardware has been installed and for allowing the switchover if it is possible.
  • FIG. 2 shows an example of an arrangement of software within a host.
  • the host 30 is shown as having two host channel adaptors including hardware for the first adaptor 31 and hardware 32 for the second adaptor.
  • Each adaptor also has software 33 and 34 for controlling the adaptor hardware.
  • the HCA drivers 33 and 34 take account of the register layout of their associated hardware and provide a procedure to save and restore the associated hardware state.
  • each HCA driver manages hardware that is responsible for saving and restoring relevant hardware context before and after a hibernation procedure.
  • the cluster transport driver 35 is also included in the host and provides uniform and abstracted access to the HCA service for clients that use the services of the host.
  • FIG. 2 shows two such clients, a driver stack for fabric-attached storage controller 36 and the driver stack for fabric-attached network controller 37 .
  • the fabric-attached storage controller are input/output controllers that are attached to units up through the fabric. Thus, devices such as hard disks, tapes, CD ROMS, etc., can be attached to the storage I/O controller.
  • the fabric-attached network controller connects to a local area network and provides an avenue to connect a cluster on one side to a local area network on the other side. Thus, when starting the hibernation procedure, it is necessary for the HCA to take into account the clients such as these two driver stacks which are connected to the host before hibernation can proceed.
  • the HCA driver operates in a fashion similar to other devices that are responsible for saving and restoring the hardware context upon hibernation. However, because of the interconnections through the cluster fabric arrangement, this type of HCA driver has additional responsibilities. First, the HCA driver must inform the subnet manager that it is going into hibernation. This informs the manager that it is being disconnected in an orderly manner rather than due to an error. In this way, the manager knows that the HCA will resume operations at some point in the future and will need to be reinitialized at that time. The manager reserves the local identifiers assigned to the HCA and will not assign them to any new HCA that is installed during hibernation.
  • the original HCA must be assigned the same local identifiers so that it can function without interruption when it resumes operation. If different local identifiers were assigned, the HCA driver would have to terminate all existing connections and existing clients.
  • the manager must also keep forwarding entries for the hibernating HCA intact and switchboarding tables. This is so that the hibernating HCA may have the capability of coming out of hibernation automatically when a packet arrives for it. If switchboarding table entries were removed, it would mean that no packets could reach the hibernating HCA to wake it up.
  • the HCA driver When the HCA first comes out of hibernation, the HCA driver must first check to see if it can attempt to resume operations. First, it must check to see whether the HCA is the same one as before hibernation. That is, it may have been replaced by a new one of identical capability if the original was marginal or faulty. If they are different, the driver must determine whether there are differences and whether the differences can be managed. Some differences require the driver to immediately flag a non-recoverable error. This would be the case, for example, if an unknown HCA was built by a different manufacturer and the detected unimplemented functionality cannot be emulated.
  • the driver must hide the differences between the two.
  • each port has a unique identifier and the HCA has a node identifier
  • identical HCAs from the same vendor will have different identifiers for the port and node.
  • These identifiers are used to direct communication between units attached through a fabric and must be the same both before and after hibernation.
  • the driver must use the identifiers from the pre-hibernation unit, rather than the post-hibernation unit.
  • the subnet manager detects the operation of the HCA, and queries for the identifiers, the HCA driver must report the old ones.
  • the manager it will appear that the old unit has returned from hibernation. This will indicate to the manager that the local identifiers should be assigned as before hibernation. If for some reason, these cannot be assigned, an unrecoverable error is flagged. If the new HCA supports more capabilities than the old one, the driver must make sure that the old capabilities are provided as before and that clients using the existing HCA capabilities are not effected by the new capabilities. If the HCA driver is not able to restore the entire context to the pre-hibernation state, it must decide whether to fail to resume or to force some clients to reinitialize.
  • the remote end may have attempted to communicate with the HCA and may have timed out if the HCA remained hibernated longer than the time out.
  • the remote client may have changed its situation and the HCA driver may not be able to fully restore the context to the pre-hibernation state.
  • the driver must decide if it should not resume operations or force the client to reinitialize. Should the driver issue an error status indicator, the entire system will fail to resume operations. On the other hand, if the driver successfully returns from hibernation, but forces some clients to reinitialize, the rest of the system can still also resume operations.
  • FIG. 3 shows a flow chart of the steps used when a request to hibernate arrives.
  • an HCA can go into hibernation only after its clients have also hibernated first. These specific clients are the kernel mode clients of the HCA and not just user mode applications that use the services indirectly.
  • FIG. 2 shows the direct clients of the driver as the cluster transport driver and the other fabric attached controllers.
  • the operating system issues a hibernation request, it does so in top down order. This means that the driver stacks for the storage and network drivers must go into hibernation first followed by the transport driver and then the HCA driver. This ascertains that the clients do not make HCA service requests while the driver is in the process of saving context and putting HCA to sleep.
  • step 40 shows the arrival of the request to hibernate.
  • active clients are investigated to see if they are still using the HCA. If they are, these clients need to hibernate first and a failure is indicated as shown in step 42 . If the active clients are not using the HCA, the manager is informed that this HCA will be hibernating as indicated in step 43 .
  • the data is sent to memory.
  • a return success is indicated.
  • FIG. 4 is a flow chart showing the steps executed by the driver when it resumes operation after hibernation. This procedure follows in the reverse or bottom up order. This means that the HCA driver is awakened first, followed by the transport driver and then the storage and network driver stacks. This ensures that HCA services are available when clients again resume operations.
  • FIG. 4 shows the arrival of the request to resume in step 50 .
  • step 51 the HCAs are examined to see if they are from the same manufacturer. If not, an error is indicated in step 59 . If they are the same, the differences are investigated in step 52 . If the differences are manageable, the method proceeds to step 53 . If not, an error is flagged in step 59 .
  • step 53 the manager initializes the HCA.
  • step 54 the value is assigned by the manager or examined to see if differences have been generated. If they have been, an unrecoverable error is flagged in step 59 . If not, the process proceeds to step 55 where the hardware state is restored.
  • step 56 the HCA is examined to see if the entire context was restored. If yes, a successful return is indicated in step 60 . If not, step 57 examines whether the entire system should be failed or not. If yes, an unrecoverable error is flagged in step 59 . If not, the clients are identified and a terminal error of events must be reported in step 58 .
  • the HCA can cause a hibernation even if the host is involved with a cluster through an interconnection fabric.
  • a procedure is provided to determine if parts have been changed and how to handle the differences.

Abstract

A method for causing a host channel adaptor which is involved with a clustered arrangement to hibernate. Before the HCA can hibernate, it is necessary for its clients to hibernate first. Once this is accomplished, all data is stored in memory and the HCA goes into hibernation. It resumes operation when a request is received. The HCA is checked to see if it has been changed and various parameters are examined to determine if an error has occurred which is unrecoverable. If not, the operation of the device is resumed.

Description

    FIELD
  • The present invention is directed to a method for hibernation of a computer system. More particularly, the present invention is directed to a method for hibernation of a host channel adaptor for a computer system included in a cluster arrangement. [0001]
  • BACKGROUND
  • In the computer field, many different types of platforms have been developed by different manufacturers which are not completely compatible with one another. Also, different operating systems which have been developed by different software companies may be utilized in similar platforms. As long as the individual equipment operates only by itself, this is never a problem. However, when systems must interact, it is necessary to have some architecture which allows for the different systems to interact. [0002]
  • While this is possible on a very large scale in an arrangement such as the internet, it is also desirable that it be available on a much smaller scale wherein small numbers of. systems can operate together in a cluster. Various attempts have been made to provide such a subnetwork arrangement so that various computer systems and input/output arrangements can interact and work together. [0003]
  • Another capability which is desirable to have in a computer system is a sleep state. That is, if the system or parts of the system are not being utilized, it is desirable to remove power from these devices, to the extent possible. This lowers the amount of power being consumed which makes the operating cost of the system less and also means that less heat is being produced, which improves the operation of the device. In addition, it also means that the system is in an active state for a much shorter time, which reduces the “wear and tear” on the device. Some systems utilize various kinds of sleep states already. For example, the Windows [0004] 2000 system permits a sleep state. However, such systems do not typically describe how to utilize a sleep state for an individual system when it is connected in a cluster arrangement with other systems.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing and a better understanding of the present invention will become apparent from the following detailed description of example embodiments and the claims when read in connection with the accompanying drawings, all forming a part of the disclosure of this invention. While the foregoing and following written and illustrated disclosure focuses on disclosing example embodiments of the invention, it should be clearly understood that the same is by way of illustration and example only and the invention is not limited thereto. The spirit and scope of the present invention are limited only by the terms of the appended claims. [0005]
  • The following represents brief descriptions of the drawings, wherein: [0006]
  • FIG. 1 is an example block diagram of an example cluster system having an advantageous arrangement; [0007]
  • FIG. 2 is a diagram of a software arrangement for a host in an example system, such as shown in FIG. 1. [0008]
  • FIG. 3 is an example flow chart of an example system having an advantageous arrangement of the present invention; [0009]
  • FIG. 4 is an example flow chart of an example system having an advantageous arrangement of the present invention.[0010]
  • DETAILED DESCRIPTION
  • Before beginning a detailed description of the subject invention, mention of the following is in order. When appropriate, like reference numerals and characters may be used to designate identical, corresponding or similar components in differing figure drawings. Further, in the detailed description to follow, example sizes/models/values/ranges may be given, although the present invention is not limited to the same. As a final note, well known power/ground connections to ICs and other components may not be shown within the FIGS. for simplicity of illustration and discussion, and so as not to obscure the invention. Further, arrangements may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements is highly dependent upon the platform within which the present invention is to be implemented, i.e., specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits, flowcharts) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without these specific details. Finally, it should be apparent that any combination of hard-wired circuitry and software instructions can be used to implement embodiments of the present invention, i.e., the present invention is not limited to any specific combination of hardware circuitry and software instructions. [0011]
  • Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment. [0012]
  • In an effort to form a cluster through which different systems can interact, an arrangement such as shown in FIG. 1 has been suggested. In this arrangement, a number of hosts and I/O devices have been interconnected using a series of switches. Thus, FIG. 1 shows the [0013] cluster arrangement 10 as including four hosts 11, 12, 13, 14, I/ O enclosures 15 and 16 and three switches 20, 21 and 22. While the example shown in FIG. 1 includes four hosts, two I/O enclosures and three switches, in fact any number of similar devices may be combined in such a cluster. Each host is a computer system such as a server or other computer system. The I/O enclosures may support a simple device such as display or printer, or may involve a number of connections to other systems such as connections to networks through either metallic or fiber optic connections.
  • Each host and I/O enclosure contains at least one channel adaptor. This is a device which terminates a link to a switch of the cluster and thus is the device through which each system is connected to the cluster. When channel adaptors are part of a host system, they are referred to as host channel adaptors and when part of an I/O enclosure, they are referred to as target channel adaptors. Each host can have one or more host channel adaptors with each host channel adaptor (HCA) having one or more ports, and each port being connected to a different switch. This arrangement of switches and their connections to the various ports of the different channel adaptors is referred to as an interconnection fabric. [0014]
  • This arrangement of the cluster allows the individual channel adaptors to talk to the channel adaptors in other units through the switches and through the interconnection fabric. As a result, it is possible for different hardware and software arrangements from different providers to interact. The adaptors and switches utilize various protocols in order to make the various units compatible. The cluster arrangement is controlled by the use of a manager unit which keeps track of the topology of the fabric and assigns addresses to the various ports and controls how the data is switched. The manager may be one of the host systems as shown in FIG. 1 or may be a separate unit connected in a similar fashion. [0015]
  • Each host channel adaptor is controlled by a software device driver stack that runs under the host operating system. This driver stack is a collection of one or more device drivers and resides in a program within the host system. This driver stack provides the possibility of causing the host channel adaptor to hibernate and also to resume operations. [0016]
  • The hibernation operation is a sleep state where power is removed from all of the components of the system. When the hibernation is ended, the system must resume its operations where it left off. This is accomplished by storing the memory contents to disk, saving the registered state of all processors, saving the settings for applications and saving the hardware state of all I/O controllers and peripheral devices. Once power is restored, applications are resumed from the point where they left off. This is done in such a way that applications do not need to be re-launched and do not go through a new initialization sequence. [0017]
  • This is a difficult procedure when a host is involved with a cluster and connected through an interconnection fabric. It is necessary that every part of a system be put to sleep. Thus, it is necessary that all operations be stopped in a manner so that no data is lost. Thus, the present invention outlines a procedure that can be used to place a host in a hibernated state even while connected in a cluster arrangement. Also, the present method of hibernation can be used even if the host operating system for the host does not support the ability to hibernate the entire system. Thus, the HCA software stack can be utilized to control the hibernation rather than the host operating system. If the HCA is inserted in a hot-plug slot on a host system, the procedure can also be used to hibernate and power down just the HCA. [0018]
  • In addition to saving power, reducing heat and reducing wear and tear, the ability to hibernate allows a part of the system to be replaced while other parts of the cluster are active. Thus, it sometimes happens that one part of the hardware of a system is marginal or faulty. These components can be replaced with identical hardware while the system is in hibernation since all of the operating data is stored before hibernation. The present method provides a system for recognizing that new hardware has been installed and for allowing the switchover if it is possible. [0019]
  • FIG. 2 shows an example of an arrangement of software within a host. The [0020] host 30 is shown as having two host channel adaptors including hardware for the first adaptor 31 and hardware 32 for the second adaptor. Each adaptor also has software 33 and 34 for controlling the adaptor hardware. The HCA drivers 33 and 34 take account of the register layout of their associated hardware and provide a procedure to save and restore the associated hardware state. Thus, each HCA driver manages hardware that is responsible for saving and restoring relevant hardware context before and after a hibernation procedure.
  • The [0021] cluster transport driver 35 is also included in the host and provides uniform and abstracted access to the HCA service for clients that use the services of the host. For example, FIG. 2 shows two such clients, a driver stack for fabric-attached storage controller 36 and the driver stack for fabric-attached network controller 37. The fabric-attached storage controller are input/output controllers that are attached to units up through the fabric. Thus, devices such as hard disks, tapes, CD ROMS, etc., can be attached to the storage I/O controller. The fabric-attached network controller connects to a local area network and provides an avenue to connect a cluster on one side to a local area network on the other side. Thus, when starting the hibernation procedure, it is necessary for the HCA to take into account the clients such as these two driver stacks which are connected to the host before hibernation can proceed.
  • The HCA driver operates in a fashion similar to other devices that are responsible for saving and restoring the hardware context upon hibernation. However, because of the interconnections through the cluster fabric arrangement, this type of HCA driver has additional responsibilities. First, the HCA driver must inform the subnet manager that it is going into hibernation. This informs the manager that it is being disconnected in an orderly manner rather than due to an error. In this way, the manager knows that the HCA will resume operations at some point in the future and will need to be reinitialized at that time. The manager reserves the local identifiers assigned to the HCA and will not assign them to any new HCA that is installed during hibernation. This is because the original HCA must be assigned the same local identifiers so that it can function without interruption when it resumes operation. If different local identifiers were assigned, the HCA driver would have to terminate all existing connections and existing clients. The manager must also keep forwarding entries for the hibernating HCA intact and switchboarding tables. This is so that the hibernating HCA may have the capability of coming out of hibernation automatically when a packet arrives for it. If switchboarding table entries were removed, it would mean that no packets could reach the hibernating HCA to wake it up. [0022]
  • When the HCA first comes out of hibernation, the HCA driver must first check to see if it can attempt to resume operations. First, it must check to see whether the HCA is the same one as before hibernation. That is, it may have been replaced by a new one of identical capability if the original was marginal or faulty. If they are different, the driver must determine whether there are differences and whether the differences can be managed. Some differences require the driver to immediately flag a non-recoverable error. This would be the case, for example, if an unknown HCA was built by a different manufacturer and the detected unimplemented functionality cannot be emulated. [0023]
  • If the two HCAs are different, but the differences are manageable, the driver must hide the differences between the two. Thus, if each port has a unique identifier and the HCA has a node identifier, identical HCAs from the same vendor will have different identifiers for the port and node. These identifiers are used to direct communication between units attached through a fabric and must be the same both before and after hibernation. Thus, if the HCA has been changed during hibernation, the driver must use the identifiers from the pre-hibernation unit, rather than the post-hibernation unit. Thus, when the subnet manager detects the operation of the HCA, and queries for the identifiers, the HCA driver must report the old ones. Thus, to the manager, it will appear that the old unit has returned from hibernation. This will indicate to the manager that the local identifiers should be assigned as before hibernation. If for some reason, these cannot be assigned, an unrecoverable error is flagged. If the new HCA supports more capabilities than the old one, the driver must make sure that the old capabilities are provided as before and that clients using the existing HCA capabilities are not effected by the new capabilities. If the HCA driver is not able to restore the entire context to the pre-hibernation state, it must decide whether to fail to resume or to force some clients to reinitialize. For example, in some cases, where there is a connection to another unit at a remote end, the remote end may have attempted to communicate with the HCA and may have timed out if the HCA remained hibernated longer than the time out. In such a case, the remote client may have changed its situation and the HCA driver may not be able to fully restore the context to the pre-hibernation state. In this type of situation, the driver must decide if it should not resume operations or force the client to reinitialize. Should the driver issue an error status indicator, the entire system will fail to resume operations. On the other hand, if the driver successfully returns from hibernation, but forces some clients to reinitialize, the rest of the system can still also resume operations. [0024]
  • FIG. 3 shows a flow chart of the steps used when a request to hibernate arrives. As indicated above, an HCA can go into hibernation only after its clients have also hibernated first. These specific clients are the kernel mode clients of the HCA and not just user mode applications that use the services indirectly. For example, FIG. 2 shows the direct clients of the driver as the cluster transport driver and the other fabric attached controllers. When the operating system issues a hibernation request, it does so in top down order. This means that the driver stacks for the storage and network drivers must go into hibernation first followed by the transport driver and then the HCA driver. This ascertains that the clients do not make HCA service requests while the driver is in the process of saving context and putting HCA to sleep. Accordingly, in FIG. 3, step [0025] 40 shows the arrival of the request to hibernate. In step 41, active clients are investigated to see if they are still using the HCA. If they are, these clients need to hibernate first and a failure is indicated as shown in step 42. If the active clients are not using the HCA, the manager is informed that this HCA will be hibernating as indicated in step 43. In step 44, the data is sent to memory. In step 45, a return success is indicated.
  • FIG. 4 is a flow chart showing the steps executed by the driver when it resumes operation after hibernation. This procedure follows in the reverse or bottom up order. This means that the HCA driver is awakened first, followed by the transport driver and then the storage and network driver stacks. This ensures that HCA services are available when clients again resume operations. [0026]
  • Thus, FIG. 4 shows the arrival of the request to resume in [0027] step 50. In step 51, the HCAs are examined to see if they are from the same manufacturer. If not, an error is indicated in step 59. If they are the same, the differences are investigated in step 52. If the differences are manageable, the method proceeds to step 53. If not, an error is flagged in step 59. In step 53, the manager initializes the HCA. In step 54, the value is assigned by the manager or examined to see if differences have been generated. If they have been, an unrecoverable error is flagged in step 59. If not, the process proceeds to step 55 where the hardware state is restored. In step 56, the HCA is examined to see if the entire context was restored. If yes, a successful return is indicated in step 60. If not, step 57 examines whether the entire system should be failed or not. If yes, an unrecoverable error is flagged in step 59. If not, the clients are identified and a terminal error of events must be reported in step 58.
  • Thus, if a host needs to be hibernated in order for part of the hardware to be replaced, or if hibernation is desired in order to reduce power requirements, the HCA can cause a hibernation even if the host is involved with a cluster through an interconnection fabric. Upon restoring operation, a procedure is provided to determine if parts have been changed and how to handle the differences. [0028]
  • This concludes the description of the example embodiments. Although the present invention has been described with reference to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this invention. More particularly, reasonable variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the foregoing disclosure, the drawings and the appended claims without departing from the spirit of the invention. In addition to variations and modifications in the component parts and/or arrangements, alternative uses will also be apparent to those skilled in the art. [0029]

Claims (17)

1. A method of hibernating a host channel adaptor in a host connected in a cluster, comprising:
determining that no active client currently is using said host channel adaptor;
informing a subnet manager of said cluster that said host channel adaptor is hibernating;
saving relevant state information to memory; and
removing power from said host channel adaptor.
2. The method according to claim 1, further comprising:
replacing at least part of the hardware of said host channel adaptor after removing power.
3. The method according to claim 1, further comprising restoring power to said host channel adaptor, retrieving said relevant state data from memory and resuming operation.
4. The method according to claim 1, wherein said host channel adaptor includes a host channel adaptor driver, a cluster transport driver and client controllers.
5. The method according to claim 4, wherein hibernation requests are issued in top down order.
6. A method for hibernating a host channel adaptor in a cluster comprising:
removing power from said host channel adaptor to cause hibernation;
receiving a request to resume operations;
initializing the host channel adaptor by the subnet manager of said cluster;
restoring hardware state to a pre-hibernation state from memory; and
restoring operation of said host channel adaptor.
7. The method according to claim 6, further comprising:
determining whether the pre-hibernation and post-hibernation host channel adaptor are the same.
8. The method according to claim 7, further comprising:
determining whether differences between said pre-hibernation and said post-hibernation post channel adaptor are manageable.
9. The method according to claim 6, further comprising:
determining whether the manager has assigned values with unmanageable differences.
10. The method stage according to claim 6, further comprising:
determining whether a return from hibernation should be failed for the entire system if the entire host channel adaptor context was not restored.
11. The method according to claim 10, further comprising:
identifying clients to whom error events must be reported if the return from hibernation has not failed.
12. A method of replacing hardware within a host connected in a cluster arrangement, comprising:
requesting the hibernation of a host channel adaptor;
saving all relevant state data to memory;
removing power from said host;
replacing said hardware;
restoring power to said host;
restoring state data from memory;
resuming operation of said host channel adaptor.
13. The method according to claim 12, further comprising:
determining whether the pre-replacement hardware and post-replacement hardware are the same.
14. The method according to claim 13, further comprising:
determining whether differences between pre-replacement hardware and post-replacement hardware are manageable.
15. The method according to claim 12, further comprising:
determining whether a resume operation should be failed for the entire host if the entire host channel adaptor was not replaced.
16. A computer program stored on a computer readable memory for hibernating a host channel adaptor and a host connected in a cluster, the computer program comprising instructions that cause a computer to:
determine that no active client currently is using said host channel adaptor;
inform a subnet manager of said cluster that said host channel adaptor is hibernating;
save relevant state information to memory; and
remove power from said host channel adaptor.
17. The computer program of claim 16, further comprising instructions that cause a computer to restore power to said host channel adaptor, retrieve said relevant state data from memory and resume operation.
US10/787,883 2000-09-20 2004-02-25 Method for hibernation of host channel adaptors in a cluster Abandoned US20040168168A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/787,883 US20040168168A1 (en) 2000-09-20 2004-02-25 Method for hibernation of host channel adaptors in a cluster

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/666,595 US6725386B1 (en) 2000-09-20 2000-09-20 Method for hibernation of host channel adaptors in a cluster
US10/787,883 US20040168168A1 (en) 2000-09-20 2004-02-25 Method for hibernation of host channel adaptors in a cluster

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/666,595 Continuation US6725386B1 (en) 2000-09-20 2000-09-20 Method for hibernation of host channel adaptors in a cluster

Publications (1)

Publication Number Publication Date
US20040168168A1 true US20040168168A1 (en) 2004-08-26

Family

ID=32070190

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/666,595 Expired - Lifetime US6725386B1 (en) 2000-09-20 2000-09-20 Method for hibernation of host channel adaptors in a cluster
US10/787,883 Abandoned US20040168168A1 (en) 2000-09-20 2004-02-25 Method for hibernation of host channel adaptors in a cluster

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/666,595 Expired - Lifetime US6725386B1 (en) 2000-09-20 2000-09-20 Method for hibernation of host channel adaptors in a cluster

Country Status (1)

Country Link
US (2) US6725386B1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024991A1 (en) * 2007-07-16 2009-01-22 International Business Machines Corporation Method, system and program product for managing download requests received to download files from a server
US20090158242A1 (en) * 2007-12-18 2009-06-18 Kabira Technologies, Inc., Library of services to guarantee transaction processing application is fully transactional
CN102455774A (en) * 2010-10-18 2012-05-16 无锡江南计算技术研究所 Host with host channel adapter (HCA) equipment, and sleeping and awakening methods for host
CN102929618A (en) * 2012-10-18 2013-02-13 中国人民解放军理工大学 Integrated network management meta-synthesis method based on extensive makeup language (XML) configuration files and listener registration

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7039922B1 (en) 1999-11-29 2006-05-02 Intel Corporation Cluster with multiple paths between hosts and I/O controllers
US20030208572A1 (en) * 2001-08-31 2003-11-06 Shah Rajesh R. Mechanism for reporting topology changes to clients in a cluster
US6950885B2 (en) * 2001-09-25 2005-09-27 Intel Corporation Mechanism for preventing unnecessary timeouts and retries for service requests in a cluster
US7194540B2 (en) * 2001-09-28 2007-03-20 Intel Corporation Mechanism for allowing multiple entities on the same host to handle messages of same service class in a cluster
US20030101158A1 (en) * 2001-11-28 2003-05-29 Pinto Oscar P. Mechanism for managing incoming data messages in a cluster
US7099337B2 (en) * 2001-11-30 2006-08-29 Intel Corporation Mechanism for implementing class redirection in a cluster
US7640440B2 (en) * 2006-04-25 2009-12-29 Apple Inc. Method and apparatus for facilitating device hibernation

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5440690A (en) * 1991-12-27 1995-08-08 Digital Equipment Corporation Network adapter for interrupting host computer system in the event the host device driver is in both transmit and receive sleep states
US5784628A (en) * 1996-03-12 1998-07-21 Microsoft Corporation Method and system for controlling power consumption in a computer system
US5796736A (en) * 1994-07-19 1998-08-18 Nec Corporation ATM network topology auto discovery method
US5805910A (en) * 1995-03-28 1998-09-08 Samsung Electronics Co., Ltd. Computer hibernation system for transmitting data and command words between host and controller
US5923099A (en) * 1997-09-30 1999-07-13 Lam Research Corporation Intelligent backup power controller
US5983353A (en) * 1997-01-21 1999-11-09 Dell Usa, L.P. System and method for activating a deactivated device by standardized messaging in a network
US6098100A (en) * 1998-06-08 2000-08-01 Silicon Integrated Systems Corp. Method and apparatus for detecting a wake packet issued by a network device to a sleeping node
US6209088B1 (en) * 1998-09-21 2001-03-27 Microsoft Corporation Computer hibernation implemented by a computer operating system
US6336161B1 (en) * 1995-12-15 2002-01-01 Texas Instruments Incorporated Computer configuration system and method with state and restoration from non-volatile semiconductor memory
US6459705B1 (en) * 1998-03-26 2002-10-01 National Semiconductor Corporation Network interface for transmitting data from a networked computer in a reduced power state
US6523125B1 (en) * 1998-01-07 2003-02-18 International Business Machines Corporation System and method for providing a hibernation mode in an information handling system
US7082521B1 (en) * 2000-08-24 2006-07-25 Veritas Operating Corporation User interface for dynamic computing environment using allocateable resources

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5440690A (en) * 1991-12-27 1995-08-08 Digital Equipment Corporation Network adapter for interrupting host computer system in the event the host device driver is in both transmit and receive sleep states
US5796736A (en) * 1994-07-19 1998-08-18 Nec Corporation ATM network topology auto discovery method
US5805910A (en) * 1995-03-28 1998-09-08 Samsung Electronics Co., Ltd. Computer hibernation system for transmitting data and command words between host and controller
US6336161B1 (en) * 1995-12-15 2002-01-01 Texas Instruments Incorporated Computer configuration system and method with state and restoration from non-volatile semiconductor memory
US5784628A (en) * 1996-03-12 1998-07-21 Microsoft Corporation Method and system for controlling power consumption in a computer system
US5983353A (en) * 1997-01-21 1999-11-09 Dell Usa, L.P. System and method for activating a deactivated device by standardized messaging in a network
US5923099A (en) * 1997-09-30 1999-07-13 Lam Research Corporation Intelligent backup power controller
US6523125B1 (en) * 1998-01-07 2003-02-18 International Business Machines Corporation System and method for providing a hibernation mode in an information handling system
US6459705B1 (en) * 1998-03-26 2002-10-01 National Semiconductor Corporation Network interface for transmitting data from a networked computer in a reduced power state
US6098100A (en) * 1998-06-08 2000-08-01 Silicon Integrated Systems Corp. Method and apparatus for detecting a wake packet issued by a network device to a sleeping node
US6209088B1 (en) * 1998-09-21 2001-03-27 Microsoft Corporation Computer hibernation implemented by a computer operating system
US7082521B1 (en) * 2000-08-24 2006-07-25 Veritas Operating Corporation User interface for dynamic computing environment using allocateable resources

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024991A1 (en) * 2007-07-16 2009-01-22 International Business Machines Corporation Method, system and program product for managing download requests received to download files from a server
US8347286B2 (en) * 2007-07-16 2013-01-01 International Business Machines Corporation Method, system and program product for managing download requests received to download files from a server
US9106627B2 (en) 2007-07-16 2015-08-11 International Business Machines Corporation Method, system and program product for managing download requests received to download files from a server
US9876847B2 (en) 2007-07-16 2018-01-23 International Business Machines Corporation Managing download requests received to download files from a server
US10554730B2 (en) 2007-07-16 2020-02-04 International Business Machines Corporation Managing download requests received to download files from a server
US11012497B2 (en) 2007-07-16 2021-05-18 International Business Machines Corporation Managing download requests received to download files from a server
US20090158242A1 (en) * 2007-12-18 2009-06-18 Kabira Technologies, Inc., Library of services to guarantee transaction processing application is fully transactional
CN102455774A (en) * 2010-10-18 2012-05-16 无锡江南计算技术研究所 Host with host channel adapter (HCA) equipment, and sleeping and awakening methods for host
CN102929618A (en) * 2012-10-18 2013-02-13 中国人民解放军理工大学 Integrated network management meta-synthesis method based on extensive makeup language (XML) configuration files and listener registration

Also Published As

Publication number Publication date
US6725386B1 (en) 2004-04-20

Similar Documents

Publication Publication Date Title
CA2659141C (en) Method and system for supporting wake-on-lan in a virtualized environment
US7085961B2 (en) Redundant management board blade server management system
US7657786B2 (en) Storage switch system, storage switch method, management server, management method, and management program
US6763479B1 (en) High availability networking with alternate pathing failover
US8171119B2 (en) Program deployment apparatus and method
US6718383B1 (en) High availability networking with virtual IP address failover
US8601314B2 (en) Failover method through disk take over and computer system having failover function
US6732186B1 (en) High availability networking with quad trunking failover
US6728780B1 (en) High availability networking with warm standby interface failover
US6948021B2 (en) Cluster component network appliance system and method for enhancing fault tolerance and hot-swapping
US7013462B2 (en) Method to map an inventory management system to a configuration management system
US6738818B1 (en) Centralized technique for assigning I/O controllers to hosts in a cluster
US20040024831A1 (en) Blade server management system
US20030084337A1 (en) Remotely controlled failsafe boot mechanism and manager for a network device
JPH09508727A (en) Distributed chassis agent for network management
US6725386B1 (en) Method for hibernation of host channel adaptors in a cluster
CN102238093A (en) Service interruption prevention method and device
JP2009140194A (en) Method for setting failure recovery environment
CN107688434A (en) Disk array RAID collocation methods and device
CN112165429B (en) Link aggregation convergence method and device for distributed switching equipment
US7039922B1 (en) Cluster with multiple paths between hosts and I/O controllers
WO2020233001A1 (en) Distributed storage system comprising dual-control architecture, data reading method and device, and storage medium
EP3474501B1 (en) Network device stacking
US20030023895A1 (en) Peripheral failover system
US20030145068A1 (en) Appliance server configuration recovery for a highly optimized server configuration profile image

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION