Revision D
Copyright © 2000 K. M. Sorenson
December, 2000
This document describes how to set up and manage a Kimberlite cluster, which provides application availability and data integrity. Send comments to documentation@missioncriticallinux.com.
Copyright © 2000 K. M. Sorenson
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation. A copy of the license is included on the GNU Free Documentation License Web site.
If you have comments on this document, please send them to:
documentation@missioncriticallinux.com
Linux is a trademark of Linus Torvalds
All product names mentioned herein are the trademarks of their respective owners
This document includes the following modifications since Revision A:
The Kimberlite clustering technology, made available to the open source community by Mission Critical Linux,Inc., provides data integrity and the ability to maintain application availability in the event of a failure. Using redundant hardware, shared disk storage, power management, and robust cluster communication and application failover mechanisms, a cluster can meet the needs of the enterprise market.
Especially suitable for database applications and World Wide Web (Web) servers with dynamic content, a cluster can also be used in conjunction with other Linux availability efforts, such as Linux Virtual Server (LVS), to deploy a highly available e-commerce site that has complete data integrity and application availability, in addition to load balancing capabilities. See Using a Cluster in an LVS Environment for more information.
For real-time management of cluster environments, Mission Critical Linux also provides Secure Service Technology TM (SST), which enables its engineers or other authorized users to securely access cluster systems and remotely diagnose and correct problems. Using both SST and Mission Critical Linux's system analysis tools ensures that any problems in a cluster are resolved quickly and easily, with minimal interruption in business. See www.missioncriticallinux.com/products/sst/ for more information.
The following sections describe:
To set up a cluster, you connect the cluster systems (often referred to as member systems) to the cluster hardware, install the Kimberlite software on both systems, and configure the systems into the cluster environment. The foundation of a cluster is an advanced host membership algorithm. This algorithm ensures that the cluster maintains complete data integrity at all times by using the following methods of inter-node communication:
To make an application and data highly available in a cluster, you configure a cluster service, which is a discrete group of service properties and resources, such as an application and shared disk storage. A service can be assigned an IP address to provide transparent client access to the service. For example, you can set up a cluster service that provides clients with access to highly-available database application data.
Both cluster systems can run any service and access the service data on shared disk storage. However, each service can run on only one cluster system at a time, in order to maintain data integrity. You can set up an active-active configuration in which both cluster systems run different services, or a hot-standby configuration in which a primary cluster system runs all the services, and a backup cluster system takes over only if the primary system fails.
The following figure shows a cluster in an active-active configuration.
If a hardware or software failure occurs, the cluster will automatically restart the failed system's services on the functional cluster system. This service failover capability ensures that no data is lost, and there is little disruption to users. When the failed system recovers, the cluster can re-balance the services across the two systems.
In addition, a cluster administrator can cleanly stop the services running on a cluster system, and then restart them on the other system. This service relocation capability enables you to maintain application and data availability when a cluster system requires maintenance.
A cluster includes the following features:
The following figure shows how systems communicate in a cluster configuration.
This manual contains information about setting up the cluster hardware, and installing the Linux distribution and the cluster software. These tasks are described in Hardware Installation and Operating System Configuration and Cluster Software Installation and Initialization.
For information about setting up and managing cluster services, see Service Configuration and Administration. For information about managing a cluster, see Cluster Administration.
Supplementary Hardware Information contains detailed configuration information for specific hardware devices, in addition to information about shared storage configurations. You should always check for information that is applicable to your hardware.
Supplementary
Software Information contains background information on the cluster software
and other related information.
To set up the hardware configuration and install the Linux distribution, follow these steps:
After setting up the hardware configuration and installing the Linux distribution, you can install the cluster software.
Kimberlite allows you to use commodity hardware to set up a cluster configuration that will meet the performance, availability, and data integrity needs of your applications and users. Cluster hardware ranges from low-cost minimum configurations that include only the components required for cluster operation, to high-end configurations that include redundant heartbeat channels, hardware RAID, and power switches.
Regardless of your configuration, you should always use high-quality hardware in a cluster, because hardware malfunction is the primary cause of system down time.
Although all cluster configurations provide availability, some configurations protect against every single point of failure. In addition, all cluster configurations provide data integrity, but some configurations protect data under every failure condition. Therefore, you must fully understand the needs of your computing environment and also the availability and data integrity features of different hardware configurations, in order to choose the cluster hardware that will meet your requirements.
When choosing a cluster hardware configuration, consider the following:
A minimum hardware configuration includes only the hardware components that are required for cluster operation, as follows:
See Example of a Minimum Cluster Configuration for an example of this type of hardware configuration.
The minimum hardware configuration is the most cost-effective cluster configuration; however, it includes multiple points of failure. For example, if a shared disk fails, any cluster service that uses the disk will be unavailable. In addition, the minimum configuration does not include power switches, which protect against data corruption under all failure conditions. Therefore, only development environments should use a minimum cluster configuration.
To improve availability and protect against component failure, and to guarantee data integrity under all failure conditions, you can expand the minimum configuration. The following table shows how you can improve availability and guarantee data integrity:
To protect against: | You can use: |
Disk failure | Hardware RAID to replicate data across multiple disks. |
Storage interconnect failure | RAID array with multiple SCSI buses or Fibre Channel interconnects. |
RAID controller failure | Dual RAID controllers to provide redundant access to disk data. |
Heartbeat channel failure | Point-to-point Ethernet or serial connection between the cluster systems. |
Power source failure | Redundant uninterruptible power supply (UPS) systems. |
Data corruption under all failure conditions | Power switches |
A no-single-point-of-failure hardware configuration that guarantees data integrity under all failure conditions can include the following components:
See Example of a No-Single-Point-Of-Failure Configuration for an example of this type of hardware configuration.
Cluster hardware configurations can also include other optional hardware components that are common in a computing environment. For example, you can include a network switch or network hub, which enables you to connect the cluster systems to a network, and a console switch, which facilitates the management of multiple systems and eliminates the need for separate monitors, mouses, and keyboards for each cluster system.
One type of console switch is a terminal server, which enables you to connect to serial consoles and manage many systems from one remote location. As a low-cost alternative, you can use a KVM (keyboard, video, and mouse) switch, which enables multiple systems to share one keyboard, monitor, and mouse. A KVM is suitable for configurations in which you access a graphical user interface (GUI) to perform system management tasks.
When choosing a cluster system, be sure that it provides the PCI slots, network slots, and serial ports that the hardware configuration requires. For example, a no-single-point-of-failure configuration requires multiple serial and Ethernet ports. Ideally, choose cluster systems that have at least two serial ports. See Installing the Basic System Hardware for more information.
Use the following table to identify the hardware components required for your cluster configuration. In some cases, the table lists specific products that have been tested in a cluster, although a cluster is expected to work with other products.
Cluster System Hardware
|
|||
Hardware | Quantity | Description | Required |
Cluster system | Two | Kimberlite supports IA-32 hardware platforms. Each cluster system must provide enough PCI slots, network slots, and serial ports for the cluster hardware configuration. Because disk devices must have the same name on each cluster system, it is recommended that the systems have identical I/O subsystems. In addition, it is recommended that each system have 450 Mhz CPU speed and 256 MB of memory. See Installing the Basic System Hardware for more information. | Yes |
Power Switch Hardware | |||
Hardware | Quantity | Description | Required |
Power switch | Two |
Power switches enable each cluster system to power-cycle the other cluster system. A recommended power switch is the RPS-10 (model M/HD in the US, and model M/EC in Europe), which is available from www.wti.com/rps-10.htm. See Configuring Power Switches for information about using power switches in a cluster. |
Strongly recommended for data integrity under all failure conditions |
Null modem cable | Two |
Null modem cables connect a serial port on a cluster system to an power switch. This serial connection enables each cluster system to power-cycle the other system. Some power switches may require different cables. |
Only if using power switches |
Mounting bracket | One | Some power switches support rack mount configurations. | Only for rack mounting power switches |
Shared Disk Storage Hardware | |||
Hardware | Quantity | Description | Required |
External disk storage enclosure | One |
For production environments, it is recommended that you use single-initiator SCSI buses or single-initiator
Fibre Channel interconnects to
connect the cluster systems to a single or dual-controller RAID array.
To use single-initiator buses or interconnects, a RAID controller must have multiple host ports and
provide simultaneous access to all the logical units on the host ports. If a logical unit can fail over from one
controller to the other, the process must be transparent to the operating system.
|
Yes |
Host bus adapter | Two |
To connect to shared disk storage, you must install either a parallel SCSI or a Fibre Channel host bus adapter in
a PCI slot in each cluster system.
See Host Bus Adapter Features and Configuration Requirements and Adaptec Host Bus Adapter Requirement for device features and configuration information. |
Yes |
SCSI cable | Two |
SCSI cables with 68 pins connect each host bus adapter to a storage enclosure port. Cables have either HD68 or VHDCI connectors. |
Only for parallel SCSI configurations |
External SCSI LVD active terminator | Two |
For hot plugging support, connect an external LVD active terminator to a host bus adapter that has disabled internal termination. This enables you to disconnect the terminator from the adapter without affecting bus operation. Terminators have either HD68 or VHDCI connectors. Recommended external pass-through terminators with HD68 connectors can be obtained from Technical Cable Concepts, Inc., 350 Lear Avenue, Costa Mesa, California, 92626 (714-835-1081), or www.techcable.com. The part description and number is TERM SSM/F LVD/SE Ext Beige, 396868-LVD/SE. |
Only for parallel SCSI configurations that require external termination for hot plugging |
SCSI terminator | Two | For a RAID storage enclosure that uses "out" ports (such as FlashDisk RAID Disk Array) and is connected to single-initiator SCSI buses, connect terminators to the "out" ports in order to terminate the buses. | Only for parallel SCSI configurations and only if necessary for termination |
Fibre Channel hub or switch | One or two | A Fibre Channel hub or switch is required, unless you have a storage enclosure with two ports, and the host bus adapters in the cluster systems can be connected directly to different ports. | Only for some Fibre Channel configurations |
Fibre Channel cable | Two to six | A Fibre Channel cable connects a host bus adapter to a storage enclosure port, a Fibre Channel hub, or a Fibre Channel switch. If a hub or switch is used, additional cables are needed to connect the hub or switch to the storage adapter ports. | Only for Fibre Channel configurations |
Network Hardware | |||
Hardware | Quantity | Description | Required |
Network interface | One for each network connection | Each network connection requires a network interface installed in a cluster system. See Tulip Network Driver Requirement for information about using this driver in a cluster. | Yes |
Network switch or hub | One | A network switch or hub enables you to connect multiple systems to a network. | No |
Network cable | One for each network interface | A conventional network cable, such as a cable with an RJ45 connector, connects each network interface to a network switch or a network hub. | Yes |
Point-To-Point Ethernet Heartbeat Channel Hardware | |||
Hardware | Quantity | Description | Required |
Network interface | Two for each channel | Each Ethernet heartbeat channel requires a network interface installed in both cluster systems. | No |
Network crossover cable | One for each channel | A network crossover cable connects a network interface on one cluster system to a network interface on the other cluster system, creating an Ethernet heartbeat channel. | Only for a redundant Ethernet heartbeat channel |
Point-To-Point Serial Heartbeat Channel Hardware | |||
Hardware | Quantity | Description | Required |
Serial card | Two for each serial channel |
Each serial heartbeat channel requires a serial port on both cluster systems. To expand your serial port capacity, you can use multi-port serial PCI cards. Recommended multi-port cards include the following:
|
No |
Null modem cable | One for each channel | A null modem cable connects a serial port on one cluster system to a corresponding serial port on the other cluster system, creating a serial heartbeat channel. | Only for serial heartbeat channel |
Console Switch Hardware | |||
Hardware | Quantity | Description | Required |
Terminal server | One | A terminal server enables you to manage
many systems from one remote location. Recommended terminal servers include the following:
|
No |
RJ45 to DB9 crossover cable | Two | RJ45 to DB9 crossover cables connect a serial port on each cluster system to a Cyclades terminal server. Other types of terminal servers may require different cables. | Only for terminal server |
Network cable | One | A network cable connects a terminal server to a network switch or hub. | Only for terminal server |
KVM | One | A KVM enables multiple systems to share one keyboard, monitor, and mouse. A recommended KVM is the Cybex Switchview, which is available from www.cybex.com. Cables for connecting systems to the switch depend on the type of KVM. | No |
UPS System Hardware | |||
Hardware | Quantity | Description | Required |
UPS system | One or two | Uninterruptible power supply (UPS)
systems provide a highly-available source of power. Ideally, connect the
power cables for the shared storage enclosure and both power switches to
redundant UPS systems. In addition, a UPS system must be able to provide
voltage for an adequate period of time. A recommended UPS system is the APC Smart-UPS 1000VA/670W, which is available from www.apc.com. |
Strongly recommended for availability
|
The hardware components described in the following table can be used to set up a minimum cluster configuration that uses a multi-initiator SCSI bus and supports hot plugging. This configuration does not guarantee data integrity under all failure conditions, because it does not include power switches. Note that this is a sample configuration; you may be able to set up a minimum configuration using other hardware.
Minimum Cluster Hardware Configuration Example | |
Two servers |
Each cluster system includes the following hardware:
|
Two network cables with RJ45 connectors | Network cables connect a network interface on each cluster system to the network for client access and Ethernet heartbeats. |
JBOD storage enclosure |
The storage enclosure's internal termination is disabled. |
Two pass-through LVD active terminators |
External pass-through LVD active terminators connected to each host bus adapter provide external SCSI bus termination for hot plugging support. |
Two HD68 SCSI cables |
HD68 cables connect each terminator to a port on the storage enclosure, creating a multi-initiator SCSI bus. |
The following figure shows a minimum cluster hardware configuration that includes the hardware described in the previous table and a multi-initiator SCSI bus, and also supports hot plugging. A "T" enclosed by a circle indicates internal (onboard) or external SCSI bus termination. A slash through the "T" indicates that termination has been disabled.
The components described in the following table can be used to set up a no-single-point-of-failure cluster configuration that includes two single-initiator SCSI buses and power switches to guarantee data integrity under all failure conditions. Note that this is a sample configuration; you may be able to set up a no-single-point-of-failure configuration using other hardware.
No-Single-Point-Of-Failure Configuration Example |
|
Two servers |
Each cluster system includes the following hardware:
|
One network switch | A network switch enables you to connect multiple systems to a network. |
One Cyclades terminal server | A terminal server enables you to manage remote systems from a central location. |
Three network cables | Network cables connect the terminal server and a network interface on each cluster system to the network switch. |
Two RJ45 to DB9 crossover cables | RJ45 to DB9 crossover cables connect a serial port on each cluster system to the Cyclades terminal server. |
One network crossover cable | A network crossover cable connects a network interface on one cluster system to a network interface on the other system, creating a point-to-point Ethernet heartbeat channel. |
Two RPS-10 power switches | Power switches enable each cluster system to power-cycle the other system before restarting its services. The power cable for each cluster system is connected to its own power switch. |
Three null modem cables |
Null modem cables connect a serial port on each cluster system to the power switch that provides power to the other cluster system. This connection enables each cluster system to power-cycle the other system. A null modem cable connects a serial port on one cluster system to a corresponding serial port on the other system, creating a point-to-point serial heartbeat channel. |
FlashDisk RAID Disk Array with dual controllers | Dual RAID controllers protect against disk and controller failure. The RAID controllers provide simultaneous access to all the logical units on the host ports. |
Two HD68 SCSI cables | HD68 cables connect each host bus adapter to a RAID enclosure "in" port, creating two single-initiator SCSI buses. |
Two terminators | Terminators connected to each "out" port on the RAID enclosure terminate both single-initiator SCSI buses. |
Redundant UPS Systems | UPS systems provide a highly-available source of power. The power cables for the power switches and the RAID enclosure are connected to two UPS systems. |
The following figure shows an example of a no-single-point-of-failure hardware configuration that includes the hardware described in the previous table, two single-initiator SCSI buses, and power switches to guarantee data integrity under all error conditions.
After you identify the cluster hardware components, as described in Choosing a Hardware Configuration, you must set up the basic cluster system hardware and connect the systems to the optional console switch and network switch or hub. Follow these steps:
After performing the previous tasks, you can install the Linux distribution, as described in Steps for Installing and Configuring the Linux Distribution.
Cluster systems must provide the CPU processing power and memory required by your applications. It is recommended that each system have 450 Mhz CPU speed and 256 MB of memory.
In addition, cluster systems must be able to accommodate the SCSI adapters, network interfaces, and serial ports that your hardware configuration requires. Systems have a limited number of preinstalled serial and network ports and PCI expansion slots. The following table will help you determine how much capacity your cluster systems require:
Cluster Hardware Component | Serial Ports | Network Slots | PCI slots |
Remote power switch connection (optional, but strongly recommended) | One | ||
SCSI bus to shared disk storage | One for each bus | ||
Network connection for client access and Ethernet heartbeat | One for each network connection | ||
Point-to-point Ethernet heartbeat channel (optional) | One for each channel | ||
Point-to-point serial heartbeat channel (optional) | One for each channel | ||
Terminal server connection (optional) | One |
Most systems come with at least one serial port. Ideally, choose systems that have at least two serial ports. If your system has a graphics display capability, you can use the serial console port for a serial heartbeat channel or a power switch connection. To expand your serial port capacity, you can use multi-port serial PCI cards.
In addition, you must be sure that local system disks will not be on the same SCSI bus as the shared disks. For example, you can use two-channel SCSI adapters, such as the Adaptec 3950-series cards, and put the internal devices on one channel and the shared disks on the other channel. You can also use multiple SCSI cards.
See the system documentation supplied by the vendor for detailed installation information. See Supplementary Hardware Information for hardware-specific information about using host bus adapters, multiport serial cards, and Tulip network drivers in a cluster.
The following figure shows the bulkhead of a sample cluster system and the external cable connections for a typical cluster configuration.
Although a console switch is not required for cluster operation, you can use one to facilitate cluster system management and eliminate the need for separate monitors, mouses, and keyboards for each cluster system. There are several types of console switches.
For example, a terminal server enables you to connect to serial consoles and manage many systems from a remote location. For a low-cost alternative, you can use a KVM (keyboard, video, and mouse) switch, which enables multiple systems to share one keyboard, monitor, and mouse. A KVM switch is suitable for configurations in which you access a graphical user interface (GUI) to perform system management tasks.
Set up the console switch according to the documentation provided by the vendor,
unless this manual provides cluster-specific installation guidelines that
supersede the vendor instructions. See Setting Up a Cyclades
Terminal Server for information.
After you set up the console switch, connect it to each cluster system. The
cables you use depend on the type of console switch. For example, if you have
a Cyclades terminal server, use RJ45 to DB9 crossover cables to connect
a serial port on each cluster system to the terminal server.
Although a network switch or hub is not required for cluster operation, you may want to use one to facilitate cluster and client system network operations.
Set up a network switch or hub according to the documentation provided by the vendor.
After you set up the network switch or hub, connect it to each cluster system by using conventional network cables. If you are using a terminal server, use a network cable to connect it to the network switch or hub.
After you set up the basic system hardware, install the Linux distribution
on both cluster systems and ensure that they recognize the connected devices. Follow
these steps:
You must install a Linux distribution on the cluster systems, in addition to the drivers and subsystems that are required by your applications. It is recommended that you install Linux kernel version 2.2.16, unless otherwise instructed.
Kimberlite supports the following Linux distributions:
When installing the Linux distribution, you must adhere to the following kernel requirements:
# rpm -q pdkshTo check if the Korn shell is installed on a Debian package-based system, use the following command:
# dpkg -S pdkshObtain the Korn shell from metalab.unc.edu:/pub/linux/system/shells/pdksh-5.2.8.tar.gz, if necessary. See the README file in the Korn shell distribution for installation instructions.
To build and install only the raw command, invoke the following commands:
# gunzip -cd util-linux-2.10k.tar.gz # cd util-linux-2.10k # ./configure # cd disk-utils # make raw # install -d -m 755 /usr/bin # install -d -m 755 /usr/man/man8 # install -m 755 raw /usr/bin # install -m 644 raw.8 /usr/man/man8
In addition, when installing the Linux distribution, it is strongly recommended that you:
In addition, see the following for details on installing different Linux distributions in a cluster:
The following requirements apply to a VA Linux distribution, which includes enhancements to the Red Hat distribution:
The default version of the GNU C compiler (gcc) provided with Red Hat 7.0 cannot be used to build a new kernel. To be able to build a new kernel, you must edit the Makefile in the /usr/src/linux directory and change two gcc references to kgcc, as shown in the following example:
18 HOSTCC =kgcc 25 CC =$(CROSS_COMPILE)kgcc -D__KERNEL__ -I$(HPATH)
You could also use alias or symlink to change gcc references to kgcc.
The /etc/hosts file contains the IP address-to-hostname translation table. The /etc/hosts file on each cluster system must contain entries for the following:
As an alternative to the /etc/hosts file, you could use a naming service such as DNS or NIS to define the host names used by a cluster. However, to limit the number of dependencies and optimize availability, it is strongly recommended that you use the /etc/hosts file to define IP addresses for cluster network interfaces.
To following is an example of an /etc/hosts file on a cluster system:
127.0.0.1 localhost.localdomain localhost 193.186.1.81 cluster2.linux.com cluster2 10.0.0.1 ecluster2.linux.com ecluster2 193.186.1.82 cluster3.linux.com cluster3 10.0.0.2 ecluster3.linux.com ecluster3
The previous example shows the IP addresses and host names for two cluster
systems (cluster2 and
cluster3), and the private
IP addresses and host names for the Ethernet interface used for the point-to-point
heartbeat connection on each cluster system (ecluster2
and ecluster3).
Note that some Linux distributions (for example, Red Hat 6.2) use an incorrect format in the /etc/hosts file, and include non-local systems in the entry for the local host. An example of an incorrect local host entry that includes a non-local system (server1) is shown next:
127.0.0.1 localhost.localdomain localhost server1A heartbeat channel may not operate properly if the format is not correct. For example, the channel will erroneously appear to be "offline." Check your /etc/hosts file and correct the file format by removing non-local systems from the local host entry, if necessary.
Note that each network adapter must be configured with the appropriate IP address and netmask.
The following is an example of a portion of the output from the ifconfig
command on a cluster system:
# ifconfig eth0 Link encap:Ethernet HWaddr 00:00:BC:11:76:93 inet addr:192.186.1.81 Bcast:192.186.1.245 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:65508254 errors:225 dropped:0 overruns:2 frame:0 TX packets:40364135 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:19 Base address:0xfce0 eth1 Link encap:Ethernet HWaddr 00:00:BC:11:76:92 inet addr:10.0.0.1 Bcast:10.0.0.245 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:18 Base address:0xfcc0
The previous example shows two network interfaces on a cluster system, eth0 (network interface for the cluster system) and eth1 (network interface for the point-to-point heartbeat connection).
You can reduce the boot time for a cluster system by decreasing the kernel boot timeout limit. During the Linux boot sequence, you are given the opportunity to specify an alternate kernel to boot. The default timeout limit for specifying a kernel depends on the Linux distribution. For Red Hat distributions, the limit is five seconds.
To modify the kernel boot timeout limit for a cluster system, edit the /etc/lilo.conf file and specify the desired value (in tenths of a second) for the timeout parameter. The following example sets the timeout limit to three seconds:
timeout = 30
To apply the changes you made to the /etc/lilo.conf file, invoke the
/sbin/lilo command.
Use the dmesg command to display the console startup messages. See the dmesg.8 manpage for more information.
The following example of dmesg
command output shows that a serial expansion card was recognized during startup:
May 22 14:02:10 storage3 kernel: Cyclades driver 2.3.2.5 2000/01/19 14:35:33 May 22 14:02:10 storage3 kernel: built May 8 2000 12:40:12 May 22 14:02:10 storage3 kernel: Cyclom-Y/PCI #1: 0xd0002000-0xd0005fff, IRQ9, 4 channels starting from port 0.
The following example of dmesg command output shows that two external SCSI buses and nine disks were detected on the system:
May 22 14:02:10 storage3 kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 May 22 14:02:10 storage3 kernel:May 22 14:02:10 storage3 kernel: scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 May 22 14:02:10 storage3 kernel: May 22 14:02:10 storage3 kernel: scsi : 2 hosts. May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST39236LW Rev: 0004 May 22 14:02:11 storage3 kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdb at scsi1, channel 0, id 0, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdc at scsi1, channel 0, id 1, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdd at scsi1, channel 0, id 2, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sde at scsi1, channel 0, id 3, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdf at scsi1, channel 0, id 8, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdg at scsi1, channel 0, id 9, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdh at scsi1, channel 0, id 10, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdi at scsi1, channel 0, id 11, lun 0 May 22 14:02:11 storage3 kernel: Vendor: Dell Model: 8 BAY U2W CU Rev: 0205 May 22 14:02:11 storage3 kernel: Type: Processor ANSI SCSI revision: 03 May 22 14:02:11 storage3 kernel: scsi1 : channel 0 target 15 lun 1 request sense failed, performing reset. May 22 14:02:11 storage3 kernel: SCSI bus is being reset for host 1 channel 0. May 22 14:02:11 storage3 kernel: scsi : detected 9 SCSI disks total.
The following example of dmesg
command output shows that a quad Ethernet card was detected on the system:
May 22 14:02:11 storage3 kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html May 22 14:02:11 storage3 kernel: tulip.c:v0.91g-ppc 7/16/99 becker@cesdis.gsfc.nasa.gov May 22 14:02:11 storage3 kernel: eth0: Digital DS21140 Tulip rev 34 at 0x9800, 00:00:BC:11:76:93, IRQ 5. May 22 14:02:12 storage3 kernel: eth1: Digital DS21140 Tulip rev 34 at 0x9400, 00:00:BC:11:76:92, IRQ 9. May 22 14:02:12 storage3 kernel: eth2: Digital DS21140 Tulip rev 34 at 0x9000, 00:00:BC:11:76:91, IRQ 11. May 22 14:02:12 storage3 kernel: eth3: Digital DS21140 Tulip rev 34 at 0x8800, 00:00:BC:11:76:90, IRQ 10.
To be sure that the installed devices, including serial and network interfaces, are configured in the kernel, use the cat /proc/devices command on each cluster system. You can also use this command to determine if you have raw device support installed on the system. For example:
# cat /proc/devices Character devices: 1 mem 2 pty 3 ttyp 4 ttyS 5 cua 7 vcs 10 misc 19 ttyC 20 cub 128 ptm 136 pts 162 raw Block devices: 2 fd 3 ide0 8 sd 65 sd #
The previous example shows:
If raw devices are displayed, raw I/O support is included in the system,
and you do not need to apply the raw I/O patch, as described in
Linux Distribution and Kernel Requirements.
After installing the Linux distribution, you can set up the cluster hardware components and then verify the installation to ensure that the cluster systems recognize all the connected devices. Note that the exact steps for setting up the hardware depend on the type of configuration. See Choosing a Hardware Configuration for more information about cluster configurations.
To set up the cluster hardware, follow these steps:
If you are using Adaptec host bus adapters for shared storage, see Adaptec
Host Bus Adapter Requirement
for configuration information.
The cluster uses heartbeat channels to determine the state of the cluster systems. For example, if a cluster system stops updating its timestamp on the quorum partitions, the other cluster system will check the status of the heartbeat channels to determine if failover should occur.
A cluster must include at least one heartbeat channel. You can use an Ethernet connection for both client access and a heartbeat channel. However, it is recommended that you set up additional heartbeat channels for high availability. You can set up redundant Ethernet heartbeat channels, in addition to one or more serial heartbeat channels.
For example, if you have an Ethernet and a serial heartbeat channel, and the cable for the Ethernet channel is disconnected, the cluster systems can still check status through the serial heartbeat channel.
To set up a redundant Ethernet heartbeat channel, use a network crossover cable to connect a network interface on one cluster system to a network interface on the other cluster system.
To set up a serial heartbeat channel, use
a null modem cable to connect a serial port on one cluster system to a
serial port on the other cluster system. Be sure to connect corresponding
serial ports on the cluster systems; do not connect to the serial port that
will be used for a remote power switch connection.
Power switches enable a cluster system to power-cycle the other cluster system before restarting its services as part of the failover process. The ability to remotely disable a system ensures data integrity under any failure condition. It is recommended that production environments use power switches in the cluster configuration. Only development environments should use a configuration without power switches.
In a cluster configuration that uses power switches, each cluster system's power cable is connected to its own power switch. In addition, each cluster system is remotely connected to the other cluster system's power switch, usually through a serial port connection. When failover occurs, a cluster system can use this connection to power-cycle the other cluster system before restarting its services.
Power switches protect against data corruption if an unresponsive ("hung") system becomes responsive ("unhung") after its services have failed over, and issues I/O to a disk that is also receiving I/O from the other cluster system. In addition, if a quorum daemon fails on a cluster system, the system is no longer able to monitor the quorum partitions. If you are not using power switches in the cluster, this error condition may result in services being run on more than one cluster system, which can cause data corruption.
It is strongly recommended that you use power switches in a cluster. However, if you are fully aware of the risk, you can choose to set up a cluster without power switches.
A cluster system may "hang" for a few seconds if it is swapping or has a high system workload. In this case, failover does not occur because the other cluster system does not determine that the "hung" system is down.
A cluster system may "hang" indefinitely because of a hardware failure or a kernel error. In this case, the other cluster will notice that the "hung" system is not updating its timestamp on the quorum partitions, and is not responding to pings over the heartbeat channels.
If a cluster system determines that a "hung" system is down, and power switches are used in the cluster, the cluster system will power-cycle the "hung" system before restarting its services. This will cause the "hung" system to reboot in a clean state, and prevent it from issuing I/O and corrupting service data.
If power switches are not used in cluster, and a cluster system determines that a "hung" system is down, it will set the status of the failed system to DOWN on the quorum partitions, and then restart the "hung" system's services. If the "hung" system becomes "unhung," it will notice that its status is DOWN, and initiate a system reboot. This will minimize the time that both cluster systems may be able to issue I/O to the same disk, but it does not provide the data integrity guarantee of power switches. If the "hung" system never becomes responsive, you will have to manually reboot the system.
If you are using power switches, set up the hardware according to the vendor instructions. However, you may have to perform some cluster-specific tasks to use a power switch in the cluster. See Setting Up an RPS-10 Power Switch for detailed information about using an RPS-10 power switch in a cluster. Note that the cluster-specific information provided in this document supersedes the vendor information.
After you set up the power switches, perform these tasks to connect them to the cluster systems:
After you install the cluster software, but before you start the cluster, test the power switches to ensure that each cluster system can power-cycle the other system. See Testing the Power Switches for information.
Uninterruptible power supply (UPS) systems protect against downtime if a power outage occurs. Although UPS systems are not required for cluster operation, they are recommended. For the highest availability, connect the power switches (or the power cords for the cluster systems if you are not using power switches) and the disk storage subsystem to redundant UPS systems. In addition, each UPS system must be connected to its own power circuit.
Be sure that each UPS system can provide adequate power to its attached devices. If a power outage occurs, a UPS system must be able to provide power for an adequate amount of time.
Redundant UPS systems provide a highly-available source of power. If a power outage occurs, the power load for the cluster devices will be distributed over the UPS systems. If one of the UPS systems fail, the cluster applications will still be available.
If your disk storage subsystem has two power supplies with separate power cords, set up two UPS systems, and connect one power switch (or one cluster system's power cord if you are not using power switches) and one of the storage subsystem's power cords to each UPS system.
A redundant UPS system configuration is shown in the following figure.
You can also connect both power switches (or both cluster systems' power cords) and the disk storage subsystem to the
same UPS system. This is the most cost-effective configuration, and provides
some protection against power failure. However, if a power outage occurs, the
single UPS system becomes a possible single point of failure. In addition, one
UPS system may not be able to provide enough power to all the attached devices
for an adequate amount of time.
A single UPS system configuration is shown in the following figure.
Many UPS system products include Linux applications that monitor the operational status of the UPS system through a serial port connection. If the battery power is low, the monitoring software will initiate a clean system shutdown. If this occurs, the cluster software will be properly stopped, because it is controlled by a System V run level script (for example, /etc/rc.d/init.d/cluster).
See the UPS documentation supplied by the vendor for detailed installation
information.
In a cluster, shared disk storage is used to hold service data and two quorum partitions. Because this storage must be available to both cluster systems, it cannot be located on disks that depend on the availability of any one system. See the vendor documentation for detailed product and installation information.
There are a number of factors to consider when setting up shared disk storage in a cluster:
Note that you must carefully follow the configuration guidelines for multi and single-initiator buses and for hot plugging, in order for the cluster to operate correctly.
You must adhere to the following shared storage requirements:
You must adhere to the following parallel SCSI requirements, if applicable:
See SCSI Bus Configuration Requirements for more information.
In addition, it is strongly recommended that you connect the storage enclosure to redundant UPS systems for a highly-available source of power. See Configuring UPS Systems for more information.
See Setting Up a Multi-Initiator SCSI Bus, Setting Up a Single-Initiator SCSI Bus, and Setting Up a Single-Initiator Fibre Channel Interconnect for more information about configuring shared storage.
After you set up the shared disk storage hardware, you can partition the disks and then either create file systems or raw devices on the partitions. You must create two raw devices for the primary and the backup quorum partitions. See Configuring the Quorum Partitions, Partitioning Disks, Creating Raw Devices, and Creating File Systems for more information.
A multi-initiator SCSI bus has more than one cluster system connected to it. If you have JBOD storage, you must use a multi-initiator SCSI bus to connect the cluster systems to the shared disks in a cluster storage enclosure. You also must use a multi-initiator bus if you have a RAID controller that does not provide access to all the shared logical units from host ports on the storage enclosure, or has only one host port.
A multi-initiator bus does not provide host isolation. Therefore, only development environments should use a multi-initiator bus.
A multi-initiator bus must adhere to the requirements described in SCSI Bus Configuration Requirements. In addition, see Host Bus Adapter Features and Configuration Requirements for information about terminating host bus adapters and configuring a multi-initiator bus with and without hot plugging support.
In general, to set up a multi-initiator SCSI bus with a cluster system at each end of the bus, you must do the following:
To set host bus adapter termination, you usually must enter the system configuration utility during system boot. To set RAID controller or storage enclosure termination, see the vendor documentation.
The following figure shows a multi-initiator SCSI bus with no hot plugging support.
Multi-Initiator SCSI Bus Configuration
If the onboard termination for a host bus adapter can be disabled, you can configure it for hot plugging. This allows you to disconnect the adapter from the multi-initiator bus, without affecting bus termination, so you can perform maintenance while the bus remains operational.
To configure a host bus adapter for hot plugging, you must do the following:
You can then use the appropriate 68-pin SCSI cable to connect the LVD terminator to the (unterminated) storage enclosure.
The following figure shows a multi-initiator SCSI bus with both host bus adapters configured for hot plugging.
Multi-Initiator SCSI Bus Configuration With Hot Plugging
The following figure shows the termination in a JBOD storage enclosure connected to a multi-initiator SCSI bus.
JBOD Storage Connected to a Multi-Initiator Bus
The following figure shows the termination in a single-controller RAID array connected to a multi-initiator SCSI bus.
Single-Controller RAID Array Connected to a Multi-Initiator Bus
The following figure shows the termination in a dual-controller RAID array connected to a multi-initiator SCSI bus.
Dual-Controller RAID Array Connected to a Multi-Initiator Bus
A single-initiator SCSI bus has only one cluster system connected to it, and provides host isolation and better performance than a multi-initiator bus. Single-initiator buses ensure that each cluster system is protected from disruptions due to the workload, initialization, or repair of the other cluster system.
If you have a single or dual-controller RAID array that has multiple host ports and provides simultaneous access to all the shared logical units from the host ports on the storage enclosure, you can set up two single-initiator SCSI buses to connect each cluster system to the RAID array. If a logical unit can fail over from one controller to the other, the process must be transparent to the operating system.
It is recommended that production environments use single-initiator SCSI buses or single-initiator Fibre Channel interconnects.
Note that some RAID controllers restrict a set of disks to a specific controller or port. In this case, you cannot set up single-initiator buses. In addition, hot plugging is not necessary in a single-initiator SCSI bus, because the private bus does not need to remain operational when you disconnect a host bus adapter from the bus.
A single-initiator bus must adhere to the requirements described in SCSI Bus Configuration Requirements. In addition, see Host Bus Adapter Features and Configuration Requirements for detailed information about terminating host bus adapters and configuring a single-initiator bus.
To set up a single-initiator SCSI bus configuration, you must do the following:
To set host bus adapter termination, you usually must enter a BIOS utility during system boot. To set RAID controller termination, see the vendor documentation.
The following figure shows a configuration that uses two single-initiator SCSI buses.
Single-Initiator SCSI Bus Configuration
The following figure shows the termination in a single-controller RAID array connected to two single-initiator SCSI buses.
Single-Controller RAID Array Connected to Single-Initiator SCSI Buses
The following figure shows the termination in a dual-controller RAID array connected to two single-initiator SCSI buses.
Dual-Controller RAID Array Connected to Single-Initiator SCSI Buses
A single-initiator Fibre Channel interconnect has only one cluster system connected to it, and provides host isolation and better performance than a multi-initiator bus. Single-initiator interconnects ensure that each cluster system is protected from disruptions due to the workload, initialization, or repair of the other cluster system.
It is recommended that production environments use single-initiator SCSI buses or single-initiator Fibre Channel interconnects.
If you have a RAID array that has multiple host ports, and the RAID array provides simultaneous access to all the shared logical units from the host ports on the storage enclosure, you can set up two single-initiator Fibre Channel interconnects to connect each cluster system to the RAID array. If a logical unit can fail over from one controller to the other, the process must be transparent to the operating system.
The following figure shows a single-controller RAID array with two host ports, and the host bus adapters connected directly to the RAID controller, without using Fibre Channel hubs or switches.
Single-Controller RAID Array Connected to Single-Initiator Fibre Channel Interconnects
If you have a dual-controller RAID array with two host ports on each controller, you must use a Fibre Channel hub or switch to connect each host bus adapter to one port on both controllers, as shown in the following figure.
Dual-Controller RAID Array Connected to Single-Initiator Fibre Channel Interconnects
You must create two raw devices on shared disk storage for the primary quorum partition and the backup quorum partition. Each quorum partition must have a minimum size of 2 MB. The amount of data in a quorum partition is constant; it does not increase or decrease over time.
The quorum partitions are used to hold cluster state information. Periodically, each cluster system writes its status (either UP or DOWN), a timestamp, and the state of its services. In addition, the quorum partitions contain a version of the cluster database. This ensures that each cluster system has a common view of the cluster configuration.
To monitor cluster health, the cluster systems periodically read state information
from the primary quorum partition and determine if it is up to date. If the
primary partition is corrupted, the cluster systems read the information from
the backup quorum partition and simultaneously repair the primary partition.
Data consistency is maintained through checksums and any inconsistencies between
the partitions are automatically corrected.
If a system is unable to write to both quorum partitions at startup time, it
will not be allowed to join the cluster. In addition, if an active cluster system
can no longer write to both quorum partitions, the system will remove itself
from the cluster by rebooting.
You must adhere to the following quorum partition requirements:
The following are recommended guidelines for configuring the quorum
partitions:
See Partitioning Disks and Creating Raw Devices for more information about setting up the quorum partitions.
See Editing the rawio File for information about editing the rawio file to bind the raw character devices to the block devices each time the cluster systems boot.
After you set up the shared disk storage hardware, you must partition the disks so they can be used in the cluster. You can then create file systems or raw devices on the partitions. For example, you must create two raw devices for the quorum partitions, using the guidelines described in Configuring Quorum Partitions.
Invoke the interactive fdisk command to modify a disk partition table and divide the disk into partitions. Use the p command to display the current partition table. Use the n command to create a new partition.
The following example shows how to use the fdisk
command to partition a disk:
# fdisk /dev/sde Command (m for help): p Disk /dev/sde: 255 heads, 63 sectors, 2213 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/sde1 1 262 2104483+ 83 Linux /dev/sde2 263 288 208845 83 Linux
Command (m for help): n Command action e extended p primary partition (1-4)
Partition number (1-4): 3
First cylinder (289-2213, default 289): 289
Last cylinder or +size or +sizeM or +sizeK (289-2213, default 2213): +2000M
Note that large partitions will increase the cluster service failover time if a file system on the partition must be checked with fsck. Quorum partitions must be at least 2 MB, although 10 MB is recommended.
Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: If you have created or modified any DOS 6.x partitions, please see the fdisk manual page for additional information. Syncing disks.
After you partition a disk, you can format it for use in the cluster. You must create raw devices for the quorum partitions. You can also format the remainder of the shared disks as needed by the cluster services. For example, you can create file systems or raw devices on the partitions.
See Creating Raw Devices and Creating File Systems for more information.
After you partition the shared storage disks, as described in Partitioning Disks, you can create raw devices on the partitions. File systems are block devices (for example, /dev/sda1) that cache recently-used data in memory in order to improve performance. Raw devices do not utilize system memory for caching. See Creating File Systems for more information.
Linux supports raw character devices that are not hard-coded against specific block devices. Instead, Linux uses a character major number (currently 162) to implement a series of unbound raw devices in the /dev/raw directory. Any block device can have a character raw device front-end, even if the block device is loaded later at runtime.
To create a raw device, use the raw command to bind a raw character device to the appropriate block device. Once bound to a block device, a raw device can be opened, read, and written.
You must create raw devices for the quorum partitions. In addition, some database applications require raw devices, because these applications perform their own buffer caching for performance purposes. Quorum partitions cannot contain file systems because if state data was cached in system memory, the cluster systems would not have a consistent view of the state data.
To enable the cluster systems to access the quorum partitions as raw devices, your kernel must support raw I/O and include the raw command. See Linux Distribution and Kernel Requirements for more information.
Some Linux distributions automatically create raw character devices at installation time in the /dev/raw directory. There are 255 raw character devices available for binding, in addition to a master raw device (with minor number 0) that is used to control the bindings on the other raw devices. Note that the permissions for a raw device are different from those on the corresponding block device. You must explicitly set the mode and ownership of the raw device.
If you need to create raw character devices, follow these steps:
The following example creates four raw character devices on systems that are using the latest
version of the raw command:
# mkdir /dev/raw # mknod /dev/rawctl c 162 0 # mknod /dev/raw/raw c 162 0 # chmod 700 /dev/raw # mknod /dev/raw/raw1 c 162 1 # mknod /dev/raw/raw2 c 162 2 # mknod /dev/raw/raw3 c 162 3 # mknod /dev/raw/raw4 c 162 4
The following example creates four raw character devices on systems that are using an old
version of the raw command:
# mknod /dev/raw c 162 0 # chmod 600 /dev/raw # mknod /dev/raw1 c 162 1 # mknod /dev/raw2 c 162 2 # mknod /dev/raw3 c 162 3 # mknod /dev/raw4 c 162 4
You can use one of the following raw command formats to bind a raw character device to a block device:
# raw /dev/raw/raw1 8 33
# raw /dev/raw/raw1 /dev/sdc1
You can also use the raw
command to:
# raw -aq /dev/raw/raw1 bound to major 8, minor 17 /dev/raw/raw2 bound to major 8, minor 18
Raw character devices must be bound to block devices each time a system boots. To ensure that this occurs, edit the rawio file and specify the quorum partition bindings. If you are using a raw device in a cluster service, you can also use this file to bind the devices at boot time. See Editing the rawio File for more information.
Note that, for raw devices, there is no cache coherency between the raw device and the block device. In addition, requests must be 512-byte aligned both in memory and on disk. For example, the standard dd command cannot be used with raw devices because the memory buffer that the command passes to the write system call is not aligned on a 512-byte boundary. To obtain a version of the dd command that works with raw devices, see www.sgi.com/developers/oss.
If you are developing an application that accesses a raw device, there are restrictions on the type of I/O operations that you can perform. See Raw I/O Programming Example for an example of application source code that adheres to these restrictions.
Use the mkfs command to create an ext2 file system on a partition. Specify the drive letter and the partition number. For example:
# mkfs /dev/sde3For optimal performance, use a 4 KB block size when creating shared file systems. Note that some of the mkfs file system build utilities default to a 1 KB block size, which can cause long fsck times.
After you install and configure the cluster hardware, you must install the cluster software and initialize the cluster systems. The following sections describe:
Mission Critical Linux provides the Kimberlite cluster software two formats: a compressed tar file that contains a snapshot of the cluster software sources, and a complete CVS source tree that contains the latest updates to the cluster software. You must first build the Kimberlite software, and then install it on each cluster system. If desired, you can build the software on a non-cluster system, and then copy the software to the cluster systems and install the software.
By default, Kimberlite is installed in the /opt/cluster directory.
Before installing Kimberlite, be sure that you have installed all the required software and kernel patches, as described in Linux Distribution and Kernel Requirements.
If you are updating the cluster software and want to preserve the existing cluster configuration database, you must back up the cluster database and stop the cluster software before you reinstall Kimberlite. See Updating the Cluster Software for more information.
Before installing Kimberlite, be sure that you have sufficient disk space to accommodate the files. The compressed tar file is approximately 1.3 MB in size. The source files and the uncompressed tar file require approximately 9.0 MB of disk space.
The system on which you build the Kimberlite software must adhere to the following requirements:
# ./configure --disable-symbols --disable-sharedSpecify the --prefix=filename option to install Tcl in a directory other than the default, /usr/local. Then, invoke the make clean, make, and make install commands.
To build and install Kimberlite by checking out the source files, perform these tasks:
export CVSROOT=:pserver:anonymous@oss.missioncriticallinux.com:/var/cvsroot
# cvs loginWhen prompted for a password, specify anonymous.
# cvs co kimberlite # cd kimberlite # ./configure # make
# pushd .. && tar czf filename.tar.gz kimberlite && popdCopy the filename.tar.gz file to the system that you want to install, use the tar -xvzf command to extract files from the tar file, change to the kimberlite directory, and run the make install command.
To build and install Kimberlite by using the tar file that is provided by Mission Critical Linux, follow these steps:
# pushd .. && tar czf filename.tar.gz kimberlite-x.y.z && popdCopy the filename.tar.gz file to the system that you want to install, use the tar -xvzf command to extract files from the tar file, change to the kimberlite-x.y.z directory, and run the make install command.
To initialize and start the cluster software, perform the following tasks:
# chgrp -R cluster /opt/cluster # chmod 774 /opt/cluster/bin/*
# clu_config --init=/dev/raw/raw1
# /etc/rc.d/init.d/cluster start
After you have initialized the cluster, you can add cluster services. See Using the cluadmin Utility, Configuring and Using the Graphical User Interface, and Configuring a Service for more information.
The rawio file is used to map the raw devices for the quorum partitions each time a cluster system boots. As part of the cluster software installation procedure, you must edit the rawio file on each cluster system and specify the raw character devices and block devices for the primary and backup quorum partitions. You also must set the mode for the raw devices so that all users have read permission. This enables the cluster graphical interface to work correctly.
In addition, ensure that the rawio file has execute permission.
If you are using raw devices in a cluster service, you can also use the rawio file to bind the devices at boot time. Edit the file and specify the raw character devices and block devices that you want to bind each time the system boots.
See Configuring Quorum Partitions for more information about setting up the quorum partitions. See Creating Raw Devices for more information on using the raw command to bind raw character devices to block devices.
The rawio file is located in the System V init directory (for example, /etc/rc.d/init.d/rawio). An example of a rawio file is as follows:
#!/bin/bash # rawio Map block devices to raw character devices. # description: rawio mapping # chkconfig: 345 98 01 # # Bind raw devices to block devices. # Tailor to match the device special files matching your disk configuration. # Note: Must be world readable for cluster web GUI to be operational. # raw /dev/raw/raw1 /dev/sdb2 chmod a+r /dev/raw/raw1 raw /dev/raw/raw2 /dev/sdb3 chmod a+r /dev/raw/raw2
This section includes an example of the member_config cluster configuration utility, which prompts you for information about the cluster members, and then enters the information into the cluster database, a copy of which is located in the cluster.conf file. See Cluster Database Fields for a description of the contents of the file.
In the example, the information entered at the member_config prompts applies to the following configuration:
# /opt/cluster/bin/member_config ------------------------------------ Cluster Member Configuration Utility ------------------------------------ Version: 1.1.2 Built: Thu Oct 26 12:09:30 EDT 2000 This utility sets up the member systems of a 2-node cluster. It prompts you for the following information: o Hostname o Number of heartbeat channels o Information about the type of channels and their names o Raw quorum partitions, both primary and shadow o Power switch type and device name In addition, it performs checks to make sure that the information entered is consistent with the hardware, the Ethernet ports, the raw partitions and the character device files. After all the information is entered, it initializes the partitions and saves the configuration information to the quorum partitions. - Checking that cluster daemons are stopped: done Your cluster configuration should include power switches for optimal data integrity. - Does the cluster configuration include power switches? (yes/no) [yes]: y ---------------------------------------- Setting information for cluster member 0 ---------------------------------------- Enter name of cluster member [storage0]: storage0 Looking for host storage0 (may take a few seconds)... Host storage0 found Cluster member name set to: storage0 Enter number of heartbeat channels (minimum = 1) [1]: 3 You selected 3 channels Information about channel 0: Channel type: net or serial [net]: net Channel type set to: net Enter hostname of cluster member storage0 on heartbeat channel 0 [storage0]: storage0 Looking for host storage0 (may take a few seconds)... Host storage0 found Hostname corresponds to an interface on member 0 Channel name set to: storage0 Information about channel 1: Channel type: net or serial [net]: net Channel type set to: net Enter hostname this interface responds to [storage0]: cstorage0 Looking for host cstorage0 (may take a few seconds)... Host cstorage0 found Hostname corresponds to an interface on member 0 Channel name set to: cstorage0 Information about channel 2: Channel type: net or serial [net]: serial Channel type set to: serial Enter device name [/dev/ttyS1]: /dev/ttyS1 Device /dev/ttyS1 found and no getty running on it Device name set to: /dev/ttyS1 Setting information about Quorum Partitions Enter Primary Quorum Partition [/dev/raw/raw1]: /dev/raw/raw1 Raw device /dev/raw/raw1 found Primary Quorum Partition set to /dev/raw/raw1 Enter Shadow Quorum Partition [/dev/raw/raw2]: /dev/raw/raw2 Raw device /dev/raw/raw2 found Shadow Quorum Partition set to /dev/raw/raw2 Information about power switch connected to member 0 Enter serial port for power switch [/dev/ttyC0]: /dev/ttyC0 Device /dev/ttyC0 found and no getty running on it Serial port for power switch set to /dev/ttyC0 Specify one of the following switches (RPS10/APC) [RPS10]: RPS10 Power switch type set to RPS10 ---------------------------------------- Setting information for cluster member 1 ---------------------------------------- Enter name of cluster member: storage1 Looking for host storage1 (may take a few seconds)... Host storage1 found Cluster member name set to: storage1 You previously selected 3 channels Information about channel 0: Channel type selected as net Enter hostname of cluster member storage1 on heartbeat channel 0: storage1 Looking for host storage1 (may take a few seconds)... Host storage1 found Channel name set to: storage1 Information about channel 1: Channel type selected as net Enter hostname this interface responds to [storage1]: cstorage1 Information about channel 2: Channel type selected as serial Enter device name [/dev/ttyS1]: /dev/ttyS1 Device name set to: /dev/ttyS1 Setting information about Quorum Partitions Enter Primary Quorum Partition [/dev/raw/raw1]: /dev/raw/raw1 Primary Quorum Partition set to /dev/raw/raw1 Enter Shadow Quorum Partition [/dev/raw/raw2]: /dev/raw/raw2 Shadow Quorum Partition set to /dev/raw/raw2 Information about power switch connected to member 1 Enter serial port for power switch [/dev/ttyS0]: /dev/ttyS0 Serial port for power switch set to /dev/ttyS0 Specify one of the following switches (RPS10/APC) [RPS10]: RPS10 Power switch type set to RPS10 ------------------------------------ The following choices will be saved: ------------------------------------ --------------------- Member 0 information: --------------------- Name: storage0 Primary quorum partition set to /dev/raw/raw1 Shadow quorum partition set to /dev/raw/raw2 Heartbeat channels: 3 Channel type: net. Name: storage0 Channel type: net. Name: cstorage0 Channel type: serial. Name: /dev/ttyS1 Power Switch type: RPS10. Port: /dev/ttyC0 --------------------- Member 1 information: --------------------- Name: storage1 Primary quorum partition set to /dev/raw/raw1 Shadow quorum partition set to /dev/raw/raw2 Heartbeat channels: 3 Channel type: net. Name: storage1 Channel type: net. Name: cstorage1 Channel type: serial. Name: /dev/ttyS1 Power Switch type: RPS10. Port: /dev/ttyS0 ------------------------------------ Save changes? yes/no [yes]: yes Writing to output configuration file...done. Changes have been saved to /etc/opt/cluster/cluster.conf ---------------------------- Setting up Quorum Partitions ---------------------------- Quorum partitions have not been set up yet. Run diskutil -I to set up the quorum partitions now? yes/no [yes]: yes
Saving configuration information to quorum partition: ------------------------------------------------------------------ Setup on this member is complete. If errors have been reported, correct them. If you have not already set up the other cluster member, before running member_config, invoke the following command on the other cluster member: # /opt/cluster/bin/clu_config --init=/dev/raw/raw1 After running member_config on the other member system, you can start the cluster daemons on each cluster system by invoking the cluster start script located in the System V init directory. For example: # /etc/rc.d/init.d/cluster start
To ensure that you have correctly configured the cluster software, check the
configuration by using tools located in the /opt/cluster/bin
directory:
The following sections describe these tools.
The quorum partitions must refer to the same physical device on both cluster systems. Invoke the diskutil utility with the -t command to test the quorum partitions and verify that they are accessible.
If the command succeeds, run the diskutil -p command on both cluster systems to display a summary of the header data structure for the quorum partitions. If the output is different on the systems, the quorum partitions do not point to the same devices on both systems. Check to make sure that the raw devices exist and are correctly specified in the rawio file. See Configuring the Quorum Partitions for more information.
The following example shows that the quorum partitions refer to the same physical device on two cluster systems:
[root@devel0 /root]# /opt/cluster/bin/diskutil -p ----- Shared State Header ------ Magic# = 0x39119fcd Version = 1 Updated on Thu Sep 14 05:43:18 2000 Updated by node 0 -------------------------------- [root@devel0 /root]# [root@devel1 /root]# /opt/cluster/bin/diskutil -p ----- Shared State Header ------ Magic# = 0x39119fcd Version = 1 Updated on Thu Sep 14 05:43:18 2000 Updated by node 0 -------------------------------- [root@devel1 /root]#
The Magic# and Version fields will be the same for all cluster configurations. The last two lines of output indicate the date that the quorum partitions were initialized with diskutil -I, and the numeric identifier for the cluster system that invoked the initialization command.
If the output of the diskutil utility with the -p option is not the same on both cluster systems, you can do the following:
After you perform these tasks, re-run the diskutil utility with the -p option.
If you are using power switches, after you install the cluster software, but before starting the cluster, use the pswitch command to test the power switches. Invoke the command on each cluster system to ensure that it can remotely power-cycle the other cluster system.
The pswitch command can accurately test a power switch only if the cluster software is not running, because only one program at a time can access the serial port that connects a power switch to a cluster system. When you invoke the pswitch command, it checks the status of the cluster software. If the cluster software is running, the command exits with a message to stop the cluster software.
The format of the pswitch command is as follows:
pswitch [option] command
The option argument can be one or more of the following:
The command argument can be one of the following:
The following example of the pswitch command output shows that the power switch is operational:
# /opt/cluster/bin/pswitch status switch status: switch is: On error? No timedout? No initialized? Yes #
If the error or timedout fields are Yes, or if the initialized field is No, the cluster software will not function correctly. If this occurs, you may be able to correct the problem as follows:
Invoke the release_version command to display the version of the cluster software running on the system and the software's build date. Ensure that both cluster systems are running the same version. This information may be required when you request support services. For example:
[root@storage1 init.d]# /opt/cluster/bin/release_version Version: 1.1.0 Built: Tue Sep 19 16:05:01 EDT 2000
You should edit the /etc/syslog.conf file to enable the cluster to log events to a file that is different from the /var/log/messages default log file. Logging cluster messages to a separate file will help you diagnose problems.
The cluster systems use the syslogd daemon to log cluster-related events to a file, as specified in the /etc/syslog.conf file. You can use the log file to diagnose problems in the cluster. It is recommended that you set up event logging so that the syslogd daemon logs cluster messages only from the system on which it is running. Therefore, you need to examine the log files on both cluster systems to get a comprehensive view of the cluster.
The syslogd daemon logs messages from the following cluster daemons:
The importance of an event determines the severity level of the log entry. Important events should be investigated before they affect cluster availability. The cluster can log messages with the following severity levels, listed in the order of decreasing severity:
The default logging severity levels for the cluster daemons are warning and higher.
Examples of log file entries are as follows:
May 31 20:42:06 clu2 svcmgr[992]: <info> Service Manager starting May 31 20:42:06 clu2 svcmgr[992]: <info> mount.ksh info: /dev/sda3 is not mounted May 31 20:49:38 clu2 clulog[1294]: <notice> stop_service.ksh notice: Stopping service dbase_home May 31 20:49:39 clu2 svcmgr[1287]: <notice> Service Manager received a NODE_UP event for stor5 Jun 01 12:56:51 clu2 quorumd[1640]: <err> updateMyTimestamp: unable to update status block. Jun 01 12:34:24 clu2 quorumd[1268]: <warning> Initiating cluster stop Jun 01 12:34:24 clu2 quorumd[1268]: <warning> Completed cluster stop Jul 27 15:28:40 clu2 quorumd[390]: <err> shoot_partner: successfully shot partner. [1] [2] [3] [4] [5]
Each entry in the log file contains the following information:
[1]Timestamp
[2] Cluster system on which the event was logged
[3] Subsystem that generated the event
[4] Severity level of the event
[5] Description of the event
After you configure the cluster software, you should edit the /etc/syslog.conf
file to enable the cluster to log events to a file that is different from
the default log file, /var/log/messages.
Using a cluster-specific log file facilitates cluster monitoring and problem
solving. To log cluster events to both the /var/log/cluster
and /var/log/messages
files, add lines similar to the following to the /etc/syslog.conf
file:
# # Cluster messages coming in on local4 go to /var/log/cluster # local4.* /var/log/cluster
To prevent the duplication of messages and log cluster events only to the /var/log/cluster file, also add lines similar to the following to the /etc/syslog.conf file:
# Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;mail.none;news.none;authpriv.none;local4.none /var/log/messages
To apply the previous changes, you can invoke the killall -HUP syslogd command, or restart syslog with a command similar to /etc/rc.d/init.d/syslog restart.
In addition, you can modify the severity level of the events that are logged
by the individual cluster daemons.
See Modifying Cluster Event Logging for more information.
The cluadmin utility provides a command-line user interface that enables you to monitor and manage the cluster systems and services. For example, you can use the cluadmin utility to perform the following tasks:
You can also use the browser-based graphical user interface (GUI) to monitor cluster systems and services. See Configuring and Using the Graphical User Interface for more information.
The cluster uses an advisory lock to prevent the cluster database from being simultaneously modified by multiple users on either cluster system. You can only modify the database if you hold the advisory lock.
When you invoke the cluadmin utility, the cluster software checks if the lock is already assigned to a user. If the lock is not already assigned, the cluster software assigns you the lock. When you exit from the cluadmin utility, you relinquish the lock.
If another user holds the lock, a warning will be displayed indicating that there is already a lock on the database. The cluster software gives you the option of taking the lock. If you take the lock, the previous holder of the lock can no longer modify the cluster database.
You should take the lock only if necessary, because uncoordinated simultaneous configuration sessions may cause unpredictable cluster behavior. In addition, it is recommended that you make only one change to the cluster database (for example, adding, modifying, or deleting services) at one time.
You can specify the following cluadmin command line options:
When you invoke the cluadmin utility without the -n option, the cluadmin> prompt appears. You can then specify commands and subcommands. The following table describes the commands and subcommands for the cluadmin utility:
cluadmin Command |
cluadmin Subcommand |
Description |
help | None | Displays help for the specified cluadmin
command or subcommand. For example: cluadmin> help service add |
cluster | status | Displays a snapshot of the current cluster status. See Displaying
Cluster and Service Status for information. For example: cluadmin> cluster status |
monitor | Continuously displays snapshots of the cluster status at five-second
intervals. Press the Return
or Enter key to stop
the display. You can specify the -interval
option with a numeric argument to display snapshots at the specified time
interval (in seconds). In addition, you can specify the -clear
option with a yes argument to clear the screen after each snapshot display
or with a no argument to not clear the screen. See Displaying
Cluster and Service Status for information. For example: cluadmin> cluster monitor -clear yes -interval 10 |
|
loglevel |
Sets the logging for the specified cluster daemon to the specified
severity level. See Modifying Cluster Event Logging
for information. For example: cluadmin> cluster loglevel quorumd 7 |
|
reload | Forces the cluster daemons to re-read the cluster configuration
database. See Reloading the Cluster Database
for information. For example: cluadmin> cluster reload |
|
name | Sets the name of the cluster to the specified name. The
cluster name is included in the output of the clustat
cluster monitoring command and the GUI. See Changing
the Cluster Name for information. For example: cluadmin> cluster name dbasecluster |
|
backup | Saves a copy of the cluster configuration database in the
/etc/opt/cluster/cluster.conf.bak
file. See Backing Up and Restoring the Cluster
Database for information. For example: cluadmin> cluster backup |
|
restore | Restores the cluster configuration database from the backup
copy in the /etc/opt/cluster/cluster.conf.bak
file. See Backing Up and Restoring the Cluster Database for information. For example: cluadmin> cluster restore |
|
saveas | Saves the cluster configuration database to the specified
file. See Backing Up and Restoring the Cluster
Database for information. For example: cluadmin> cluster saveas cluster_backup.conf |
|
restorefrom | Restores the cluster configuration database from the specified
file. See Backing Up and Restoring the Cluster
Database for information. For example: cluadmin> cluster restorefrom cluster_backup.conf |
|
service | add | Adds a cluster service to the cluster database.
The command prompts you for information about service resources and properties.
See Configuring a Service for information.
For example: cluadmin> service add |
modify | Modifies the resources or properties of the specified service.
You can modify any of the information that you specified when the service
was created. See Modifying a Service for
information. For example: cluadmin> service modify dbservice |
|
show state | Displays the current status of all services or the specified
service. See Displaying Cluster and Service Status
for information. For example: cluadmin> service show state dbservice |
|
show config | Displays the current configuration for the specified service.
See Displaying a Service Configuration for
information. For example: cluadmin> service show config dbservice |
|
disable | Stops the specified service. You must enable a service to
make it available again. See Disabling a Service
for information. For example: cluadmin> service disable dbservice |
|
enable | Starts the specified disabled service. See Enabling
a Service for information. For example: cluadmin> service enable dbservice |
|
delete | Deletes the specified service from the cluster configuration
database. See Deleting a Service for information.
For example: service delete dbservice |
|
apropos | None | Displays the cluadmin
commands that match the specified character string argument or, if no
argument is specified, displays all cluadmin
commands. For example: cluadmin> apropos service |
clear | None | Clears the screen display. For example: cluadmin> clear |
exit | None | Exits from
cluadmin. For example: cluadmin> exit |
While using cluadmin utility, you can press the Tab key to help identify cluadmin commands. For example, pressing the Tab key at the cluadmin> utility displays a list of all the commands. Entering a letter at the prompt and then pressing the Tab key displays the commands that begin with the specified letter. Specifying a command and then pressing the Tab key displays a list of all the subcommands that can be specified with that command.
In addition, you can display the history of cluadmin
commands by pressing the up arrow and down arrow keys at the prompt. The command
history is stored in the .cluadmin_history
file in your home directory.
You can use the browser-based graphical user interface (GUI) to monitor the cluster members and services. Before you can use the GUI, you must perform some tasks on each cluster system. The instructions that follow are based on a generic Apache configuration. The actual directories and files that you should use depend on your Linux distribution and Web server software.
To configure and use the GUI, follow these steps:
Options Indexes Includes FollowSymLinks ExecCGIIn addition, be sure that the following line is not commented out by a preceding number sign (#) character:
AddHandler cgi-script .cgi
# chmod 4755 clumon.cgi
# /etc/rc.d/init.d/httpd restart
Invoke the GUI by using the following URL format, where cluster_system specifies the name of the cluster system on which you are invoking the GUI:
http://cluster_name/clumon.cgi
The following example shows the cluster and service status that is displayed when you start the GUI.
The following sections describe how to set up and administer cluster services:
To configure a service, you must prepare the cluster systems for the service.
For example, you must set up any disk storage or applications used in the
services. You can then add information about the service properties and resources
to the cluster database, a copy of which is located in the
/etc/opt/cluster/cluster.conf file.
This information is used as parameters to scripts that start and stop
the service.
To configure a service, follow these steps
cluadmin> service add
For more information about adding a cluster service,
see the following:
See Cluster Database Fields for a description of the service fields in the database. In addition, the /opt/cluster/doc/services/examples/cluster.conf_services file contains an example of a service entry from a cluster configuration file. Note that it is only an example.
Before you create a service, you must gather information about the service resources and properties. When you add a service to the cluster database, the cluadmin utility prompts you for this information.
In some cases, you can specify multiple resources for a service. For example, you can specify multiple IP addresses and disk devices.
The service properties and resources that you can specify are described in the following table.
Service Property or Resource |
Description |
Service name | Each service must have a unique name. A service name can consist of one to 63 characters and must consist of a combination of letters (either uppercase or lowercase), integers, underscores, periods, and dashes. However, a service name must begin with a letter or an underscore. |
Preferred member |
Specify the cluster system, if any, on which you want the service to run unless failover has occurred or unless you manually relocate the service. |
Preferred member relocation policy
|
If you enable this policy, the service will automatically relocate to its preferred member when that system joins the cluster. If you disable this policy, the service will remain running on the non-preferred member. For example, if you enable this policy and the failed preferred member for the service reboots and joins the cluster, the service will automatically restart on the preferred member. |
Script location | If applicable, specify the full path name for the script that will be used to start and stop the service. See Creating Service Scripts for more information. |
IP address |
You can assign one or more Internet protocol (IP) addresses to a service. This IP address (sometimes called a "floating" IP address) is different from the IP address associated with the host name Ethernet interface for a cluster system, because it is automatically relocated along with the service resources, when failover occurs. If clients use this IP address to access the service, they do not know which cluster system is running the service, and failover is transparent to the clients.
Note that cluster members must have network interface cards configured
in the IP subnet of each IP address used in a service. |
Disk partition, owner, group, and access mode | Specify each shared disk partition used in a service. In addition, you can specify the owner, group, and access mode (for example, 755) for each mount point or raw device. |
Mount points, file system type, and mount options |
If you are using a file system, you must specify the type of file system, a mount point, and any mount options. Mount options that you can specify are the standard file system mount options that are described in the mount.8 manpage. If you are using a raw device, you do not have to specify mount information. The ext2 file system is the recommended file system for a cluster. Although you can use a different file system in a cluster, log-based and other file systems such as reiserfs and ext3 have not been fully tested. In addition, you must specify whether you want to enable forced unmount for a file system. Forced unmount enables the cluster service management infrastructure to unmount a file system even if it is being accessed by an application or user (that is, even if the file system is "busy"). This is accomplished by terminating any applications that are accessing the file system. |
Disable service policy | If you do not want to automatically start a service after it is added to the cluster, you can choose to keep the new service disabled, until an administrator explicitly enables the service. |
For services that include an application, you must create a script that contains specific instructions to start and stop the application (for example, a database application). The script will be called with a start or stop argument and will run at service start time and stop time. The script should be similar to the scripts found in the System V init directory.
The /opt/cluster/doc/services/examples directory contains a template that you can use to create service scripts, in addition to examples of scripts. See Setting Up an Oracle Service, Setting Up a MySQL Service, Setting Up an Apache Service, and Setting Up a DB2 Service for sample scripts.
Before you create a service, set up the shared file systems and raw devices that the service will use. See Configuring Shared Disk Storage for more information.
If you are using raw devices in a cluster service, you can use the rawio file to bind the devices at boot time. Edit the file and specify the raw character devices and block devices that you want to bind each time the system boots. See Editing the rawio File for more information.
Note that software RAID, SCSI adapter-based RAID, and host-based RAID are not supported for shared disk storage.
You should adhere to these service disk storage recommendations:
Before you set up a service, install any application that will be used in a service on each system. After you install the application, verify that the application runs and can access shared disk storage. To prevent data corruption, do not run the application simultaneously on both systems.
If you are using a script to start and stop the service application, you must install and test the script on both cluster systems, and verify that it can be used to start and stop the application. See Creating Service Scripts for information.
A database service can serve highly-available data to a database application. The application can then provide network access to database client systems, such as Web servers. If the service fails over, the application accesses the shared database data through the new cluster system. A network-accessible database service is usually assigned an IP address, which is failed over along with the service to maintain transparent access for clients.
This section provides an example of setting up a cluster service for an Oracle database. Although the variables used in the service scripts depend on the specific Oracle configuration, the example may help you set up a service for your environment. See Tuning Oracle Services for information about improving service performance.
In the example that follows:
The Oracle service example uses five scripts that must be placed in /home/oracle and owned by the Oracle administration account. The oracle script is used to start and stop the Oracle service. Specify this script when you add the service. This script calls the other Oracle example scripts. The startdb and stopdb scripts start and stop the database. The startdbi and stopdbi scripts start and stop a Web application that has been written by using Perl scripts and modules and is used to interact with the Oracle database. Note that there are many ways for an application to interact with an Oracle database.
The following is an example of the oracle script, which is used to start and stop the Oracle service. Note that the script is run as user oracle, instead of root.
#!/bin/sh # # Cluster service script to start/stop oracle # cd /home/oracle case $1 in 'start') su - oracle -c ./startdbi su - oracle -c ./startdb ;; 'stop') su - oracle -c ./stopdb su - oracle -c ./stopdbi ;; esac
The following is an example of the startdb script, which is used to start the Oracle Database Server instance:
#!/bin/sh # # # Script to start the Oracle Database Server instance. # ########################################################################### # # ORACLE_RELEASE # # Specifies the Oracle product release. # ########################################################################### ORACLE_RELEASE=8.1.6 ########################################################################### # # ORACLE_SID # # Specifies the Oracle system identifier or "sid", which is the name of the # Oracle Server instance. # ########################################################################### export ORACLE_SID=TESTDB ########################################################################### # # ORACLE_BASE # # Specifies the directory at the top of the Oracle software product and # administrative file structure. # ########################################################################### export ORACLE_BASE=/u01/app/oracle ########################################################################### # # ORACLE_HOME # # Specifies the directory containing the software for a given release. # The Oracle recommended value is $ORACLE_BASE/product/<release> # ########################################################################### export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE} ########################################################################### # # LD_LIBRARY_PATH # # Required when using Oracle products that use shared libraries. # ########################################################################### export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib ########################################################################### # # PATH # # Verify that the users search path includes $ORCLE_HOME/bin # ########################################################################### export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin ########################################################################### # # This does the actual work. # # The oracle server manager is used to start the Oracle Server instance # based on the initSID.ora initialization parameters file specified. # ########################################################################### /u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl << EOF spool /home/oracle/startdb.log connect internal; startup pfile = /u01/app/oracle/admin/db1/pfile/initTESTDB.ora open; spool off EOF exit 0
The following is an example of the stopdb script, which is used to stop the Oracle Database Server instance:
#!/bin/sh # # # Script to STOP the Oracle Database Server instance. # ########################################################################### # # ORACLE_RELEASE # # Specifies the Oracle product release. # ########################################################################### ORACLE_RELEASE=8.1.6 ########################################################################### # # ORACLE_SID # # Specifies the Oracle system identifier or "sid", which is the name of the # Oracle Server instance. # ########################################################################### export ORACLE_SID=TESTDB ########################################################################### # # ORACLE_BASE # # Specifies the directory at the top of the Oracle software product and # administrative file structure. # ########################################################################### export ORACLE_BASE=/u01/app/oracle ########################################################################### # # ORACLE_HOME # # Specifies the directory containing the software for a given release. # The Oracle recommended value is $ORACLE_BASE/product/<release> # ########################################################################### export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE} ########################################################################### # # LD_LIBRARY_PATH # # Required when using Oracle products that use shared libraries. # ########################################################################### export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib ########################################################################### # # PATH # # Verify that the users search path includes $ORCLE_HOME/bin # ########################################################################### export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin ########################################################################### # # This does the actual work. # # The oracle server manager is used to STOP the Oracle Server instance # in a tidy fashion. # ########################################################################### /u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl << EOF spool /home/oracle/stopdb.log connect internal; shutdown abort; spool off EOF exit 0
The following is an example of the startdbi script, which is used to start a networking DBI proxy daemon:
#!/bin/sh # # ########################################################################### # # This script allows are Web Server application (perl scripts) to # work in a distributed environment. The technology we use is # base upon the DBD::Oracle/DBI CPAN perl modules. # # This script STARTS the networking DBI Proxy daemon. # ########################################################################### export ORACLE_RELEASE=8.1.6 export ORACLE_SID=TESTDB export ORACLE_BASE=/u01/app/oracle export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE} export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin # # This line does the real work. # /usr/bin/dbiproxy --logfile /home/oracle/dbiproxy.log --localport 1100 & exit 0
The following is an example of the stopdbi script, which is used to stop a networking DBI proxy daemon:
#!/bin/sh # # ####################################################################### # # Our Web Server application (perl scripts) work in a distributed # environment. The technology we use is base upon the DBD::Oracle/DBI # CPAN perl modules. # # This script STOPS the required networking DBI Proxy daemon. # ######################################################################## PIDS=$(ps ax | grep /usr/bin/dbiproxy | awk '{print $1}') for pid in $PIDS do kill -9 $pid done exit 0
The following example shows how to use cluadmin to add an Oracle service.
cluadmin> service add oracle The user interface will prompt you for information about the service. Not all information is required for all services. Enter a question mark (?) at a prompt to obtain help. Enter a colon (:) and a single-character command at a prompt to do one of the following: c - Cancel and return to the top-level cluadmin command r - Restart to the initial prompt while keeping previous responses p - Proceed with the next prompt Preferred member [None]: ministor0 Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes User script (e.g., /usr/foo/script or None) [None]: /home/oracle/oracle Do you want to add an IP address to the service (yes/no/?): yes IP Address Information IP address: 10.1.16.132 Netmask (e.g. 255.255.255.0 or None) [None]: 255.255.255.0 Broadcast (e.g. X.Y.Z.255 or None) [None]: 10.1.16.255 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses: f Do you want to add a disk device to the service (yes/no/?): yes Disk Device Information Device special file (e.g., /dev/sda1): /dev/sda1 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /u01 Mount options (e.g., rw, nosuid): [Return] Forced unmount support (yes/no/?) [no]: yes Device owner (e.g., root): root Device group (e.g., root): root Device mode (e.g., 755): 755 Do you want to (a)dd, (m)odify, (d)elete or (s)how devices, or are you (f)inished adding device information: a Device special file (e.g., /dev/sda1): /dev/sda2 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /u02 Mount options (e.g., rw, nosuid): [Return] Forced unmount support (yes/no/?) [no]: yes Device owner (e.g., root): root Device group (e.g., root): root Device mode (e.g., 755): 755 Do you want to (a)dd, (m)odify, (d)elete or (s)how devices, or are you (f)inished adding devices: f Disable service (yes/no/?) [no]: no name: oracle disabled: no preferred node: ministor0 relocate: yes user script: /home/oracle/oracle IP address 0: 10.1.16.132 netmask 0: 255.255.255.0 broadcast 0: 10.1.16.255 device 0: /dev/sda1 mount point, device 0: /u01 mount fstype, device 0: ext2 force unmount, device 0: yes device 1: /dev/sda2 mount point, device 1: /u02 mount fstype, device 1: ext2 force unmount, device 1: yes Add oracle service as shown? (yes/no/?) y notice: Starting service oracle ... info: Starting IP address 10.1.16.132 info: Sending Gratuitous arp for 10.1.16.132 (00:90:27:EB:56:B8) notice: Running user script '/home/oracle/oracle start' notice, Server starting Added oracle. cluadmin>
A database service can serve highly-available data to a database application. The application can then provide network access to database client systems, such as Web servers. If the service fails over, the application accesses the shared database data through the new cluster system. A network-accessible database service is usually assigned an IP address, which is failed over along with the service to maintain transparent access for clients.
You can set up a MySQL database service in a cluster. Note that MySQL does not provide full transactional semantics; therefore, it may not be suitable for update-intensive applications.
An example of a MySQL database service is as follows:
CR_SERVER_GONE_ERROR CR_SERVER_LOST
A sample script to start and stop the MySQL database is located in /opt/cluster/doc/services/examples/mysql.server, and is shown below:
#!/bin/sh # Copyright Abandoned 1996 TCX DataKonsult AB & Monty Program KB & Detron HB # This file is public domain and comes with NO WARRANTY of any kind # Mysql daemon start/stop script. # Usually this is put in /etc/init.d (at least on machines SYSV R4 # based systems) and linked to /etc/rc3.d/S99mysql. When this is done # the mysql server will be started when the machine is started. # Comments to support chkconfig on RedHat Linux # chkconfig: 2345 90 90 # description: A very fast and reliable SQL database engine. PATH=/sbin:/usr/sbin:/bin:/usr/bin basedir=/var/mysql bindir=/var/mysql/bin datadir=/var/mysql/var pid_file=/var/mysql/var/mysqld.pid mysql_daemon_user=root # Run mysqld as this user. export PATH mode=$1 if test -w / # determine if we should look at the root config file then # or user config file conf=/etc/my.cnf else conf=$HOME/.my.cnf # Using the users config file fi # The following code tries to get the variables safe_mysqld needs from the # config file. This isn't perfect as this ignores groups, but it should # work as the options doesn't conflict with anything else. if test -f "$conf" # Extract those fields we need from config file. then if grep "^datadir" $conf > /dev/null then datadir=`grep "^datadir" $conf | cut -f 2 -d= | tr -d ' '` fi if grep "^user" $conf > /dev/null then mysql_daemon_user=`grep "^user" $conf | cut -f 2 -d= | tr -d ' ' | head -1` fi if grep "^pid-file" $conf > /dev/null then pid_file=`grep "^pid-file" $conf | cut -f 2 -d= | tr -d ' '` else if test -d "$datadir" then pid_file=$datadir/`hostname`.pid fi fi if grep "^basedir" $conf > /dev/null then basedir=`grep "^basedir" $conf | cut -f 2 -d= | tr -d ' '` bindir=$basedir/bin fi if grep "^bindir" $conf > /dev/null then bindir=`grep "^bindir" $conf | cut -f 2 -d=| tr -d ' '` fi fi # Safeguard (relative paths, core dumps..) cd $basedir case "$mode" in 'start') # Start daemon if test -x $bindir/safe_mysqld then # Give extra arguments to mysqld with the my.cnf file. This script may # be overwritten at next upgrade. $bindir/safe_mysqld --user=$mysql_daemon_user --pid-file=$pid_file --datadir=$datadir & else echo "Can't execute $bindir/safe_mysqld" fi ;; 'stop') # Stop daemon. We use a signal here to avoid having to know the # root password. if test -f "$pid_file" then mysqld_pid=`cat $pid_file` echo "Killing mysqld with pid $mysqld_pid" kill $mysqld_pid # mysqld should remove the pid_file when it exits. else echo "No mysqld pid file found. Looked for $pid_file." fi ;; *) # usage echo "usage: $0 start|stop" exit 1 ;; esac
The following example shows how to use cluadmin to add a MySQL service.
cluadmin> service add The user interface will prompt you for information about the service. Not all information is required for all services. Enter a question mark (?) at a prompt to obtain help. Enter a colon (:) and a single-character command at a prompt to do one of the following: c - Cancel and return to the top-level cluadmin command r - Restart to the initial prompt while keeping previous responses p - Proceed with the next prompt Currently defined services: databse1 apache2 dbase_home mp3_failover Service name: mysql_1 Preferred member [None]: devel0 Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes User script (e.g., /usr/foo/script or None) [None]: /etc/rc.d/init.d/mysql.server Do you want to add an IP address to the service (yes/no/?): yes IP Address Information IP address: 10.1.16.12 Netmask (e.g. 255.255.255.0 or None) [None]: [Return] Broadcast (e.g. X.Y.Z.255 or None) [None]: [Return] Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses: f Do you want to add a disk device to the service (yes/no/?): yes Disk Device Information Device special file (e.g., /dev/sda1): /dev/sda1 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /var/mysql Mount options (e.g., rw, nosuid): rw Forced unmount support (yes/no/?) [no]: yes Device owner (e.g., root): root Device group (e.g., root): root Device mode (e.g., 755): 755 Do you want to (a)dd, (m)odify, (d)elete or (s)how devices, or are you (f)inished adding device information: f Disable service (yes/no/?) [no]: yes name: mysql_1 disabled: yes preferred node: devel0 relocate: yes user script: /etc/rc.d/init.d/mysql.server IP address 0: 10.1.16.12 netmask 0: None broadcast 0: None device 0: /dev/sda1 mount point, device 0: /var/mysql mount fstype, device 0: ext2 mount options, device 0: rw force unmount, device 0: yes Add mysql_1 service as shown? (yes/no/?) y Added mysql_1. cluadmin>
This section provides an example of setting up a cluster service that will fail over IBM DB2 Enterprise/Workgroup Edition on a Kimberlite cluster. This example assumes that NIS is not running on the cluster systems.
To install the software and database on the cluster systems, follow these steps:
10.1.16.182 ibmdb2.class.cluster.com ibmdb2
# mke2fs /dev/sda3
# mkdir /db2home
devel0# mount -t ext2 /dev/sda3 /db2home
devel0% mount -t iso9660 /dev/cdrom /mnt/cdrom devel0% cp /mnt/cdrom/IBM/DB2/db2server.rsp /root
-----------Instance Creation Settings------------ ------------------------------------------------- DB2.UID = 2001 DB2.GID = 2001 DB2.HOME_DIRECTORY = /db2home/db2inst1 -----------Fenced User Creation Settings---------- -------------------------------------------------- UDF.UID = 2000 UDF.GID = 2000 UDF.HOME_DIRECTORY = /db2home/db2fenc1 -----------Instance Profile Registry Settings------ --------------------------------------------------- DB2.DB2COMM = TCPIP ----------Administration Server Creation Settings--- ---------------------------------------------------- ADMIN.UID = 2002 ADMIN.GID = 2002 ADMIN.HOME_DIRECTORY = /db2home/db2as ---------Administration Server Profile Registry Settings- --------------------------------------------------------- ADMIN.DB2COMM = TCPIP ---------Global Profile Registry Settings------------- ------------------------------------------------------ DB2SYSTEM = ibmdb2
devel0# cd /mnt/cdrom/IBM/DB2 devel0# ./db2setup -d -r /root/db2server.rsp 1>/dev/null 2>/dev/null &
devel0# su - db2inst1 devel0# db2stop devel0# exit devel0# su - db2as devel0# db2admin stop devel0# exit
devel0# umount /db2home
devel1# mount -t ext2 /dev/sda3 /db2home
devel1# mount -t iso9660 /dev/cdrom /mnt/cdrom devel1# rcp devel0:/root/db2server.rsp /root
devel1# cd /mnt/cdrom/IBM/DB2 devel1# ./db2setup -d -r /root/db2server.rsp 1>/dev/null 2>/dev/null &
DB2 Instance Creation FAILURE Update DBM configuration file for TCP/IP CANCEL Update parameter DB2COMM CANCEL Auto start DB2 Instance CANCEL DB2 Sample Database CANCEL Start DB2 Instance Administration Server Creation FAILURE Update parameter DB2COMM CANCEL Start Administration Serve CANCEL
# mount -t ext2 /dev/sda3 /db2home # su - db2inst1 # db2start # db2 connect to sample # db2 select tabname from syscat.tables # db2 connect reset # db2stop # exit # umount /db2home
# vi /db2home/ibmdb2 # chmod u+x /db2home/ibmdb2 #!/bin/sh # # IBM DB2 Database Cluster Start/Stop Script # DB2DIR=/usr/IBMdb2/V6.1 case $1 in "start") $DB2DIR/instance/db2istrt ;; "stop") $DB2DIR/instance/db2ishut ;; esac
for DB2INST in ${DB2INSTLIST?}; do echo "Stopping DB2 Instance "${DB2INST?}"..." >> ${LOGFILE?} find_homedir ${DB2INST?} INSTHOME="${USERHOME?}" su ${DB2INST?} -c " \ source ${INSTHOME?}/sqllib/db2cshrc 1> /dev/null 2> /dev/null; \ ${INSTHOME?}/sqllib/db2profile 1> /dev/null 2> /dev/null; \ >>>>>>> db2 force application all; \ db2stop " 1>> ${LOGFILE?} 2>> ${LOGFILE?} if [ $? -ne 0 ]; then ERRORFOUND=${TRUE?} fi done
# db:234:once:/etc/rc.db2 > /dev/console 2>&1 # Autostart DB2 Services
Use the cluadmin utility to create the DB2 service. Add the IP address from Step 1, the shared partition created in Step 2, and the start/stop script created in Step 16.
To install the DB2 client on a third system, invoke these commands:
display# mount -t iso9660 /dev/cdrom /mnt/cdrom display# cd /mnt/cdrom/IBM/DB2 display# ./db2setup -d -r /root/db2client.rsp
To configure a DB2 client, add the service's IP address to the /etc/hosts file on the client system. For example:
10.1.16.182 ibmdb2.lowell.mclinux.com ibmdb2
Then, add the following entry to the /etc/services file on the client system:
db2cdb2inst1 50000/tcp
Invoke the following commands on the client system:
# su - db2inst1 # db2 catalog tcpip node ibmdb2 remote ibmdb2 server db2cdb2inst1 # db2 catalog database sample as db2 at node ibmdb2 # db2 list node directory # db2 list database directory
To test the database from the DB2 client system, invoke the following commands:
# db2 connect to db2 user db2inst1 using ibmdb2 # db2 select tabname from syscat.tables # db2 connect reset
This section provides an example of setting up a cluster service that will fail over an Apache Web server. Although the actual variables that you use in the service depend on your specific configuration, the example may help you set up a service for your environment.
To set up an Apache service, you must configure both cluster systems as Apache servers. The cluster software ensures that only one cluster system runs the Apache software at one time.
When you install the Apache software on the cluster systems, do not configure the cluster systems so that Apache automatically starts when the system boots. For example, if you include Apache in the run level directory such as /etc/rc.d/init.d/rc3.d, the Apache software will be started on both cluster systems, which may result in data corruption.
When you add an Apache service, you must assign it a "floating" IP address. The cluster infrastructure binds this IP address to the network interface on the cluster system that is currently running the Apache service. This IP address ensures that the cluster system running the Apache software is transparent to the HTTP clients accessing the Apache server.
The file systems that contain the Web content must not be automatically mounted on shared disk storage when the cluster systems boot. Instead, the cluster software must mount and unmount the file systems as the Apache service is started and stopped on the cluster systems. This prevents both cluster systems from accessing the same data simultaneously, which may result in data corruption. Therefore, do not include the file systems in the /etc/fstab file.
Setting up an Apache service involves the following four steps:
To set up the shared file systems for the Apache service, become root and perform the following tasks on one cluster system:
# mkfs /dev/sde3
# mount /dev/sde3 /opt/apache-1.3.12/htdocs
Do not add this mount information to the /etc/fstab file, because only the cluster software can mount and unmount file systems used in a service.
You must install the Apache software on both cluster systems. Note that the basic Apache server configuration must be the same on both cluster systems in order for the service to fail over correctly. The following example shows a basic Apache Web server installation, with no third-party modules or performance tuning. To install Apache with modules, or to tune it for better performance, see the Apache documentation that is located in the Apache installation directory, or on the Apache Web site, www.apache.org.
On both cluster systems, follow these steps to install the Apache software:
# cd /var/tmp # ftp ftp.digex.net ftp> cd /pub/packages/network/apache/ ftp> get apache_1.3.12.tar.gz ftp> quit #
# tar -zxvf apache_1.3.12.tar.gz
# cd apache_1.3.12
# mkdir /opt/apache-1.3.12
# ./configure --prefix=/opt/apache-1.3.12
# make # make install
# groupadd nobody # useradd -G nobody nobody # chown -R nobody.nobody /opt/apache-1.3.12
To configure the cluster systems as Apache servers, customize the httpd.conf Apache configuration file, and create a script that will start and stop the Apache service. Then, copy the files to the other cluster system. The files must be identical on both cluster systems in order for the Apache service to fail over correctly.
On one system, perform the following tasks:
MaxKeepAliveRequests nReplace n with the appropriate value, which must be at least 100. For the best performance, specify 0 for unlimited requests.
MaxClients nReplace n with the appropriate value. By default, you can specify a maximum of 256 clients. If you need more clients, you must recompile Apache with support for more clients. See the Apache documentation for information.
User nobody Group nobody
DocumentRoot "/opt/apache-1.3.12/htdocs"
ScriptAlias /cgi-bin/ "/opt/apache-1.3.12/cgi-bin/"
<Directory opt/apache-1.3.12/cgi-bin"> AllowOverride None Options None Order allow,deny Allow from all </Directory>
#!/bin/sh /opt/apache-1.3.12/bin/apachectl $1Note that the actual name of the Apache start script depends on the Linux distribution. For example, the file may be /etc/rc.d/init.d/httpd.
chmod 755 /etc/opt/cluster/apwrap
Before you add the Apache service to the cluster database, ensure that the Apache directories are not mounted. Then, on one cluster system, add the service. You must specify an IP address, which the cluster infrastructure will bind to the network interface on the cluster system that runs the Apache service.
The following is an example of using the cluadmin utility to add an Apache service.
cluadmin> service add apache The user interface will prompt you for information about the service. Not all information is required for all services. Enter a question mark (?) at a prompt to obtain help. Enter a colon (:) and a single-character command at a prompt to do one of the following: c - Cancel and return to the top-level cluadmin command r - Restart to the initial prompt while keeping previous responses p - Proceed with the next prompt Preferred member [None]: devel0 Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes User script (e.g., /usr/foo/script or None) [None]: /etc/opt/cluster/apwrap Do you want to add an IP address to the service (yes/no/?): yes IP Address Information IP address: 10.1.16.150 Netmask (e.g. 255.255.255.0 or None) [None]: 255.255.255.0 Broadcast (e.g. X.Y.Z.255 or None) [None]: 10.1.16.255 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses: f Do you want to add a disk device to the service (yes/no/?): yes Disk Device Information Device special file (e.g., /dev/sda1): /dev/sda3 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /opt/apache-1.3.12/htdocs Mount options (e.g., rw, nosuid): rw,sync Forced unmount support (yes/no/?) [no]: yes Device owner (e.g., root): nobody Device group (e.g., root): nobody Device mode (e.g., 755): 755 Do you want to (a)dd, (m)odify, (d)elete or (s)how devices, or are you (f)inished adding device information: f Disable service (yes/no/?) [no]: no name: apache disabled: no preferred node: node1 relocate: yes user script: /etc/opt/cluster/apwrap IP address 0: 10.1.16.150 netmask 0: 255.255.255.0 broadcast 0: 10.1.16.255 device 0: /dev/sde3 mount point, device 0: /opt/apache-1.3.12/htdocs mount fstype, device 0: ext2 mount options, device 0: rw,sync force unmount, device 0: yes owner, device 0: nobody group, device 0: nobody mode, device 0: 755 Add apache service as shown? (yes/no/?) y Added apache. cluadmin>
You can display detailed information about the configuration of a service. This information includes the following:
To display cluster service status, see Displaying Cluster and Service Status.
To display service configuration information, invoke the cluadmin utility and specify the service show config command. For example:
cluadmin> service show config 0) diskmount 1) user_mail 2) database1 3) database2 4) web_home Choose service: 1 name: user_mail disabled: no preferred node: stor5 relocate: no user script: /etc/opt/cluster/usermail IP address 0: 10.1.16.200 device 0: /dev/sdb1 mount point, device 0: /var/cluster/mnt/mail mount fstype, device 0: ext2 mount options, device0: ro force unmount, device 0: yes cluadmin>
If you know the name of the service, you can specify the service show config service_name command.
You can disable a running service to stop the service and make it unavailable. To start a disabled service, you must enable it. See Enabling a Service for information.
There are several situations in which you may need to disable a running service:
To disable a running service, invoke the cluadmin utility on the cluster system that is running the service, and specify the service disable service_name command. For example:
cluadmin> service disable user_home Are you sure? (yes/no/?) y notice: Stopping service user_home ... notice: Service user_home is disabled service user_home disabled
You can also disable a service that is in the error
state. To perform this task, run cluadmin
on the cluster system that owns the service, and specify the
service disable
service_name command.
See Handling Services in an Error State
for more information.
You can enable a disabled service to start the service and make it available. You can also enable a service that is in the error state to start it on the cluster system that owns the service. See Handling Services in an Error State for more information.
To enable a disabled service, invoke the cluadmin utility on the cluster system on which you want the service to run, and specify the service enable service_name command. If you are starting a service that is in the error state, you must enable the service on the cluster system that owns the service. For example:
cluadmin> service enable user_home Are you sure? (yes/no/?) y notice: Starting service user_home ... notice: Service user_home is running service user_home enabled
You can modify any property that you specified when you created the service. For example, you can change the IP address. You can also add more resources to a service. For example, you can add more file systems. See Gathering Service Information for information.
You must disable a service before you can modify it. If you attempt to modify a running service, you will be prompted to disable it. See Disabling a Service for more information.
Because a service is unavailable while you modify it, be sure to gather all the necessary service information before you disable the service, in order to minimize service down time. In addition, you may want to back up the cluster database before modifying a service. See Backing Up and Restoring the Cluster Database for more information.
To modify a disabled service, invoke the cluadmin utility on any cluster system and specify the service modify service_name command.
cluadmin> service modify web1
You can then modify the service properties and resources, as needed. The cluster will check the service modifications, and allow you to correct any mistakes. If you submit the changes, the cluster verifies the service modification and then starts the service, unless you chose to keep the service disabled. If you do not submit the changes, the service will be started, if possible, using the original configuration.
In addition to providing automatic service failover, a cluster enables you to cleanly stop a service on one cluster system and then start it on the other cluster system. This service relocation functionality enables administrators to perform maintenance on a cluster system, while maintaining application and data availability.
To relocate a service by using the cluadmin utility, follow these steps:
You can delete a cluster service. You may want to back up the cluster database before deleting a service. See Backing Up and Restoring the Cluster Database for information.
To delete a service by using the cluadmin
utility, follow these steps:
For example:
cluadmin> service disable user_home Are you sure? (yes/no/?) y notice: Stopping service user_home ... notice: Service user_home is disabled service user_home disabled cluadmin> service delete user_home Deleting user_home, are you sure? (yes/no/?): y user_home deleted. cluadmin>
A service in the error state is still owned by a cluster system, but the status of its resources cannot be determined (for example, part of the service has stopped, but some service resources are still configured on the owner system). See Displaying Cluster and Service Status for detailed information about service states.
The cluster puts a service into the error state if it cannot guarantee the integrity of the service. An error state can be caused by various problems, such as a service start did not succeed, and the subsequent service stop also failed.
You must carefully handle services in the error state. If service resources are still configured on the owner system, starting the service on the other cluster system may cause significant problems. For example, if a file system remains mounted on the owner system, and you start the service on the other cluster system, the file system will be mounted on both systems, which can cause data corruption. Therefore, you can only enable or disable a service that is in the error state on the system that owns the service. If the enable or disable fails, the service will remain in the error state.
You can also modify a service that is in the error state. You may need to do this in order to correct the problem that caused the error state. After you modify the service, it will be enabled on the owner system, if possible, or it will remain in the error state. The service will not be disabled.
If a service is in the error state, follow these steps to resolve the problem:
After you set up a cluster and configure services, you may need to administer the cluster, as described in the following sections:
Monitoring cluster and service status can help you identify and solve problems in the cluster environment. You can display status by using the following tools:
Note that status is always from the point of view of the cluster system on which you are running a tool. To obtain comprehensive cluster status, run a tool on all cluster systems.
Cluster and service status includes the following information:
The following table describes how to analyze the status information shown by the cluadmin utility, the clustat command, and the cluster GUI.
Member Status |
Description |
UP | The member system is communicating with the other member system and accessing the quorum partitions. |
DOWN | The member system is unable to communicate with the other member system. |
Power Switch Status |
Description |
OK | The power switch is operating properly. |
Wrn | Could not obtain power switch status. |
Err | A failure or error has occurred. |
Good | The power switch is operating properly. |
Unknown | The other cluster member is DOWN. |
Timeout | The power switch is not responding to power daemon commands, possibly because of a disconnected serial cable. |
Error | A failure or error has occurred. |
None | The cluster configuration does not include power switches. |
Heartbeat Channel Status |
Description |
OK | The heartbeat channel is operating properly. |
Wrn | Could not obtain channel status. |
Err | A failure or error has occurred. |
ONLINE | The heartbeat channel is operating properly. |
OFFLINE | The other cluster member appears to be UP, but it is not responding to heartbeat requests on this channel. |
UNKNOWN | Could not obtain the status of the other cluster member system over this channel, possibly because the system is DOWN or the cluster daemons are not running. |
Service Status |
Description |
running | The service resources are configured and available on the cluster system that owns the service. The running state is a persistent state. From this state, a service can enter the stopping state (for example, if the preferred member rejoins the cluster), the disabling state (if a user initiates a request to disable the service), or the error state (if the status of the service resources cannot be determined). |
disabling |
The service is in the process of being disabled (for example, a user has initiated a request to disable the service). The disabling state is a transient state. The service remains in the disabling state until the service disable succeeds or fails. From this state, the service can enter the disabled state (if the disable succeeds), the running state (if the disable fails and the service is restarted), or the error state (if the status of the service resources cannot be determined). |
disabled |
The service has been disabled, and does not have an assigned owner. The disabled state is a persistent state. From this state, the service can enter the starting state (if a user initiates a request to start the service), or the error state (if a request to start the service failed and the status of the service resources cannot be determined). |
starting | The service is in the process of being started. The starting state is a transient state. The service remains in the starting state until the service start succeeds or fails. From this state, the service can enter the running state (if the service start succeeds), the stopped state (if the service stop fails), or the error state (if the status of the service resources cannot be determined). |
stopping | The service is in the process of being stopped. The stopping state is a transient state. The service remains in the stopping state until the service stop succeeds or fails. From this state, the service can enter the stopped state (if the service stop succeeds), the running state (if the service stop failed and the service can be started), or the error state (if the status of the service resources cannot be determined). |
stopped | The service is not running on any cluster system, does not have an assigned owner, and does not have any resources configured on a cluster system. The stopped state is a persistent state. From this state, the service can enter the disabled state (if a user initiates a request to disable the service), or the starting state (if the preferred member joins the cluster). |
error |
The status of the service resources cannot be determined. For example, some resources associated with the service may still be configured on the cluster system that owns the service. The error state is a persistent state. To protect data integrity, you must ensure that the service resources are no longer configured on a cluster system, before trying to start or stop a service in the error state. |
To display a snapshot of the current cluster status, invoke the cluadmin utility on a cluster system and specify the cluster status command. For example:
cluadmin> cluster status Thu Jul 20 16:23:54 EDT 2000 Cluster Configuration (cluster_1): Member status:
Member Id System Status Power Switch ---------- ------ ------------- ------------ stor4 0 Up Good stor5 1 Up Good Channel status: Name Type Status ------------------------- ---------- -------- stor4 <--> stor5 network ONLINE /dev/ttyS1 <--> /dev/ttyS1 serial OFFLINE Service status: Service Status Owner ---------------- ---------- ---------------- diskmount disabled None database1 running stor5 database2 starting stor4 user_mail disabling None web_home running stor4 cluadmin>
To monitor the cluster and display a status snapshot at five-second intervals, specify the cluster monitor command. Press the Return or Enter key to stop the display. To modify the time interval, specify the -interval time command option, where time specifies the number of seconds between status snapshots. You can also specify the -clear yes command option to clear the screen after each display. The default is not to clear the screen.
To display the only the status of the cluster services, invoke the cluadmin utility and specify the service show state command. If you know the name of the service whose status you want to display, you can specify the service show state service_name command.
You can also use the clustat command to display cluster and service status. To monitor the cluster and display status at specific time intervals, invoke clustat with the -i time command option, where time specifies the number of seconds between status shapshots. For example:
# clustat -i 5 Cluster Configuration (cluster_1): Thu Jun 22 23:07:51 EDT 2000 Member status: Member Id System Power Status Switch ------------------- ---------- ---------- -------- member2 0 Up Good member3 1 Up Good Channel status: Name Type Status ---------------------------- ---------- -------- /dev/ttyS1 <--> /dev/ttyS1 serial ONLINE member2 <--> member3 network UNKNOWN cmember2 <--> cmember3 network OFFLINE Service status: Service Status Owner -------------------- ---------- ------------------ oracle1 running member2 usr1 disabled member3 usr2 starting member2 oracle2 running member3
In addition, you can use the GUI to display cluster and service status.
See Configuring and Using the Graphical User Interface for more information.
You can start the cluster software on a cluster system by invoking the cluster start command located in the System V init directory. For example:
# /etc/rc.d/init.d/cluster start
You can stop the cluster software on a cluster system by invoking the cluster stop command located in the System V init directory. For example:
# /etc/rc.d/init.d/cluster stop
The previous command may cause the cluster system's services to fail over to the other cluster system.
You may need to modify the cluster configuration. For example, you may need to correct heartbeat channel or quorum partition entries in the cluster database, a copy of which is located in the /etc/opt/cluster/cluster.conf file.
You must use the member_config utility to modify the cluster configuration. Do not modify the cluster.conf file. To modify the cluster configuration, stop the cluster software on one cluster system, as described in Starting and Stopping the Cluster Software.
Then, invoke the member_config utility, and specify the correct information at the prompts. If prompted whether to run diskutil -I to initialize the quorum partitions, specify no. After running the utility, restart the cluster software.
It is recommended that you regularly back up the cluster database. In addition, you should back up the database before making any significant changes to the cluster configuration.
To back up the cluster database to the /etc/opt/cluster/cluster.conf.bak file, invoke the cluadmin utility, and specify the cluster backup command. For example:
cluadmin> cluster backup
You can also save the cluster database to a different file by invoking the cluadmin utility and specifying the cluster saveas filename command.
To restore the cluster database, follow these steps:
# /etc/rc.d/init.d/cluster stopThe previous command may cause the cluster system's services to fail over to the other cluster system.
# /etc/rc.d/init.d/cluster start
You can modify the severity level of the events that are logged by the powerd, quorumd, hb, and svcmgr daemons. You may want the daemons on the cluster systems to log messages at the same level.
To change a cluster daemon's logging level on all the cluster systems, invoke the cluadmin
utility, and specify the cluster
loglevel command, the name of the daemon, and the severity level.
You can specify the severity level by using the name or the number that corresponds
to the severity level. The values 0 to 7 refer to the following severity levels:
0 - emerg
1 - alert
2 - crit
3 - err
4 - warning
5 - notice
6 - info
7 - debug
Note that the cluster logs messages with the designated severity level and also messages of a higher severity. For example, if the severity level for quorum daemon messages is 2 (crit), then the cluster logs messages or crit, alert, and emerg severity levels. Be aware that setting the logging level to a low severity level, such as 7 (debug), will result in large log files over time.
The following example enables the quorumd
daemon to log messages of all severity levels:
# cluadmin cluadmin> cluster loglevel quorumd 7 cluadmin>
You can update the cluster software, but preserve the existing cluster database. Updating the cluster software on a system can take from 10 to 20 minutes, depending on whether you must rebuild the kernel.
To update the cluster software while minimizing service downtime, follow these steps:
cluadmin> cluster backup
# /etc/rc.d/init.d/cluster stop
Invoke the cluadmin utility and use the cluster reload command to force the cluster to re-read the cluster database. For example:
cluadmin> cluster reload
Invoke the cluadmin utility and use the cluster name cluster_name command to specify a name for the cluster. The cluster name is used in the display of the clustat command and the GUI. For example:
cluadmin> cluster name cluster_1 cluster_1
In rare circumstances, you may want to reinitialize the cluster systems, services, and database. Be sure to back up the cluster database before reinitializing the cluster. See Backing Up and Restoring the Cluster Database for information.
To completely reinitialize the cluster, follow these steps:
# /etc/rc.d/init.d/cluster stop
# /opt/cluster/bin/clu_config --init=/dev/raw/raw1
# /etc/rc.d/init.d/cluster start
In some cases, you may want to temporarily remove a member system from the cluster. For example, if a cluster system experiences a hardware failure, you may want to reboot the system, but prevent it from rejoining the cluster, in order to perform maintenance on the system.
If you are running a Red Hat distribution, use the chkconfig utility to be able to boot a cluster system, without allowing it to rejoin the cluster. For example:
# chkconfig --del cluster
When you want the system to rejoin the cluster, use the following command:
# chkconfig --add cluster
If you are running a Debian distribution, use the update-rc.d utility to be able to boot a cluster system, without allowing it to rejoin the cluster. For example:
# update-rc.d -f cluster removeWhen you want the system to rejoin the cluster, use the following command:
# update-rc.d cluster defaults
You can then reboot the system or run the cluster start command located in the System V init directory. For example:
# /etc/rc.d/init.d/cluster start
To ensure that you can identify any problems in a cluster, you must enable event logging. In addition, if you encounter problems in a cluster, be sure to set the severity level to debug for the cluster daemons. This will log descriptive messages that may help you solve problems.
If you have problems while running the cluadmin utility (for example, you cannot enable a service), set the severity level for the svcmgr daemon to debug. This will cause debugging messages to be displayed while you are running the cluadmin utility. See Modifying Cluster Event Logging for more information.
Use the following table to diagnose and correct problems in a cluster.
Problem
|
Symptom
|
Solution
|
SCSI bus not terminated | SCSI errors appear in the log file |
Each SCSI bus must be terminated only at the beginning and end of the bus. Depending on the bus configuration, you may need to enable or disable termination in host bus adapters, RAID controllers, and storage enclosures. If you want to support hot plugging, you must use external termination to terminate a SCSI bus. In addition, be sure that no devices are connected to a SCSI bus using a stub that is longer than 0.1 meter. See Configuring Shared Disk Storage and SCSI Bus Termination for information about terminating different types of SCSI buses. |
SCSI bus length greater than maximum limit |
SCSI errors appear in the log file |
Each type of SCSI bus must adhere to restrictions on length, as described in SCSI Bus Length. In addition, ensure that no single-ended devices are connected to the LVD SCSI bus, because this will cause the entire bus to revert to a single-ended bus, which has more severe length restrictions than a differential bus. |
SCSI identification numbers not unique | SCSI errors appear in the log file |
Each device on a SCSI bus must have a unique identification number. If you have a multi-initiator SCSI bus, you must modify the default SCSI identification number (7) for one of the host bust adapters connected to the bus, and ensure that all disk devices have unique identification numbers. See SCSI Identification Numbers for more information. |
SCSI commands timing out before completion | SCSI errors appear in the log file |
The prioritized arbitration scheme on a SCSI bus can result in low-priority
devices being locked out for some period of time. This may cause commands
to time out, if a low-priority storage device, such as a disk, is unable
to win arbitration and complete a command that a host has queued to it.
For some workloads, you may be able to avoid this problem by assigning
low-priority SCSI identification numbers to the host bus adapters. See SCSI Identification Numbers for more information. |
Mounted quorum partition | Messages indicating checksum errors on a quorum partition appear in the log file | Be sure that the quorum partition raw devices are used only
for cluster state information. They cannot be used for cluster services
or for non-cluster purposes, and cannot contain a file system. See Configuring
the Quorum Partitions for more information. These messages could also indicate that the underlying block device special file for the quorum partition has been erroneously used for non-cluster purposes. |
Service file system is unclean | A disabled service cannot be enabled | Manually run a checking program such as fsck.
Then, enable the service. Note that the cluster infrastructure does not automatically repair file system inconsistencies (for example, by using the fsck -y command). This ensures that a cluster administrator intervenes in the correction process and is aware of the corruption and the affected files. |
Quorum partitions not set up correctly | Messages indicating that a quorum partition cannot be accessed appear in the log file | Run the diskutil
-t command to check that the quorum partitions are accessible.
If the command succeeds, run the diskutil
-p command on both cluster systems. If the output is different
on the systems, the quorum partitions do not point to the same devices on
both systems. Check to make sure that the raw devices exist and are correctly
specified in the rawio
file. See Configuring the Quorum Partitions
for more information. These messages could also indicate that you did not specify yes when prompted by the member_config utility to initialize the quorum partitions. To correct this problem, run the utility again. |
Cluster service operation fails | Messages indicating the operation failed appear on the console or in the log file | There are many different reasons for the failure of a service operation (for example, a service stop or start). To help you identify the cause of the problem, set the severity level for the cluster daemons to debug in order to log descriptive messages. Then, retry the operation and examine the log file. See Modifying Cluster Event Logging for more information. |
Cluster service stop fails because a file system cannot be unmounted | Messages indicating the operation failed appear on the console or in the log file |
Use the fuser and ps commands to identify the processes that are accessing the file system. Use the kill command to stop the processes. You can also use the lsof -t file_system command to display the identification numbers for the processes that are accessing the specified file system. You can pipe the output to the kill command. To avoid this problem, be sure that only cluster-related processes can access shared storage data. In addition, you may want to modify the service and enable forced unmount for the file system. This enables the cluster service to unmount a file system even if it is being accessed by an application or user. |
Incorrect entry in the cluster database | Cluster operation is impaired |
On each cluster system, examine the /etc/opt/cluster.cluster.conf file. If an entry in the file is incorrect, modify the cluster configuration by running the member_config utility, as specified in Modifying the Cluster Configuration, and correct the problem. |
Incorrect Ethernet heartbeat entry in the cluster database or /etc/hosts file | Cluster status indicates that a Ethernet heartbeat channel is OFFLINE even though the interface is valid |
On each cluster system, examine the /etc/opt/cluster/cluster.conf file and verify that the name of the network interface for chan0 is the name returned by the hostname command on the cluster system. If the entry in the file is incorrect, modify the cluster configuration by running the member_config utility, as specified in Modifying the Cluster Configuration, and correct the problem. If the entries in the cluster.conf file are correct, examine the /etc/hosts file and ensure that it includes entries for all the network interfaces. Also, make sure that the /etc/hosts file uses the correct format. See Editing the /etc/hosts File for more information. In addition, be sure that you can use the ping command to send a packet to all the network interfaces used in the cluster. |
Loose cable connection to power switch | Power switch status is Timeout | Check the serial cable connection. |
Power switch serial port incorrectly specified in the cluster database | Power switch status indicates a problem | On each cluster system, examine the /etc/opt/cluster/cluster.conf file and verify that the serial port to which the power switch is connected matches the serial port specified in the file. If the entry in the file is incorrect, modify the cluster configuration by running the member_config utility, as specified in Modifying the Cluster Configuration, and correct the problem. |
Heartbeat channel problem | Heartbeat channel status is OFFLINE |
On each cluster system, examine the /etc/opt/cluster/cluster.conf file and verify that the device special file for each serial heartbeat channel matches the actual serial port to which the channel is connected. If an entry in the file is incorrect, modify the cluster configuration by running the member_config utility, as specified in Modifying the Cluster Configuration, and correct the problem. Verify that the correct type of cable is used for each heartbeat channel connection. Verify that you can "ping" each cluster system over the network interface for each Ethernet heartbeat channel. |
The information in the following sections can help you set up a cluster hardware
configuration. In some cases, the information is vendor specific.
To help you set up a terminal server, this document provides information about setting up a Cyclades terminal server.
The Cyclades terminal server consists of two primary parts:
To set up a Cyclades terminal server, follow these steps:
The first step for setting up a Cyclades terminal Server is to specify an Internet protocol (IP) address for the PR3000 router. Follow these steps:
Cyclades-PR3000 (PR3000) Main Menu 1. Config 2. Applications 3. Logout 4. Debug 5. Info 5. Admin Select option ==> 1 Cyclades-PR3000 (PR3000) Config Menu 1. Interface 2. Static Routes 3. System 4. Security 5. Multilink 6. IP 7. Transparent Bridge 8. Rules List 9. Controller (L for list) Select option ==> 1 Cyclades-PR3000 (PR3000) Interface Menu 1. Ethernet 2. Slot 1 (Zbus-A) (L for list) Select option ==> 1 Cyclades-PR3000 (PR3000) Ethernet Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Traffic Control (L for list) Select option ==> 2 (A)ctive or (I)nactive [A]: Interface (U)nnumbered or (N)umbered [N]: Primary IP address: 111.222.3.26 Subnet Mask [255.255.255.0]: Secondary IP address [0.0.0.0]: IP MTU [1500]: NAT - Address Scope ( (L)ocal, (G)lobal, or Global (A)ssigned) [G]: ICMP Port ( (A)ctive or (I)nactive) [I]: Incoming Rule List Name (? for help) [None]: Outgoing Rule List Name (? for help) [None]: Proxy ARP ( (A)ctive or (I)nactive) [I]: IP Bridge ( (A)ctive or (I)nactive) [I]: ESC (D)iscard, save to (F)lash or save to (R)un configuration: F Changes were saved in Flash configuration !
After you specify an IP address for the PR3000 router, you must set up the network and terminal port parameters.
At the console login prompt, [PR3000], log in to the super account, using the password provided with the Cyclades manual. The console displays a series of menus. Enter the appropriate information. For example:
Cyclades-PR3000 (PR3000) Main Menu 1. Config 2. Applications 3. Logout 4. Debug 5. Info 5. Admin Select option ==> 1 Cyclades-PR3000 (PR3000) Config Menu 1. Interface 2. Static Routes 3. System 4. Security 5. Multilink 6. IP 7. Transparent Bridge 8. Rules List 9. Controller (L for list) Select option ==> 1 Cyclades-PR3000 (PR3000) Interface Menu 1. Ethernet 2. Slot 1 (Zbus-A) (L for list) Select option ==> 1 Cyclades-PR3000 (PR3000) Ethernet Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Traffic Control (L for list) Select option ==> 1 Ethernet (A)ctive or (I)nactive [A]: MAC address [00:60:2G:00:08:3B]: Cyclades-PR3000 (PR3000) Ethernet Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Traffic Control (L for list) Select option ==> 2 Ethernet (A)ctive or (I)nactive [A]: Interface (U)nnumbered or (N)umbered [N]: Primary IP address [111.222.3.26]: Subnet Mask [255.255.255.0]: Secondary IP address [0.0.0.0]: IP MTU [1500]: NAT - Address Scope ( (L)ocal, (G)lobal, or Global (A)ssigned) [G]: ICMP Port ( (A)ctive or (I)nactive) [I]: Incoming Rule List Name (? for help) [None]: Outgoing Rule List Name (? for help) [None]: Proxy ARP ( (A)ctive or (I)nactive) [I]: IP Bridge ( (A)ctive or (I)nactive) [I]: Cyclades-PR3000 (PR3000) Ethernet Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Traffic Control (L for list) Select option ==> Cyclades-PR3000 (PR3000) Interface Menu 1. Ethernet 2. Slot 1 (Zbus-A) (L for list) Select option ==> 2 Cyclades-PR3000 (PR3000) Slot 1 (Zbus-A) Range Menu 1. ZBUS Card 2. One Port 3. Range 4. All Ports (L for list) Select option ==> 4 Cyclades-PR3000 (PR3000) Slot 1 (Zbus-A) Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Physical 5. Traffic Control 6. Authentication 7. Wizards (L for list) Select option ==> 1 Cyclades-PR3000 (PR3000) Slot 1 (Zbus-A) Encapsulation Menu 1. PPP 2. PPPCHAR 3. CHAR 4. Slip 5. SlipCHAR 6. Inactive Select Option ==> 3 Device Type ( (T)erminal, (P)rinter or (S)ocket ) [S]: TCP KeepAlive time in minutes (0 - no KeepAlive, 1 to 120) [0]: (W)ait for or (S)tart a connection [W]: Filter NULL char after CR char (Y/N) [N]: Idle timeout in minutes (0 - no timeout, 1 to 120) [0]: DTR ON only if socket connection established ( (Y)es or (N)o ) [Y]: Device attached to this port will send ECHO (Y/N) [Y]: Cyclades-PR3000 (PR3000) Slot 1 (Zbus-A) Encapsulation Menu 1. PPP 2. PPPCHAR 3. CHAR 4. Slip 5. SlipCHAR 6. Inactive Select Option ==> Cyclades-PR3000 (PR3000) Slot 1 (Zbus-A) Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Physical 5. Traffic Control 6. Authentication 7. Wizards (L for list) Select option ==> 2 Interface IP address for a Remote Telnet [0.0.0.0]: Cyclades-PR3000 (PR3000) Slot 1 (Zbus-A) Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Physical 5. Traffic Control 6. Authentication 7. Wizards (L for list) Select option ==> 4 Speed (? for help) [115.2k]: 9.6k Parity ( (O)DD, (E)VEN or (N)ONE ) [N]: Character size ( 5 to 8 ) [8]: Stop bits (1 or 2 ) [1]: Flow control ( (S)oftware, (H)ardware or (N)one ) [N]: Modem connection (Y/N) [N]: RTS mode ( (N)ormal Flow Control or (L)egacy Half Duplex ) [N]: Input Signal DCD on ( Y/N ) [N]: n Input Signal DSR on ( Y/N ) [N]: Input Signal CTS on ( Y/N ) [N]: Cyclades-PR3000 (PR3000) Slot 1 (Zbus-A) Interface Menu 1. Encapsulation 2. Network Protocol 3. Routing Protocol 4. Physical 5. Traffic Control 6. Authentication 7. Wizards (L for list) Select option ==> 6 Authentication Type ( (N)one, (L)ocal or (S)erver ) [N]: ESC (D)iscard, save to (F)lash or save to (R)un configuration: F Changes were saved in Flash configuration
After you set up the network and terminal port parameters, you can configure Linux to send console messages to the console serial port. Follow these steps on each cluster system:
CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_SERIAL=y CONFIG_SERIAL_CONSOLE=yWhen specifying kernel options, under Character Devices, select Support for console on serial port.
serial=0,9600n8To the stanza entries for each bootable kernel, add a line similar to the following to enable kernel messages to go to both the specified console serial port (for example,ttyS0) and to the graphics terminal:
append="console=ttyS0 console=tty1"The following is an example of an /etc/lilo.conf file:
boot=/dev/hda map=/boot/map install=/boot/boot.b prompt timeout=50 default=scons serial=0,9600n8 image=/boot/vmlinuz-2.2.12-20 label=linux initrd=/boot/initrd-2.2.12-20.img read-only root=/dev/hda1 append="mem=127M" image=/boot/vmlinuz-2.2.12-20 label=scons initrd=/boot/initrd-2.2.12-20.img read-only root=/dev/hda1 append="mem=127M console=ttyS0 console=tty1"
S0:2345:respawn:/sbin/getty ttyS0 DT9600 vt100
ttyS0
# ls -l /dev/console
crw--w--w- 1 joe root 5, 1 Feb 11 10:05 /dev/console # mv /dev/console /dev/console.old # ls -l /dev/ttyS0 crw------- 1 joe tty 4, 64 Feb 14 13:14 /dev/ttyS0 # mknod console c 4 64
To connect to the console port, use the following telnet command format:
telnet hostname_or_IP_address port_number
Specify either the cluster system's host name or its IP address, and the port number associated with the terminal server's serial line. Port numbers range from 1 to 16, and are specified by adding the port number to 31000. For example, you can specify a port numbers ranging from 31001 to 31016.
The following example connects the cluconsole system to port 1:
# telnet cluconsole 31001
The following example connects the cluconsole system to port 16:
# telnet cluconsole 31016
The following example connects the system with the IP address 111.222.3.26 to port 2:
# telnet 11.222.3.26 31002
After you log in, anything you type will be repeated. For example:
[root@localhost /root]# date date Sat Feb 12 00:01:35 EST 2000 [root@localhost /root]#
To correct this behavior, you must change the operating mode that telnet has negotiated with the terminal server. The following example uses the ^] escape character:
[root@localhost /root]# ^] telnet> mode character
You can also issue the mode character command by creating a .telnetrc file in your home directory and including the following lines:
cluconsole mode character
If you are using an RPS-10 Series power switch in your cluster, you must:
Switch | Function | Up Position | Down Position |
1 | Data rate | X | |
2 | Toggle delay | X | |
3 | Power up default | X | |
4 | Unused | X |
The following figure shows an example of an RPS-10 Series power switch configuration.
See the RPS-10 documentation supplied by the vendor for additional installation information. Note that the information provided in this document supersedes the vendor information.
SCSI buses must adhere to a number of configuration requirements in order to operate correctly. Failure to adhere to these requirements will adversely affect cluster operation and application and data availability.
You must adhere to the following SCSI bus configuration requirements:
To set SCSI identification numbers, disable host bus adapter termination, and disable bus resets, use the system's configuration utility. When the system boots, a message is displayed describing how to start the utility. For example, you may be instructed to press Ctrl-A, and follow the prompts to perform a particular task. To set storage enclosure and RAID controller termination, see the vendor documentation. See SCSI Bus Termination and SCSI Identification Numbers for more information.
See www.scsita.org and the following sections for detailed information about SCSI bus requirements.
A SCSI bus is an electrical path between two terminators. A device (host bus adapter, RAID controller, or disk) attaches to a SCSI bus by a short stub, which is an unterminated bus segment that usually must be less than 0.1 meter in length.
Buses must have only two terminators located at the ends of the bus. Additional terminators, terminators that are not at the ends of the bus, or long stubs will cause the bus to operate incorrectly. Termination for a SCSI bus can be provided by the devices connected to the bus or by external terminators, if the internal (onboard) device termination can be disabled.
Terminators are powered by a SCSI power distribution wire (or signal), TERMPWR, so that the terminator can operate as long as there is one powering device on the bus. In a cluster, TERMPWR must be provided by the host bus adapters, instead of the disks in the enclosure. You can usually disable TERMPWR in a disk by setting a jumper on the drive. See the disk drive documentation for information.
In addition, there are two types of SCSI terminators. Active terminators provide a voltage regulator for TERMPWR, while passive terminators provide a resistor network between TERMPWR and ground. Passive terminators are also susceptible to fluctuations in TERMPWR. Therefore, it is recommended that you use active terminators in a cluster.
For maintenance purposes, it is desirable for a storage configuration to support hot plugging (that is, the ability to disconnect a host bus adapter from a SCSI bus, while maintaining bus termination and operation). However, if you have a single-initiator SCSI bus, hot plugging is not necessary because the private bus does not need to remain operational when you remove a host. See Setting Up a Multi-Initiator SCSI Bus Configuration for examples of hot plugging configurations.
If you have a multi-initiator SCSI bus, you must adhere to the following requirements for hot plugging:
When disconnecting a device from a single-initiator SCSI bus or from a multi-initiator SCSI bus that supports hot plugging, follow these guidelines:
To enable or disable an adapter's internal termination, use the system BIOS utility. When the system boots, a message is displayed describing how to start the utility. For example, you may be instructed to press Ctrl-A. Follow the prompts for setting the termination. At this point, you can also set the SCSI identification number, as needed, and disable SCSI bus resets. See SCSI Identification Numbers for more information.
To set storage enclosure and RAID controller termination, see the vendor documentation.
A SCSI bus must adhere to length restrictions for the bus type. Buses that do not adhere to these restrictions will not operate properly. The length of a SCSI bus is calculated from one terminated end to the other, and must include any cabling that exists inside the system or storage enclosures.
A cluster supports LVD (low voltage differential) buses. The maximum length of a single-initiator LVD bus is 25 meters. The maximum length of a multi-initiator LVD bus is 12 meters. According to the SCSI standard, a single-initiator LVD bus is a bus that is connected to only two devices, each within 0.1 meter from a terminator. All other buses are defined as multi-initiator buses.
Do not connect any single-ended devices to a LVD bus, or the bus will convert to a single-ended bus, which has a much shorter maximum length than a differential bus.
Each device on a SCSI bus must have a unique SCSI identification number. Devices include host bus adapters, RAID controllers, and disks.
The number of devices on a SCSI bus depends on the data path for the bus. A cluster supports wide SCSI buses, which have a 16-bit data path and support a maximum of 16 devices. Therefore, there are sixteen possible SCSI identification numbers that you can assign to the devices on a bus.
In addition, SCSI identification numbers are prioritized. Use the following priority order to assign SCSI identification numbers:
7 - 6 - 5 - 4 - 3 - 2 - 1 - 0 - 15 - 14 - 13 - 12 - 11 - 10 - 9 - 8
The previous order specifies that 7 is the highest priority, and 8 is the lowest priority. The default SCSI identification number for a host bus adapter is 7, because adapters are usually assigned the highest priority. On a multi-initiator bus, be sure to change the SCSI identification number of one of the host bus adapters to avoid duplicate values.
A disk in a JBOD enclosure is assigned a SCSI identification number either manually (by setting jumpers on the disk) or automatically (based on the enclosure slot number). You can assign identification numbers for logical units in a RAID subsystem by using the RAID management interface.
To modify an adapter's SCSI identification number, use the system BIOS utility. When the system boots, a message is displayed describing how to start the utility. For example, you may be instructed to press Ctrl-A, and follow the prompts for setting the SCSI identification number. At this point, you can also enable or disable the adapter's internal termination, as needed, and disable SCSI bus resets. See SCSI Bus Termination for more information.
The prioritized arbitration scheme on a SCSI bus can result in low-priority
devices being locked out for some period of time. This may cause commands to
time out, if a low-priority storage device, such as a disk, is unable to win
arbitration and complete a command that a host has queued to it. For some workloads,
you may be able to avoid this problem by assigning low-priority SCSI identification
numbers to the host bus adapters.
Not all host bus adapters can be used with all cluster shared storage configurations. For example, some host bus adapters do not support hot plugging or cannot be used in a multi-initiator SCSI bus. You must use host bus adapters with the features and characteristics that your shared storage configuration requires. See Configuring Shared Disk Storage for information about supported storage configurations.
The following table describes some recommended SCSI and Fibre Channel host bus adapters. It includes information about adapter termination and how to use the adapters in single and multi-initiator SCSI buses and Fibre Channel interconnects.
The specific product devices listed in the table have been tested by Mission
Critical Linux. However, other devices may also work well in a cluster. If you
want to use a host bus adapter other than a recommended one, the information
in the table can help you determine if the device has the features and characteristics
that will enable it to work in a cluster.
Host Bus Adapter |
Features |
Single-Initiator Configuration |
Multi-Initiator Configuration |
---|---|---|---|
Adaptec 2940U2W (minimum driver: AIC7xxx V5.1.28) |
Ultra2, wide, LVD HD68 external connector One channel, with two bus segments Set the onboard termination by using the BIOS utility. Onboard termination is disabled when the power is off. |
Set the onboard termination to automatic (the default). You can use the internal SCSI connector for private (non-cluster) storage. |
This configuration is not supported, because the adapter and its Linux driver
do not reliably recover from SCSI bus resets that can be generated by the host bus adapter on the other
cluster system.
To use the adapter in a multi-initiator bus, the onboard
termination must be disabled. This ensures proper termination when the
power is off. For hot plugging support, disable the onboard termination
for the Ultra2 segment, and connect an external terminator, such as a
pass-through terminator, to the adapter. You cannot connect a cable to
the internal Ultra2 connector. |
Qlogic QLA1080 (minimum driver: QLA1x160 V3.12, obtained from www.qlogic.com/ bbs-html /drivers.html) |
Ultra2, wide, LVD VHDCI external connector One channel Set the onboard termination by using the BIOS utility. Onboard termination is disabled when the power is off, unless jumpers are used to enforce termination. |
Set the onboard termination to automatic (the default). You can use the internal SCSI connector for private (non-cluster) storage. |
This configuration is not supported, because the adapter and its Linux driver
do not reliably recover from SCSI bus resets that can be generated by the host bus adapter on the other
cluster system.
For hot plugging support, disable the onboard termination,
and use an external terminator, such as a VHDCI pass-through terminator,
a VHDCI y-cable or a VHDCI trilink connector. You cannot connect a cable
to the internal Ultra2 connector. For no hot plugging support, disable the onboard termination, or set it to automatic. Connect a terminator to the end of the internal cable connected to the internal Ultra2 connector. For an alternate configuration without hot plugging support, enable the onboard termination with jumpers, so the termination is enforced even when the power is off. You cannot connect a cable to the internal Ultra2 connector. |
Tekram DC-390U2W (minimum driver SYM53C8xx V1.3G) |
Ultra2, wide, LVD HD68 external connector One channel, two segments Onboard termination for a bus segment is disabled if internal and external cables are connected to the segment. Onboard termination is enabled if there is only one cable connected to the segment. Termination is disabled when the power is off. |
You can use the internal SCSI connector for private (non-cluster) storage. |
Testing has shown that the adapter and its Linux driver reliably recover from SCSI bus resets
that can be generated by the host bus adapter on the other cluster system.
The adapter cannot be configured to use external termination,
so it does not support hot plugging. Disable the onboard termination by connecting an internal cable to the internal Ultra2 connector, and then attaching a terminator to the end of the cable. This ensures proper termination when the power is off.
|
Adaptec 29160 (minimum driver: AIC7xxx V5.1.28) |
Ultra160 HD68 external connector One channel, with two bus segments Set the onboard termination by using the BIOS utility. Termination is disabled when the power is off, unless jumpers are used to enforce termination. |
Set the onboard termination to automatic (the default). You can use the internal SCSI connector for private (non-cluster) storage. |
This configuration is not supported, because the adapter and its Linux driver
do not reliably recover from SCSI bus resets that can be generated by the host bus adapter on the other
cluster system.
You cannot connect the adapter to an external terminator,
such as a pass-through terminator, because the adapter does not function
correctly with external termination. Therefore, the adapter does not support
hot plugging. Use jumpers to enable the onboard termination for the Ultra160 segment. You cannot connect a cable to the internal Ultra160 connector. For an alternate configuration, disable the onboard termination for the Ultra160 segment, or set it to automatic. Then, attach a terminator to the end of an internal cable that is connected to the internal Ultra160 connector. |
Adaptec 29160LP (minimum driver: AIC7xxx V5.1.28) |
Ultra160 VHDCI external connector One channel Set the onboard termination by using the BIOS utility. Termination is disabled when the power is off, unless jumpers are used to enforce termination. |
Set the onboard termination to automatic (the default). You can use the internal SCSI connector for private (non-cluster) storage. |
This configuration is not supported, because the adapter and its Linux driver
do not reliably recover from SCSI bus resets that can be generated by the host bus adapter on the other
cluster system.
You cannot connect the adapter to an external terminator,
such as a pass-through terminator, because the adapter does not function
correctly with external termination. Therefore, the adapter does not support
hot plugging. Use jumpers to enable the onboard termination. You cannot connect a cable to the internal Ultra160 connector. For an alternate configuration, disable the onboard termination, or set it to automatic. Then, attach a terminator to the end of an internal cable that is connected to the internal Ultra160 connector. |
Adaptec 39160 (minimum driver: AIC7xxx V5.1.28) Qlogic QLA12160 (minimum driver: QLA1x160 V3.12, obtained from www.qlogic.com/ bbs-html /drivers.html) |
Ultra160 Two VHDCI external connectors Two channels Set the onboard termination by using the BIOS utility. Termination is disabled when the power is off, unless jumpers are used to enforce termination. |
Set onboard termination to automatic (the default). You can use the internal SCSI connectors for private (non-cluster) storage. |
This configuration is not supported, because the adapter and its Linux driver
do not reliably recover from SCSI bus resets that can be generated by the host bus adapter on the other
cluster system.
You cannot connect the adapter to an external terminator,
such as a pass-through terminator, because the adapter does not function
correctly with external termination. Therefore, the adapter does not support
hot plugging. Use jumpers to enable the onboard termination for a multi-initiator SCSI channel. You cannot connect a cable to the internal connector for the multi-initiator SCSI channel. For an alternate configuration, disable the onboard termination for the multi-initiator SCSI channel or set it to automatic. Then, attach a terminator to the end of an internal cable that is connected to the multi-initiator SCSI channel. |
LSI Logic SYM22915 (minimum driver: SYM53c8xx V1.6b, obtained from ftp.lsil.com /HostAdapter Drivers/linux) |
Ultra160 Two VHDCI external connectors Two channels Set the onboard termination by using the BIOS utility. The onboard termination is automatically enabled or disabled, depending on the configuration, even when the module power is off. Use jumpers to disable the automatic termination. |
Set onboard termination to automatic (the default). You can use the internal SCSI connectors for private (non-cluster) storage. |
Testing has shown that the adapter and its Linux driver reliably recover from SCSI bus resets
that can be generated by the host bus adapter on the other cluster system.
For hot plugging support, use an external terminator, such as a VHDCI pass-through terminator, a
VHDCI y-cable, or a VHDCI trilink connector. You cannot connect a cable to the internal connector.
For no hot plugging support, connect a cable to the internal connector, and connect a terminator to the end
of the internal cable attached to the internal connector.
|
Adaptec AIC-7896 on the Intel L440GX+ motherboard (as used on the VA Linux 2200 series) (minimum driver: AIC7xxx V5.1.28) |
One Ultra2, wide, LVD port, and one Ultra, wide port Onboard termination is permanently enabled, so the adapter must be located at the end of the bus. |
Termination is permanently enabled, so no action is needed in order to use the adapter in a single-initiator bus. |
The adapter cannot be used in a multi-initiator configuration, because it does not function correctly in this configuration. |
QLA2200 (minimum driver: QLA2x00 V2.23, obtained from www.qlogic.com /bbs-html /drivers.html) |
Fibre Channel arbitrated loop and fabric One channel |
Can be implemented with point-to-point links or with hubs. Configurations with switches have not been tested. Hubs are required for connection to a dual-controller RAID array or to multiple RAID arrays. |
This configuration has not been tested. |
If you are using Adaptec host bus adapters for the shared disk storage connection,
edit the /etc/lilo.conf
file and either add the following line or edit the
append line to match the following line:
append="aic7xxx=no_reset"
If you are using a Vision Systems VScom 200H PCI card, which provides you
with two serial ports, you must bind the I/O port and IRQ of the card's UART
to the cluster system. To perform this task,
use the vscardcfg
utility that is provided by Vision Systems. You can also use the setserial
command.
There is a problem with the Tulip network driver that is included in the 2.2.16 Linux kernel, and with network cards that use the PNIC and PNIC-2 Tulip-compatible Ethernet chipset. Examples of these cards include the Netgear FA310tx and the Linksys LNE100TX.
The cards do not re-establish a connection after it has been broken and the Ethernet link beat has been lost. This is a problem in a cluster if one cluster system fails and the card looses the link beat. This problem will be addressed in future Tulip drivers.
If you experience this problem, there are several temporary solutions available:
# /sbin/modprobe tulip options=0x214See www.scyld.com/network/tulip.html for more information on what options you can use, in addition to recent updates and new drivers.
If you experience this problem with the Tulip network driver, perform an ifdown/ifup on the Ethernet device to reinitialize the driver and make the link active again.
The information in the following sections can help you manage the cluster software configuration:
A cluster uses several intracluster communication mechanisms to ensure data integrity and correct cluster behavior when a failure occurs. The cluster uses these mechanisms to:
The cluster communication mechanisms are as follows:
If a cluster system determines that the quorum timestamp from the other cluster system is not up-to-date, it will check the heartbeat status. If heartbeats to the system are still operating, the cluster will take no action at this time. If a cluster system does not update its timestamp after some period of time, and does not respond to heartbeat pings, it is considered down.
Note that the cluster will remain operational as long as one cluster system
can write to the quorum disk partitions, even if all other communication mechanisms
fail.
The cluster daemons are as follows:
Understanding cluster behavior when significant events occur can help you manage a cluster. Note that cluster behavior depends on whether you are using power switches in the configuration. Power switches enable the cluster to maintain complete data integrity under all failure conditions.
The following sections describe how the system will respond to various failure and error scenarios:
In a cluster configuration that uses power switches, if a system "hangs," the cluster behaves as follows:
In a cluster configuration that does not use power switches, if a system "hangs," the cluster behaves as follows:
A system panic is a controlled response to a software-detected error. A panic attempts to return the system to a consistent state by shutting down the system. If a cluster system panics, the following occurs:
Inaccessible quorum partitions can be caused by the failure of a SCSI adapter that is connected to the shared disk storage, or by a SCSI cable becoming disconnected to the shared disk storage. If one of these conditions occurs, and the SCSI bus remains terminated, the cluster behaves as follows:
A total network connection failure occurs when all the heartbeat network connections between the systems fail. This can be caused by one of the following:
If a total network connection failure occurs, both systems detect the problem, but they also detect that the SCSI disk connections are still active. Therefore, services remain running on the systems and are not interrupted.
If a total network connection failure occurs, diagnose the problem and then do one of the following:
If a query to a remote power switch connection fails, but both systems continue to have power, there is no change in cluster behavior unless a cluster system attempts to use the failed remote power switch connection to power-cycle the other system. The power daemon will continually log high-priority messages indicating a power switch failure or a loss of connectivity to the power switch (for example, if a cable has been disconnected).
If a cluster system attempts to use a failed remote power switch, services running on the system that experienced
the failure are stopped. However, to ensure data integrity, they are not failed
over to the other cluster system. Instead, they remain stopped until the hardware
failure is corrected.
If a quorum daemon fails on a cluster system, the system is no longer able to monitor the quorum partitions. If you are not using power switches in the cluster, this error condition may result in services being run on more than one cluster system, which can cause data corruption.
If a quorum daemon fails, and power switches are used in the cluster, the following occurs:
If a quorum daemon fails, and power switches are not used in the cluster, the following occurs:
If the heartbeat daemon fails on a cluster system, service failover time will increase because the quorum daemon cannot quickly determine the state of the other cluster system. By itself, a heartbeat daemon failure will not cause a service failover.
If the power daemon fails on a cluster system and the other cluster system experiences a severe failure (for example, a system panic), the cluster system will not be able to power-cycle the failed system. Instead, the cluster system will continue to run its services, and the services that were running on the failed system will not fail over. Cluster behavior is the same as for a remote power switch connection failure.
If the service manager daemon fails, services cannot be started or stopped until you restart the service manager daemon or reboot the system.
A copy of the cluster database is located in the /etc/opt/cluster/cluster.conf file. It contains detailed information about the cluster members and services. Do not manually edit the configuration file. Instead, use cluster utilities to modify the cluster configuration.
When you run the member_config script, the site-specific information you specify is entered into fields within the [members] section of the database. The following is a description of the cluster member fields:
start member0 start chan0 |
Specifies the tty port that is connected to a null model cable for a serial heartbeat channel. For example, the serial_port could be /dev/ttyS1. |
start chan1 name = interface_name type = net end chan1 |
Specifies the network interface for one Ethernet heartbeat channel. The interface_name is the host name to which the interface is assigned (for example, storage0). |
start chan2 device = interface_name type = net end chan2 |
Specifies the network interface for a second Ethernet heartbeat channel. The interface_name is the host name to which the interface is assigned (for example, cstorage0). This field can specify the point-to-point dedicated heartbeat network. |
|
|
Specifies the identification number (either 0 or 1) for the cluster system and the name that is returned by the hostname command (for example, storage0). |
powerSerialPort = serial_port |
Specifies the device special file for the serial port to which the power switches are connected, if any (for example, /dev/ttyS0). |
powerSwitchType = power_switch |
Specifies the power switch type, either RPS10, APC, or None. |
quorumPartitionPrimary = raw_disk quorumPartitionShadow = raw_disk end member0 |
Specifies the raw devices for the primary and backup quorum partitions (for example, /dev/raw/raw1 and /dev/raw/raw2). |
When you add a cluster service, the service-specific information you specify is entered into the fields within the [services] section in the database. The following is a description of the cluster service fields.
start service0 name = service_name disabled = yes_or_no userScript = path_name |
Specifies the name of the service, whether the service should be disabled after it is created, and the full path name of any script used to start and stop the service. |
preferredNode = member_name relocateOnPreferredNodeBoot = yes_or_no |
Specifies the name of the cluster system on which you prefer to run the service, and whether the service should relocate to that system when it reboots and joins the cluster. |
start network0 ipAddress = aaa.bbb.ccc.ddd netmask = aaa.bbb.ccc.ddd broadcast = aaa.bbb.ccc.ddd end network0 |
Specifies the IP address, if any, and accompanying netmask and broadcast addresses used by the service. Note that you can specify multiple IP addresses for a service. |
start device0 name = device_file |
Specifies the special device file, if any, that is used in the service (for example, /dev/sda1). Note that you can specify multiple device files for a service. |
|
start mount name = mount_point fstype = file_system_type |
Specifies the directory mount point, if any, for the device, the type of file system, the mount options, and whether forced unmount is enabled for the mount point. |
owner = user_name group = group_name mode = access_mode end device0 end service0 |
Specifies the owner of the device, the group to which the device belongs, and the access mode for the device. |
The Oracle database recovery time after a failover is directly proportional to the number of outstanding transactions and the size of the database. The following parameters control database recovery time:
To minimize recovery time, set the previous parameters to relatively low values. Note that excessively low values will adversely impact performance. You may have to try different values in order to find the optimal value.
Oracle provides additional tuning parameters that control the number of database transaction retries and the retry delay time. Be sure that these values are large enough to accommodate the failover time in your environment. This will ensure that failover is transparent to database client application programs and does not require programs to reconnect.
For raw devices, there is no cache coherency between the raw device and the block device. In addition, all I/O requests must be 512-byte aligned both in memory and on disk. For example, the standard dd command cannot be used with raw devices because the memory buffer that the command passes to the write system call is not aligned on a 512-byte boundary. To obtain a version of the dd command that works with raw devices, see www.sgi.com/developers/oss/.
If you are developing an application that accesses a raw device, there are restrictions on the type of I/O operations that you can perform. For a program, to get a read/write buffer that is aligned on a 512-byte boundary, you can do one of the following:
The following is a sample program that gets a read/write buffer aligned on a 512-byte boundary:
#include <stdio.h> #include <malloc.h> #include <sys/file.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> main() { int zfd; char *memory; int bytes = sysconf(_SC_PAGESIZE); int i; zfd = open("/dev/zero", O_RDWR); if (zfd == -1) { perror("open"); return(1); } memory = mmap(0, bytes, PROT_READ|PROT_WRITE, MAP_PRIVATE, zfd, 0); if (memory == MAP_FAILED) { perror("mmap"); return(1); } printf("mapped one page (%d bytes) at: %lx\n", bytes, memory); /* verify we can write to memory...*/ for (i = 0; i < bytes; i++) { memory[i] = 0xff; } }
You can use a cluster in conjunction with Linux Virtual Server (LVS) to deploy a highly available e-commerce site that has complete data integrity and application availability, in addition to load balancing capabilities. Note that various commercial cluster offerings are LVS derivatives. See www.linuxvirtualserver.org for detailed information about LVS and downloading the software.
The following figure shows how you could use a cluster in an LVS environment. It has a three-tier architecture, where the top tier consists of LVS load-balancing systems to distribute Web requests, the second tier consists of a set of Web servers to serve the requests, and the third tier consists of a cluster to serve data to the Web servers.
In an LVS configuration, client systems issue requests on the World Wide Web. For security reasons, these requests enter a Web site through a firewall, which can be a Linux system serving in that capacity or a dedicated firewall device. For redundancy, you can configure firewall devices in a failover configuration. Behind the firewall are LVS load-balancing systems, which can be configured in an active-standby mode. The active load-balancing system forwards the requests to a set of Web servers.
Each Web server can independently process an HTTP request from a client and send the response back to the client. LVS enables you to expand a Web site's capacity by adding Web servers to the load-balancing systems' set of active Web servers. In addition, if a Web server fails, it can be removed from the set.
This LVS configuration is particularly suitable if the Web servers serve only static Web content, which consists of small amounts of infrequently changing data, such as corporate logos, that can be easily duplicated on the Web servers. However, this configuration is not suitable if the Web servers serve dynamic content, which consists of information that changes frequently. Dynamic content could include a product inventory, purchase orders, or customer database, which must be consistent on all the Web servers to ensure that customers have access to up-to-date and accurate information.
To serve dynamic Web content in an LVS configuration, you can add a cluster behind the Web servers, as shown in the previous figure. This combination of LVS and a cluster enables you to configure a high-integrity, no-single-point-of-failure e-commerce site. The cluster can run a highly-available instance of a database or a set of databases that are network-accessible to the web servers.
For example, the figure could represent an e-commerce site used for online merchandise ordering through a URL. Client requests to the URL pass through the firewall to the active LVS load-balancing system, which then forwards the requests to one of the three Web servers. The cluster systems serve dynamic data to the Web servers, which forward the data to the requesting client system.
Note that LVS has many configuration and policy options that are beyond the
scope of this document. Contact the Mission Critical Linux Professional Services
organization for assistance in setting up an LVS environment. In addition, see
the packaged versions of LVS from the following vendors: