Index of /nongnu/clusterix/

NameLast ModifiedSizeType
Parent Directory/ -  Directory
README.txt2005-Oct-10 22:56:0528.0Ktext/plain
README.txt.sig2005-Oct-10 22:56:070.1Kapplication/pgp-signature
clusterix4_8.tar.gz2005-Oct-10 22:56:3045.2Kapplication/gzip
clusterix4_8.tar.gz.sig2005-Oct-10 22:56:320.1Kapplication/pgp-signature
clusterix5_1.tar.gz2006-Aug-21 23:01:2434.9Kapplication/gzip
clusterix5_1.tar.gz.sig2006-Aug-21 23:01:280.1Kapplication/pgp-signature
# This file is part of Clusterix, Copyright (C) 2004 Alessandro Manzini, 
# email: a.manzini@infogroup.it

# Clusterix is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# Clusterix is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Clusterix; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA



		#############   Clusterix(wsd)   #############

It    is    a   cluster    software    for    unix   operating    system
(Linux,FreeBSD,Solaris)

there are 2 type of clusterix:
1) Clusterix:  this type  write status  information on  a raw  device on
shared disk.
2) Clusterixwsd:  this type  write status information  on file  on local
2disk of the nodes.

It is intented to  be used with 2 machines if you want  to put a service
in high  availability. With service  i mean ip addresses,  program (like
web server,database etc), shared disk. So  we can use for example to put
in  high  availability a  web  server  in 2  nodes  or  to put  in  high
availability a database  with shared disk. If you want  to put more than
one service  in high availability so  that one service stay  in one node
and the second in  the other node you can simply  make 2 installation of
clusterix in 2 different  directory of the 2 nodes. When  the there is a
failer of service on one node the cluster system move the service on the
other node,  that is  it configure  ip,publish macaddress  of interface,
mount disk  if any  in configuration file  and at the  end it  start the
program (web server, database etc).

If you have  to mount shared disk  and you want to  use very intensivily
these disk,  maybe it  is better to  use clusterixwsd  because clusterix
write  informazion on  shared  disk (not  exchange information  throught
network  like clusterixwsd)  and can  take away  some resource  that are
needed by the service.

- the cluster  software is composed by a  main script (clusterixwsd.sh),
one  configration file  (clusterixwsd.conf), one  file that  control the
start and  stop sequence of a  service (script.sh), one file  that start
and stop the  service (service.sh),2 files that check the  status of the
service  and switch  the cluster  if service  failer (control_process.sh
control_control.sh).   only   the   script   clusterixwsd.sh   and   the
configuration file clusterixwsd.conf are needed for the correct function
of the cluster. others file and script are optional and they are used to
control and administrate the service offered by the cluster.
- All the variables that you have  to set are in the configuration file.
- The cluster script and configuration file  have to be the same in both
of the nodes.
- Usually you have only  to make change to clusterixwsd.conf, service.sh
and to conffile variable on clusterixwsd.sh. The other you have to leave
untouched.

HOW IT FUNCTION

a) cluster with status information on shared disk: Clusterix

                                Public Interface
            |-----------------------------------------------|
        ____|____             Private Interface         ____|____
        |       |               <--------->             |       |
        |       |               <--------->             |       |
        ---------      Backup Private Interface         ---------
           |                                                |
   Write   |                                                | Write
           |                    raw device                  |
        ----------------------------------------------------------
        |                                                        |
        |writes host 1          |       |write host 2            |
        |________________________________________________________|

The information  are placed in  one rawdevice  of a shared  disk. Mainly
every  node writes  a  timestamp  that the  other  goes  to read.If  the
timestamp is unchanged  by tot sec (variable timeoutdisk)  the node that
reads decides the  other is down and  takes the services if  it does not
still have them.  The cluster takes advantage also  of information taken
from  the  net. It  can  use  zero, one  or  two  private interfaces  of
communication  with  the  other  node.  In order  to  activate  them  or
deactivate set variable  PRNI and BNI. Before  controlling the timestamp
of  the  other node,  the  cluster  check  if  the public  interface  is
communicating on the  net. If yes the timestamp is  checked of the other
node.  If no  it checks  through the  private interfaces  if the  public
interface of the other node is up. If yes it stop the services and start
them on the other node. If no it  does not take no action. If beyond the
public interface also  the private one is down, it  tries to communicate
with the private interface of backup. If also this is down, the isolated
node stop its services. The private interfaces (or public in the case in
which there  is no private  interface) are used  to check if  the public
interfaces on the  2 nodes are up  and in order to  launch vary commands
from a node to the other. In this type of cluster the information on the
status (timestamp etc) are written on  the raw device that is visible at
the same time by the 2 nodes.

b) Cluster with status information on local disk: Clusterixwsd


                                Public Interface
            |-----------------------------------------------|
        ____|____             Private Interface         ____|____
        |       |               <--------->             |       |
        |       |               <--------->             |       |
        ---------      Backup Private Interface         ---------
           | |    |                                |       | |
   Write   | |    |R                              R|       | | Write
   Host 1  | |    |e                              e|       | | host 2
           | |    |a                              a|       | |
           | |    |d                              d|       | |
           | |    |                                |       | |
           | |    |h                              h|       | |
   --------------------                          ----------------------
           | |    |o                              o|       | |
   Write   | |    |s                              s|       | | Write
   Host 2  | |    |t                              t|       | | Host 1
           | |    |1                              2|       | |
           | |    v                                v       | |
           |_|    v                                v       |_|

    Quorum file host 1                             Quorum file host 2

Every node writes  the information and the timestamp on  a file on local
disk  on certain  blocks and  they  also writes  a timestamp  on a  very
precise block on file on the  other node. Every node reads the timestamp
on the  local file. In this  case is fundamental that  puclic or private
network function in order to determine the state of the node. It can use
zero,  one or  two private  interfaces of  communication with  the other
node. In  order to activate them  or deactivate check variable  PRNI and
BNI. Before  controlling the  timestamp of the  other node,  the cluster
check if  the public interface is  communicating on the net.  If yes the
timestamp is  checked of the  other node. If  yes it checks  through the
private interfaces if  the public interface of the other  node is up. If
yes it stop  of the services and it  start on the other node.  If yes it
does not take no action. If beyond the public interface also the private
is down, it  tries to communicate with the private  interface of backup.
If also this  is down, the isolated node stop  its services. The private
interfaces  (or  public  in  the  case in  which  there  is  no  private
interface) are used to check if the public interfaces on the 2 nodes are
up and  in order to launch  vary commands from  a node to the  other. In
this type of  cluster the information on the status  (timestamp etc) are
exchanged via network.

For both types of cluster

For both types  based on the settings the start  of the virtual services
implies start  of one or  more IP addreses, the  start of a  program and
mount of one or more disk.  The differences in the settings of variables
with  linux  or  freebsd are  not  many.  and  it  is possible  to  find
indications on  how to  set the  variable in  the configuration  file as
comments before the  variable. Variable that can  be different according
of the os you are using are date, ps, dd, ifconfig,for definition of the
alias, the  mount/umount of the disks.  In the case in  which you active
external control on  the services, 2 processes are  activated: the first
one controls that  the service is up.  if not it switch  the services on
the other node. The other controls that  the first process is up. if not
it restart it. The first process restart the second in the case in which
it is not present.  So the 2 processes check one the  other. In the case
in which  we have  two host  with 3 interfaces  of net  you can  set the
variable PRNI=on and BNI=on. In this  case we have 1 public interface, 1
private and 1 private interface of  backup. If we have only 2 interfaces
we set the PRNI=on and BNI=off.So we  will have 1 public interface and 1
private one without  interface of backup. In the case  in which the host
have one  single interface  the cluster  can only  work with  the public
interface (but it is better to not use so). In this case we set variable
PRNI=off and BNI=off. The command cluster status shows timestamp, status
and if  the node own  the services  for every of  the 2 nodes,  when the
service is started  and finally the status of all  the interfaces in the
configuration file. A detailed  log can be found in the  file set in the
variable log. It is possible to set the timeout after which if the field
timestamp is  not changed,the  node is considered  down from  the other.
This  variable  is timeoutdisk.  The  variable  checkfreq regulated  the
frequency  with  which  the  timestamp  is  written  and  the  frequency
with  which the  timestamp written  by the  other node  is checked.  The
variable  checkprocfreq regulated  the frequency  with which  the script
control_process check  the availability  of the  service. The  script of
control of the  availability of the service can be  whatever script that
give 0 for ok and 1 for ko. When it is executed the stop of the service,
the cluster try to umount disks  if any in configuration file. If umount
command fails, cluster force a crash of node to prevent data corruption
(if  you want  to disable  this, you  have to  leave unset  crashcommand
variable).

BEFORE TO INSTALL

It    is    needed    whichever   unix    os    between    Linux,FreeBSD
and   Solaris.    It   is   also   needed    the   following   commands:
bash,hostname,ps,dd,ssh,date,kill,ifconfig,ping,send_arp (only for linux
not for  freebsd). if  you want  mount and  umount filesystem  you needs
also: mount,umount,lsof  (for linux e solaris),fstat  (for freebsd). For
solaris you need to use ksh for the script.

For ping you have to install the ping program of iputils. For Freebsd is
right the default program. This because you want ping to be able to send
broadcast request. The syntax of ping  program for linux to establish if
network is down is "ping  -b -c 3 -f -w 2" for Freebsd  is "ping -c 3 -f
-t 2" and for Solaris is "ping -c 1"

All this program are put in  variables in the configuration file and can
be substitue by other command that make the same things.

The ssh program has to be configured so that you can launch command from
root on  the other node without  any password on all  the interfaces. If
there is installed openssh you can receive it so:

ssh-keygen  -t  rsa (for  dsa  is  ssh-keygen  -t  dsa.it is  the  same)
this  creates a  key in  ~/.ssh/id_rsa e  ~/.ssh/id_rsa.pub. the  key in
~/.ssh/id_rsa.pub has to be placed in the file ~/.ssh/authorized_keys of
the other node.

examples of the files:
~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA1brtV7H9V5A3yLDYxUG71eGO0nvHmJ2g2+U7
n+Ed5cs0C8mW3Ecb5PkQqCHmdErVQFnzs8BllZSoAcmfxMSjbH7DZKmlz/z0V3CcRgIc661o
TfrIFc/xk7GDxQiaNO8+VMw/BjrtWsYxPHT5vkzigPQPdLBhamFWKTYeTJAX7sE= root@be
llatrix.intra.it
~/.ssh/authorized_keys dell'altro nodo:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA1brtV7H9V5A3yLDYxUG71eGO0nvHmJ2g2+U7
n+Ed5cs0C8mW3Ecb5PkQqCHmdErVQFnzs8BllZSoAcmfxMSjbH7DZKmlz/z0V3CcRgIc661o
TfrIFc/xk7GDxQiaNO8+VMw/BjrtWsYxPHT5vkzigPQPdLBhamFWKTYeTJAX7sE= root@be
llatrix.intra.it

then you repeat these things inverting the  2 nodes. then you have to go
in  the  file  /etc/ssh/sshd_config  and  set  yes  the  permitrootlogin
variable and restart ssh daemon. at  this point you can lanch command as
root on both of the 2 nodes without any password.

you have also  to say to the  node to ignore broadcast  request. This is
because in  this manner you  can establish  if network is  reacheable or
not. In  fact the  test that  try to  know if  network is  reacheable of
not  send  a  broadcast  request  to  the  network.  so  you  dont  want
that  the machine  itself reply  to  this request.  So you  have to  set
net.ipv4.icmp_echo_ignore_broadcasts =  1 for linux. Freebsd  by default
dont reply to broadcast request.

In order to  permit that when the  virtual ip pass from one  node to the
other, the mac  address is published correctly to all  machine, you have
to put the nodes in the same  virtual lan with the router on the switch.
In  this manner  the  gratuitos  arp request  send  by send_arp  program
arrives to router and so to all other network. Otherwise uou can see the
virtual ip only in its network.

INSTALLAZIONE

For cluster with status information on local disk:
create a directory to put cluster software:
mkdir /opt/clusterix
create a directory to put log and to put file that contain
status information.
mkdir /opt/clusterix/log
mkdir /opt/clusterix/qf
create file for status information:
touch /opt/clusterix/qf/quorumfile
put the following files in the directory /opt/clusterix:

clusterixwsd.sh 		Main cluster script
clusterixwsd.conf		Configuration file
control_control0_4.sh 		Scirpt that control control_process0_4.sh
control_process0_4.sh		Scirpt that control control_control0_4.sh and
				that lanch the script that test the availability of
				the service
script.sh			Script that control the start and stop of
				the service
service.sh 			Script that start and stop the service (it is 
                                only to test the cluster at the beginning 
                                of installation)
controllo 			(script that have to give 0 for ok and 1 for ko)
gotest.sh                       the service use for test (it is only to test 
				the cluster at the beginning of installation)

Note: all the files names can be  changed and then changes the values of
variables in the configuration files)

Note: All the files except the configuration file have to be executable.

Note: The send_arp program have to take the following parameters: 
usage: send_arp src_ip_addr src_hw_addr targ_ip_addr targ_hw_addr
I include that send_arp program for linux in this archive. Note that for
FreeBSD and Solaris you havent needed of it.

Note: If for  some reason you want  to change the block  number in which
the cluster write information either in  file or raw device, you have to
see the blocknum function in clusterix(wsd).sh.

Note: For  Solaris you have  to use another  date program to  return the
number of secs from 1 January 1970.  I include in this tar datenum.c and
compiled version datenum that give the right result.

Then you  have to change  the values of  the variable according  to your
system:  for the  file clusterixwsd.sh  you  have only  to set  conffile
variable to the path of the configuration file. Then you have to set all
the variables  in the configuration file  clusterixwsd.conf according to
your needs and following the comment  that is before each variable. When
you are  installing it is  better that you set  the variable to  use the
test service that  is with the cluster  and only when you  are sure that
all  function  that you  substitute  this  test  service with  the  real
service. after  that you  have set the  variable in  configuration files
you  have  to  copy  all  these  files  on  the  other  node.  than  you
can give  the command: "/opt/clusterix/clusterixwsd.sh  initialize" this
initialize the file in which the status information are stored. then you
start the  cluster: "/opt/clusterix/clusterixwsd.sh  start" now  you can
check  the  log  to see  if  it  is  all  right, and  give  the  command
"/opt/clusterix/clusterixwsd.sh status"  to see if  it is all  right and
check for the presence of the test service with: "ps auxw | grep gotest"
and  check  for the  precence  of  an ip  and  mounted  disk if  any  in
configuration  file. then  you  start  the cluster  in  the other  node:
"/opt/clusterix/clusterixwsd.sh remote start" and then check again with:
"/opt/clusterix/clusterixwsd.sh status" at this  point the cluster is up
and function and if there is some hardware failer the cluster switch the
service. but until  now there is no  check of service. So if  there is a
service fails  or you  stop the  service the cluster  dont know  of that
and  doesnt make  anything.  So  you can  start  the  external check  of
the  service:  "/opt/clusterix/clusterixwsd.sh  startcontrol"  and  then
check  the statuscontrol  program with:  "/opt/clusterix/clusterixwsd.sh
statuscontrol"

At  this point  we can  make some  test. For  example you  can stop  the
service with "/opt/clusterix/service.sh  stop" (or ps auxw  | grep gotest
and then kill pid) and see that the cluster switch it on the other node.
you can stop the  cluster with "/opt/clusterix/clusterixwsd.sh stop" and
see that the cluster start the service on the other node. you can reboot
the machine and test that the service go on the other node etc.

The same  is valid for  cluster with  status information on  shared disk
with the exception that instead  to specified a quorumfile variable with
a file you have  to set a variable quorumdevice to a  raw device that is
on shared disk.

after that  all the things function  you can put your  service under the
control of  the cluster: so  set the variable start_service_comand  to a
script  that  start the  service  set  the variable  stop_service_comand
to  a  script  that  stop   the  service  set  stop_service_comand_2  to
a  script  that   stop  the  service  if  the  first   script  fails  to
stop  it  set  includestringstart  excludestringstart  includestringstop
excludestringstop according to your needs.

Explanation of the variables in configuration file:

version			is the version of the cluster system
pathcluster		is the path of the main script
operatingsystem		Specify operating system. Possible chooses: Linux,FreeBSD,
			Solaris.if you use Solaris change also the default 
			shell in ksh in all the script
hostname		program that have to return the host name without domain
			like host1 and not host1.domain.org
node1			Name of the 2 nodes of the cluster without domain
node2

log			Log file cluster
quorumfile		File that contain status information. For cluster
			with status information in local disk
quorumdevice		Raw device that contain status information. For cluster
                        with status information in shared disk
blocksize		block size in bytes inside the status information file 
			or partition
timeoutdisk		Timeout accessing status information file after that 
			a node is thought 
			down by other node
checkfreq		Frequency of the check and write information on status
			information file

timeoutkillremotestop	Timeout to wait before to kill the process that 
			remotely stop the cluster on the other node. 
			This is useful to give time to disk to umount.
			So it is advisable that is greater  than timeoutstop + 
			timeoutstop2 + umountfailedsec * umountcountnum. if you
			make so you are sure that when the node mount the disk
			the other node have just umounted them.

script			 Script that control the start and the stop of the service
servicename		Name of the service to start. it appear in the log and in 
			the email
start_service_comand	Command that start the service.
			IMPORTANT: start_service_comand variable has to be different 
			from includestringstart variable
stop_service_comand	Command that stop the service.
stop_service_comand_2	Second stop command to lanch if the first dont stop the service

includestringstart
excludestringstart
includestringstop
excludestringstop	
			List of string pattern to match or exclude for starting 
			and stopping the service. Put the strings separated by a "|". 
			example "pattern1|pattern2". For the include the service 
			start only if all the pattern are matched.  For the exclude, 
			all the pattern are excluded from the matched ones and you 
			can also use regular expression for the exclude.

timeoutstop		Timeout after which begin the stop of the process without 
			waiting for normal stop
timeoutstop_2		Timeout for the the second stop script after which begin 
			the stop of the process without waiting for normal stop
timeoutstart		Timeout after which the service is tried to start again
countstartlimit		Number of tries to start the service
begincheck		Number of seconds after the start of the service before to 
			begin to check the precence of the process
trycountnumquorum	Number of tries before to say that the other member is 
			not polling quorum device
trysecintquorum		Interval between one try to another to see if other node 
			is polling quorum device
trymessagequorum	Message to display in log for polling quorum device test

date			have to return the date as Wed Sep 21 10:51:35 CEST 2005
datenum			have to return date as 1127292729 (seconds by 1 Jan 1970)

mailto			Destination address of mail alert

control_process		path to the control_process script
control_control		path to the control_control script

controlscript		Path to script that control the availability of the service. 
			It have to return 0 for good 1 for wrong
checkprocfreq		Frequency of the check of the availability of the service
countfailedservicenum	Number of tries before to say that a service is down
failedtrysec		secs between one failed try to the next
trymessageservice	Message to display in log when trying this service

fsckbeforemount		put ON if you want to make an fsck of file system 
			before to mount it
umountcountnum		Number of tries to umount the device
umountfailedsec		secs between one failer try to the next
umountfaildmessage	Message to display in log for umount failer
crashcommand		Command that force crash on the node. 
			reboot -f for linux reboot -q for freebsd.
			It is needed to be sure that when you mount
			a file system on the node, the file system is not
			mounted in the other node.
			Leave unset if you dont want to force crash of
			node in the case of umount failer.
numdevice		Number of device to mount,umount when cluster start,stop. 
devicetomount1		device to mount
mountpoint1		Mount point

killprocess		# Define the utility to use to kill the process before 
			to umounting a file system.
			For Solaris,Linux: 
			for pidopenfile in `$lsof $1 | $awk '{print $2}' |$uniq | $grep -v PID`; do if [ ! -z "$pidopenfile" ]; then $kill -9 $pidopenfile; fi; done
			For Freebsd:
			for pidopenfile in `$fstat -f $1 | $awk '{print $6}' | $grep -v INUM`; do if [ ! -z $pidopenfile ]; then $kill -9 $pidopenfile; fi; done

PRNI				set to on if you want to you a private network interface
node1ipprivatenetwork1		ip of  private network interface on node1
node2ipprivatenetwork1		ip of  private network interface on node2
BNI				set BNI (backup network interface) to on if you 
				have and want to use a private backup interface
node1ipprivatenetwork2		ip of  backup private network interface on node1
node2ipprivatenetwork2		ip of  backup private network interface on node2
node1ippublicnetwork		ip of  public network interface on node1
node2ippublicnetwork		ip of  public network interface on node2
netmaskpublicnetwork		netmask of  public network interface
broadcastpublicnetwork		broadcast  of  public network interface
interfacepublicnetwork		name of  public network interface
trycountnumpubnet		Number of tries before to say that public interface 
				is down
trysecintpubnet			Interval between one try to another to see if 
				public interface is down
trymessagepubnet		Message to display in log for public interface test

node1macaddress			mac address of public interface of node1
node2macaddress			mac address of public interface of node2
				is needed for linux not for FreeBSD

numvip				Number of virtual ip address to active on the
				start of service
useexternalvipfile		Set to "on" if you want to read vip address 
				from esternal file.
externalvipfile			 File containing vip address
                                if you use external file you have to put one
                                ip per line
vipdeflinux			Set the ip,netmask,broadcast,interface and 
				interfacenumber for virtual ip addrees.  
				if unused let the variable ip unset. If you use 
				it also add the other variable (netmask,broadcast, 
				interface,interfacenumber). you have to set when
				you put useexternalvipfile="off".
vipdeffreebsd			the same for FreeBSD
vipdefsolaris			the same for Solaris


OPERATIVE MANUAL

Clusterix 4.6...
Usage: /opt/clusterix1/clusterixwsd.sh {start|stop|startforeground|startall|stopall|stat
us|initialize|startservice|stopservice|stopcluster|stopclusterhere|startcontrol|stopcont
rol|statuscontrol|writedatenow|writeactive|version}

start:                  start cluster in background: service and check quorum device.
stop:                   stop cluster: service and check quorum device.
status:                 Retrieve status information of the 2 nodes.
initialize:             Initialize quorum device.
startforeground:        start cluster in foreground: service and check quorum device.
startcontrol:           start processes that control the availability of service.
stopcluster:            stop the cluster system without stopping the service on both nod
es.
stopclusterhere:        stop the cluster system without stopping the service on this nod
e.
stopcontrol:            stop processes that control the availability of service.
statuscontrol:          status of processes that control the availability of service.
startservice:           start only the service not the cluster system.
startserviceifnotactive:        start only the service not the cluster system only if th
e node is not active.
stopservice:            stop only the service not the check of quorum device.
stopall:                stop service,stop check quorum device and stop processes that control the availability of service.
startall:               start service,start check quorum device and start processes that
 control the availability of service. Use this only if the cluster is down also in the o
ther node.Otherwise use /opt/clusterix1/clusterixwsd.sh start.
remote {start|stop|startservice|stopservice}: start,stop,startservice,stopservice on the
 other node.
writeactive {yes|no}: Write yes,no for the status on quorum device.
writedatenow: Write actual date on quorum device.
version:                Program version.

clusterixwsd.sh initialize		- Initialize status information files
clusterixwsd.sh start      		- Start cluster on this node
clusterixwsd.sh remote start		- Start cluster on other node


clusterixwsd.sh stop	 		- Stop cluster on this node
clusterixwsd.sh remote stop		- Stop cluster on other node
clusterixwsd.sh stopcluster		- Stop cluster system without stopping 
					the service
clusterixwsd.sh status			- Check the status of the cluster
clusterixwsd.sh startservice		- Start only the service without cluster 
					(for emergency)
clusterixwsd.sh writeactive yes|no	- Write yes or no in the block that contain 
					active information on status file
clusterixwsd.sh writedatenow		- Write date in the block that contain 
					timestamp information on status file
lighttpd/1.4.35