SafeKit 7 – Knowledge Base
Known Problems, Restrictions or Changes
SK-0002, SK-0005,SK-0006,SK-0007,SK-0009,SK-0013,SK-0017,SK-0022,SK-0023,SK-0025,SK-0029,SK-0030,SK-0033,SK-0049,SK-0061,SK-0074
SK-0062,SK-0063,SK-0065,SK-0066,SK-0067,SK-0068,SK-0069,SK-0070,SK-0071,SK-0072,SK-0073,SK-0075,SK-0076 ,SK-0077,SK-0078,SK-0079,SK-0080
SK-0062,SK-0063,SK-0064,SK-0065,SK-0066,SK-0067,SK-0068,SK-0069,SK-0070,SK-0071,SK-0072,SK-0073,SK-0078
SK-0038,SK-0039,SK-0040,SK-0041,SK-0042,SK-0043,SK-0044,SK-0045,SK-0046,SK-0047,SK-0048,SK-0050,SK-0051,SK-0052,SK-0053,SK-0054,SK-0055,SK-0056,SK-0057,SK-0058,SK-0059,SK-0060,SK-0065,SK-0066,SK-0067,SK-0078
SK-0018,SK-0035 ,SK-0036,SK-0037,SK-0039,SK-0078
SK-0018,SK-0032 ,SK-0034,SK-0039,SK-0078
SK-0018,SK-0025,SK-0026,SK-0027,SK-0031,SK-0039,SK-0028,SK-0078
SK-0010,SK-0011,SK-0014,SK-0015,SK-0016,SK-0018,SK-0019,SK-0020,SK-0021,SK-0024,SK-0025,SK-0078
SK-0003,SK-0010,SK-0011,SK-0014,SK-0015,SK-0016,SK-0028,SK-0078
SK-0001,SK-0004,SK-0008,SK-0012,SK-0078
Id
: SK-0001
OS / Release :
Linux / All (For SafeKit 7.0.8 see SK-0021
)
Problem :
File replication doesn't work if there is a mount point under
the replicated directory (error “JUKEBOX”)
Mars
Id : 22041
Id
: SK-0002
OS / Release :
Windows / All
Problem : With
SQL Server 2005, SafeKit sometimes stops on primary if “Boost SQL
server priority” is used
(sqlserver process uses 100% cpu and
safekit stops with IOS - ReleaseINK kernel->user error)
Solution
: Disable “Boost SQL server priority” (SQL Management
Studio => select your server =>Properties =>
Processors)
Mars Id : 21956
Id
: SK-0003
OS / Release :
Windows 2003 64-bit kernel/ 7.0.4
Problem
: “safekit kill” command doesn't work with “exit”
or “exception” option
Solution :
use “safekit kill” command with “terminate” option
Mars
Id : 20278
Id
: SK-0004
OS / Release :
Solaris 10 / All
Problem :
Interface checker doesn't work with SK-98xx SysKonnect adapter on
Solaris 10
Solution : set
<vip check = “off”> in userconfig.xml file
Mars
Id : 19395
Id
: SK-0005
OS / Release :
Linux / All
Problem :
“safekit forcestop” doesn't complete on “nfsbox” death
Solution : Reboot your
system
Mars Id : 19565
Id
: SK-0006
OS / Release :
Linux, Solaris, Aix / All
Problem :
When Oracle 10.2 is started by SafeKit, the database startup fails
with “ORA-00205: error in Identifying control file,
check
alert log for more info” error.
Solution
: set <rfs packetsize=”32768”>
in
userconfig.xml file and use SafeKit >= 7.0.1.15
Mars
Id : 21552
Id
: SK-0007
OS / Release :
All / All
Problem : “quick
configure” doesn't work for an application module built with
SafeKit 6.2
Solution : Save
the file “<SAFE>/modules/AM/web/htmllib.lua” (AM is your
application module) and replace it with
“web/htmllib.lua”
file from a SafeKit 7.0 application module. If “quick configure”
doesn't still work, update the file
“<SAFE>/modules/AM/web/index.lua”
according to the
functions defined in “web/htmllib.lua”.
Mars
Id : 22025
Id
: SK-0008
OS / Release :
All / 7.0.1
Problem : After a
migration from SafeKit 6.2 to SafeKit 7.0 (< 7.0.1.21),
SafeMonitor fails to add a new server to administer and returns the
"received error: forbIdden" message. The problem is
that SafeMonitor 7.0 tries to work with a “httpd.conf file”
coming from SafeKit 6.2 installation and not compatible.
Solution
: Replace “httpd.conf” file (under <SAFE>/web/conf)
with “httpd.conf.default” file.
Other solution : Uninstall
SafeKit 7.0. Remove the <SAFE>/web directory and re-install
SafeKit 7.0.
Fix : Fixed in SafeKit >= 7.0.1.21
Id
: SK-0009
OS
/ Release : Windows
/ All
Problem : File
attributes replication : file encryption and
file compression are not supported
Mars
Id : 20912-20913
Id
: SK-0010
OS
/ Release : Linux
/ from 7.0.4
Restriction :
Replicated directory can not be a
a root of a file system when mountover=”off” (mandatory on Linux)See SK-0030 for a workaround
Id
: SK-0011
OS
/ Release : Aix
/ from 7.0.4
Problem : The
name of the process monitored by the software error detector (errd)
must be the one returned by the command “ps -e -o comm”. On Aix,
when the command is a symbolic link, “ps -e -o comm” returns the
value of the symbolic link. Thus, you must set this value into
<errd> configuration, and not the symbolic link name.
Id
: SK-0012
OS
/ Release : Linux
/ from 7.0.1
Restriction : NFS
server on RedHat 4 Update 3 does not support ACL. Thus acl
attribute for a replicated directory can not be set to “on”.
Id
: SK-0013
OS
/ Release : Linux /
All
Problem : Interface
checker doesn't work with bonding interfaces.
Id
: SK-0014
OS
/ Release : All
/ from 7.0.4
Restriction :
Failover of NFS mounts of
replicated directories from remote NFS clients are no more supported
Id
: SK-0015
OS
/ Release : All / from
7.0.4
Changes : Since
SafeKit 7.0.4.13 new attributes for rfs configuration
Configuration sample : <rfs
checktime=”30000” reitimeout=”50” async=”second”
moutover=”off” packetsize=”16384” maxnbretrans=”50”
reicommit=”1000”>
Id
: SK-0016
OS
/ Release : Solaris / from
7.0.4.18
Changes and Restriction
: The SafeKit 7.0.4.18 fixes
the virtual interface support on recent Solaris 10 kernels.
This version needs the "IP stack multiple instances"API
level in the Solaris Kernel. Those APIs are known to be available
in
the Solaris 10 update 4 Kernel version.
From
7.0.4.18 on, the VIP feature will not be available on Solaris
10 kernels that do not export the above APIs<.
The
incompatibility will be detected at SafeKit config time.
This
message can be seen during Safekit installation : "devfsadm:
driver failed to attach: vipdrv
Warning: Driver (vipdrv)
successfully added to system but failed to attach"
There
is no change for Solaris 9.
Id
: SK-0017
OS
/ Release : All /
All
Changes and Restriction :
SafeKit start blocks into wait state when a heartbeat
with ident=”flow” is configured while there is no replication
configuration (<rfs> section).
Solution
: Remove the
ident attribute.
Id
: SK-0018
OS
/ Release : Linux / from
7.0.8
Problem :
Red Hat > 4 freezes with file replication on heavy write
load.
In that case, the system hangs but the other server from the
cluster does not detect the error since network communication is
still working. You have then to reboot the broken server.
Solution
: The kernel freeze is a Linux bug.
You can try to the change kernel parameters as
follows:
Insert into
the file
/etc/sysctl.conf:
vm.dirty_ratio=5
vm.dirty_background_ratio=5
Run sysctl -p
Our tests show that these settings help to solve the problem is many cases.
Id :
SK-0019
OS
/ Release : Windows / from
7.0.8.7
Changes and Restriction :
SafeKit SNMP agent (safeagent service) does not
work.
Mars Id :
27387
Solution
: Use SafeKit >= 7.0.8.25
Id :
SK-0020
OS
/ Release : Windows / from 7.0.8.17
Changes
: Since SafeKit 7.0.8.17 the new attribute “roflags” for
rfs configuration is used to configure the behavior of file
replication when a process is accessing a replicated directory on
secondary.Values :
Currently, upon
notification, nfsbox logs a debug message in the log, containing the
pid anf fisrt characters of the executable image name of the
offending process,up to 10 messages.
Id
: SK-0021
OS / Release :
Linux/ 7.0.8
Problem : When
a replicated directory ( eg. “/Tests/Repli”) contains a mounted
file-system (e.g. “/Tests/Repli/MyFileSystem”), re-integration
fails with “JUKEBOX” error.
Solution
: Use SafeKit >= 7.0.8.26 and apply these changes
:
exportopt="crossmnt"
(<rfs> configuration)
- If “/Tests/Repli” is your
replicated directory and “MyFileSystem” your file system (under
“Tests/Repli”), use this command to export the file system
:exportfs -i -o
rw,wdelay,insecure,no_root_squash,no_subtree_check
localhost.localdomain:/Tests/Repli_For_SafeKit_Replication/MyFileSystem
Id
: SK-0022
OS
/ Release : Linux / All
Problem
: Correctly exported NFS mounts
sometimes fail to mount with a "Permission denied" error.
This error prevents SafeKit module using file replication (with
<rfs>) from starting. It fails with the following error into
SafeKit log:
| 2009-06-09 08:35:23:080185 | nfsboxv3 | W |
Mount error: 13.
Mars Id
: 32204
Solution
: This is a known Linux bug,
reported in RedHat
Bugzilla - Bug 452415. The reason is that there is no mount of
nfsd on /proc/fs/nfsd while nfs service is running. Check it by
running the mount command that lists all current mounts. If the
line: nfsd on /proc/fs/nfsd type nfsd (rw) is not listed,
your system is broken for NFS. Adding this mount manually (by
running the command: /bin/mount -t nfsd nfsd /proc/fs/nfsd)
produces the correct result and NFS mounts, and thus SafeKit, become
available. If you encounter this problem on Linux SafeKit server,
the workaround is to insert into the SafeKit prestart user
script the following lines:
is_mounted=`/bin/mount
| /usr/bin/awk "\\$1 ~ /^nfsd$/ { print \\$5 }"`
if [
-z "$is_mounted" ] ; then
/bin/mount -t nfsd nfsd
/proc/fs/nfsd
fi
Id
: SK-0023
OS
/ Release : All / All
Problem
: How to temporarily disconnect file mirroring from one
SafeKit module ?
Solution : Stop
the SafeKit module and edit its configuration (userconfig.xml)
in order to :
<heart>
<heartbeat>
<server addr="192.168.1.16"/>
<server addr="192.168.1.20"/>
</heartbeat>
<heartbeat ident="flow">
<server addr="10.0.0.1"/>
<server addr="10.0.0.2"/>
</heartbeat>
</heart>
Disable
file replication configuration
For this, comment all the <rfs>
tag section by inserting <!-- (for beginning comment) and
--> (for ending comment) as shown below (warning:
comments may not be nested) :
<!--
<rfs>
<flow>
<server addr="10.0.0.1"/>
<server addr="10.0.0.2"/>
</flow>
<replicated dir=”/safedir” mode=”read_only”/>
</rfs>
-->
Then the new module configuration can be applied and the module started.
WARNING: when file mirroring is
disabled, only one server must be running in alone state. The other
server must not be started since it could run a failover with not
uptodate data. You can uninstall the module on the server to ensure
that is can not start (and reinstall it later).
Id
: SK-0024
OS
/ Release : All / from
7.0.8.25
Changes : Since
SafeKit 7.0.8.25, new degraded mode for rfs component
When
nfsbox, the main rfs component, encounters a sever error, it now
goes into degraded mode on the primary server instead of stopping.
The secondary server, if one, then runs a stopstart and blocks until
the other server comes back into default mode. This improve
operational continuity since there is no restart or failover of the
application. But in degraded mode, file mirroring and high
availability is no more provided. The alone degraded server must be
restarted as primary to come back into default mode. This is a
manual operation that must be ran by the administrator (stop-prim or
stopstart via SafeMonitor or safekit command) when it knows that
stopping the application is not critical. The other server will then
run data synchronization and become secondary.
You can read
server state to get its mode (state via SafeMonitor or safekit
command). For instance, the following shows the state of a server in
degraded mode (ALONE state and up value for resource
rfs.degraded):--------------------- mirror State
---------------------
Local (127.0.0.1) : ALONE (Service :
Available)
Resources
Name State Since
heartbeat.0 up
2009-07-23 08:22:32
heartbeat.flow up 2009-07-23
08:22:32
rfs.uptodate up 2009-07-23 08:22:37
rfs.lastprimstate
down 2009-07-23 08:22:37
rfs.swapping down 2009-07-23
08:22:32
rfs.degraded up 2009-07-23
Id
: SK-0025
OS
/ Release : All/from 7.0.8
Restriction : Rename
of directory between replicated and not replicated trees are not
supported
This restriction applies only when you
configure not replicated directories into <rfs>
tag.
For instance: <rfs>
<replicated dir="/repdir"
mode="read_only">
<notreplicated path="notrepdir"
/>
</replicated>
</rfs>
Rename of
files between replicated and not replicated trees are supported. For
instance, the operations below are allowed:mv /repdir/file
/repdir/notrepdir
mv /repdir/notrepdir/file /repdir
But,
rename of directories between replicated and not replicated trees
may lead to secondary stop-start and/or to degraded mode (cf Mars
34165, 63859 and 63864). For instance, the operations below are not supported:mv
/repdir/dir /repdir/notrepdir
mv /repdir/notrepdir/dir /repdir
Id
: SK-0026
OS
/ Release : All/Since
7.0.9.17
Change : Add
user scripts argument
This argument can be used for instance to send an e-mail on module start and stop.
While transiting from STOP to WAIT
During this transition, the scripts transition and prestart are called in the following manner:
transition STOP WAIT [ start | stopstart | stopwait
]
prestart STOP WAIT [ start | stopstart ]
STOP and WAIT arguments are for the current and next states.
start argument is set on module start (with safekit start | prim | second).
stopstart argument is set on module stop-start (with safekit stopstart either called by the user or a checker).
stopwait argument is set on module stop-start for waiting a resource (wait rules of the failover machine). But only the transition user script is called in that case.
transiting from WAIT to STOP
During this transition, the scripts poststop and transition are called in the following manner:
poststop WAIT STOP [ stop | stopstart ]
transition WAIT
STOP [ stop | stopstart | stopwait ]
WAIT and STOP arguments are for the current and next states.
stop argument is set on module stop. That is a stop that is not followed by an automatinc start.
stopstart argument is set on module stop-start (with safekit stopstart either called by the user or a checker).
stopwait argument is set on module stop-start for waiting a resource (wait rules of the failover machine). But only the transition user script is called in that case.
Id
: SK-0027
OS
/ Release : Solaris 10/
releases < 7.0.4.18
Problem :
The OS does not start after reboot
Solution :
You have to
remove the vip kernel module load so as to be able to start your
system. For this, follow the procedure below:
bge -1 0 vipmode
Id
: SK-0028
OS
/ Release : Solaris 10/
7.0.4 and 7.0.9
Restriction :
SafeKit must be installed into the "global" Solaris
zone
SafeKit load-balancing and file mirroring features
implies to install the package into the "global" zone.
The zonename command returns global when logged in
the "global" zone.
When logged in a "non-global" zone, the SafeKit installation returns the following errors:
ERROR attribute verification of
/sbin/dlstyle2ify failed
Pathname does not exist
Egrep: can't open /etc/name_to_major
Id
: SK-0029
OS
/ Release : SUSE SLES 11/
All
Problem : Modules in
farm mode are unable to start because safekit vip kernel module is
not allowed to load
Solution :
You have to allow the loading of vip
kernel module. For this, set allow_unsupported_modules
to
1 in /etc/modprobe.d/unsupported-modules
Id
: SK-0030
OS
/ Release : Linux/ From
7.0.9
Problem : The module configuration fails when a replicated
directory is a mount point
Solution : Apply the following manual procedure
as work around.
This article takes the example of PostgreSQL module that set as replicated directories /var/lib/pgsql/var
and /var/lib/pgsql/data
, which are mount points.
The SafeKit module configuration fails with the error:
Error : Device or resource busy
It is the same procedure for all mounts points that must be replicated.
Detect mount points with a command line
On both nodes, check mount points with the command df -H
that returns for instance:
df -H
/dev/mapper/vg01-lv_pgs_var … /var/lib/pgsql/var
/dev/mapper/vg02-lv_pgs_data … /var/lib/pgsql/data
/var/lib/pgsql/var
and /var/lib/pgsql/data
are mount points and they must be replicated for PostgreSQL.
But the SafeKit module configuration command /opt/safekit/safekit config –m postgresql
returns
Error : Device or resource busy
What to do if a replicated directory is a mount point
/opt/safekit/modules/postgresql/userconfig.xml
:
<rfs … >
<replicated dir="/var/lib/pgsql/var" mode="read_only" />
<replicated dir="/var/lib/pgsql/data" mode="read_only" />
</rfs>
/var/lib/pgsql/var
and /var/lib/pgsql/data
/opt/safekit/safekit config –m postgresql
which should succeed (no errors)ls -l /var/lib
:
ls -l /var/lib
lrwxrwxrwx 1 root root var -> var_For_SafeKit_Replication
lrwxrwxrwx 1 root root data -> data_For_SafeKit_Replication
/etc/fstab
and change the two lines:
/dev/mapper/vg01-lv_pgs_var /var/lib/pgsql/var ext4…
/dev/mapper/vg02-lv_pgs_data /var/lib/pgsql/data ext4…
with
/dev/mapper/vg01-lv_pgs_var /var/lib/pgsql/var_For_SafeKit_Replication ext4…
/dev/mapper/vg02-lv_pgs_data /var/lib/pgsql/data_For_SafeKit_Replication ext4..
mount /var/lib/pgsql/var_For_SafeKit_Replication
and mount /var/lib/pgsql/data_For_SafeKit_Replication
Note
To protect the start of SafeKit on a non-mounted and empty directory, you can insert in userconfig.xml
the checking of a file inside the replicated directory. Example for var/
(do the same for data/
with a file inside this directory which is always present):
<replicated dir="/var/lib/pgsql/var" mode="read_only">
<tocheck path="postgresql.conf" />
</replicated>
What to do for de-configuring the module (or uninstall whole SafeKit)
If you want to deconfigure the module (or uninstall whole safekit), you must reverse this procedure by:
umount /var/lib/pgsql/var_For_SafeKit_Replication
and umount /var/lib/pgsql/data_For_SafeKit_Replication
/opt/safekit/safekit deconfig -m postgresql
/etc/fstab
to undo previous editingmount /var/lib/pgsql/var
and mount /var/lib/pgsql/data
Id
: SK-0031
OS
/ Release : Aix / Since
7.0.9.31
Change : Enable
setting of environment variables prior to nfsbox startup
For
this, add the nfsboxenv file into /opt/safekit/AM/bin (where
AM is the name of your application module) and apply the new
configuration. This file contains one line for each environment
variable to be exported. For instance, add the following line if
your system is running out of paging space because of nfsbox virtual
memory requirements:
MALLOCOPTIONS=disclaim
This permit to return to the operating system, the part of the pages in nfsbox virtual address space that are no longer needed.
Id
: SK-0032
OS
/ Release : Windows 2003 /
Since 7.0.10.8
Problem : Module
using <virtual_interface> (such as farm), does not start
The
module is configured with a virtual IP address on a
<virtual_interface> and the configuration succeeded. But, the
module start fails and the log contains a line saying vipplug
loading failed.
Solution : In Windows 2003, after the module configuration,
you have to access the corresponding network interface's property
sheet (the one onto which the new virtual IP address will be added)
and click OK to validate the vip driver binding. Then, the module
should start. On further references to the same network interface
(by the same module or others modules), the above procedure is not
needed.
In previous releases, vip driver binding was done during
SafeKit install on all network interfaces. Since 7.0.10, vip driver
binding are activated on demand at configuration time only on
network interfaces that needs vip driver. This avoid configuration
problems on platforms using software vlans on other network
interfaces.
In Windows 2008, the above procedure is not needed.
Id
: SK-0033
OS
/ Release : All / All
Problem : SafeKit servers
can not communicate when the firewall is on
When firewall
is turned on, you have to configure the firewall to allow
connections on SafeKit module ports. The list of used ports is
returned by the command:
safekit module getports –m AM
Id
: SK-0034
OS
/ Release : Red Hat Enterprise Linux 6 / Since
7.0.10.23
Problem : If
NetworkManager is used to manage network interfaces, SafeKit , can't
work properly in case of network failure :
When
a network cable is unplugged the network interface is unconfigured ,
and a module using <virtual_interface>, fails with
error : “vipplug config error: Can't get interface for address
...Error: environment modification need re-configuration” When the
cable is plugged again, SafeKit module start fails, and we have to
run “safekit config” again.
Problems can occur too with a
module using <real_interface>.
OS
/ Release : Red Hat Enterprise Linux 6 /
7.1.3
Problem : If
NetworkManager is used to manage network interfaces :
When
a network cable is unplugged the network interface is unconfigured ,
and a module mirror using <real_interface>, loops with
errors : "nfsboxv3 Internal error: bind failed (99) and heart bind error 99"
Solution
:
Stop NetworkManager and use system-config-network
to configure network interfaces :
On your server run :
service NetworkManager
stop
chkconfig NetworkManager off
chkconfig network
on
service network start
And run : system-config-network to manage your network interfaces.
Id
: SK-0035
OS
/ Release : Red Hat Enterprise Linux / Since 7.0.11
How to : Enable Oracle Direct NFS with SafeKit file mirroring
Since SafeKit 7.0.11, you can configure SafeKit file mirroring with Oracle 11g Direct NFS.
You have first to configure oracle for Direct NFS while SafeKit and Oracle are stopped. For this refer to the Oracle documentation . It consists in changing the ODM library by running:
cd $ORACLE_HOME/lib
cp libodm11.so libodm11.so_stub
ln –s libnfsodm11.so libodm11.so
Then you can start Oracle and check that Direct NFS is enabled. Oracle records the use of Direct NFS in alert.log and also in internal catalog v$dnfs tables. For instance, you can check
the table of servers accessed using Direct NFS by running:
su - oracle
sqlplus
system (login)
system (password)
select * from v$dnfs_servers;
pmapset="on"
. This option can be applied only on one module.
Then apply the new configuration and start the module. You can check that Oracle uses Direct NFS and connects to the nfsbox port instead of the default standard nfsd port 2049.
The nfsbox port is the nfs_port listed by the command safekit module getports -m AM
. For checking connections, read the alert.log and v$dnfs tables. You can also run the command
lsof -Pnl +M -i4
(for IPv4) or lsof -Pnl +M -i6
(for IPv6) that lists all processes connections. You should have oracle processes that connects to nfs_port.
To roll back to the standard Oracle configuration, stops the module, reconfigure it with the attribute: pmapset="on"
removed and revert Oracle configuration for Direct NFS.
Id
: SK-0036
OS
/ Release : All
/ 7.0.11
Problem : Problems in WebConsole with I9 updates.
Id
: SK-0037
OS
/ Release : All / 7.0.11
Problem
: Unable to configure virtual_addr in mirror mode
Solution : Add the following section to the configuration file (userconfig.xml)
:
<farm>
<lan>
<node name="node" addr="127.0.0.1"/>
</lan>
</farm>
Id
: SK-0038
OS
/ Release : All / Since
7.1
Change : mailsend
binary no more delivered with the SafeKit package
Since 7.1 release, mailsend
is no more delivered with the SafeKit package.
For Windows, you can download windows binary from the mailsend download area.
For Unix, you can use the mail
command instead of mailsend
. For instance, the following line, inserted in poststop script of a module,
notifies about the stop of the module:
echo "Running poststop" | mail -s "Stop module $SAFEMODULE on `hostname`" admin@mydomain.com
where "Running poststop" is the mail's body and "Stop module $SAFEMODULE on `hostname`" is the mail's subject.
Id
: SK-0039
OS
/ Release : All / Since
7.0.9
How to : Disable SSL 2 protocol into the SafeKit web server configuration
To disable insecure protocols like SSL 2.0 and weak ciphers:
SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
SSLProtocol -ALL +SSLv3 +TLSv1
SSLCipherSuite ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:!LOW:!SSLv2:!EXPORT
safekit webserver stop sync
safekit webserver start sync
Id
: SK-0040
OS
/ Release : Linux / Since 7.1.0.8
Problem
: Address IP conflict loadbalancing problems can occur if the virtual IP address is
an IPv6 address (restriction).
Id
: SK-0041
OS
/ Release : All / Since 7.1.2
Problem
: SafeKit Web console and Internet Explorer 8
The SafeKit web console may not be correctly displayed in IE 8 and returns "xml parse" errors.
Id
: SK-0042
OS
/ Release : All / Since 7.1
How to
: Configure a farm module with the spread communication protocol that it is replaced since SafeKit 7.1 by a proprietary protocol.
<farm spread="on">
<lan>
<node name="node1" addr="192.168.208.5"/>
<node name="node2" addr="192.168.208.6"/>
</lan>
</farm>
<vip>
<interface_list>
<interface arpreroute="off" check="off">
<virtual_interface type="vmac_invisible">
<virtual_addr addr="192.168.208.56" where="alias"/>
</virtual_interface>
</interface>
</interface_list>
<loadbalancing_list>
<group name="FarmProto">
<!-- Set load-balancing rule -->
<rule filter="on_port" proto="tcp" port="9000"/>
</group>
</loadbalancing_list>
</vip>
Id
: SK-0043
OS
/ Release : All / Since 7.1
How to
: Configure a mirror module with a virtual IP address mapped on a virtual MAC address.
<vip>
<interface_list>
<interface check="on">
<virtual_interface type="vmac_invisible">
<virtual_addr addr="192.168.208.56" check="off" where="one_side_alias"/>
</virtual_interface>
</interface>
</interface_list>
<loadbalancing_list>
<group name="mirrorgrp">
<rule filter="on_addr" proto="tcp" port="*"/>
</group>
</loadbalancing_list>
</vip>
<farm>
<lan>
<node name="node1" addr="127.0.0.1"/>
</lan>
</farm>
Id
: SK-0044
OS
/ Release : All / 7.1.1
Errata
: Use a third machine as spare for a mirror module (User's guide section 5.10))
safekit config
command must be issued on this machine.Id
: SK-0045 Id
: SK-0046 Id
: SK-0047 Id
: SK-0048
You have secured the SafeKit Web Console with https (see SafeKit User's Guide). Id
: SK-0049 Id
: SK-0050 Id
: SK-0051 Id
: SK-0052 Id
: SK-0053 Id
: SK-0054 Id
: SK-0055 Id
: SK-0056 Id
: SK-0057 Id
: SK-0058
OS
/ Release : Aix / 7.1.1.7
Problem
: Replication and reintegration performance
Solution
:
To improve write file performance during replication and reintegration, edit the configuration file userconfig.xml and insert the attribute mountopt="hard,combehind"
for each <replicated> tags.
See AIX documentation.
OS
/ Release : All / Since 7.1.1
Problem
: Web console problems after SafeKit upgrade
Solution
:
You have to clear your browser's cache so as to get the new web console pages. A quick way to do this is a keyboard shortcut that works on IE, Firefox, and Chrome.
Open the browser to any web page and hold CTRL and SHIFT while tapping the DELETE key. (This is NOT CTRL, ALT, DEL).
The dialog box will open to clear the browser. Set it to clear everything and click Clear Now or Delete at the bottom.
Close the browser, stop the process still running in the background if necessary, and re-open it fresh to test what wasn't working for you previously.
OS
/ Release : All / Since 7.1.2
Problem
: Interface checker "intf" attribute and "-I" parameter are deprecated
Solution
:
If the "intf" attribute is specified in the configuration file userconfig.xml, it is ignored and a message of level "D" "Deprecated argument -I" is emitted at runtime.
If the interface checker process intfcheck.exe is started at the command line with the extra argument "-I", eg :
safekit -r intfcheck <module> <resourcename> -A none -l <ipaddress> -I <interfacename>, the -I argument is ignored and a message of level "D" "Deprecated argument -I" is emitted at runtime.
OS
/ Release : All / Since 7.1.3 and < 7.1.3.5
How to
: How to administer with the web console modules installed before securing SafeKit servers (with https)
If modules have been installed before securing SafeKit servers, you have to deploy them again to change the administration network URL
to protocol https and port 9453 :
Warning : If you connect the web console to an another secured server https://anotherservername:9453, you have to repeat the procedure described in (3).
If you prefer, you can instead modify the "servers.xml" file under Application_Modules/webconsole folder and reload the web console.
From SafeKit 7.1.3.5 when connecting to "https://servername:9453", the web console automatically switches to secured url.
OS
/ Release : All / Since 7.0
Problem
: Web Console secured with https: Problem using literal IPv6 address
If you use https://[lIPV6]:9453/ or http://[IPV6]:9010/ where IPV6 is a literal IPv6 address, the connection fails "Internet Explorer cannot display the webpage"
See : Apache-Bugzilla-Bug 52831
Solution
: connect with https://[lIPV6]:9453/deploy.html, https://[lIPV6]:9453/monitor.html ... will work.
Or don't use literal addresses for IPv6.
Mars Id : 44424
OS
/ Release : Windows / Since 7.1.2.18
Problem
: Process monitoring fails when the process name contains uppercase letters
The User's Guide recommends to use the command safekit -r errdpoll_running
to get the name of running processes. The displayed name can be used to configure the process monitoring
into the <errd> section of the configuration file userconfig.xml. Since SafeKit 7.1.2.18, the displayed name is case sensitive while it should be in lower case. The reason is that the
process name comparison for the process monitoring is not case sensitive.
Solution
: When defining a process monitoring into the <errd> section of userconfig.xml, the value of the attribute name for <proc>
must in lower case. If not, the process name matching will fail.
Mars Id : 53612
Fix : In SafeKit > 7.1.3.6:
safekit -r errdpoll_running
displays the command name in lower case
OS
/ Release : All / Since 7.1
Problem
: The animated progress bar is not diplayed into the web console with IE11
Solution
: Follow the options below and check:
OS
/ Release : All / Since 7.1
Problem
: safekit modules fail to start at boot when safeagent is set to automatic start
Solution
: Follow the procedure below
OS
/ Release : Windows / 7.1
Problem
: LPR server : connections on Virtual IP don't work
Mars Id : 54939
Fix : From SafeKit 7.1.3.15 :
Follow the procedure below
( Add "<public name="SAFEVIPDONTSKIP" value="1"/>")
WARNING : this procedure impacts all the modules
OS
/ Release : All/7.1.3
Problem
: When setting the resource state in a custom checker, it logs a message in the module log even if the resource state did not changed
Mars Id : 56903
Solution : Edit your custom checker for running the command setting the resource state only if the state has changed
OS
/ Release : All / 7.1.3
How to
: force a not up-to-date server to automatically start as primary when the up-to-date server is not running ?
Solution:
When all the heartbeats from the up-to-date server are lost and the up-to-date server is not responding to ping requests, the not up-to-date server can failover or start as primary.
Warning: You may use this solution only if you don't care loosing some modifications of the replicated data.
<check>
<!-- arg is the interval in sec between 2 checks -->
<custom ident="pingremote" when="pre" exec="ping_remote" arg="10"/>
<!-- 1st arg is the interval in sec between 2 checks (>=30) -->
<!-- 2nd arg is the accepted elapsed time in min since the last synchronisation time (>1) -->
<custom ident="synced" when="pre" exec="syncedcheck" arg="30 10"/>
</check>
<failover>
<![CDATA[
force_uptodate: if (heartbeat.* == down && custom.pingremote == down && custom.synced == up && rfs.uptodate == down) then rfs.uptodate=up;
]]>
</failover>
This checker checks that the remote server is responding. It sets the resource custom.pingremote
to up if responding, to down if not responding.
[Show Unix script]
[Show Windows script]
This checker checks the elapsed time since the replicated data has been synchronised on both servers.
It sets the resource custom.synced
to up if the data is up-to-date or not up-to-date but it was synchronised elpasedtime
minutes ago. The value for elpasedtime
is the 2nd value of the attribute arg
in the custom checker configuration: <custom ident="synced" when="pre" exec="syncedcheck" arg="30 10">
Ask to Evidian support for getting the syncedcheck binary.
OS
/ Release : All/7.1.3
Problem
: Incompatible configuration options <interface arpreroute="on"
and
<virtual_interface type="vmac_invisible"
Mars Id : 57173
Solution :
<interface arpreroute="off"
when <virtual_interface type="vmac_invisible"
<interface arpreroute="on" arpelapse="60" arpinterval="5"
only when type="vmac_directed"
OS
/ Release : Linux RH5 and RH6 / 7.1
How to
: Use of RedHat httpd server instead of the SafeKit httpd server
/opt/safekit/safekit webserver stop
cd /opt/safekit/web
mv -f lib/libcrypto* ../private/bin
mv -f lib/libssl* ../private/bin
mv -f lib lib.safekit
mv -f modules modules.safekit
ln -s /usr/lib64/httpd/modules/ modules
/opt/safekit/safekit webserver start
OS
/ Release : All/ >= 7.1.3.16
Problem
: In a farm module, how to start load-balancing once the application is started and stop load-balancing before stopping the application
Solution :
Enable/disable the load_balancing with special commands ran into the user scripts start_both and stop_both. Find below the scripts templates.
[Show Unix script]
[Show Windows script]
[Show Unix script]
[Show Windows script]
Id
: SK-0059
OS
/ Release : Windows / 7.1
How to
: Use of externally built httpd server instead of the SafeKit built-in httpd server
safekit webserver stop
safekit webserver start
Id
: SK-0060 Id
: SK-0061 Id
: SK-0062 Id
: SK-0063 Id
: SK-0064 Id
: SK-0065 Id
: SK-0066 By default, an NTFS volume will have its USN journal active only the system drive. If the replicated directories are located on a drive different from the system drive, you have to explicitly activate the journal.
The default USN journal maximum size is 512 MB. If your volume contains 400,000 files or
fewer, no additional configuration is required. For every 100,000 additional files on a volume containing replicated directories, increase the USN journal size by 128 MB.
If files on the volume are
changed or renamed frequently (regardless of whether they are part of the replica set), consider sizing the USN
journal larger than these recommendations to prevent USN journal wraps, which can occur when large numbers
of files change so quickly that the USN journal must discard the oldest changes to stay within the specified size limit.
The table below includes the various figures needed to create the USN journal to different amounts. Number of files m a m Id
: SK-0067 This occurs when the USN journal has just been created on the drive containing the replicated directories and no access has yet be done on the drive.
Id
: SK-0068 Id
: SK-0069 Id
: SK-0070 Id
: SK-0071 Id
: SK-0072 Id
: SK-0073 Id
: SK-0074 Id
: SK-0075 Id
: SK-0076
Id
: SK-0077
Id
: SK-0078 Id
: SK-0079 Solution
: Id
: SK-0080 Workaround
:
OS
/ Release : Windows / 7.1.3
Problem
: Checkers start failure on module start after a crash of the server
Mars Id : 57364
Solution :
Apply the following manual procedure as work around.
Before the start of the service safeadmin
, insert the line
del "c:\safekit\var\mapper.xml"
OS
/ Release : Windows 2008 and Windows 2008 R2/ All
Problem
: File replication errors that may occur when an application extends a file (most notably, in write_through mode)
This problem is a part due to a misbehaviour of the Microsoft NTFS.sys filesystem driver described on the Microsoft support site at http://support.microsoft.com/kb/976538/en-us/
Fix :When using file replication, it is mandatory to update the windows OS at least at the level indicated http://support.microsoft.com/kb/976538/en-us/. The update procedure is also described in this knowledge base entry.
OS
/ Release : All / 7.2
Problem
: Web console: do not use literal IPv6 addresses (e.g. 3ffe:2a00:100:7031::1)
In the SafeKit 7.2 web console, you have to fill the address of the SafeKit servers for configuring the web console inventory and the SafeKit clusters. These addresses are used by the web console for
connecting to servers (but IPv6 URL must be be surrounded in square brackets). The web console does not yet manage both format.
Mars Id : 59106
Solution : The work around is to use DNS names instead of literal IPv6 addresses.
OS
/ Release : Windows 2008 R2/ 7.2
Problem
: 3 nodes replication (3nodesrepli.safe) configuration fails
Mars Id : 59141
3nodesrepli.safe configuration relies on PowerShell scripts that require for a correct execution the change of the execution policy and the 4.0 version.
Solution :
Follow the procedure described in How to install PowerShell 4.0
OS
/ Release : Windows 2008 R2 / 7.2
Problem
: SafeKit drivers load fails when Windows 2008 R2 release does not include the support for SHA-2 signing and verification functionality
Fix : You have to update your system for including the support for SHA-2.
Refer to the Microsoft Security Advisory at https://technet.microsoft.com/en-us/library/security/2949927.aspx
OS
/ Release : All / 7.2
Problem
: With IE11, "connection error" can occur after a time when the Webconsole is secured with https.
stop and start the browser are necessary.
Solution : "Internet Options"/"Advanced"; unselect "TLS 1.0" and "TLS 1.2", select only "TLS 1.1".
OS
/ Release : Windows / 7.1 and 7.2
Problem
: How to configure the USN journal in Windows when namespacepolicy="3"
in <rfs>
tag
Solution : In Windows, to enable zone reintegration after reboot when the module has been properly stopped, rfs component use the NTFS USN change journal to check that saved information on zones are still valid after reboot. When the check succeeds, zone reintegration can be applied on the file; otherwise, full reintegration must be used.
To enable the use of USN change journal, set namespacepolicy="3"
in <rfs>
tag.
Run the following command, as an administrator, to check that the USN journal is enabled on your drive:
fsutil usn queryjournal D:
(replace D: with the desired drive).
If the command returns "Error: The volume change journal is not active"
, run the following command, as an administrator, to create the USN journal:
fsutil usn createjournal m=536870912 a=67108864 D:
(replace D: with the desired drive) ; where m
, for maximum size, specifies the maximum size, in bytes,
that NTFS allocates for the change journal and a
, for allocation delta, specifies the size, in bytes, of memory allocation that is added to the end and removed from the beginning of the change journal.
See SK-0067 before starting the module after the USN journal creation.
maximum size in bytes
allocation delta in bytes
in MB
400 000
536 870 912
67 108 864
512
600 000
805 306 368
100 663 296
768
800 000
1 073 741 824
134 217 728
1 024
1 000 000
1 342 177 280
167 772 160
1 280
1 200 000
1 610 612 736
201 326 592
1 536
1 400 000
1 879 048 192
234 881 024
1 792
1 600 000
2 147 483 648
268 435 456
2 048
1 800 000
2 415 919 104
301 989 888
2 304
2 000 000
2 684 354 560
335 544 320
2 560
2 200 000
2 952 790 016
369 098 752
2 816
2 400 000
3 221 225 472
402 653 184
3 072
2 600 000
3 489 660 928
436 207 616
3 328
2 800 000
3 758 096 384
469 762 048
3 584
3 000 000
4 026 531 840
503 316 480
3 840
3 200 000
4 294 967 296
536 870 912
4 096
OS
/ Release : Windows / 7.1 and 7.2
Problem
: The start of the module hangs into the WAIT(magenta) state after creating the USN journal on the drive containing the replicated directories
The start of the module hangs into the WAIT(magenta) state with the following messages into the log of the module:
| 2017-02-23 09:05:58:454000 | nfsboxv3 | D | Directory D:\: Filesystem=NTFS (flags 3e700ff), Volume=Data
| 2017-02-23 09:06:00:302000 | rfsplug | D | Retrying nfsbox port lookup
| 2017-02-23 09:06:00:302000 | rfsplug | D | Waiting for nfsbox ready
| 2017-02-23 09:06:00:303000 | log | D | Last message repeated 2 times
| 2017-02-23 09:06:00:333000 | nfsadmin | D | Retrying nfsbox port lookup
| 2017-02-23 09:06:00:333000 | nfsadmin | D | Waiting for nfsbox initialization
Solution :After creating the USN journal and before starting the module, run any modification on the drive so as to fill the USN journal. For instance, you can create then delete a file.
OS
/ Release : Windows / 7.2
How to
: Use of externally built httpd server instead of the SafeKit built-in httpd server
safekit webserver stop
safekit webserver start
OS
/ Release : Linux / > 7.2.0.29
How to
: Use of Linux httpd server instead of the SafeKit httpd server
/opt/safekit/safekit webserver stop
/opt/safekit/safekit webserver start
OS
/ Release : Linux / 7.3
How to
: Use mySQL with Safekit when SELinux is "Enforcing"
Solution :
Generate SELinux policy allow rules from logs of denied operations (tested with safekit 7.3 on RedHat 7.2):
mkdir: cannot create directory /var/lib/mysql: File exists
mariadb.service: main process exited, code=exited, status=1/FAILURE
mariadb.service: control process exited, code=exited status=1
Failed to start MariaDB database server.
And/Or :
[Note] /usr/libexec/mysqld (mysqld 5.5.44-MariaDB) starting as process 29039 ...
Warning] Can't create test file /var/lib/mysql/alambix2.lower-test
/usr/libexec/mysqld: Can't change dir to '/var/lib/mysql/' (Errcode: 13)
170426 8:55:20 [ERROR] Aborting
setenforce 0
/sbin/service auditd rotate
, to rotate the SELinux log file "/var/log/audit/audit.log"semodule -DB
, to remove "dontaudits from policy" (log becomes more verbose)systemctl start mariadb
and systemctl stop mariadb
grep mysqld /var/log/audit/audit.log | audit2allow -M NewMySQL
, 2 files are created : NewMySQL.pp and NewMySQL.te
semodule -B
semodule -i NewMySQL.pp
setenforce 1
File NewMySQL.te sample :
module NewMySQL 1.0;
require {
type var_lib_t;
type mysqld_safe_t;
type nfs_t;
type mysqld_t;
class process { siginh noatsecure rlimitinh };
class sock_file { create unlink };
class lnk_file { read getattr };
class file { write getattr read lock create unlink open };
class dir { write remove_name getattr add_name };
}
#============= mysqld_safe_t ==============
#!!!! This avc has a dontaudit rule in the current policy
allow mysqld_safe_t mysqld_t:process { siginh rlimitinh noatsecure };
#!!!! This avc has a dontaudit rule in the current policy
allow mysqld_safe_t nfs_t:dir getattr;
allow mysqld_safe_t var_lib_t:lnk_file read;
#============= mysqld_t ==============
allow mysqld_t nfs_t:dir { write remove_name add_name };
allow mysqld_t nfs_t:file { write getattr read lock create unlink open };
allow mysqld_t nfs_t:sock_file { create unlink };
allow mysqld_t var_lib_t:lnk_file { read getattr };
Remarks :
If the ".te" file is manually modified, the ".pp" file must be build again
checkmodule -M -m -o NewMySQL.mod NewMySQL.te
semodule_package -o NewMySQL.pp -m NewMySQL.mod
Then reload the policy module semodule -i NewMySQL.pp
.
OS
/ Release : Linux / 7.3
Solution :
drop database MaBase;
ERROR 1010 (HY000): Error dropping database (can't rmdir './MaBase', errno: 13)
or
create database MaBase;
ERROR ...(HY000): Error creating database (can't mkdir './MaBase', errno: 13)
class dir { write remove_name getattr add_name }
with : class dir { create rmdir write remove_name getattr add_name }
allow mysqld_t nfs_t:dir { write remove_name add_name }
with : allow mysqld_t nfs_t:dir { create rmdir write remove_name add_name }
checkmodule -M -m -o NewMySQL.mod NewMySQL.te
semodule_package -o NewMySQL.pp -m NewMySQL.mod
semodule -i NewMySQL.pp
OS
/ Release : Linux / 7.3
How to
: Set SELinux to "Permissive" mode OR set only enforcement mode for MySQL to "Permissive"
To set SELinux in "Permissive" mode execute : setenforce 0
, to see the current mode : getenforce
To set enforcement mode to "Permissive" only for MySQL execute : semanage permissive -a mysqld_t
OS
/ Release : Windows / 7.2 and < 7.3.0.14
Mars Id : 62147
Fix : Fixed in SafeKit >= 7.3.0.14
Problem
: 3nodesrepli / SafeKit upgrade : After upgrade procedure, the module does not start and DR node indicator does not appear.
Set DR node
OS
/ Release : All / All
Mars Id : 63124
Problem
: With IE, the file may be truncated when loaded into the SafeKit Web console editor
Text files created on DOS/Windows machines have different line endings than files created on Unix/Linux. DOS uses carriage return and line feed ("\r\n") as a line ending,
which Unix uses just line feed ("\n").
In IE, the editor of the SafeKit web console may truncate files using DOS line ending format.
:set ff=unix
; then save the file"Edit"
menu, select "EOL Conversion" -> "UNIX/OSX Format"
; then save the file
OS
/ Release : Linux / > 7.3.0.10
How to
: Configure safewebserver on SLES12
/opt/safekit/web/bin/safeapachectl
script according to the inline comments/opt/safekit/web/conf/httpd.conf.sles12
to /opt/safekit/web/conf/httpd.conf
/opt/safekit/safekit webserver start
OS
/ Release : Linux / > 7.1
Problem
: Could not configure cluster : got Error:incoherent local name ...
check the sysctl option net.ipv4.ip_nonlocal_bind , it must be 0.
if not, set it with command sysctl net.ipv4.ip_nonlocal_bind=0
and retry cluster configuration.
check /etc/sysctl and /etc/sysctl.d to be sure that this option is not set at boot time.
OS
/ Release : Linux / 7.3
Problem
: Messages : Error: INVALID_SERVICE: 'safeagent' not among existing services at safekitinstall
Remove obsolete safeagent firewalld service : firewall-cmd --remove-service=safeagent
OS
/ Release : All / < 7.3.0.24
Problem
: Mirror module stays into WAIT-magenta state on both nodes or failover rules do not apply
Check the configuration file of the module userconfig.xml
. Having 2 CDATA
sections under <failover>
leads to these behavior. For instance:
<failover>
<![CDATA[
is_alone: if(custom.checkaround == down) then restart();
]]>
<![CDATA[
is_isolated: if(custom.checkisolated == down) then stopstart();
]]>
</failover>
Solution : This configuration must be replaced by:
<failover>
<![CDATA[
is_alone: if(custom.checkaround == down) then restart();
is_isolated: if(custom.checkisolated == down) then stopstart();
]]>
</failover>
OS
/ Release : Windows / >= 7.4.0.16
Problem
: Module not starting correctly if cluster configuration contains DNS names
Mars Id : 69307
On Windows, if:
Then it is possible that the Windows resolver returns an IP address that is NOT the first address returned by the DNS server for the specified FQDN, and the module(s) may not start correctly (heartbeat lost, reintegration timeout …)
If you are using DNS names in cluster.xml, please check on all nodes that the address displayed by the “ping” command for the local DNS address is the same as the address displayed by the “nslookup” command.
If it is not the case, you need to alter the node’s Windows network configuration interface and route metric so that the above condition is fulfilled.
OS
/ Release : All / < 7.4.0.16
Problem
: Module communication failures if cluster configuration contains DNS names
Some bugs in the DNS name resolution leads to module internal communication failures if the cluster configuration contains DNS names.
A work-around consists in setting only IP addresses. But if you require DNS names for accessing the SafeKit web console, the work-around
consists in setting 2 lan
sections into
into the cluster configuration. One lan
definition with DNS names used only by the SafeKit web console ; one lan
definition with IP addresses
used for the framework communications. For instance, the cluster configuration may look like the following one :
<cluster>
<lans>
<lan name="default" connect="on" console="on" framework="off">
<node name="node1" addr="node1.safe"/>
<node name="node2" addr="node2.safe"/>
</lan>
<lan name="private" connect="off" console="off" framework="on">
<node name="node1" addr="172.23.188.101"/>
<node name="node2" addr="172.23.188.102"/>
</lan>
</lans>
</cluster>