SafeKit 7 & 8 – Knowledge Base

Known Problems, Restrictions or Changes

List of Items for main SafeKit Releases

All Releases >= 7.4.0

SK-0006,SK-0009, SK-0010,SK-0013,SK-0017,SK-0025 ,SK-0029,SK-0030,SK-0033,SK-0085,SK-0099

All Releases < 7.4.0

SK-0002, SK-0005,SK-0006,SK-0007,SK-0009, SK-0013,SK-0017,SK-0022,SK-0023,SK-0025,SK-0029, SK-0030,SK-0033,SK-0049,SK-0074,SK-0085,SK-0086 ,SK-0093,SK-0099

SafeKit 8.2

SK-0096 , SK-0097 , SK-0098

SafeKit 7.5

SK-0088, SK-0089, SK-0090, SK-0091, SK-0094 , SK-0095 , SK-0096

SafeKit 7.4

SK-0079,SK-0080,SK-0081,SK-0082,SK-0083,SK-0084, SK-0087, SK-0088, SK-0091, SK-0092 , SK-0094

SafeKit 7.3

SK-0062,SK-0063,SK-0065,SK-0066,SK-0067,SK-0068,SK-0069,SK-0070,SK-0071,SK-0072,SK-0073,SK-0075,SK-0076 ,SK-0077,SK-0078,SK-0079,SK-0080,SK-0084

SafeKit 7.2

SK-0062,SK-0063,SK-0064,SK-0065,SK-0066,SK-0067,SK-0068,SK-0069,SK-0070,SK-0071,SK-0072,SK-0073,SK-0078,SK-0084

SafeKit 7.1

SK-0038,SK-0039,SK-0040,SK-0041,SK-0042,SK-0043,SK-0044,SK-0046,SK-0047,SK-0048,SK-0050,SK-0051,SK-0052,SK-0053,SK-0054,SK-0055,SK-0056,SK-0057,SK-0058,SK-0059,SK-0060,SK-0065,SK-0066,SK-0067,SK-0078

SafeKit 7.0.11

SK-0018,SK-0035 ,SK-0036,SK-0037,SK-0039,SK-0078

SafeKit 7.0.10

SK-0018,SK-0032 ,SK-0034,SK-0039,SK-0078

SafeKit 7.0.9

SK-0018,SK-0025,SK-0026,SK-0039,SK-0028,SK-0078

SafeKit 7.0.8

SK-0010,SK-0014,SK-0015,SK-0018,SK-0019,SK-0020,SK-0021,SK-0024,SK-0025,SK-0078

SafeKit 7.0.4

SK-0003,SK-0010,SK-0014,SK-0015,SK-0028,SK-0078

SafeKit 7.0.1

SK-0001,SK-0008,SK-0012,SK-0078

List of Items ordered by Id


  1. Id : SK-0001
    OS / Release : Linux / All (For SafeKit 7.0.8 see SK-0021 )
    Problem : File replication doesn't work if there is a mount point under the replicated directory (error “JUKEBOX”)

    Mars Id : 22041


  2. Id : SK-0002
    OS / Release :
    Windows / All
    Problem : With SQL Server 2005, SafeKit sometimes stops on primary if “Boost SQL server priority” is used
    (sqlserver process uses 100% cpu and safekit stops with IOS - ReleaseINK kernel->user error)
    Solution : Disable “Boost SQL server priority” (SQL Management Studio => select your server =>Properties => Processors)
    Mars Id : 21956


  3. Id : SK-0003
    OS / Release :
    Windows 2003 64-bit kernel/ 7.0.4
    Problem : “safekit kill” command doesn't work with “exit” or “exception” option
    Solution : use “safekit kill” command with “terminate” option
    Mars Id : 20278


  4. Id : SK-0005
    OS / Release : Linux / All
    Problem : “safekit forcestop” doesn't complete on “nfsbox” death
    Solution : Reboot your system
    Mars Id : 19565


  5. Id : SK-0006
    OS / Release : Linux / All
    Problem : When Oracle 10.2 is started by SafeKit, the database startup fails with “ORA-00205: error in Identifying control file,
    check alert log for more info” error.
    Solution : set <rfs packetsize=”32768”> in userconfig.xml file and use SafeKit >= 7.0.1.15
    Mars Id : 21552


  6. Id : SK-0007
    OS / Release : All / All
    Problem : “quick configure” doesn't work for an application module built with SafeKit 6.2
    Solution : Save the file “<SAFE>/modules/AM/web/htmllib.lua” (AM is your application module) and replace it with
    “web/htmllib.lua” file from a SafeKit 7.0 application module. If “quick configure” doesn't still work, update the file “<SAFE>/modules/AM/web/index.lua”
    according to the functions defined in “web/htmllib.lua”.
    Mars Id : 22025


  7. Id : SK-0008
    OS / Release : All / 7.0.1
    Problem : After a migration from SafeKit 6.2 to SafeKit 7.0 (< 7.0.1.21), SafeMonitor fails to add a new server to administer and returns the "received error: forbIdden" message. The problem is that SafeMonitor 7.0 tries to work with a “httpd.conf file” coming from SafeKit 6.2 installation and not compatible.
    Solution : Replace “httpd.conf” file (under <SAFE>/web/conf) with “httpd.conf.default” file.
    Other solution : Uninstall SafeKit 7.0. Remove the <SAFE>/web directory and re-install SafeKit 7.0.
    Fix : Fixed in SafeKit >= 7.0.1.21


  8. Id : SK-0009
    OS / Release : Windows / All
    Problem : File attributes replication : file encryption and file compression are not supported
    Mars Id : 20912-20913


  9. Id : SK-0010
    OS / Release : Linux / from 7.0.4
    Restriction : Replicated directory can not be a a root of a file system when mountover=”off” (mandatory on Linux)

    See SK-0030 for a workaround


  10. Id : SK-0012
    OS / Release : Linux / from 7.0.1
    Restriction : NFS server on RedHat 4 Update 3 does not support ACL. Thus acl attribute for a replicated directory can not be set to “on”.


  11. Id : SK-0013
    OS / Release : Linux / All
    Problem : Interface checker doesn't work with bonding interfaces.


  12. Id : SK-0014
    OS / Release : All / from 7.0.4
    Restriction : Failover of NFS mounts of replicated directories from remote NFS clients are no more supported


  13. Id : SK-0015
    OS / Release : All / from 7.0.4
    Changes : Since SafeKit 7.0.4.13 new attributes for rfs configuration
    Configuration sample :
    <rfs checktime=”30000” reitimeout=”50” async=”second” moutover=”off” packetsize=”16384” maxnbretrans=”50” reicommit=”1000”>


  14. Id : SK-0017
    OS / Release : All / All
    Changes and Restriction : SafeKit start blocks into wait state when a heartbeat with ident=”flow” is configured while there is no replication configuration (<rfs> section)
    .
    Solution : It has been fixed in 7.5.0.11 for Linux and 7.5.0.12 for Windows. For previous releases, remove the ident attribute.


  15. Id : SK-0018
    OS / Release : Linux / from 7.0.8
    Problem : Red Hat > 4 freezes with file replication on heavy write load.
    In that case, the system hangs but the other server from the cluster does not detect the error since network communication is still working. You have then to reboot the broken server.
    Solution : The kernel freeze is a Linux bug.
    You can try to the change kernel parameters as follows:

    Our tests show that these settings help to solve the problem is many cases.


  16. Id : SK-0019
    OS / Release : Windows / from 7.0.8.7
    Changes and Restriction : SafeKit SNMP agent (safeagent service) does not work.

    Mars Id : 27387
    Solution : Use SafeKit >= 7.0.8.25


  17. Id : SK-0020
    OS / Release : Windows / from 7.0.8.17
    Changes : Since SafeKit 7.0.8.17 the new attribute “roflags” for rfs configuration is used to configure the behavior of file replication when a process is accessing a replicated directory on secondary.Values :

    Currently, upon notification, nfsbox logs a debug message in the log, containing the pid anf fisrt characters of the executable image name of the offending process,up to 10 messages.


  18. Id : SK-0021
    OS / Release : Linux/ 7.0.8
    Problem : When a replicated directory ( eg. “/Tests/Repli”) contains a mounted file-system (e.g. “/Tests/Repli/MyFileSystem”), re-integration fails with “JUKEBOX” error.
    Solution : Use SafeKit >= 7.0.8.26 and apply these changes :


  19. Id : SK-0022
    OS / Release : Linux / All
    Problem : Correctly exported NFS mounts sometimes fail to mount with a "Permission denied" error. This error prevents SafeKit module using file replication (with <rfs>) from starting. It fails with the following error into SafeKit log:
    | 2009-06-09 08:35:23:080185 | nfsboxv3 | W | Mount error: 13.

    Mars Id : 32204

    Solution : This is a known Linux bug, reported in RedHat Bugzilla - Bug 452415. The reason is that there is no mount of nfsd on /proc/fs/nfsd while nfs service is running. Check it by running the mount command that lists all current mounts. If the line: nfsd on /proc/fs/nfsd type nfsd (rw) is not listed, your system is broken for NFS. Adding this mount manually (by running the command: /bin/mount -t nfsd nfsd /proc/fs/nfsd) produces the correct result and NFS mounts, and thus SafeKit, become available. If you encounter this problem on Linux SafeKit server, the workaround is to insert into the SafeKit prestart user script the following lines:

    is_mounted=`/bin/mount | /usr/bin/awk "\\$1 ~ /^nfsd$/ { print \\$5 }"`
    if [ -z "$is_mounted" ] ; then
    /bin/mount -t nfsd nfsd /proc/fs/nfsd
    fi

    If you prefer, you can try to run this at init time, after nfs start and before safeadmin start, but you must check that this workaround works at boot time.


  20. Id : SK-0023
    OS / Release : All / All
    Problem : How to temporarily disconnect file mirroring from one SafeKit module ?
    Solution : Stop the SafeKit module and edit its configuration (userconfig.xml) in order to :

    WARNING: when file mirroring is disabled, only one server must be running in alone state. The other server must not be started since it could run a failover with not uptodate data. You can uninstall the module on the server to ensure that is can not start (and reinstall it later).


  21. Id : SK-0024
    OS / Release : All / from 7.0.8.25
    Changes : Since SafeKit 7.0.8.25, new degraded mode for rfs component

    When nfsbox, the main rfs component, encounters a sever error, it now goes into degraded mode on the primary server instead of stopping. The secondary server, if one, then runs a stopstart and blocks until the other server comes back into default mode. This improve operational continuity since there is no restart or failover of the application. But in degraded mode, file mirroring and high availability is no more provided. The alone degraded server must be restarted as primary to come back into default mode. This is a manual operation that must be ran by the administrator (stop-prim or stopstart via SafeMonitor or safekit command) when it knows that stopping the application is not critical. The other server will then run data synchronization and become secondary.
    You can read server state to get its mode (state via SafeMonitor or safekit command). For instance, the following shows the state of a server in degraded mode (ALONE state and up value for resource rfs.degraded):

    --------------------- mirror State ---------------------
    Local (127.0.0.1) : ALONE (Service : Available)
    Resources
    Name State Since
    heartbeat.0 up 2009-07-23 08:22:32
    heartbeat.flow up 2009-07-23 08:22:32
    rfs.uptodate up 2009-07-23 08:22:37
    rfs.lastprimstate down 2009-07-23 08:22:37
    rfs.swapping down 2009-07-23 08:22:32
    rfs.degraded up 2009-07-23


  22. Id : SK-0025
    OS / Release : All/from 7.0.8
    Restriction : Rename of directory between replicated and not replicated trees are not supported

    This restriction applies only when you configure not replicated directories into <rfs> tag. For instance:
    <rfs>
    <replicated dir="/repdir" mode="read_only">
    <notreplicated path="notrepdir" />
    </replicated>
    </rfs>

    Rename of files between replicated and not replicated trees are supported. For instance, the operations below are allowed:
    mv /repdir/file /repdir/notrepdir
    mv /repdir/notrepdir/file /repdir


    But, rename of directories between replicated and not replicated trees may lead to secondary stop-start and/or to degraded mode (cf Mars 34165, 63859 and 63864). For instance, the operations below are not supported:
    mv /repdir/dir /repdir/notrepdir
    mv /repdir/notrepdir/dir /repdir


  23. Id : SK-0026
    OS / Release : All/Since 7.0.9.17
    Change : Add user scripts argument

    This argument can be used for instance to send an e-mail on module start and stop.


  24. Id : SK-0029
    OS / Release : SUSE SLES 11/ All
    Problem : Modules in farm mode are unable to start because safekit vip kernel module is not allowed to load
    Solution : You have to allow the loading of vip kernel module. For this, set allow_unsupported_modules to 1 in /etc/modprobe.d/unsupported-modules


  25. Id : SK-0030
    OS / Release : Linux/ From 7.0.9
    Problem : The module configuration fails when a replicated directory is a mount point
    Solution : Apply the following manual procedure as work around.

    This article takes the example of PostgreSQL module that set as replicated directories /var/lib/pgsql/var and /var/lib/pgsql/data, which are mount points. The SafeKit module configuration fails with the error:
    Error : Device or resource busy

    It is the same procedure for all mounts points that must be replicated.

    Detect mount points with a command line
    On both nodes, check mount points with the command df -H that returns for instance:
    df -H
    /dev/mapper/vg01-lv_pgs_var … /var/lib/pgsql/var
    /dev/mapper/vg02-lv_pgs_data … /var/lib/pgsql/data

    /var/lib/pgsql/var and /var/lib/pgsql/data are mount points and they must be replicated for PostgreSQL. But the SafeKit module configuration command /opt/safekit/safekit config –m postgresql returns Error : Device or resource busy

    What to do if a replicated directory is a mount point


    Apply this procedure on both nodes if replicated directories are mount point on both nodes.
    After this procedure, you can use SafeKit as usual: ie safekit start stop etc ...

    Note
    To protect the start of SafeKit on a non-mounted and empty directory, you can insert in userconfig.xml the checking of a file inside the replicated directory. Example for var/ (do the same for data/ with a file inside this directory which is always present):

    	
    <replicated dir="/var/lib/pgsql/var" mode="read_only">
        <tocheck path="postgresql.conf" />
    </replicated>
    

    What to do for de-configuring the module (or uninstall whole SafeKit)
    If you want to deconfigure the module (or uninstall whole safekit), you must reverse this procedure by:


  26. Id : SK-0032
    OS / Release : Windows 2003 / Since 7.0.10.8
    Problem : Module using <virtual_interface> (such as farm), does not start
    The module is configured with a virtual IP address on a <virtual_interface> and the configuration succeeded. But, the module start fails and the log contains a line saying vipplug loading failed.
    Solution : In Windows 2003, after the module configuration, you have to access the corresponding network interface's property sheet (the one onto which the new virtual IP address will be added) and click OK to validate the vip driver binding. Then, the module should start. On further references to the same network interface (by the same module or others modules), the above procedure is not needed.
    In previous releases, vip driver binding was done during SafeKit install on all network interfaces. Since 7.0.10, vip driver binding are activated on demand at configuration time only on network interfaces that needs vip driver. This avoid configuration problems on platforms using software vlans on other network interfaces.
    In Windows 2008, the above procedure is not needed.


  27. Id : SK-0033
    OS / Release : All / All
    Problem : SafeKit servers can not communicate when the firewall is on

    When firewall is turned on, you have to configure the firewall to allow connections on SafeKit module ports. The list of used ports is returned by the command: safekit module getports –m AM


  28. Id : SK-0034
    OS / Release : Red Hat Enterprise Linux 6 / Since 7.0.10.23
    Problem : If NetworkManager is used to manage network interfaces, SafeKit , can't work properly in case of network failure :
    When a network cable is unplugged the network interface is unconfigured , and a module using <virtual_interface>, fails with error : “vipplug config error: Can't get interface for address ...Error: environment modification need re-configuration” When the cable is plugged again, SafeKit module start fails, and we have to run “safekit config” again.
    Problems can occur too with a module using <real_interface>.

    OS / Release : Red Hat Enterprise Linux 6 / 7.1.3
    Problem : If NetworkManager is used to manage network interfaces :
    When a network cable is unplugged the network interface is unconfigured , and a module mirror using <real_interface>, loops with errors : "nfsboxv3 Internal error: bind failed (99) and heart bind error 99"

    Solution : Stop NetworkManager and use system-config-network to configure network interfaces :
    On your server run :

    And run : system-config-network to manage your network interfaces.


  29. Id : SK-0035
    OS / Release : Red Hat Enterprise Linux / Since 7.0.11
    How to : Enable Oracle Direct NFS with SafeKit file mirroring

    Since SafeKit 7.0.11, you can configure SafeKit file mirroring with Oracle 11g Direct NFS.

    You have first to configure oracle for Direct NFS while SafeKit and Oracle are stopped. For this refer to the Oracle documentation . It consists in changing the ODM library by running:

    cd $ORACLE_HOME/lib
    cp libodm11.so libodm11.so_stub
    ln –s libnfsodm11.so libodm11.so

    Then you can start Oracle and check that Direct NFS is enabled. Oracle records the use of Direct NFS in alert.log and also in internal catalog v$dnfs tables. For instance, you can check the table of servers accessed using Direct NFS by running:
    su - oracle
    sqlplus
    system (login)
    system (password)
    select * from v$dnfs_servers;

    When Oracle is properly configured for Direct NFS, you can configure SafeKit file mirroring for enabling Oracle NFS connections with SafeKit nfsbox process. Edit the module configuration file userconfig.xml and insert into the <rfs> tag the attribute: pmapset="on". This option can be applied only on one module. Then apply the new configuration and start the module. You can check that Oracle uses Direct NFS and connects to the nfsbox port instead of the default standard nfsd port 2049. The nfsbox port is the nfs_port listed by the command safekit module getports -m AM. For checking connections, read the alert.log and v$dnfs tables. You can also run the command lsof -Pnl +M -i4 (for IPv4) or lsof -Pnl +M -i6 (for IPv6) that lists all processes connections. You should have oracle processes that connects to nfs_port.

    To roll back to the standard Oracle configuration, stops the module, reconfigure it with the attribute: pmapset="on" removed and revert Oracle configuration for Direct NFS.


  30. Id : SK-0036
    OS / Release : All / 7.0.11
    Problem : Problems in WebConsole with I9 updates.


    Mars Id : 46316,46365


  31. Id : SK-0037
    OS / Release : All / 7.0.11
    Problem : Unable to configure virtual_addr in mirror mode
    Solution : Add the following section to the configuration file (userconfig.xml) :

    <farm>
    <lan>
    <node name="node" addr="127.0.0.1"/>
    </lan>
    </farm>


  32. Id : SK-0038
    OS / Release : All / Since 7.1
    Change : mailsend binary no more delivered with the SafeKit package

    Since 7.1 release, mailsend is no more delivered with the SafeKit package.
    For Windows, you can download windows binary from the mailsend download area.
    For Unix, you can use the mail command instead of mailsend. For instance, the following line, inserted in poststop script of a module, notifies about the stop of the module:
    echo "Running poststop" | mail -s "Stop module $SAFEMODULE on `hostname`" admin@mydomain.com
    where "Running poststop" is the mail's body and "Stop module $SAFEMODULE on `hostname`" is the mail's subject.


  33. Id : SK-0039
    OS / Release : All / Since 7.0.9
    How to : Disable SSL 2 protocol into the SafeKit web server configuration

    To disable insecure protocols like SSL 2.0 and weak ciphers:


  34. Id : SK-0040
    OS / Release : Linux / Since 7.1.0.8
    Problem : Address IP conflict loadbalancing problems can occur if the virtual IP address is an IPv6 address (restriction).

    Solution : fixed in SafeKit > 7.1.1.0
    Mars Id : 48110


  35. Id : SK-0041
    OS / Release : All / Since 7.1.2
    Problem : SafeKit Web console and Internet Explorer 8
    The SafeKit web console may not be correctly displayed in IE 8 and returns "xml parse" errors.

    Solution : Set the checkbox "Enable (not secure)" for "Initialize and script ActiveX controls not marked as safe for scripting (not secure)" into the "Security Settings" panel of the Internet Zone
    Warning: IE8 is no more supported since SafeKit 7.1.3
  36. Id : SK-0042
    OS / Release : All / Since 7.1
    How to : Configure a farm module with the spread communication protocol that it is replaced since SafeKit 7.1 by a proprietary protocol.

    Solution :The following is an extract of a farm configuration file userconfig.xml, set to use the spread protocol. Beware that in this case, the type for the virtual interface must be set to "vmac_invisible".
    <farm spread="on">
    <lan>
    <node name="node1" addr="192.168.208.5"/>
    <node name="node2" addr="192.168.208.6"/>
    </lan>
    </farm>
    <vip>
    <interface_list>
    <interface arpreroute="off" check="off">
    <virtual_interface type="vmac_invisible">
    <virtual_addr addr="192.168.208.56" where="alias"/>
    </virtual_interface>
    </interface>
    </interface_list>
    <loadbalancing_list>
    <group name="FarmProto">
    <!-- Set load-balancing rule -->
    <rule filter="on_port" proto="tcp" port="9000"/>
    </group>
    </loadbalancing_list>
    </vip>

  37. Id : SK-0043
    OS / Release : All / Since 7.1
    How to : Configure a mirror module with a virtual IP address mapped on a virtual MAC address.

    Solution :The following is an extract of a mirror configuration file userconfig.xml, set to user a virtual IP address mapped on a virtual MAC address. Beware that in this case, the type for the virtual interface must be set to "vmac_invisible". The loadbalancing rule is a mandatory configuration option when defining a virtual MAC address but all the trafic goes to the primary server.
    <vip>
    <interface_list>
    <interface check="on">
    <virtual_interface type="vmac_invisible">
    <virtual_addr addr="192.168.208.56" check="off" where="one_side_alias"/>
    </virtual_interface>
    </interface>
    </interface_list>
    <loadbalancing_list>
    <group name="mirrorgrp">
    <rule filter="on_addr" proto="tcp" port="*"/>
    </group>
    </loadbalancing_list>
    </vip>
    <farm>
    <lan>
    <node name="node1" addr="127.0.0.1"/>
    </lan>
    </farm>

  38. Id : SK-0044
    OS / Release : All / 7.1.1
    Errata : Use a third machine as spare for a mirror module (User's guide section 5.10))

    • The module deployement on the three machines will result in a configuration error on the spare machine (The one that will not have a localy defined address). This error will be fixed in future SafeKit version. However the deployement on the three machines is still necessary.
    • Before starting the spare machine, the safekit config command must be issued on this machine.


  39. Id : SK-0046
    OS / Release : All / Since 7.1.1
    Problem : Web console problems after SafeKit upgrade
    Solution : You have to clear your browser's cache so as to get the new web console pages. A quick way to do this is a keyboard shortcut that works on IE, Firefox, and Chrome. Open the browser to any web page and hold CTRL and SHIFT while tapping the DELETE key. (This is NOT CTRL, ALT, DEL). The dialog box will open to clear the browser. Set it to clear everything and click Clear Now or Delete at the bottom. Close the browser, stop the process still running in the background if necessary, and re-open it fresh to test what wasn't working for you previously.


  40. Id : SK-0047
    OS / Release : All / Since 7.1.2
    Problem : Interface checker "intf" attribute and "-I" parameter are deprecated
    Solution : If the "intf" attribute is specified in the configuration file userconfig.xml, it is ignored and a message of level "D" "Deprecated argument -I" is emitted at runtime. If the interface checker process intfcheck.exe is started at the command line with the extra argument "-I", eg : safekit -r intfcheck <module> <resourcename> -A none -l <ipaddress> -I <interfacename>, the -I argument is ignored and a message of level "D" "Deprecated argument -I" is emitted at runtime.


  41. Id : SK-0048
    OS / Release : All / Since 7.1.3 and < 7.1.3.5
    How to : How to administer with the web console modules installed before securing SafeKit servers (with https)

    You have secured the SafeKit Web Console with https (see SafeKit User's Guide).
    If modules have been installed before securing SafeKit servers, you have to deploy them again to change the administration network URL to protocol https and port 9453 :

    1. clear your browser's cache and SSL cache
    2. connect to https://servername:9453 ,where servername is the name or IP address of one of your secured SafeKit servers, and choose the "Admin" role
    3. Under "Administration Network", for each secured server, replace "http://servername:9010" with "https://servername:9453", then "Confirm" => The state of modules installed before securing SafeKit servers is now "Connection error"
    4. Under "Advanced Configuration" tab, and for each concerned module:
      • edit the "userconfig.xml" file
      • under <application_module> tag, for each <clusternode> replace "uri=http://servername:9010" with "uri=https://servername:9453"
      • apply the configuration

    Warning : If you connect the web console to an another secured server https://anotherservername:9453, you have to repeat the procedure described in (3).
    If you prefer, you can instead modify the "servers.xml" file under Application_Modules/webconsole folder and reload the web console.

    From SafeKit 7.1.3.5 when connecting to "https://servername:9453", the web console automatically switches to secured url.
  42. Id : SK-0049
    OS / Release : All / Since 7.0
    Problem : Web Console secured with https: Problem using literal IPv6 address
    If you use https://[lIPV6]:9453/ or http://[IPV6]:9010/ where IPV6 is a literal IPv6 address, the connection fails "Internet Explorer cannot display the webpage"
    See : Apache-Bugzilla-Bug 52831
    Solution : connect with https://[lIPV6]:9453/deploy.html, https://[lIPV6]:9453/monitor.html ... will work. Or don't use literal addresses for IPv6.
    Mars Id : 44424


  43. Id : SK-0050
    OS / Release : Windows / Since 7.1.2.18
    Problem : Process monitoring fails when the process name contains uppercase letters
    The User's Guide recommends to use the command safekit -r errdpoll_running to get the name of running processes. The displayed name can be used to configure the process monitoring into the <errd> section of the configuration file userconfig.xml. Since SafeKit 7.1.2.18, the displayed name is case sensitive while it should be in lower case. The reason is that the process name comparison for the process monitoring is not case sensitive.
    Solution : When defining a process monitoring into the <errd> section of userconfig.xml, the value of the attribute name for <proc> must in lower case. If not, the process name matching will fail.
    Mars Id : 53612

    Fix : In SafeKit > 7.1.3.6:


  44. Id : SK-0051
    OS / Release : All / Since 7.1
    Problem : The animated progress bar is not diplayed into the web console with IE11
    Solution : Follow the options below and check:

    1. Click on tools and open Internet Options
    2. Click on Advanced tab and browse down to category Multimedia
    3. Check the option Play animations in web pages
    4. IE will need to be restarted for the changes to take effect

  45. Id : SK-0052
    OS / Release : All / Since 7.1
    Problem : safekit modules fail to start at boot when safeagent is set to automatic start
    Solution : Follow the procedure below

    1. Start the "service control manager" control panel applet
    2. In the right-click contextual menu of the "Safeagent" service, select "Properties"
    3. Set the "safeagent" service "startup type" to "Automatic (Delayed Start)"
    4. Click OK
    5. The setting will be active at the next boot.

  46. Id : SK-0053
    OS / Release : Windows / 7.1
    Problem : LPR server : connections on Virtual IP don't work
    Mars Id : 54939

    Fix : From SafeKit 7.1.3.15 :
    Follow the procedure below

    1. Edit %SYSTEMROOT%\safeini.xml file and define SAFEVIPDONTSKIP variable
      ( Add "<public name="SAFEVIPDONTSKIP" value="1"/>")
    2. stop and start "safeadmin" service
    3. start the application module
    4. use the command "netsh interface ip show ipaddresses level=verbose" to verify the "Skip as Source" flag value
      WARNING : this procedure impacts all the modules

  47. Id : SK-0054
    OS / Release : All/7.1.3
    Problem : When setting the resource state in a custom checker, it logs a message in the module log even if the resource state did not changed
    Mars Id : 56903

    Solution : Edit your custom checker for running the command setting the resource state only if the state has changed


  48. Id : SK-0055
    OS / Release : All / 7.1.3
    How to : force a not up-to-date server to automatically start as primary when the up-to-date server is not running ?

    Solution: When all the heartbeats from the up-to-date server are lost and the up-to-date server is not responding to ping requests, the not up-to-date server can failover or start as primary.
    Warning: You may use this solution only if you don't care loosing some modifications of the replicated data.

    1. Edit the configuration file SAFE/modules/AM/conf/userconfig.xml to insert a new failover rule and custom checkers
      <check>
      <!-- arg is the interval in sec between 2 checks -->
      <custom ident="pingremote" when="pre" exec="ping_remote" arg="10"/>

      <!-- 1st arg is the interval in sec between 2 checks (>=30) -->
      <!-- 2nd arg is the accepted elapsed time in min since the last synchronisation time (>1) -->
      <custom ident="synced" when="pre" exec="syncedcheck" arg="30 10"/>
      </check>

      <failover>
      <![CDATA[
      force_uptodate: if (heartbeat.* == down && custom.pingremote == down && custom.synced == up && rfs.uptodate == down) then rfs.uptodate=up;
      ]]>
      </failover>


    2. Add the ping_remote checker into SAFE/modules/AM/bin/
      This checker checks that the remote server is responding. It sets the resource custom.pingremote to up if responding, to down if not responding.
      [Show Unix script]
      [Show Windows script]

    3. Add the syncedcheck checker into SAFE/modules/AM/bin/
      This checker checks the elapsed time since the replicated data has been synchronised on both servers. It sets the resource custom.synced to up if the data is up-to-date or not up-to-date but it was synchronised elpasedtime minutes ago. The value for elpasedtime is the 2nd value of the attribute arg in the custom checker configuration: <custom ident="synced" when="pre" exec="syncedcheck" arg="30 10">
      Ask to Evidian support for getting the syncedcheck binary.


  49. Id : SK-0056
    OS / Release : All/7.1.3
    Problem : Incompatible configuration options <interface arpreroute="on" and <virtual_interface type="vmac_invisible"
    Mars Id : 57173

    Solution :


  50. Id : SK-0057
    OS / Release : Linux RH5 and RH6 / 7.1
    How to : Use of RedHat httpd server instead of the SafeKit httpd server

    Solution :
    • Install RedHat httpd server 2.2 package. If you intend to use the secured web console with HTTPS, you must have the mod_ssl and openssl packages installed in addition to the httpd package.
    • Stop safewebserver : /opt/safekit/safekit webserver stop
    • cd /opt/safekit/web
    • mv -f lib/libcrypto* ../private/bin
    • mv -f lib/libssl* ../private/bin
    • mv -f lib lib.safekit
    • mv -f modules modules.safekit
    • ln -s /usr/lib64/httpd/modules/ modules
    • Edit the /opt/safekit/web/bin/safeapachectl script : change the HTTPD variable to /usr/sbin/httpd
    • Edit the /opt/safekit/web/bin/envvars file : subsitute path /opt/safekit/web/lib to /opt/safekit/private/bin everywhere it appears
    • Start safewebserver : /opt/safekit/safekit webserver start


  51. Id : SK-0058
    OS / Release : All/ >= 7.1.3.16
    Problem : In a farm module, how to start load-balancing once the application is started and stop load-balancing before stopping the application

    Solution : Enable/disable the load_balancing with special commands ran into the user scripts start_both and stop_both. Find below the scripts templates.
    1. Edit the user script SAFE/modules/AM/bin/start_both
      [Show Unix script]
      [Show Windows script]

    2. Edit the user script SAFE/modules/AM/bin/stop_both
      [Show Unix script]
      [Show Windows script]


  52. Id : SK-0059
    OS / Release : Windows / 7.1
    How to : Use of externally built httpd server instead of the SafeKit built-in httpd server

    Solution :
    • Download the appropriate version of the httpd server 2.2 package x64 binaries from a source you trust, or build it yourself. For example, http://www.apachelounge.com provide x64 binaries for Windows. If you intend to use the secured web console with HTTPS, ensure that the mod_ssl and openssl packages are also delivered.
    • Download and install the associated Microsoft C Runtime redistributable package
    • Stop safewebserver : safekit webserver stop
    • Take a copy of the SAFE/web/bin and SAFE/web/modules directories
    • Copy the content of the "bin" directory from the httpd server package to the SAFE/web/bin directory
    • Copy the content of the "modules" directory from the httpd server package to the SAFE/web/modules directory
    • Start safewebserver : safekit webserver start


  53. Id : SK-0060
    OS / Release : Windows / 7.1.3
    Problem : Checkers start failure on module start after a crash of the server
    Mars Id : 57364

    Solution : Apply the following manual procedure as work around.

    1. Edit the script SAFE\private\bin\safekitbootstart.cmd
      Before the start of the service safeadmin, insert the line
      del "c:\safekit\var\mapper.xml"
    2. Change Windows settings for calling scripts on start-up/shutdown as described in the SafeKit User's Guide
      • set manual start for safeadmin service
      • start the MMC console with the mmc command line
      • File - Add/Remove Snap-in Add - "Group Policy Object Editor" – OK
      • under "Console Root"/"Local Computer Policy"/"Computer Configuration"/"Windows Settings"/"Scripts (Start-up/Shutdown)", double click on "Start-up". Click on Add then set for "Script Name:" c:\safekit\private\bin\safekitbootstart.cmd. This script launches the safeadmin service.
      • under "Console Root"/"Local Computer Policy"/"Computer Configuration"/"Windows Settings"/"Scripts (Start-up/Shutdown)", double click on "Shutdown". Click on Add then set for "Script Name:" c:\safekit\private\bin\safekitshutdown.cmd. This script shutdowns all running modules.

  54. Id : SK-0061
    OS / Release : Windows 2008 and Windows 2008 R2/ All
    Problem : File replication errors that may occur when an application extends a file (most notably, in write_through mode)
    This problem is a part due to a misbehaviour of the Microsoft NTFS.sys filesystem driver described on the Microsoft support site at http://support.microsoft.com/kb/976538/en-us/
    Fix :When using file replication, it is mandatory to update the windows OS at least at the level indicated http://support.microsoft.com/kb/976538/en-us/. The update procedure is also described in this knowledge base entry.


  55. Id : SK-0062
    OS / Release : All / 7.2
    Problem : Web console: do not use literal IPv6 addresses (e.g. 3ffe:2a00:100:7031::1)
    In the SafeKit 7.2 web console, you have to fill the address of the SafeKit servers for configuring the web console inventory and the SafeKit clusters. These addresses are used by the web console for connecting to servers (but IPv6 URL must be be surrounded in square brackets). The web console does not yet manage both format.
    Mars Id : 59106

    Solution : The work around is to use DNS names instead of literal IPv6 addresses.


  56. Id : SK-0063
    OS / Release : Windows 2008 R2/ 7.2
    Problem : 3 nodes replication (3nodesrepli.safe) configuration fails
    Mars Id : 59141

    3nodesrepli.safe configuration relies on PowerShell scripts that require for a correct execution the change of the execution policy and the 4.0 version.
    Solution :


  57. Id : SK-0064
    OS / Release : Windows 2008 R2 / 7.2
    Problem : SafeKit drivers load fails when Windows 2008 R2 release does not include the support for SHA-2 signing and verification functionality
    Fix : You have to update your system for including the support for SHA-2. Refer to the Microsoft Security Advisory at https://technet.microsoft.com/en-us/library/security/2949927.aspx


  58. Id : SK-0065
    OS / Release : All / 7.2
    Problem : With IE11, "connection error" can occur after a time when the Webconsole is secured with https. stop and start the browser are necessary.
    Solution : "Internet Options"/"Advanced"; unselect "TLS 1.0" and "TLS 1.2", select only "TLS 1.1".


  59. Id : SK-0066
    OS / Release : Windows / 7.1 and 7.2
    Problem : How to configure the USN journal in Windows when namespacepolicy="3" in <rfs> tag
    Solution : In Windows, to enable zone reintegration after reboot when the module has been properly stopped, rfs component use the NTFS USN change journal to check that saved information on zones are still valid after reboot. When the check succeeds, zone reintegration can be applied on the file; otherwise, full reintegration must be used. To enable the use of USN change journal, set namespacepolicy="3" in <rfs> tag.

    By default, an NTFS volume will have its USN journal active only the system drive. If the replicated directories are located on a drive different from the system drive, you have to explicitly activate the journal.
    Run the following command, as an administrator, to check that the USN journal is enabled on your drive:
    fsutil usn queryjournal D: (replace D: with the desired drive).
    If the command returns "Error: The volume change journal is not active", run the following command, as an administrator, to create the USN journal:
    fsutil usn createjournal m=536870912 a=67108864 D: (replace D: with the desired drive) ; where m, for maximum size, specifies the maximum size, in bytes, that NTFS allocates for the change journal and a, for allocation delta, specifies the size, in bytes, of memory allocation that is added to the end and removed from the beginning of the change journal.
    See SK-0067 before starting the module after the USN journal creation.

    The default USN journal maximum size is 512 MB. If your volume contains 400,000 files or fewer, no additional configuration is required. For every 100,000 additional files on a volume containing replicated directories, increase the USN journal size by 128 MB. If files on the volume are changed or renamed frequently (regardless of whether they are part of the replica set), consider sizing the USN journal larger than these recommendations to prevent USN journal wraps, which can occur when large numbers of files change so quickly that the USN journal must discard the oldest changes to stay within the specified size limit.

    The table below includes the various figures needed to create the USN journal to different amounts.

    Number of files

    m
    maximum size in bytes

    a
    allocation delta in bytes

    m
    in MB

    400 000 536 870 912 67 108 864 512
    600 000 805 306 368 100 663 296 768
    800 000 1 073 741 824 134 217 728 1 024
    1 000 000 1 342 177 280 167 772 160 1 280
    1 200 000 1 610 612 736 201 326 592 1 536
    1 400 000 1 879 048 192 234 881 024 1 792
    1 600 000 2 147 483 648 268 435 456 2 048
    1 800 000 2 415 919 104 301 989 888 2 304
    2 000 000 2 684 354 560 335 544 320 2 560
    2 200 000 2 952 790 016 369 098 752 2 816
    2 400 000 3 221 225 472 402 653 184 3 072
    2 600 000 3 489 660 928 436 207 616 3 328
    2 800 000 3 758 096 384 469 762 048 3 584
    3 000 000 4 026 531 840 503 316 480 3 840
    3 200 000 4 294 967 296 536 870 912 4 096


  60. Id : SK-0067
    OS / Release : Windows / 7.1 and 7.2
    Problem : The start of the module hangs into the WAIT(magenta) state after creating the USN journal on the drive containing the replicated directories
    The start of the module hangs into the WAIT(magenta) state with the following messages into the log of the module:
    | 2017-02-23 09:05:58:454000 | nfsboxv3 | D | Directory D:\: Filesystem=NTFS (flags 3e700ff), Volume=Data
    | 2017-02-23 09:06:00:302000 | rfsplug | D | Retrying nfsbox port lookup
    | 2017-02-23 09:06:00:302000 | rfsplug | D | Waiting for nfsbox ready
    | 2017-02-23 09:06:00:303000 | log | D | Last message repeated 2 times
    | 2017-02-23 09:06:00:333000 | nfsadmin | D | Retrying nfsbox port lookup
    | 2017-02-23 09:06:00:333000 | nfsadmin | D | Waiting for nfsbox initialization

    This occurs when the USN journal has just been created on the drive containing the replicated directories and no access has yet be done on the drive.
    Solution :After creating the USN journal and before starting the module, run any modification on the drive so as to fill the USN journal. For instance, you can create then delete a file.


  61. Id : SK-0068
    OS / Release : Windows / 7.2
    How to : Use of externally built httpd server instead of the SafeKit built-in httpd server

    Solution :
    • Download the appropriate version of the httpd server 2.4 package x64 binaries from a source you trust, or build it yourself. For example, http://www.apachelounge.com provide x64 binaries for Windows. If you intend to use the secured web console with HTTPS, ensure that the mod_ssl and openssl packages are also delivered.
    • Download and install the associated Microsoft C Runtime redistributable package
    • Stop safewebserver : safekit webserver stop
    • Take a copy of the SAFE/web/bin and SAFE/web/modules directories
    • Copy the content of the "bin" directory from the httpd server package to the SAFE/web/bin directory
    • Copy the content of the "modules" directory from the httpd server package to the SAFE/web/modules directory
    • Start safewebserver : safekit webserver start


  62. Id : SK-0069
    OS / Release : Linux / > 7.2.0.29
    How to : Use of Linux httpd server instead of the SafeKit httpd server

    Solution :
    • Install httpd server 2.4 package. If you intend to use the secured web console with HTTPS, you must have the mod_ssl and openssl packages installed in addition to the httpd package.
    • Stop safewebserver : /opt/safekit/safekit webserver stop
    • Edit the /opt/safekit/web/bin/safeapachectl script according to the inline comments
    • Start safewebserver : /opt/safekit/safekit webserver start


  63. Id : SK-0070
    OS / Release : Linux / 7.3
    How to : Use mySQL with Safekit when SELinux is "Enforcing"

    Problem : The start of the module mySQL.safe fails with mysql errors:
      mkdir: cannot create directory /var/lib/mysql: File exists
      mariadb.service: main process exited, code=exited, status=1/FAILURE
      mariadb.service: control process exited, code=exited status=1
      Failed to start MariaDB database server.

      And/Or :
      [Note] /usr/libexec/mysqld (mysqld 5.5.44-MariaDB) starting as process 29039 ...
      Warning] Can't create test file /var/lib/mysql/alambix2.lower-test
      /usr/libexec/mysqld: Can't change dir to '/var/lib/mysql/' (Errcode: 13)
      170426 8:55:20 [ERROR] Aborting

    Solution : Generate SELinux policy allow rules from logs of denied operations (tested with safekit 7.3 on RedHat 7.2):
    • First deploy mySQL.safe module
    • Set SELinux in Permissive mode : setenforce 0
    • Execute: /sbin/service auditd rotate, to rotate the SELinux log file "/var/log/audit/audit.log"
    • Execute : semodule -DB, to remove "dontaudits from policy" (log becomes more verbose)
    • Start and Stop mySQL in command line : systemctl start mariadb and systemctl stop mariadb
    • Start and stop mySQL.safe module (command line or safekit web console)
    • Now, use "audit2allow" to build a policy module "NewMySQL.pp" from denial and the associated system call logged to /var/log/audit/audit.log:
    • grep mysqld /var/log/audit/audit.log | audit2allow -M NewMySQL , 2 files are created : NewMySQL.pp and NewMySQL.te
    • Set audit in initial mode: semodule -B
    • Load the new policy module: semodule -i NewMySQL.pp
    • Set SELinux in Enforcing mode: setenforce 1
    • Start mySQL.safe module : it works !

    File NewMySQL.te sample :

      module NewMySQL 1.0;
      require {
      type var_lib_t;
      type mysqld_safe_t;
      type nfs_t;
      type mysqld_t;
      class process { siginh noatsecure rlimitinh };
      class sock_file { create unlink };
      class lnk_file { read getattr };
      class file { write getattr read lock create unlink open };
      class dir { write remove_name getattr add_name };
      }
      #============= mysqld_safe_t ==============
      #!!!! This avc has a dontaudit rule in the current policy
      allow mysqld_safe_t mysqld_t:process { siginh rlimitinh noatsecure };
      #!!!! This avc has a dontaudit rule in the current policy
      allow mysqld_safe_t nfs_t:dir getattr;
      allow mysqld_safe_t var_lib_t:lnk_file read;
      #============= mysqld_t ============== allow mysqld_t nfs_t:dir { write remove_name add_name };
      allow mysqld_t nfs_t:file { write getattr read lock create unlink open };
      allow mysqld_t nfs_t:sock_file { create unlink };
      allow mysqld_t var_lib_t:lnk_file { read getattr };

    Remarks :

    If the ".te" file is manually modified, the ".pp" file must be build again
    checkmodule -M -m -o NewMySQL.mod NewMySQL.te
    semodule_package -o NewMySQL.pp -m NewMySQL.mod

    Then reload the policy module semodule -i NewMySQL.pp .


  64. Id : SK-0071
    OS / Release : Linux / 7.3

    Problem : Command "create" or "drop" on Mysql replicated database fails when SELinux is "Enforcing"
      drop database MaBase;
      ERROR 1010 (HY000): Error dropping database (can't rmdir './MaBase', errno: 13)

      or create database MaBase;
      ERROR ...(HY000): Error creating database (can't mkdir './MaBase', errno: 13)

    Solution :
    • Edit your policy rules for MySQL (".te" file see SK-0070) and add rules for "create" and "rmdir" directory :
    • Replace the line : class dir { write remove_name getattr add_name } with : class dir { create rmdir write remove_name getattr add_name }
    • Replace the line : allow mysqld_t nfs_t:dir { write remove_name add_name }
      with : allow mysqld_t nfs_t:dir { create rmdir write remove_name add_name }
    • Then compile and load the policy module :
      checkmodule -M -m -o NewMySQL.mod NewMySQL.te
      semodule_package -o NewMySQL.pp -m NewMySQL.mod
      semodule -i NewMySQL.pp


  65. Id : SK-0072
    OS / Release : Linux / 7.3
    How to : Set SELinux to "Permissive" mode OR set only enforcement mode for MySQL to "Permissive"

    Solution :
    To set SELinux in "Permissive" mode execute : setenforce 0 , to see the current mode : getenforce
    To set enforcement mode to "Permissive" only for MySQL execute : semanage permissive -a mysqld_t
  66. Id : SK-0073
    OS / Release : Windows / 7.2 and < 7.3.0.14
    Mars Id : 62147

    Fix : Fixed in SafeKit >= 7.3.0.14
    Problem : 3nodesrepli / SafeKit upgrade : After upgrade procedure, the module does not start and DR node indicator does not appear.

    Workaround :
    • Start the webconsole
    • In the web console, go to "Configuration" tab
    • Click on the module toplevel menu (the "wheel"), then on Set DR node
    • Execute the steps of the "Set DR node" wizard procedure
    • Start the 3nodesrepli module as usual

  67. Id : SK-0074
    OS / Release : All / All
    Mars Id : 63124

    Problem : With IE, the file may be truncated when loaded into the SafeKit Web console editor
    Text files created on DOS/Windows machines have different line endings than files created on Unix/Linux. DOS uses carriage return and line feed ("\r\n") as a line ending, which Unix uses just line feed ("\n").
    In IE, the editor of the SafeKit web console may truncate files using DOS line ending format.

    Solution : A workaround consists in converting end of lines in the file from Windows format to Unix format. For this either use :
    • the dos2unix command
    • the vim editor, apply the command :set ff=unix ; then save the file
    • the Notepad++ editor, in the "Edit" menu, select "EOL Conversion" -> "UNIX/OSX Format" ; then save the file

  68. Id : SK-0075
    OS / Release : Linux / > 7.3.0.10
    How to : Configure safewebserver on SLES12

    • Install httpd server 2.4 package.
    • Edit the /opt/safekit/web/bin/safeapachectl script according to the inline comments
    • Copy /opt/safekit/web/conf/httpd.conf.sles12 to /opt/safekit/web/conf/httpd.conf
    • Start safewebserver : /opt/safekit/safekit webserver start


  69. Id : SK-0076
    OS / Release : Linux / > 7.1
    Problem : Could not configure cluster : got Error:incoherent local name ...


    check the sysctl option net.ipv4.ip_nonlocal_bind , it must be 0.
    if not, set it with command sysctl net.ipv4.ip_nonlocal_bind=0 and retry cluster configuration.
    check /etc/sysctl and /etc/sysctl.d to be sure that this option is not set at boot time.


  70. Id : SK-0077
    OS / Release : Linux / 7.3
    Problem : Messages : Error: INVALID_SERVICE: 'safeagent' not among existing services at safekitinstall


    Remove obsolete safeagent firewalld service : firewall-cmd --remove-service=safeagent

  71. Id : SK-0078
    OS / Release : All / < 7.3.0.24
    Problem : Mirror module stays into WAIT-magenta state on both nodes or failover rules do not apply


    Check the configuration file of the module userconfig.xml. Having 2 CDATA sections under <failover> leads to these behavior. For instance:
    <failover>
    <![CDATA[
    is_alone: if(custom.checkaround == down) then restart();
    ]]>
    <![CDATA[
    is_isolated: if(custom.checkisolated == down) then stopstart();
    ]]>
    </failover>

    Solution : This configuration must be replaced by:
    <failover>
    <![CDATA[
    is_alone: if(custom.checkaround == down) then restart();
    is_isolated: if(custom.checkisolated == down) then stopstart();
    ]]>
    </failover>


  72. Id : SK-0079
    OS / Release : Windows / >= 7.4.0.16
    Problem : Module not starting correctly if cluster configuration contains DNS names
    Mars Id : 69307


    On Windows, if:
    • Addresses of nodes are specified as DNS names (FQDN) in cluter.xml
    • On one or several nodes, the FQDN specified in cluster.xml corresponds to a local name (e.g. the first part of the FQDN is the hostname and the rest of the FQDN is the DNS suffix specified in the Windows network configuration)
    • The node(s) is/are multihomed

    Then it is possible that the Windows resolver returns an IP address that is NOT the first address returned by the DNS server for the specified FQDN, and the module(s) may not start correctly (heartbeat lost, reintegration timeout …)

    Solution :
    If you are using DNS names in cluster.xml, please check on all nodes that the address displayed by the “ping” command for the local DNS address is the same as the address displayed by the “nslookup” command. If it is not the case, you need to alter the node’s Windows network configuration interface and route metric so that the above condition is fulfilled.

    Since SafeKit 7.4.0.54, DNS names are resolved during the cluster configuration and IP addresses are stored into the file c:/safekit/var/cluster/cluster_ip.xml. You can check that the DNS name resolution is correct by verifying the content of this file.


  73. Id : SK-0080
    OS / Release : All / < 7.4.0.16
    Problem : Module communication failures if cluster configuration contains DNS names


    Some bugs in the DNS name resolution leads to module internal communication failures if the cluster configuration contains DNS names.

    Workaround :
    A work-around consists in setting only IP addresses. But if you require DNS names for accessing the SafeKit web console, the work-around consists in setting 2 lan sections into into the cluster configuration. One lan definition with DNS names used only by the SafeKit web console ; one lan definition with IP addresses used for the framework communications. For instance, the cluster configuration may look like the following one : <cluster>
    <lans>
    <lan name="default" connect="on" console="on" framework="off">
    <node name="node1" addr="node1.safe"/>
    <node name="node2" addr="node2.safe"/>
    </lan>
    <lan name="private" connect="off" console="off" framework="on">
    <node name="node1" addr="172.23.188.101"/>
    <node name="node2" addr="172.23.188.102"/>
    </lan>
    </lans>
    </cluster>


  74. Id : SK-0081
    OS / Release : Windows 10 Pro / 7.4
    Problem : Hyper-V module (hyperv.safe) start fails with plugwait error


    hyperv.safe relies on PowerShell scripts that require, for a correct execution, the change of the execution policy.

    Solution :
    Change the execution policy as follow:

    • start a Windows PowerShell session
    • run Set-ExecutionPolicy RemoteSigned
    • reply yes when prompt


  75. Id : SK-0082
    OS / Release : Windows / 7.4
    Problem : Hyper-V module (hyperv.safe) failover fails with VM import failure


    The failover on node2 fails and into the user script output file SAFEVAR/modules/AM/userlog.ulog (where SAFEVAR=c:\safekit and AM is your module name), you have the following message:
    Import-VM: Unable to import the virtual machine due to configuration errors. Use Compare-VM to to repair the virtual machine.

    The VM import during the failover is equivalent to the virtual machine (VM) migration that consists in moving the VM from physical server node1 to node2. The import may fail when the migration requirements are not met.

    Solution :
    Check the common requirements for HyperV VM migration depending on you Windows release number. This requirements applies on the physical server settings (processor, Active Directory domain, ...) and the VM settings (virtual hard disks, virtual networks, ...). For checking incompatilities, you can try to manually import the VM on node2 while SafeKit is stopped. It will logs incompatibility error messages. One common error is because the host hardware isn't compatible. This occurs when a virtual machine has one or more snapshots, and hosts have different processor versions. To fix this problem, shut down the virtual machine on node1 and turn on the processor compatibility setting as follow:

    • From Hyper-V Manager, in the Virtual Machines pane, right-click the virtual machine and click Settings.
    • In the navigation pane, expand Processors and click Compatibility.
    • Check Migrate to a computer with a different processor version.
    • Click OK.

    Be aware that it is anyway recommanded to have the same physical servers when using hyperv.safe since you may have other incompatibilities issues.


  76. Id : SK-0083
    OS / Release : Windows / 7.4
    Problem : Some SafeKit components and modules fails on Windows


    SafeKit relies on PowerShell scripts, that require for a correct execution, the change of the execution policy. For instance:
    • When the SafeKit web console is configured for HTTPS, the cluster configuration fails with error on server certificate
    • Hyper-V module (hyperv.safe) start fails with plugwait error

    Solution :
    Change the execution policy as follow:

    • start a Windows PowerShell session
    • run Set-ExecutionPolicy RemoteSigned
    • reply yes when prompt


  77. Id : SK-0084
    OS / Release : All />= 7.2
    Mars Id : 71535

    Problem : In mirror modules, data reintegration fails on expiration of cryptogtraphic keys

    SafeKit relies on a certificate for securing module internal communications. With SafeKit <= 7.4.0.31, the validity period for this certificate is 1 year.

    When the certificate expires, the module goes to ALONE/STOP with the application still running on the ALONE. The secondary fails to reintegrate with the following message:

    reintegre | D | XXX clnttcp_create: socket=7 TLS handshake failed

    For checking that your module is using encrypted communication, check that the file named modulekey.p12 is present in SAFE/modules/AM/conf/ (where AM is the module name). The certificate expiration date is most of the time, 1 year after the creation date of this file. For more precise date, please contact the support.

    Solution :
    The solution consists in generating a new certificate (but this new one will still expire in 1 year). It can be done either with:

    • the web console
      • Stop the module on both nodes
      • Launch the Configuration wizard of the module
      • Go to "Edit the Configuration" tab
      • Check the "Generate Keys" checkbox
      • Click ont the "Validate" button
      • Then "Apply the configuration" on both nodes

    • the command lines (where AM is the module name)
      • Stop the module on both nodes
      • Login as administrator/root on one node and open a command shell window
      • Generate keys with the command safekit module genkey -m AM
      • Then apply the configuration on both nodes with the command safekit –H "*" -E AM

    In SafeKit 7.4, you can generate new keys while the module is not fully stopped, with the command lines (where AM is the module name)
    • Stop the module only on the secondary node
    • Login as administrator/root on one node and open a command shell window
    • Generate keys with the command safekit module genkey -m AM
    • Then apply the configuration on both nodes with the command safekit –H "*" -E AM

    If you prefer to run with a certificate that has a longer validity period, upgrade to the SafeKit release > 7.4.0.31 that fixes it to 20 years.


  78. Id : SK-0085
    OS / Release : All / All
    Problem : SafeKit may not run properly when relying on host name resolution service that is not itself highly available

    If node addresses are specified using names instead of numerical IP addresses in the cluster.xml file, then the name resolution service must be highly available. For instance, when using DNS service (for FQDN, fully qualified domain name, such as node1.mysubdomain.mydomain.com), the local resolver should have a cache that can cope with DNS server(s) short failures. A long lasting name resolver failure will prevent SafeKit cluster nodes from communicating with each other, potentially leading to splitbrain situations.

    Solution :
    To avoid that, you must implement a robust DNS resolution policy on the nodes participating in the cluster, such as :

    • Configuring more than 1 DNS server on the nodes.
    • Increasing resolver cache retention time (for Windows, see https://docs.microsoft.com/en-us/windows-server/networking/dns/troubleshoot/disable-dns-client-side-caching )
    • Implementing a local DNS on the nodes (master or cache with forwarding to the zone’s master)
    • Adding cluster FQDNs and corresponding IP addresses in the hosts file of the nodes

    As last resort, use numerical IP addresses in cluster.xml.
  79. Id : SK-0086
    OS / Release : 7.4
    Problem : Before 7.4.0.48, Safewebserver service fails to start in https mode after installing certificates from external PKI


    The https configuration needs a proxy.crtkey file, located in SAFE/web/conf to start. This file is built by the integrated PKI server (safecaserv service).

    Solution :
    Execute SAFE/web/bin/openssl rsa -in SAFE/web/conf/admin.key -out SAFE/web/conf/rsa-admin.key to convert admin.key in rsa format.
    In the SAFE/web/conf directory, concatenate the content of admin.crt and rsa-admin.key files into the proxy.crtkey file using a text editor or CLI, and restart the safewebserver service. Apply this procedure on each SafeKit node.

    Since 7.4.0.48, this procedure is only necessary to use the webconsole with proxy=true.


  80. Id : SK-0087
    OS / Release : RedHat 8/ 7.4
    Problem : User scripts executed within the SafeKit environment return, into the application log, an error with openssl version


    For instance, you get errors like the following into the application log:

    PAM unable to dlopen(/usr/lib64/security/pam_unix.so): /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b

    or

    symbol lookup error: /lib64/librpmio.so.8: undefined symbol: EVP_md2, version OPENSSL_1_1_0

    Workaround :
    This problem is probably due to a bad linking with the openssl library delivered with SafeKit, that is, in some cases, not compatible with the one delivered with RH 8 and used by the application or commands.

    User scripts are executed with LD_LIBRARY_PATH environment variable set to SafeKit libraries.

    The workaround is to execute commands after unsetting LD_LIBRARY_PATH. Below an example that starts oracle into start_prim:

    replace

    /bin/su - oracle19 -c "/usr/local/bin/startDb"

    by

    (unset LD_LIBRARY_PATH ; /bin/su - oracle19 -c "/usr/local/bin/startDb" )

    Apply the same changes for all user scripts (start_xxx, stop_xxx, ...)

  81. Id : SK-0088
    OS / Release : Windows / 7.4, 7.5
    Problem : Hyper-V module (hyperv.safe) failover prerequisite

    During the Hyper-V module failover, the virtual machine (VM) is imported on the new primary node before being started. It is equivalent to the VM migration that consists in moving the VM from physical server node1 to node2. The import may fail when the migration requirements are not met. On failure, the user script output file SAFEVAR/modules/AM/userlog.ulog (where SAFEVAR=c:\safekit and AM is your module name), contains the following message:
    Import-VM : Unable to import virtual machine due to configuration errors. Please use Compare-VM to repair the virtual machine.

    Solution :
    Before testing the Hyper-V module failover, check the common requirements for HyperV VM migration depending on you Windows release number. This requirements applies on the physical server settings (processor, Active Directory domain, ...) and the VM settings (virtual hard disks, virtual networks, ...).
    For checking incompatilities, we recommand the following procedure:

    • Configure the hyperv.safe module and start the module on both nodes
    • When PRIM (node1)/SECOND (node2) green, shutdown the VM on node1
    • Stop the module on node2
    • On node2, start a PowerShell as administrator and run
      compare-vm -path "D:\Repli-Hyper-V\VM1\Virtual Machines\8CB619CE-CFB4-45BD-908B-F123A2E0AA24.vmcx" -Register
      Change the path to the location of your VM configuration file (extension may be xml instead of vmcx)
      For details, see Compare-VM Hyper-V

      This command lists incompatibilities if some. To get details on incompatibilities, run
      $report = compare-vm -path "D:\Repli-Hyper-V\VM2\Virtual Machines\8CB619CE-CFB4-45BD-908B-F12 3A2E0AA24.XML" -Register
      $report.Incompatibilities | FL

    When incompatibilites are found (e.g., hardware incompatibilities, different name for virtual switch), fix them before running the failover of the module.
    One common error is due to host hardware incompatibility. This occurs when a virtual machine has one or more snapshots, and hosts have different processor versions. To fix this problem, shut down the virtual machine on node1 and turn on the processor compatibility setting as follow:
    • From Hyper-V Manager, in the Virtual Machines pane, right-click the virtual machine and click Settings.
    • In the navigation pane, expand Processors and click Compatibility.
    • Check Migrate to a computer with a different processor version.
    • Click OK.

    Be aware that it is anyway recommanded to have the same physical servers when using hyperv.safe since you may have other incompatibilities issues.


  82. Id : SK-0089
    OS / Release : All / 7.5.0.16
    Problem : Default failover rule for tcp checkers set to wait instead of restart
    Solution : Do not configure tcp checker or upgrade to SafeKit > 7.5.0.16
    Mars Id : 74378


  83. Id : SK-0090
    OS / Release : All / 7.5.0.16
    Change : Failover machine may generate a wakeup before checkers, with wait rules, have time to set the associated resource state to up or down.
    Mars Id : 74340


  84. Id : SK-0091
    OS / Release : Windows / 7.4, 7.5
    Problem : Timeout during reintegration of big files (>50Gb) such as vhd files in Hyper-V module
    During the file synchronisation, space on disk may need to be allocated for new or extended files. In Windows, when the file is large or zero filled, a timeout may occurs during the synchronisation if the primary or the reintegration process writes at the end of the file. This leads to synchronisation failures.
    This problem may occur with the Hyper-V module (hyperv.safe) where VM disks are implemented by big vhd files.

    Solution :
    Edit the module XML configuration file SAFE/modules/AM/conf/userconfig.xml (replace AM by the name of the module) and add the option allocthreshold into the <rfs> section as follow:
    <rfs allocthreshold="50"

    When allocthreshold > 0, fast allocation of disk space is enabled for files to be synchronized on the secondary node The allocation is applied only:

    • for new files (files that do not exist on the secondary when reintegration starts)
    • for a full synchronization (for example, during the first reintegration or when the secondary is started with safekit second fullsync)
    • when the file size on the primary is >= allocthreshold (size in Gb)

  85. Id : SK-0092
    OS / Release : Linux / >= 7.4.0.50 & < 7.5
    Problem : SafeKit web server don't start when using LDAP/AD basic authentication on some Linux distribution (RedHat/CentOS 8)


    Due to openssl linking problems, when using the mod_ldap apache module, the SafeKit web server is not able to start on some Linux distribution (in particular RH/CS 8).

    Solution :
    The solution consists in using the Apache HTTP server provided by the Linux distribution.
    On SafeKit version > 7.4.0.50 :
    A new option has been added to safekitinstall: -extsafewebserver for switching to external web server during the SafeKit install.
    A new script has been added: SAFE/web/bin/setsafewebserver for switching between internal and external web server:

    • setsafewebserver internal: switch to the SafeKit built-in Apache HTTP server
    • setsafewebserver external: switch to the the Linux distribution Apache HTTP server
    • -n : do no start of the web server after setting

    In case of using an external Apache HTTP server, ensure that:
    • the httpd package is installed (at least release 2.4.37) and that httpd binary is present under /usr/sbin
    • the mod_ssl package is installed
    • the mod_ldap package is installed if you need LDAP/AD basic authentication
    • the mod_session package is installed (this package is needed for SafeKit 7.5)

    On RedHat/CentOS 8, the command yum install httpd mod_ssl mod_ldap fulfill theses conditions.

    Note : apr, apr-util, apr-util-ldap and apr-util-openssl packages are also to be installed if they have not been installed as dependencies.


  86. Id : SK-0093
    OS / Release : Linux / All
    Problem : SafeKit web server don't start when using port 80

    Port 80 is a reserved port that could be bind only by root processes or processes that have the needed capability

    Solution :
    As root , run the command : setcap 'cap_net_bind_service=+ep' /opt/safekit/web/bin/httpd


  87. Id : SK-0094
    OS / Release : Windows / 7.4, 7.5
    Problem : SafeKit Replication of anti-ransomware folders
    To configure protected folders, use Windows Security; select Virus & heart protection and Manage ransomware protection.
    Set Controlled access to on and select Protected folders to add folders.

    Solution :
    To use SafeKit to replicate such directories, you have to allow SafeKit apps to access the protected folders.
    Select Allow an app through Controlled access folder, Add an allowed app and Browse all apps.
    Then add the following apps :

    • c:\safekit\private\bin\nfsbox.exe
    • c:\safekit\private\bin\reintegre.exe
    • c:\safekit\private\bin\sync.exe
    • c:\safekit\private\plugin\heart\heartplug.exe

    Replace c:\safekit by the SafeKit root install path if you changed the default one.


  88. Id : SK-0095
    OS / Release : Linux / 7.5.2
    Problem : one_side VIP and src routes limitations
    On the PRI server where a one_side VIP is configurated, the route src are setted to the VIP for :

    • the VIP subnet
    • via routes using the vip interface that dont have an explicit src
    • via routes to the localip of the vip subnet

    So, if the VIP interface have more than one subnet, and if there is routes for the subnets to which the VIP don't belong, then, they must have explicit routes. Otherwise their src will be setted to the VIP, what is not expected.


  89. Id : SK-0096
    OS / Release : Linux / SafeKit >= 7.5.2.11
    Problem : Zone reintegration is not operational in Linux
    JIRA Id : ES-659
    This is a regression that will be corrected in a future version. It is not critical, but it does result in more data being recopied than necessary during reintegration, as zone-based reintegration optimization is disabled.

    Fix : Fixed in SafeKit 8.2.2.7
  90. Id : SK-0097
    OS / Release : All / SafeKit 8.2.0 to 8.2.2.2
    Problem : Nodes sometimes show "Connection error" even when only one is down
    JIRA Id : ES-650
    On the console loading, if the console is connected to node2 and node1 is down (with the alphabetical order of node names being important), the console displays a ‘connection error’ for all nodes. However, only node1 should be displayed with this state. This issue does not occur when the console is already loaded and the node1 goes down.
    Fix : Fixed in SafeKit 8.2.2.3


  91. Id : SK-0098
    OS / Release : All / SafeKit >= 8.2.3
    Problem: Unable to login to the web console after the OpenId connection expired
    JIRA Id : ES-723
    Once the OpenId connection has expired, the web console do not present the login page but only unauthorized page.

    Workaround : There is 2 workarounds:
    • clear the browser's cache ; then reload the SafeKit web console
    • change the SafeKit web server configuration as described below ; then restart the SafeKit web server
      • Edit the configuration file SAFE/web/conf/httpd.webconsoleopenidauth.conf and uncomment the lines
        # Circumvent Console quirks: worker fetches index.html with header Sec-Fetch-Dest set to 'empty' ... So it would get 401 instead of going to the login screen. OIDCUnAuthAction 401 "%{HTTP:X-Requested-With} == 'XMLHttpRequest' \ || ( -n %{HTTP:Sec-Fetch-Mode} && %{HTTP:Sec-Fetch-Mode} != 'navigate' ) \ || ( -n %{HTTP:Sec-Fetch-Dest} && %{HTTP:Sec-Fetch-Dest} != 'document' && %{HTTP:Sec-Fetch-Dest} != 'empty' ) \ || ( ( %{HTTP_ACCEPT} !~ m#text/html# ) \ && ( %{HTTP_ACCEPT} !~ m#application/xhtml\+xml# ) \ && ( %{HTTP_ACCEPT} !~ m#\*/\*# ) )"
      • Restart the web server with SAFE/safekit webserver restart

  92. Id : SK-0099
    OS / Release : All / All
    How to : Configure promiscuous mode in hypervisor network

    When using a SafeKit farm module that is configured on two VMs with the SafeKit vmac_invisible virtual interface option, it is required that the network interfaces of the machines on which SafeKit is installed support the promiscuous mode. For the promiscuous mode to work, it must be configured in the hypervisor settings of the virtual switch or of the virtual network cards, depending on the hypervisor.

    In order to configure the promiscuous mode in Hyper-V:
    • go to the Hyper-V Manager console, and for each virtual machine of the module:
    • edit the settings of the virtual machine,
    • then edit the advanced features of the network adapter to which the virtual IP address of the module corresponds,
    • then enable MAC address spoofing

    In order to configure the promiscuous mode in VMWare:
    • go to the VMWare console,
    • then edit the settings of the virtual switch whose network the virtual IP address is on,
    • then allow Promiscuous mode.

    If the promiscuous mode is not configured in the hypervisor settings, the virtual IP address will be unreachable (thus the application that is configured at this IP address will be unreachable; the ping command will not work either). Note that if the type of the SafeKit virtual_interface is not vmac_invisible, but instead is vmac_directed, the virtual IP address will be reachable regardless of whether the promiscuous mode is configured or not.