Internal Production: Last Updated 2009-04-03

Overview of chk_sanity.ksh


Purpose (top, intro, usage, report)

Describe usage and report formatting of chk_sanity.ksh script. This script is used to validate configuration files for a platform:
execute from:for:
dispatch LinuxFiles
dispatch SunFiles
dispatch XTSadmFiles (ssh, krb, ...)
piboot CrayFiles (piman/pingo)
ogboot CrayFiles (ogman/ognip)
mn1sm LinuxFiles (midnight)
node a node with access to *Files.long can validate itself

Document Index / Related Documentation


Introduction (top, intro, usage, report)

The chk_sanity.ksh script is in the sysmon crontab on appropriate management server hosts and can be invoked manually to validate one or more nodes on demand. The [Config|Linux|Sun]Files are conceptually identical but platform specific. The *Files directories map managed files for all platform nodes identified in the related machines.* file.

Related files (in /usr/local/adm/etc or /var/local/*Files unless noted):

file name: alternate names: purpose:
etc/machines.* .list|.linux|.sun|.cray|...   managed systems
ConfigFiles/List.txt [Linux|Sun|Cray]Files managed files
ConfigFiles/Long.txt [Linux|Sun|Cray]Files file mode,ownership,sum
/var/local/ConfigFiles/   [Linux|Sun|Cray]Files configuration files
Related commands and tools (in /usr/local/adm/[s]bin/):
tool name: purpose:
push [...] config [...] propagate ConfigFiles
push [...] one [...] propagate ConfigFiles
push -s [...] rcp -rq [...]   back-copy ConfigFiles
get_ConfigFiles.ksh identify *Files for push
get_Machines.ksh parse entries from machines.list
upd_ConfigFiles.ksh update ConfigFiles/List.txt,Long.txt
upd_LinuxFiles.ksh update LinuxFiles/List.txt,Long.txt
upd_SunFiles.ksh update SunFiles/List.txt,Long.txt
upd_CrayFiles.ksh update CrayFiles/List.txt,Long.txt
chk_sanity_Files generate file mode|ownership|sum on a node
cfsanity validate node file lists against *Files.long
cmp_sanity.ksh back-copy and compare *Files
edconfig edit *Files making backout
mkbko make backout copy of file
ckbko check backout copy of file
cksumnode compare files between systems
Reference ARSC Configuration Push Process for further information on these files and tools.

Usage Summary (top, intro, usage, report)

dispatch: /usr/local/adm/bin/chk_sanity.ksh -?

Usage:  chk_sanity.ksh [-options] [hostname [...]]

Options:
  -h sun|  linux| cray
||  -sun| -linux|-cray
  -u # run upd_*Files.ksh -u,   current: 
  -F # force upd_*Files update, current: 
  -l # run on local node only,  current: 
  -x # use 'set -x',            current: 
  -q # non-quiet (verbose),     current: tty
  +q # quiet,                   current: 

  -node|-w|-m          node1[,node2...]  # same as 'hostname [...]'
  -type                type(hardware)    # 
  -status              status            # 
  -group|-usage        usage(group)      # 
  -version|-os         version[.sub]     # 
  -other|-frame|-rack  other(frame|rack) # 

See: /usr/local/adm/etc/machines.*
     /var/local/*Files/List.txt

Report Format (top, intro, usage, report)

Sample Report:

        Mis-matched mode, ownership, sum, or type:
        ----------- (ConfigFiles)           (actual)
*1  ochre     :0644  root     root :0600     .      .    :/etc/X11/xorg.conf
*2  kappa     :0640  root       95 :   .     .      3    :/etc/cups/printers.conf
    
*3  kappa    !:      sum .template :   .     .      .    :/bin/ksh
*4  lemon    !:      sum .lemon    :0644     .      0    :/etc/sysctl.conf
    
*5  kappa    #: c=none   .         : t=link              :/boot/grub/menu.lst

*6  neladm   $: c=none   .         : t=directory         :/etc/sysconfig/network

    
*7  kappa    -: c=link   .kappa    :0600  root   root    :/boot/grub/grub.conf
*8a kappa    -: c=none   .         :0644  root   root    :/etc/hosts
*8b puppychow-: c=nofile .template :0640  root   linuxman:/etc/log.d/conf/services/named.conf
    
*9  puppychow?: c=file   .template : t=. (missing)       :/etc/profile.d/zARSC.sh
Report lines are formatted in colon delimited sections:
  1. Node and error type.
    The error type character is essentially arbitrary, it is used merely to separate the report into error types. It is a single character (column 10), characters used are:
    ' ' node mode|ownership and  ConfigFiles do not match
    '!' node sum -r         and  ConfigFiles do not match
    '#' node is a symlink   and  ConfigFiles entry is not or does not match
    '$' node is a directory and  ConfigFiles entry is not 
    '-' node is a file      and  ConfigFiles entry is not
    '?' node has no file    and  ConfigFiles entry exists
    
  2. ConfigFiles information, which can be:
    1. mode user|uid group|gid
      These values represent what ConfigFiles expects the target node to be. The numeric uid|gid are displayed when there is a mis-match with the node. Do not trust the alpha user|group representation in ConfigFiles if the target node uses a different group file.
    2. sum .entry
      When ConfigFiles and node 'sum -r' disagree, the word 'sum' is shown.
      The .entry indicates what was matched in ConfigFiles (of .node, .type, .usage, .version, .major.minor, .other, or .template).
    3. c=value .entry
      This format is used to indicate ConfigFiles and node inode type do not agree. The c=value represents what ConfigFiles believes the target node should be. Note 'c=none' represesents there is no matching ConfigFiles entry while 'c=nofile' indicates ConfigFiles explicitely states there should be no file on the node.
  3. Node information, which can be:
    1. mode|. user|uid|. group|gid|.
      Note a period, '.', is displayed when node matches ConfigFiles for mode|uid|gid. Only when there is a difference is a value displayed.
    2. t=value
      The t=value represents what the node (target) file actually is, '(missing)' is also printed when the file does not exist on the node.
  4. ConfigFiles name (full path of file).

To facilitate resolving the errors, The report is sorted by:

  1. Error type.
  2. ConfigFiles name
  3. Node name

Correcting errors will require knowledge of the systems. Discrepancies should be treated seriously, there is a possibility that checksum or permissions changes may indicate a file or system has been compromised. Most discrepancies are due to software updates, procedural errors, or temporary changes for testing. It may be necessary to work with the platform ISSO and administrator or identify who was working on a particular file or product.

For the sample report above, explanations of the errors, and possible corrections follow: