Administration

Assuming two controllers up and running, make sure both are connected and can ping eachother. There is as previously discussed no ethernet backplane so you need to make sure they can do that or nothing will work.

DS6800 FC ArchitectureDS6800 FC connection Architecture, from IBM Redbook

The above schematic is useful to keep in mind in how data flows from FCP/FICON to disks. They arrive at the disk array either on controller 1 or controller 2 and can take the sub-optimal path through the other controller if they have to. It’s important that you configure the array and the clients so that pathing is done in an optimal way.

Model Determination

I have what I think is a 522 controller and a 511 controller. The reason I believe that is because looking at /proc/cpuinfo I see 1.18 on one and 1.2 on the other, with MR1750K card ID reported as 0801 and 0702 respectively. I would love to hear about more ways to figure out the generation of controllers.

Sanity Checking

Sanity checking runs a series of tests to ensure sanity of the configuration. You can either run it locally only by passing -current, or on both controllers. Beware that running on both controllers is quite slow as the current controller will SSH into the other controller for every check, which can take some time.

Example output:

[Mon Mar 07 05:58:59] root@noname:~ # smsanity -nc   

Sanity Checker v0.30 invoked on c1 (noname)
------------------------------------------ Kona 0 --------- Kona 1 ---------
Checking free memory...................... FAILED!          Passed           
Verifying RW partitions................... FAILED!          Passed           
Verifying Kona replacement is enabled..... FAILED!          Passed           
Checking running processes................ FAILED!          Passed           
Checking disk space....................... FAILED!          Passed           
Checking SBR status....................... skipped          skipped          
Verifying four online DA partitions....... FAILED!          Passed           
Verifying certain files do not exist...... Passed           Passed           
Verifying that LCPSS is in Dual mode...... FAILED!          FAILED!          
Verifying no open hardware problems....... FAILED!          FAILED!          
Verifying no open software problems....... FAILED!          FAILED!          
Checking file permissions................. FAILED!          Passed           
Verifying no open cabling problems........ FAILED!          Passed           
Verifying no open data loss problems...... FAILED!          Passed           
Checking symbolic links................... FAILED!          Passed           
Checking number of IML retries............ FAILED!          Passed           
Verifying no CF R/W errors................ FAILED!          Passed           
Scanning ranks............................ FAILED!          Passed           
Checking serials in ncipl (strict)........ FAILED!          FAILED!          
Checking serials in ncipl (vote).......... skipped          skipped          
Checking PDM ISS consistency.............. skipped          skipped          
Checking PDM corruption................... skipped          Passed           
Checking Pulled out BANJO................. FAILED!          Passed           
----------------------------------------------------------------------------

(*) Detailed information about the failed checks:

  date: Failed to execute '' on Kona 0
  memory: Kona 0: Failed to execute 'cat /proc/meminfo'
  mount-rw: Kona 0: Failed to execute 'busybox mount'
  kona-replacement: Kona 0: Remote file /lic/sm/bin/ncutils.conf does not exist or is not readable
  processes: Kona 0: Failed to determine which processes are running
  diskspace: Kona 0: Failed to execute 'df -a'
  4-dapart: Kona 0: Failed to execute 'df -a'
  dual-lcpss: Kona 0: Failed to execute catreef status/opmode
  dual-lcpss: Kona 1: Bad LCPSS status 'Single Cluster Operational'
  hw-problems: Kona 0: Failed to execute /lic/sm/bin/rss_displayProblem
  hw-problems: Kona 1: Open problem of type 0 (hardware) found, id=2005-01-01-00.54.48.548951
  sw-problems: Kona 0: Failed to execute /lic/sm/bin/rss_displayProblem
  sw-problems: Kona 1: Open problem of type 1 (software) found, id=2016-07-21-19.12.06.936383
  permissions: Kona 0: Failed to execute 'stat -c '%a' /persist/scratch'
  cabling-problems: Kona 0: Failed to execute /lic/sm/bin/rss_displayProblem
  dataloss-problems: Kona 0: Failed to execute /lic/sm/bin/rss_displayProblem
  sym-links: Kona 0: Failed to execute 'ls -l /dc /linuxrc /lic /home/shark/log /home/shark/tmp /home/shark/statesave /home/shark/config'
  num-iml-retries: Kona 0: Remote file /home/shark/config/imlretry does not exist or is not readable
  cf-rw-errors: Kona 0: Failed to execute 'grep _intr /var/log/messages | grep -v "grep" | wc -l'
  scanrank: Kona 0: Failed to execute 'dacmd -x scanrank'
  ncipl-strict: C0 serial number was found in only 2 files: /dapart/s1/ncipl.da,/dapart/s3/ncipl.da
Box serial number was found in only 2 files: /dapart/s1/ncipl.da,/dapart/s3/ncipl.da
C1 serial number was found in only 2 files: /dapart/s1/ncipl.da,/dapart/s3/ncipl.da
  pulled_out_banjo: Kona 0: Failed to execute /lic/sm/bin/rss_displayProblem

(*) This situation is OK for the following scenarios:

  - Node config status report
  - Nonconcurrent code load (before quiesce)
  - Concurrent code load (before quiesce)

RAID details

You can use the utility issraid to explore the applied RAID configuration.

# Find out name of manager (seems to be always iss001 on kona 0, iss011 on kona 1)
issraid -M
# Show details on primary manager (kona 0)
issraid -Izl iss001

Setting the time

If you want to manually set the time, you can do it from the shell like this:

date --set="20190706 11:16"
/sbin/hwclock --set --date="$(date)"

Factory Reset

You can either factory reset an enclosure, or a single Kona.

Enclosure

In order to factory reset, both controllers needs to be online and reach eachother. Do the following as root on the primary controller (Kona 0).

[Fri Jan 10 19:42:11] root@noname:~ # cmdmenu.pl 
+------------------------------------------------------------------------+
| Text Based Menu v0.03 running on noname                                |
+------------------------------------------------------------------------+
|  1) Clear Message Router Files                                         |
|  2) Check and Clear Failed Controller Flag (Window Files)              |
|  3) Display and Reset Controller Reboot Count (imlretry)               |
|  4) Display/Modify Controller Autoboot Flag (norsStart)                |
|  5) Delete Nonvolatile Write Cache Data (CST)                          |
|  6) Rebuild Configuration Database (Clean PDM)                         |
|  7) Delete Config and Return to Factory Defaults (Clear&Pave part 1/2) |
|  8) Delete Config and Return to Factory Defaults (Clear&Pave part 2/2) |
|  9) Force CPSS Dump                                                    |
| 10) PE_Package (...)                                                   |
| 11) Statesaves (...)                                                   |
| 12) Arrowhead_Dumps (...)                                              |
| 13) FTP a File (from current node)                                     |
| 14) Exit                                                               |
+------------------------------------------------------------------------+
>>> Your choice? 7
Fri Jan 10 19:43:26 2020 Begin Procedure: Delete Config and Return to Factory Defaults (Clear&Pave part 1/2)
Fri Jan 10 19:43:26 2020 /lic/sm/bin/clear_and_pave1.pl on kona 0
>>> Are you sure you want to delete all data (y/n)? y
*********************************************************************************
Fri Jan 10 19:43:33 2020 All data will be lost, this operation is not reversible!
*********************************************************************************
>>> To continue type 'continue and delete all data' (without the single quotes):  continue and delete all data
>>> Really? Are you *SURE* (y/n)? y
>>> Please enter your full name : something-clever-here-maybe
Fri Jan 10 19:44:00 2020 Shutting down LCPSS on kona 0

After a while you’ll get asked to confirm a reboot. Do that, and after the reboot has finished and the console appears somewhat silent, continue the process.

[Fri Jan 10 19:51:10] root@noname:~ # cmdmenu.pl
+------------------------------------------------------------------------+
| Text Based Menu v0.03 running on noname                                |
+------------------------------------------------------------------------+
|  1) Clear Message Router Files                                         |
|  2) Check and Clear Failed Controller Flag (Window Files)              |
|  3) Display and Reset Controller Reboot Count (imlretry)               |
|  4) Display/Modify Controller Autoboot Flag (norsStart)                |
|  5) Delete Nonvolatile Write Cache Data (CST)                          |
|  6) Rebuild Configuration Database (Clean PDM)                         |
|  7) Delete Config and Return to Factory Defaults (Clear&Pave part 1/2) |
|  8) Delete Config and Return to Factory Defaults (Clear&Pave part 2/2) |
|  9) Force CPSS Dump                                                    |
| 10) PE_Package (...)                                                   |
| 11) Statesaves (...)                                                   |
| 12) Arrowhead_Dumps (...)                                              |
| 13) FTP a File (from current node)                                     |
| 14) Exit                                                               |
+------------------------------------------------------------------------+
>>> Your choice? 8
Fri Jan 10 19:51:22 2020 Begin Procedure: Delete Config and Return to Factory Defaults (Clear&Pave part 2/2)
Fri Jan 10 19:51:22 2020 /lic/sm/bin/clear_and_pave2.pl on kona 0
>>> Are you sure you want to continue with clear and pave (y/n)? y
***********************************************************************************************
Fri Jan 10 19:51:29 2020 This is Part 2/2 of the procedure, make sure you already did Part 1/2.
***********************************************************************************************
>>> Have you already ran clear and pave part 1? y
*********************************************************************************
Fri Jan 10 19:51:34 2020 All data will be lost, this operation is not reversible!
*********************************************************************************
>>> To continue type 'continue and delete all data' (without the single quotes):  continue and delete all data
>>> Really? Are you *SURE* (y/n)? y
>>> Please enter your full name : something-clever-here-maybe
Fri Jan 10 19:51:45 2020 daDestroyGD on kona 1

This will format all drives which will take a few hours likely.

Single Kona

If you want to reset a single controller, you can use the fresh_kona.sh script that is available on disk. It will put the controller in a mode where it is prepared to be a replacement controller for another failed one

Licensing

There are some files that are interesting for licensing and product enablement. I haven’t looked too much into this yet as my array is not fully up and running, but expect this section to grow. If you’re so inclined, the code that handles feature activation seems to be called libSm.so and comes with debugging symbols baked in.

Files that are critical to licensing that I know so far is /persist/etc/fea${CHASSIS_SERIAL}.bin, /persist/etc/nc_mnta.cfg, and persist/etc/ncipl.da.

MTS serial number

The serial number for DS6800 is shown in the SSH banner. This is an example:

 
                         IBM SystemStorage DS6000 (TM)
 
        1750 Model 511            SN 13-00715         Controller Card 1 (CL0)
 
 MCP 2007-06-26-203005         Code EC 5.2.2.514      Installed on: Sep 12 2010 
 
 
                                R11f.5b090514a 
                           SEA.rte level = 5.2.2-514
 
                   Licensed Internal Code - Property of IBM. 
 
                          1750 Licensed Internal Code 
             (C) IBM Corporation 1997, 2005.  All rights reserved. 
 
    US Government Users Restricted Rights - Use, duplication or disclosure 
          Restricted by GSA ADP Schedule Contract with IBM Corporation. 

The MTS (a.k.a. essID) serial number here is IBM.1750-1300715. When adding to DSSM it’s the 1300715 you need to enter.