GMS NOC Docs   Crashed machine UserPreferences
 
Help Search Diffs Info Edit Subscribe Print View
 GMS NOC Docs   FrontPage   RecentChanges   TitleIndex   Help 

Bringing up a crashed machine from firmware mode (shutdown -i5, or system crash into firmware hard boot)

Move file systems to proper backup machines, by running outage and checking where the abandoned file systems should go:

tjau_root > outage 
 
========================================== 
The following auth processes should be active on the tj node 
NO AUTH PROCESS'S FOUND 
========================================== 
The following filesystems can be moved off of tjau TO: 
05000     Mir001 Sa=62,94     Move to: tjbu Sb=82,74     ... 
15000     Mir003 Sa=72,84     Move to: tjcu Sc=92,64     ... 
rtj1      Mir002 Sa=63,95     Move to: tjbu Sb=83,75     ... 
rtj2      Mir004 Sa=73,85     Move to: tjcu Sc=93,65     ... 
=========================================== 
The following filesystems can be moved off of tjbu TO: 
25000     Mir005 Sb=62,94     Move to: tjcu Sc=82,74     ... 
35000     Mir007 Sb=72,84     Move to: tjau Sa=92,64     ... 
rtj3      Mir006 Sb=63,95     Move to: tjcu Sc=83,75     ... 
rtj4      Mir008 Sb=73,85     Move to: tjau Sa=93,65     ... 
============================================ 
The following filesystems can be moved off of tjcu TO: 
45000     Mir009 Sc=62,94     Move to: tjau Sa=82,74     ... 
55000     Mir011 Sc=72,84     Move to: tjbu Sb=92,64     ... 
rtj5      Mir010 Sc=63,95     Move to: tjau Sa=83,75     ... 
rtj6      Mir012 Sc=73,85     Move to: tjbu Sb=93,65     ... 

This shows that file system 45000, which normally runs on TJCU should be moved to TJAU if TJCU is not functioning. (No release is needed, since the machine has crashed and the filesystems are currently not being used.)

tjau_root > activate -y -f tj45000  

This will activate the dumped file system on its temporary machine TJAU (listed in outage above).

Repeat the above step for other orphaned file systems. In this case it would be 55000, which would need to be moved to TJBU.

<!> Spoolers (rtj1, etc.) don't need to be moved to a temporary machine unless the machine is to be out of service for a while.

After activating file systems on the proper machine, bring up the downed machine. A phone call to the location may be needed for a hard reboot.

  1. Enter mcp at the firmware mode prompt.
  2. At prompt run filletd
  3. Hit enter at the next 2 prompts (for SCSI and disk system scans)
  4. At the next prompt run /etc/system to bring up the system. ( /!\ It may prompt you for /unix, use /etc/system instead.)
  5. Hit enter at the next 2 prompts for scsi system scans.
  6. Login to machine as root
  7. Run /etc/startup to get the machine fully loaded for network use.
  8. After you get the INITILIZATION COMPLETE prompt run service -B to get all the Easylink Services started
  9. Release the file systems from the backup machines and activate them on the original machine, one at a time.
    1. release file system name should be run on the machine the file system has been moved to
    2. Activate the file systems (including spoolers (rtj1 etc)) on the original machine
  10. Check processes by running setsvc;showstatus. If the machine is an int'l node, then make sure AUTH is running on the proper machine by running outage

PythonPowered EditText of this page (last modified 2003-07-16 20:53:28)
FindPage by title search or text search