![]() | ![]() | ![]() |
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
resource:sc_lab_info:sclab:boot [2016/04/12 22:03] rim [Booting SC Lab Machines] |
resource:sc_lab_info:sclab:boot [2023/02/15 12:46] (current) |
||
---|---|---|---|
Line 4: | Line 4: | ||
The process for rebooting the machines is very specific and needs to be done in a particular order. Here it is: | The process for rebooting the machines is very specific and needs to be done in a particular order. Here it is: | ||
- | |||
- | |||
- | ====== SC ====== | ||
- | sc is the top machine in the rack | ||
- | - Check that the external backup disk is turned on (it might not restart after a crash) - sc can boot without the backup disk, but it won't be able to backup properly if the disk is turned on after it boots | ||
- | - Boot the machine (this can take 15 minutes, and for the first 5 minutes there may not be anything showing on the screen while it does its memory check) | ||
- | - Restart scteach (preferably by logging into sc on an nx console and running scteach in that) - normally, Bob will look after this | ||
====== SC1 ====== | ====== SC1 ====== | ||
- | sc1 is the second bottom machine in the rack | + | sc1 needs to be rebooted first. It is the second bottom machine in the rack |
- | - Turn on the external RAID array (the bottom machine in the rack) - sc1 _cannot boot without this_. | + | - Turn on the external RAID array (the bottom machine in the rack) - //sc1 cannot boot without this//. |
- Check that the external backup disk is turned on (it might not restart after a crash) - sc1 can boot without the backup disk, but it won't be able to backup properly if the disk is turned on after it boots | - Check that the external backup disk is turned on (it might not restart after a crash) - sc1 can boot without the backup disk, but it won't be able to backup properly if the disk is turned on after it boots | ||
- Turn on at least one blade (it won't boot properly, but if none are booted, sc1's cluster control software won't come up properly) | - Turn on at least one blade (it won't boot properly, but if none are booted, sc1's cluster control software won't come up properly) | ||
- Boot the machine (this can take a long time, though not as long as sc) | - Boot the machine (this can take a long time, though not as long as sc) | ||
- | - If, after about ten minutes, the system is still synching the RAID disks - all the RAID lights are flashing - then the boot has probably failed, and you should reboot again | + | - If, after about ten minutes, the system is still synching the RAID disks - all the RAID lights are flashing - then the boot has probably failed, and you should reboot |
- This problem might have been due to a flakey disk that eventually failed completely and had to be replaced, but we aren't completely sure | - This problem might have been due to a flakey disk that eventually failed completely and had to be replaced, but we aren't completely sure | ||
- | - Boot or reboot the blades once sc1 is running fully. You should do this even if pbsmon shows them as running. | + | - Boot or reboot the blades once sc1 is running fully. You should do this even if pbsmon shows them as running. Specifically, you need to reboot the blade you started a couple of steps ago because it won't be running properly. |
- Run pbsmon (in the education menu on sc1) to check that all the blades boot properly | - Run pbsmon (in the education menu on sc1) to check that all the blades boot properly | ||
- Restart sc1a (preferably by logging into sc on an nx console and running scteach in that) - normally, Bob will look after this | - Restart sc1a (preferably by logging into sc on an nx console and running scteach in that) - normally, Bob will look after this | ||
+ | |||
+ | ====== SC ====== | ||
+ | sc is the top machine in the rack. Its http server depends on some resources from sc1, so it won't boot properly until sc1 has completely rebooted | ||
+ | - Check that the external backup disk is turned on (it might not restart after a crash) - sc can boot without the backup disk, but it won't be able to backup properly if the disk is turned on after it boots | ||
+ | - Boot the machine (this can take 15 minutes, and for the first 5 minutes there may not be anything showing on the screen while it does its memory check) | ||
+ | - Restart scteach (preferably by logging into sc on an nx console and running scteach in that) - normally, Bob will look after this |