In close collaboration with Seoul National University's Structural Complexity Laboratory

 

Upgrading Fedora

General issues in upgrading Fedora

  • Before Upgrading
    • yum update
    • package-cleanup --problems
    • package-cleanup --dupes
    • package-cleanup --orphans
    • rpmconf -a (using meld to conflate)
  • After Upgrading
    • yum distro-sync
    • yum update
    • package-cleanup --problems
    • package-cleanup --dupes
    • package-cleanup --orphans
    • rpmconf -a (using meld to conflate)

Some useful scripts

  • When you want to delete a package (usually, to reinstall because of yum screw-ups) without touching any dependencies:

rpm -e --justdb --nodeps packagename

Special Issues, Upgrading VirtualBox Hosts

Uninstall virtualbox before upgrade. Upgrade, then reinstall virtualbox (or you could try using the yum repo for virtualbox provided by rpmfusion, but so far I've found it too painful - too many glitches with the rpms - and directly installing rpms from www.virtualbox.org much easier).

Special Issues, Upgrading VirtualBox Guests

Guest additions may fail badly in the updated guests (may only be able to boot into init 3), and may not be able to access CD. Handle by downloading corresponding guest additions .iso and attaching to CD drive.

Special Issues, Upgrading Cluster

  1. Create new INSTALL and RUN directories, and export via nfs
  2. Download appropriate DVD iso, and copy to INSTALL from command line (don't use File Manager because of name length limitations - symptoms, will fail to find repodata).
  3. Set permissions of INSTALL subdirectories
  4. Copy INSTALL scripts from previous version of fedora
  5. Copy RUN contents from previous version of fedora
  6. Now using yum via kaya, having switched kaya routing on
    1. Do make sure to disable the updates repo in all yum commands; we can't handle the load of updating all cluster machines
      1. Probably should set up a yum proxy on kaya, but I don't have time
    2. Shouldn't need to do this any more: Download outdated binaries in RUN
  7. Copy previous directory in tftpboot to new version
  8. Copy new TRANS.TBL, vmlinuz, initrd.img from nnn/INSTALL/isolinux
  9. cd to pxelinux.cfg, and edit ANACONDA_TEMPLATE.cfg and makecfgs to update version of fedora and location of install contents
  10. Try a test installation of a basic system (same HD structure) from DVD; save kickstart file from root directory to a USB drive, and compare with previous ANACONDA_TEMPLATE.cfg
  11. Run makecfgs to update version of fedora and location of install contents
  12. Test by installing on cluster machine. If it reboots OK, leave running and proceed to upgrade cluster controller
  13. If updated client appears in xpbsmon after upgrade, upgrade all other cluster clients (else debug)

Special Issues, Upgrading sc and sc1

  1. Use dd to make copies of all system partitions to backup boot disk
    1. dd bs=64k if=/dev/vg_sc1_f15plus/lv_xxx of=/dev/vg_sc1_f14plus/lv_xxx for xxx in var, root, usr, usrlocal
  2. Update /sda1 boot config to reflect this
  3. Test the backup works directly from BIOS

Now trying a new version (for both sc and sc1) and documenting here in case it helps someone else. Basically, our servers have the structure of a slow but reasonably large 2.5“ disk as the boot disk and a RAID array (detailed type largely doesn't matter, but for the record, software RAID 0 striped over hardware RAID 1 on sc - done this way because of hardware limitations - and hardware RAID 5 on sc1). The basic idea is to keep a base system on the 2.5” disk (sda for concreteness), and the production system on the RAID array (md for concreteness). This isn't working yet, but I'll describe it anyway. sda1 contains the typical fedora boot partition. sda2 contains an lvm volume group, sc, which contains logical volumes root and home (with obvious meaning). Actually, the detailed structure is more complex, but this will do for now. The RAID array contains another vg, scfast, also with lvs root and home. When the system is in normal running, it boots from sda1 to scfast/root and scfast/home. These are yum-updated normally. When an upgrade is needed, /boot is dd-ed to a partition on md so we have an emergency backup. Booting is from sc/root and sc/home, and all other mounts from scfast are commented out in fstab (i.e. the system boots entirely from sc), and scfast is actually physically disconnected to ensure that no screw-ups can occur. sc is upgraded in normal fashion (if anything goes wrong, we still have a running production system on scfast). When sc is running and stable, we go through this process (partially courtesy Andy Botting):

  1. Add a snapshot volume for sc/root: lvcreate –size 5G –name root_snap /dev/sc/root (this is necessary so that we get a different UUID, since dd copies everything, including UUIDs, and lvchange still provides no mechanism for changing UUIDs)
  2. dd bs=128M if=/dev/sc/root_snap of=/dev/scfast/root
  3. lvremove /dev/sc/root_snap (we no longer need it)
  4. uuidgen (to create a brand spanking new UUID for the ext4 filesystem on /dev/scfast/root)
  5. tune2fs -U <value generated by uuidgen> /dev/scfast/root
  6. Make sure that /etc/default/grub contains all modules needed for any RAID (e.f. md)
  7. grub2-mkconfig -o /boot/grub2/grub.cfg
    1. This will create updated grub entries for booting from sc
  8. Copy the top entry (the whole entry) from /boot/grub2/grub.cfg to /etc/grub.d/40_custom
  9. Edit as appropriate (i.e. changing sc to scfast as appropriate, and updating UUIDs - blkid is your friend to find these out)
  10. Edit fstab correspondingly
  11. grub2-mkconfig -o /boot/grub2/grub.cfg
    1. This will create a menuentry for booting to scfast
  12. dd /sda1 to a safe backup
  13. Try to boot
  14. If it succeeds
    1. Edit /etc/default/grub as appropriate
    2. grub2-mkconfig -o /boot/grub2/grub.cfg
    3. Check out /boot/grub2/grub.cfg - somehow you need to make scfast the default boot system
    4. Repeat the process, to generate alternate boot entries for booting to sc

Special Issues, Upgrading sc1

  1. Disable unbacked in fstab
  2. Reboot
  3. Upgrade (e.g. by preupgrade)
  4. cd /root/RocketRAID/rr232x-linux-src-v1.10/product/rr232x/linux
  5. make
  6. make install
  7. reboot
  8. re-enable unbacked in fstab
  9. reboot

Special Issues, Upgrading to Fedora 21

Only just started on this, I expect this list to grow…

  1. A number of packages haven't been properly updated for F21, so they don't get upgraded properly. As of Dec. 23, the list includes wget, samba and sqlite. The cause seems to be a misunderstanding between the package maintainers and the fedup maintainers about the required relationship between F20 and F21 version numbers for things to work properly. You can fix this after upgrade by running “yum downgrade” on the affected packages.
  2. The ATI Rage 128 video driver doesn't get upgraded properly; you may be able to fix this by booting into single user mode and issuing yum downgrade xorg-x11-server-Xorg (if you can't, you will be left with a system that can't get to a login screen). However I can't test this because…
  3. There is a further problem with the assignment of network interfaces, so (especially if you need fixed ip addresses), the network may require further configuration. You can either do this manually, or - if you have a video interface (see above) - use the networkmanager gui.

Bottom line, currently there are too many bugs that may interact, creating catch-22s as above. Fortunately, so far I have only been upgrading virtual machines. One is running F21 OK, I think I'm going to have to revert the other. I won't be trying to upgrade real machines till things are a bit more stable.

Special Issues, Upgrading to Fedora 20

  1. The biggy here seems to be that it's best to disable all external repos (and specifically, rpmfusion) before running fedup
    1. yum repolist all | less
    2. yum-config-manager --disable 'rpmfusion*'
    3. Run fedup
    4. yum-config-manager --enable 'rpmfusion*20*'
    5. yum repolist all | less (just to check)
  2. And there's yet another screwup if you have a separate /var partition. To fix it,
    1. sudo sed -i 's# -a -d /system-upgrade##' /lib/systemd/system-generators/system-upgrade-generator
  3. The network interface naming protocol has changed _yet again_, I needed to reconstruct the firewall to match
  4. On one system running a dhcp server, dhcp startup was failing (probably a timing race), and I needed to put into rc.local
    1. systemctl restart dhcpd
  5. On the same system I had a similar problem with a tftp server, tftp startup was failing (probably also a timing race), and I needed to put into rc.local
    1. systemctl restart tftp.socket

Special Issues, Upgrading to Fedora 19

  1. F19 VirtualBox guests: fstab mounting of virtual box shared folders (type vboxsf) seems to have stopped working (this may be just an initial problem), so they need to be mounted manually, say from rc.local. However the availability of share mounting seems to take some time. I've had to put a 'sleep 30' in the initscript. And unfortunately…
  2. rc.local seems to have been definitively removed from the default list. To enable it again:
    1. create rc.local in /etc/rc.d:
      1. -rwxr-xr-x. root root system_u:object_r:initrc_exec_t:s0 rc.local
    2. systemctl enable rc-local.service
    3. systemctl start rc-local.service (to test)
  3. The old system-config-network gui seems to have definitively stopped working (I found some guides to getting it working again around the net; unfortunately I wasn't able to get their recipes working for me). So it looks like it's command-line for the future
    1. Unless you want to switch to NetworkManager, which seems to be fine for desktops, but still has too many bugs for server use in my experience…
  4. Fedora 19 has dropped support for the old-style procfs based drivers - which is what the older HighPoint RocketRaid drivers are built on. This was a big problem for us, as we didn't have a larger PCIE slot available (so we couldn't use any of the more modern RAID controllers), yet rewriting the driver to use sysfs was a very daunting prospect. So all kudos to HighPoint. They came to the rescue by rewriting the driver to use sysfs, and supplied me with rr232x-linux-src-v1.10.1, despite having told us for at least three years that they were not providing support for these drivers any longer. _Thank you_. I'd love to be able to provide copies here, but unlike their newer drivers, it doesn't seem to have been GPL'ed, so I can't. But I'm sure if you write to them nicely they'll be helpful. Cross fingers, we won't face any more driver changes in Fedora for the 18 months or so that these machines need to last us…

Special Issues, Upgrading to Fedora 18

Fortunately, F18 doesn't seem to have many special issues beyond…

  1. If you are running an iptables firewall (for example, one configured by fwbuilder), you need to be aware that upgrading to F18 has installed firewalld as an alternative, and that it becomes the default in F19. If you want to continue to use your old firewall rules, somewhere between F18 and F19 you need to do
    1. systemctl disable firewalld.service
    2. systemctl stop firewalld.service
      1. Unfortunately you can't easily remove it completely because NetworkManager has a yum dependency on it
    3. systemctl status firewalld.service
    4. systemctl status iptables.service
      1. just to make sure
  2. If you are using specific eth<n> style names for interfaces using udev rules and config files, this will probably stop working somewhere between F18 and F19 because of the repeated stuff-ups with this code. Fedora 18 interface naming lasted one whole release; the new naming convention in F19 is just as bad (just as susceptible to slot reconfiguration). If you want to stick to the earlier (and more reliable) approach based on MAC numbers, you can still do it via udev rules, but you need to change the naming base from eth to something else (because of conflicts with the kernel interfaced naming). To do this,
    1. Rename the interfaces in /etc/udev/rules.d/70-persistent-net-rules to some other base name (I renamed eth<n> to sceth<n>; this simple change meant that the names got attached to the interfaces specified in the rules, whereas the original eth<n> names got all confused).
    2. Rename the corresponding files in /etc/sysconfig/network-scripts, /etc/sysconfig/networking/devices and /etc/sysconfig/networking/profile/default appropriately. I found this useful: for i in *eth[0-9] ; do j=`echo $i | sed -e 's;eth;sceth;'`; mv $i $j; done
    3. Make corresponding changes in all firewalls

Special Issues, Upgrading to Fedora 17

Please see F16 issues below (many issues are repeated for F17). Especially, for RocketRaid, note that the kernel changes to 3.4 soon after the upgrade, so you will need to change the sources as below to reflect this. Bugs: there seem to be a lot.

  1. The real biggie is bug 820351. It means that upgrading from a DVD doesn't work, and will leave your yum configuration in a seriously screwed-up state unless you enable the network, and the (F17) updates repo, during the upgrade. My recommendation: if you don't know yum reasonably well, don't upgrade from a DVD.
    1. If you do get caught with this, do a yum remove of the package causing the problem (this may change as the packages get updated; at the moment, the problems are turning up with one of the lib-sane packages and with one of the cups libraries; both can safely be removed. If you see a whole slew of dependent packages being removed, better not to do it…). Then do yum –skip-broken distro-sync. Finally, run package-cleanup –orphans and manually yum remove all the orphaned packages (check carefully first, some of the packages might be ones you manually installed).
  2. If you have a separate /usr partition, you're in for more excitement - exactly which depends on your upgrade method.
    1. If you use preupgrade, you'll find that the boot fails (because essential bits are in /usr). To fix this, you need to add rd.lvm.lv=vg_sc4/lv_usr to the boot command line on the first boot. Then you need to go into /etc/default/grub and edit the GRUB_CMDLINE_LINUX to something like:[CODE]GRUB_CMDLINE_LINUX=“rd.lvm.lv=vg_sc4/lv_root rd.lvm.lv=vg_sc4/lv_usr rd.lvm.lv=vg_sc4/lv_swap”[/CODE], and finally run grub2-mkconfig -o /boot/grub2/grub.cfg, after which your system will probably run somewhat normally
    2. If you use an install DVD for the upgrade, somehow /usr gets set to read-only during the update. This manifests by the system saying that there isn't enough space to install something. You can manually do mount -o rw/remount /usr, and then run yum normally.
  3. Some binaries seem to have disappeared from the installation repo; I needed to omit:
    1. libsigc++
    2. hdparm
  4. For the initial pxe/tftp boot config (which pivots to boot from nfs), I needed to change the kernel append commmand:
    1. Original:
      1. append initrd=initrd.img ks=nfs:192.168.<NETNUM>.1:/tangof17/INSTALL/C0A80<NETNUM>.cfg ramdisk_size=500000 devfs=nomount text dns=192.168.<NETNUM>.1 ip=dhcp ksdevice=eth0
    2. New:
      1. append initrd=initrd.img ks=nfs:192.168.<NETNUM>.1:/tangof17/INSTALL/C0A80<NETNUM>.cfg ramdisk_size=500000 devfs=nomount text dns=192.168.<NETNUM>.1 ksdevice=bootif
      2. ipappend 2
  5. For pxe/nfs installs, there is a problem on the reboot after the install: networkmanager may be stopped before nfs is unmounted (probably depends on the relative speeds of different bits of hardware) resulting in the system hanging. For this, I needed to put this img file (see Fedora 17 problems page) into <Expanded ISO directory>/images to fix this.

system-config-firewall has a serious bug: in a newly installed Fedora 17, the initial state of the firewall doesn't match what system-config-firewall shows. You have to first save it, by for example setting ssh to disabled, saving, setting back to enabled, and saving again…

A change from previous versions, sshd isn't enabled at installation. You have to run

 
systemctl enable service

after installation

Also some positives:

  1. Realtek rt2500 wireless card support seems to be working again (for the first time in about a year)

Special Issues, Upgrading to Fedora 16

Fedora 16 seems to have had the buggiest upgrade process yet (though it looks quite a nice system _once_ you get it going). This is my list of needed fixes:

  1. If you have a separate partition for /var, then before starting the process, you need to put a copy of /var/lib/rpm into your root partition (because the upgrade process will look for it there)
  2. If you are using a 64-bit system, you will need to boot first into runlevel 3 (add init 3 to your boot parameters) and use yum to uninstall the 32-bit version of caribou that got installed in the upgrade, and install the 64-bit version instead
  3. If you want to use openvpn, you will need to:
    1. Client:
      1. Create your openvpn.conf, e.g. /etc/openvpn/scopenvpn.conf
      2. ln -s /lib/systemd/system/openvpn@.service /etc/systemd/system/multi-user.target.wants/openvpn@scopenvpn.service
        1. You should be able to do the above with systemctl enable openvpn@scopenvpn.service, but for some reason it fails
      3. systemctl start openvpn@scopenvpn.service to check it works
      4. (You might think you could do this using nmcli. Ha, gotcha - nmcli now depends on org/freedesktop, and so can't be run from rc.local or crontab)
    2. Host:
      1. Exactly as above , but the enable doesn't seem to work (I think openvpn is being started too early, so it fails)
      2. In /etc/rc.d/rc.local, put systemctl restart openvpn@scopenvpn.service
  4. If you are hosting nfs
    1. The nfs server probably won't start: systemctl start nfs-server.service followed by systemctl enable nfs-server.service
    2. If you have ever edited /etc/sysconfig/nfs, it will no longer work. You need to backup /etc/sysconfig/nfs to /etc/sysconfig/nfs.bak, move /etc/sysconfig/nfs.rpmnew to /etc/sysconfig/nfs, then edit /etc/sysconfig/nfs to conform to whatever changes you had in /etc/sysconfig/nfs.bak, and finally systemctl restart nfs-server.service (check rpcinfo -p <nfs-server-name> to see whether you have this problem; if so, you won't see nfs, nfs_acl or nslockd demons)
  5. The level of duplicate rpms from yum seems to be much higher than usual. You probably should try to remove them
  6. In this upgrade, rhgb and quiet (and probably other kernel options) get re-enabled in grub. To fix this, edit /etc/default/grub, then grub2-mkconfig -o /boot/grub2/grub.cfg. However this will probably mess up the grub defaults. If you want it to behave in the natural way, and reboot into the previously-selected kernel, you need to add GRUB_SAVEDEFAULT=true as well to /etc/default/grub, before doing grub2-mkconfig. If you are running F16 in a virtualbox guest, you probably need to add divider=10 as well.
  7. If you are using nx, the keyfile /var/lib/nxserver/home/.ssh/authorized_keys2 gets moved to authorized_keys2.disabled and needs to be moved back for nx to work
  8. If you don't like packagekit running (I much prefer yum myself), edit /etc/PackageKit/PackageKit.conf to comment out the [Daemon] line
  9. If you are using an older Highpoint RocketRaid adapter, Highpoint haven't updated the drivers in quite a while. You will need to apply previous mods for 2.6.30 kernels (thanks to Niels Horn), and then add further modifications to tell the scripts to compile for a 3.1 kernel. Please note that this is very risky; you use this based on your own expertise, it is quite possible that even for my specific hardware, there could be problems down the track - and I have no idea about yours. If you aren't familiar at minimum with C, shell scripting and linux structure, please don't try, the risk is far greater than any possible benefit. Anyway, in my case, the changes were reasonably extensive, so please go through by hand and compare. Here is what I think was the original version I started from. Here is my changed (and apparently working) version. To the best of my recall, the files I changed were:
    1. in /root/RocketRaid3.1/rr232x-linux-src-v1.10/inc/linux Makefile.def
    2. in /root/RocketRaid3.1/rr232x-linux-src-v1.10/osm/linux patch.sh osm_linux.c install.sh

The recent upgrade to kernel 3.6 causes further problems. All kudos to ZoZo on the ubuntu forums, who discovered that:

I succeeded in compiling the drivers, but they're the rr2340, not rr62x. What I did was look for calls to kmap_atomic and kunmap_atomic in the source code (under os_linux.c and osm_linux.c in my case), and removed their second argument (HPT_something). Then I deleted the #define lines referring to KM_BIO_SRC_IRQ (under osm_linux.h in my case), they weren't needed anymore. Then 'make install' and voilà. You can try the same technique on your side.

Worked for me too…

…and 3.7 brings even more joy. For some unknown reason, the kernel maintainers have decided to move things around, so I found it necessary to make the following changes around line 80 of RocketRAID3.7/rr232x-linux-src-v1.10/inc/linux/Makefile.def:

#
# change KERNELDIR according to your system
# Kernel 3.7 moved all the directories around... https://lkml.org/lkml/2012/7/20/419
#
ifndef KERNELDIR
KERNELDIR := /lib/modules/$(shell uname -r)/build
endif
KERNELSRC := /usr/src/kernels/$(shell uname -r)

#KERNEL_VER := 2.$(shell expr `grep LINUX_VERSION_CODE $(KERNELDIR)/include/linux/version.h | cut -d\  -f3` / 256 % 256)
#KERNEL_VER := 3.$(shell expr `grep LINUX_VERSION_CODE $(KERNELDIR)/include/linux/version.h | cut -d\  -f3` / 256 % 256)
KERNEL_VER := 3.$(shell expr `grep LINUX_VERSION_CODE $(KERNELSRC)/include/generated/uapi/linux/version.h | cut -d\  -f3` / 256 % 256)

ifeq ($(KERNEL_VER),)
#$(error Cannot find kernel version. Check $(KERNELDIR)/include/linux/version.h.)
$(error Cannot find kernel version. Check $(KERNELSRC)/include/generated/uapi/linux/version.h.)
endif
  1. If you are using a RAlink wireless card, check whether you are using the RTxx00 driver. If so, it has been broken since kernel 2.6.40, giving system crashes (RT2500 and perhaps other cards) and very slow, unreliable connections (RT2800 and probably other cards). While in F15, you can regress to the 2.6.38 kernel, which seems to be fine. If you upgrade to F16, this option is removed. I would strongly recommend not upgrading till the kernel/driver issues are fixed (see bug 731672 and bug 753648).

Replacing GNOME with LXDE

I'm sure Gnome 3 has its good points. So far, I haven't had a chance to find them, because it fails to work properly on most of our hardware, so that systems become unusable. I've found it necessary to switch to lxde instead. My guess is that Gnome 3 is fine if you happen to have a gamer-style machine with a high-end graphics card. If your machine is a scientific machine, optimised for computation, it's just luck whether you have a graphics card Gnome 3 supports properly. Of course, if you are installing fedora from scratch I would strongly recommend using the lxde spin. If you have a Gnome system that you need to convert to lxde, here are the steps I used:

  1. Install the right software:
    yum install imsettings-lxde lxde-common lxde-icon-theme lxdm lxmenu-data lxpanel lxsession lxappearance lxinput lxlauncher lxmusic lxpolkit lxrandr lxsession-edit lxshortcut lxsplit lxtask lxterminal
    1. Actually, I don't think this is everything, but it's enough to work, and I haven't been able to find the missing modules (feedback on this would be greatly appreciated).
  2. So that the system uses the lxde login manager (important - the gnome login manager often times out on scientific systems), and that users get lxde desktop sessions by default, modify (or create) /etc/sysconfig/desktop to contain:
PREFERRED="startlxde"
DISPLAYMANAGER="/sbin/lxdm"
  1. Just to be sure, chcon -u system_u /etc/sysconfig/desktop