Bare metal provisioning

Introduction

In this article I attempt to outline a workflow that can be used to deploy hundreds of machines with relative ease.

Preparing the template

Use LXC to set up template machine using Puppet, Salt or your favourite configuration management software. Make sure you use btrfs backing store as btrfs is used for differential snapshots. Make sure you clean up cached packages and other stuff you might not want to see in the production machines. Once the template container is ready, stop the container and create a snapshot of the container using lxc-snapshot. We use LDAP for authentication so /etc/passwd is basically empty and we can use same container to deploy hundreds of machines.

Buildroot based provisioning image

Use Buildroot to generate self-contained Linux image to be booted using PXE or optionally download the one I've prepared.

git clone git://git.buildroot.net/buildroot

We managed to cram image below 10MB, so no external root filesystem is needed for the provisioning stage. Use menuconfig to tweak Builtroot build and kernel-menuconfig to tweak kernel build:

make menuconfig
make linux-menuconfig

Enable Kernel -> Linux kernel to have kernel built by Buildroot. Use Filesystem images -> initial RAM filesystem linked into linux kernel to enable initramfs build, in that case the initial RAM filesystem is built into the kernel image.

Use System configuration -> Root filesystem overlay directories to point to the root overlay folder which contains firstly a customized init at /sbin/init:

#!/bin/sh
# Note that /dev should be handled by devtmpfs
# Make sure this file will be executable

mount -t proc none /proc
mount -t sysfs sysfs /sys
udhcpc

while [ 1 ]; do
    dialog --menu "What do you want to do" 0 0 0 \
        provision       "Provision this machine" \
        shell           "Drop to shell" \
        reboot          "Reboot" \
        poweroff        "Shutdown" 2> /tmp/what_next
    clear
    case $(cat /tmp/what_next) in
        shell)
            sh
        ;;
        reboot)
            reboot -f
        ;;
        poweroff)
            poweroff -f
        ;;
        provision)
            provision
            sh
        ;;
    esac
done

And the provisioning tool itself, at /sbin/provision for instance:

#!/bin/sh

CURL="curl -s"
URL="https://mgmt.koodur.com/api"

TARGET_MOUNTPOINT=/mnt/target
TARGET_DIRECTORY=$TARGET_MOUNTPOINT/rootfs

# These Bashisms are insane
for chunk in $(cat /proc/cmdline); do
    case "$chunk" in
        butterknife_api_url=*)
        URL="${chunk#butterknife_api_url=}"
        ;;
    esac
done

$CURL $URL/container/ \
  | jq '.containers[] | .name + " \"" + .description + "\""' -r \
  > /tmp/available_templates
dialog --menu "Select template to deploy from $URL" 0 0 0 \
    --file /tmp/available_templates 2>/tmp/selected_template
TEMPLATE=$(cat /tmp/selected_template)

$CURL $URL/container/$TEMPLATE/snapshot/ \
  | jq '.snapshots[] | .name+" \""+.comment + "\""' -r \
  | sort -r \
  > /tmp/available_snapshots
dialog --menu "Select snapshot to deploy" 0 0 0 \
    --file /tmp/available_snapshots 2>/tmp/selected_snapshot
SNAPSHOT=$(cat /tmp/selected_snapshot)

STREAM="$URL/container/$TEMPLATE/snapshot/$SNAPSHOT/stream"

#exit 0

# Determine target disk
for disk in /dev/sd?; do
    slug=$(echo $disk | cut -d "/" -f 3)
    echo "$disk \"$(cat /sys/block/$slug/device/model | xargs) ($(expr $(cat /sys/block/$slug/size) \* $(cat /sys/block/$slug/queue/hw_sector_size) / 1000000000)G)\"";
done > /tmp/disks

dialog --menu "Target disk" 0 0 0 --file /tmp/disks 2> /tmp/selected_disk

DISK=$(cat /tmp/selected_disk)
DISK_SLUG=$(echo $DISK | cut -d "/" -f 3)

dialog --menu "Partitioning $DISK" 0 0 0 \
    purge           "Overwrite whole disk" \
    reformat        "Reformat partition" \
    receive         "Receive into existing btrfs filesystem" \
    unpartitioned   "Use unpartitioned area" 2> /tmp/partitioning_method

clear

# TODO: EFI way is not currently covered!
case $(cat /tmp/partitioning_method) in
    "unpartitioned")
        clear
        echo "Attempting to create new partition in unpartitioned space"
        echo -e "n\np\n1\n\n\nw" | fdisk $DISK
        sleep 3
    ;;
    "purge")
        clear
        echo "Purging whole disk"
        echo -e "o\nn\np\n1\n\n\nw" | fdisk $DISK
        sleep 3
    ;;
esac

# Determine target partition
for partition in $DISK?; do
    partition_slug=$(echo $partition | cut -d "/" -f 3)
    echo "$partition \"$(cat /sys/block/$DISK_SLUG/$partition_slug/size)\"";
done > /tmp/partitions

dialog --menu "Target partition" 0 0 0 --file /tmp/partitions 2> /tmp/selected_partition
clear

PARTITION=$(cat /tmp/selected_partition)
PARTITION_SLUG=$(echo $PARTITION | cut -d "/" -f 3)

case $(cat /tmp/partitioning_method) in
    "receive")
        echo "Skipping filesystem creation"
    ;;
    *)
        echo "Creating clean btrfs filesystem on $PARTITON"
        mkfs.btrfs -f $PARTITION
    ;;
esac

mkdir -p $TARGET_MOUNTPOINT
mount $PARTITION $TARGET_MOUNTPOINT
if [ $? -ne 0 ]; then
    dialog --msgbox "Mounting $PARTITION at $TARGET_MOUNTPOINT, are you sure kernel has btrfs support built-in?" 0 0
    exit 255
fi

mkdir -p $TARGET_DIRECTORY

echo "Mountpoints:"

# Determine transfer method
dialog --menu "Select transfer method" 0 0 0 \
    multicast "Multicast receive" \
    http "HTTP-only" \
    tee "HTTP and multicast" 2>/tmp/transfer_method
clear

TRANSFER_METHOD=$(cat /tmp/transfer_method)

case $TRANSFER_METHOD in
    multicast)
        udp-receiver | zcat | pv | btrfs receive $TARGET_DIRECTORY
    ;;
    http)
        $CURL $STREAM | zcat | pv | btrfs receive $TARGET_DIRECTORY
    ;;
    tee)
        $CURL $STREAM | tee '>(zcat | btrfs receive $TARGET_DIRECTORY)' | udp-sender
    ;;
esac

echo "Flushing buffers"
sync
sleep 1
echo "Rebooting machine"
reboot -f

If you proxy the API through HTTPS make sure you place your web server certificate bundle at /etc/ssl/bundle.crt of the Buildroot filesystem overlay.

Issue make to actually build the buildroot image:

make -j16

Serving provisioning image

In order to boot machines using PXE you need to have control over the DHCP server settings, mainly you need to be able to provide BOOTP options for the machines that are booting using PXE and requesting boot arguments from DHCP server. I've used OpenWrt based router to serve DHCP in the local area network. In order to perform PXE in an OpenWrt managed network you need to add following options in order to load pxelinux.0 from TFTP server sitting at 192.168.72.146:

mkdir -p /var/lib/tftp
uci set dhcp.@dnsmasq[0].dhcp_boot=pxelinux.0,,192.168.72.146
uci commit dhcp
/etc/init.d/dnsmasq restart

The Ubuntu 14.04 based TFTP server sits at 192.168.72.146 and we've prepared buildroot at ~/buildroot:

apt-get install pxelinux atftpd openbsd-inetd
mkdir -p /srv/tftp/pxelinux.cfg
ln -s \
    /usr/lib/PXELINUX/pxelinux.0 \
    /usr/lib/syslinux/modules/bios/ldlinux.c32 \
    /usr/lib/syslinux/modules/bios/libutil.c32 \
    /usr/lib/syslinux/modules/bios/menu.c32 \
    /srv/tftp/
ln -s ~/buildroot/output/images/bzImage /srv/tftp/

Also create /srv/tftp/pxelinux.cfg/default, here you may customize butterknife_api_url to have the container snapshots fetched from another server:

DEFAULT menu.c32
PROMPT 0
TIMEOUT 50
MENU TITLE Butterknife provisioning tool

label Ubuntu
        MENU LABEL Butterknife provisioning (i386)
        KERNEL bzImage
        APPEND butterknife_api_url=https://mgmt.koodur.com/api/
        TEXT HELP
                Start the Butterknife provisioning tool
        ENDTEXT

Booting the image

Press F12 or whatever is required on the PC to boot from network. The SYSLINUX menu should appear, press enter to boot the provisioning image and follow instructions on the screen.

img/butterknife1.png

The main menu has convenience entries for shell, reboot and shutdown.

img/butterknife2.png

Target disk selection lists /dev/sd[a-z] entries.

img/butterknife3.png

Several partitioning options are provided, nothing fancy though.

img/butterknife4.png

Partition selection should be polished.

img/butterknife5.png

Last tee option pipes HTTP stream to multicast

img/butterknife6.png

Final screen before reboot

provisioning