Bare metal provisioning
Introduction
In this article I attempt to outline a workflow that can be used to deploy hundreds of machines with relative ease.
Preparing the template
Use LXC to set up template machine using Puppet, Salt or your favourite configuration management software. Make sure you use btrfs backing store as btrfs is used for differential snapshots. Make sure you clean up cached packages and other stuff you might not want to see in the production machines. Once the template container is ready, stop the container and create a snapshot of the container using lxc-snapshot. We use LDAP for authentication so /etc/passwd is basically empty and we can use same container to deploy hundreds of machines.
Buildroot based provisioning image
Use Buildroot to generate self-contained Linux image to be booted using PXE or optionally download the one I've prepared.
git clone git://git.buildroot.net/buildroot
We managed to cram image below 10MB, so no external root filesystem is needed for the provisioning stage. Use menuconfig to tweak Builtroot build and kernel-menuconfig to tweak kernel build:
make menuconfig
make linux-menuconfig
Enable Kernel -> Linux kernel to have kernel built by Buildroot. Use Filesystem images -> initial RAM filesystem linked into linux kernel to enable initramfs build, in that case the initial RAM filesystem is built into the kernel image.
Use System configuration -> Root filesystem overlay directories to point to the root overlay folder which contains firstly a customized init at /sbin/init:
#!/bin/sh
# Note that /dev should be handled by devtmpfs
# Make sure this file will be executable
mount -t proc none /proc
mount -t sysfs sysfs /sys
udhcpc
while [ 1 ]; do
dialog --menu "What do you want to do" 0 0 0 \
provision "Provision this machine" \
shell "Drop to shell" \
reboot "Reboot" \
poweroff "Shutdown" 2> /tmp/what_next
clear
case $(cat /tmp/what_next) in
shell)
sh
;;
reboot)
reboot -f
;;
poweroff)
poweroff -f
;;
provision)
provision
sh
;;
esac
done
And the provisioning tool itself, at /sbin/provision for instance:
#!/bin/sh
CURL="curl -s"
URL="https://mgmt.koodur.com/api"
TARGET_MOUNTPOINT=/mnt/target
TARGET_DIRECTORY=$TARGET_MOUNTPOINT/rootfs
# These Bashisms are insane
for chunk in $(cat /proc/cmdline); do
case "$chunk" in
butterknife_api_url=*)
URL="${chunk#butterknife_api_url=}"
;;
esac
done
$CURL $URL/container/ \
| jq '.containers[] | .name + " \"" + .description + "\""' -r \
> /tmp/available_templates
dialog --menu "Select template to deploy from $URL" 0 0 0 \
--file /tmp/available_templates 2>/tmp/selected_template
TEMPLATE=$(cat /tmp/selected_template)
$CURL $URL/container/$TEMPLATE/snapshot/ \
| jq '.snapshots[] | .name+" \""+.comment + "\""' -r \
| sort -r \
> /tmp/available_snapshots
dialog --menu "Select snapshot to deploy" 0 0 0 \
--file /tmp/available_snapshots 2>/tmp/selected_snapshot
SNAPSHOT=$(cat /tmp/selected_snapshot)
STREAM="$URL/container/$TEMPLATE/snapshot/$SNAPSHOT/stream"
#exit 0
# Determine target disk
for disk in /dev/sd?; do
slug=$(echo $disk | cut -d "/" -f 3)
echo "$disk \"$(cat /sys/block/$slug/device/model | xargs) ($(expr $(cat /sys/block/$slug/size) \* $(cat /sys/block/$slug/queue/hw_sector_size) / 1000000000)G)\"";
done > /tmp/disks
dialog --menu "Target disk" 0 0 0 --file /tmp/disks 2> /tmp/selected_disk
DISK=$(cat /tmp/selected_disk)
DISK_SLUG=$(echo $DISK | cut -d "/" -f 3)
dialog --menu "Partitioning $DISK" 0 0 0 \
purge "Overwrite whole disk" \
reformat "Reformat partition" \
receive "Receive into existing btrfs filesystem" \
unpartitioned "Use unpartitioned area" 2> /tmp/partitioning_method
clear
# TODO: EFI way is not currently covered!
case $(cat /tmp/partitioning_method) in
"unpartitioned")
clear
echo "Attempting to create new partition in unpartitioned space"
echo -e "n\np\n1\n\n\nw" | fdisk $DISK
sleep 3
;;
"purge")
clear
echo "Purging whole disk"
echo -e "o\nn\np\n1\n\n\nw" | fdisk $DISK
sleep 3
;;
esac
# Determine target partition
for partition in $DISK?; do
partition_slug=$(echo $partition | cut -d "/" -f 3)
echo "$partition \"$(cat /sys/block/$DISK_SLUG/$partition_slug/size)\"";
done > /tmp/partitions
dialog --menu "Target partition" 0 0 0 --file /tmp/partitions 2> /tmp/selected_partition
clear
PARTITION=$(cat /tmp/selected_partition)
PARTITION_SLUG=$(echo $PARTITION | cut -d "/" -f 3)
case $(cat /tmp/partitioning_method) in
"receive")
echo "Skipping filesystem creation"
;;
*)
echo "Creating clean btrfs filesystem on $PARTITON"
mkfs.btrfs -f $PARTITION
;;
esac
mkdir -p $TARGET_MOUNTPOINT
mount $PARTITION $TARGET_MOUNTPOINT
if [ $? -ne 0 ]; then
dialog --msgbox "Mounting $PARTITION at $TARGET_MOUNTPOINT, are you sure kernel has btrfs support built-in?" 0 0
exit 255
fi
mkdir -p $TARGET_DIRECTORY
echo "Mountpoints:"
# Determine transfer method
dialog --menu "Select transfer method" 0 0 0 \
multicast "Multicast receive" \
http "HTTP-only" \
tee "HTTP and multicast" 2>/tmp/transfer_method
clear
TRANSFER_METHOD=$(cat /tmp/transfer_method)
case $TRANSFER_METHOD in
multicast)
udp-receiver | zcat | pv | btrfs receive $TARGET_DIRECTORY
;;
http)
$CURL $STREAM | zcat | pv | btrfs receive $TARGET_DIRECTORY
;;
tee)
$CURL $STREAM | tee '>(zcat | btrfs receive $TARGET_DIRECTORY)' | udp-sender
;;
esac
echo "Flushing buffers"
sync
sleep 1
echo "Rebooting machine"
reboot -f
If you proxy the API through HTTPS make sure you place your web server certificate bundle at /etc/ssl/bundle.crt of the Buildroot filesystem overlay.
Issue make to actually build the buildroot image:
make -j16
Serving provisioning image
In order to boot machines using PXE you need to have control over the DHCP server settings, mainly you need to be able to provide BOOTP options for the machines that are booting using PXE and requesting boot arguments from DHCP server. I've used OpenWrt based router to serve DHCP in the local area network. In order to perform PXE in an OpenWrt managed network you need to add following options in order to load pxelinux.0 from TFTP server sitting at 192.168.72.146:
mkdir -p /var/lib/tftp
uci set dhcp.@dnsmasq[0].dhcp_boot=pxelinux.0,,192.168.72.146
uci commit dhcp
/etc/init.d/dnsmasq restart
The Ubuntu 14.04 based TFTP server sits at 192.168.72.146 and we've prepared buildroot at ~/buildroot:
apt-get install pxelinux atftpd openbsd-inetd
mkdir -p /srv/tftp/pxelinux.cfg
ln -s \
/usr/lib/PXELINUX/pxelinux.0 \
/usr/lib/syslinux/modules/bios/ldlinux.c32 \
/usr/lib/syslinux/modules/bios/libutil.c32 \
/usr/lib/syslinux/modules/bios/menu.c32 \
/srv/tftp/
ln -s ~/buildroot/output/images/bzImage /srv/tftp/
Also create /srv/tftp/pxelinux.cfg/default, here you may customize butterknife_api_url to have the container snapshots fetched from another server:
DEFAULT menu.c32
PROMPT 0
TIMEOUT 50
MENU TITLE Butterknife provisioning tool
label Ubuntu
MENU LABEL Butterknife provisioning (i386)
KERNEL bzImage
APPEND butterknife_api_url=https://mgmt.koodur.com/api/
TEXT HELP
Start the Butterknife provisioning tool
ENDTEXT
Booting the image
Press F12 or whatever is required on the PC to boot from network. The SYSLINUX menu should appear, press enter to boot the provisioning image and follow instructions on the screen.