I have an Xserve that’s still on 10.4. It runs a web app with a typical MySQL database backend. Both on metal.
This morning the DB got so slow that the app server started returning “timed out” error pages.
Ungood.
So I dumped the database, created a new Debian VMware instance and loaded up the data. I ssh port-forwarded metal’s 3306 to the new VM and restarted the apps.
Yesterday it took around seven seconds to vend a page. Now it’s back under half a second.
I wonder what I’m doing wrong that MySQL on OSX metal is an order of magnitude slower than MySQL on virtualized Linux.*
Could it be the default settings on Mac OS X Server 10.4 are that much worse than the defaults on Debian 5 Lenny?
Mac OS X’s file system is a lot slower than Linux’s, but I don’t think it could explain this much of a difference.
*You may assume I did the obvious things before migrating the DB from metal to VMware: restart the server, dump+reload the DB, etc.
As Jeff Atwood learned the hard way, unfortunately you need to take additional steps in order to backup live virtual machines.
If you have a WinXP VM that you just boot up from time to time to check IE6 compatibility, Time Machine and SuperDuper have you covered there. That’s because your image spends most of its time suspended, with all its bits on-disk in a consistent, restorable state.
Sadly, such is not the case for long-running VMs (such as server VMs).
When Time Machine or SuperDuper copies your live VM’s files, it may or may not get a copy of the VM’s files in a consistent state. Inconsistent state == failed restoration.
That’s bad.
My theoretical solution is to take a “backup snapshot” before backing up a live VM, with the belief that doing so will force the on-disk representation to be consistent prior to backup.
I must emphasize this is only my theory — I don’t have VMware’s source code. While taking a snapshot must result in a forced virtual machine image serialization, for all I know that snapshot is stored on-disk in an inconsistent fashion until the virtual machine is shutdown or suspended.
I’d be surprised if that were the case, but there’s your disclaimer.
※ ※ ※
My theory detailed above is only one piece of the puzzle: with backup, automation is king.
Fortunately VMware Fusion 2 and later come with a vmrun command that gives us a basis to automate the taking of snapshots for backup purposes.
Here’s a small Ruby script that dynamically discovers all running VMs, creating a new snapshot unimaginatively named “backup snapshot”, deleting any old snapshots if present:
#!/usr/bin/ruby
VERBOSE = true
BACKUP_SNAPSHOT_NAME = 'backup snapshot'
def vmrun(subcmd, *args)
args.map! {|arg| "'#{arg}'"} # Single-quote all args.
cmd = "'/Library/Application Support/VMware Fusion/vmrun' #{subcmd} #{args.join(' ')}"
puts "$ #{cmd}" if VERBOSE
cmdresult = %x[#{cmd}].chomp
puts "=> #{$?} #{cmdresult}\n\n" if VERBOSE
cmdresult
end
vmListStr = vmrun('list')
vmListStr.split("\n")[1..-1].each {|vmxPath|
vmrun('deleteSnapshot', vmxPath, BACKUP_SNAPSHOT_NAME)
vmrun('snapshot', vmxPath, BACKUP_SNAPSHOT_NAME)
}
I’ve hooked up my servers’ nightly SuperDuper Copy Script to use this script (Options… > Advanced > Before Copy > Run shell script before copy starts).
Now my SuperDuper backups should all have cleanly-restorable backups of all my active VMs, no further configuration required.