Drive Crash 4: Now That's What I Call rsync
At the end of the 2016/2017 academic year, during weeks 9and 10, Sam Willcocks, Tom Lee and Matthew Stratford decided it would be a good idea to tear everything out of the AV and Computing racks
. Unfortunately, some of the servers didn't like being turned off and moved around and decided to fail.
The first server to complain was backup, which complained of a degraded array. This was caused by a High Impedance Air Gap between the hard drive power supply and the drive.
Shortly after plugging the drive back into backup, Web started to complain of a drive reporting SMART errors. This drive was replaced and Backup, not to be outdone by Web, decided to destroy one of its drives. This drive was replaced and all was well.
The Attenborough Disaster
Matt and Tim Bradgate were happily patching cat5 when Tom (who was patching SDI in the AV rack at the time) noticed a suspiciously familiar beeping noise.
The general reaction was "oh crap, not again".
Edwin Barnes was asked to stop editing so we could shutdown Attenborough and the dead drive was identified as one of the OS disks. The disk was replaced and the array began to rebuild so Tim turned his attention to determining why the drive had failed, whereas Matt and Tom went back to patching the AV rack.
Tim determined that the drive was healthy, which was slightly concerning, but more concerning was the beeping that started to come from Attenborough.
At this point #Computing moved up a level of emergency.
Attenborough disaster
- Patching the AV Rack - Tom hears a beeping - OhShit.jpeg - Dead OS drive - No problem, we'll bung another drive in - After sorting through all the other dead OS drives, we found an healthy (we think) 500GB drive - Tried rebuilding onto the new drive - Raid status optimal - Reboot - Raid status degraded - Old drive passing SMART tests - Bung old drive into server - RAID card crashes - And again - Optimal - Degraded - Katherine Bell and Hui-Ling Phillips arrive with biscuits - RAID card crashes - Try rebuild again - Phone call to Sam Willcocks who is in sheffield after a job interview in London - Discussions about transferring data to Bruce - Decision is made to transfer data off of Attenborough to Backup - Pirates of the Caribbean soundtrack used to compliment the atmosphere (and keep spirits high) - Pizza ordered - Debian installed on another 500GB drive - Given the host name "TomScott" due to the new OS disk being a temporary bodge - ZFS pool mounted - The rsyncing begins - We rsync the most recent (and most important) productions to backup - Edwin Barnes starts transferring footage from Pending Edits backup to Edit 2 to edit - All seems well - Tim Bradgate starts working on chron jobs to auto backup - Tom starts working on a media cache PC/Ingest station - Matt continues working on patching/routing - Backup starts reporting SMART errors - Goddammit - Scramble to retrieve data from backup onto Edwin's SSD and one of YSTV's SSDs - Tech team take a break to have a shower while they have an opportunity
Attendees
- Katherine - Nervous and there - Hui-Ling - Cable Monkey - Tom - Chief Bodger - Tim - Linux Wrangler - Matt - Cat5 Patcher - Edwin - Stressed Out Editor