Drive Crash 4: Now That's What I Call rsync: Difference between revisions

From YSTV History Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
At the end of the 2016/2017 academic year, during weeks 9and 10, [[Sam Willcocks]], [[Tom Lee]] and [[Matthew Stratford]] decided it would be a good idea to tear everything out of the AV and Computing racks {{See also|The Great Tech Redo 2017}}. Unfortunately, some of the servers didn't like being turned off and moved around and decided to fail.
At the end of the 2016/2017 academic year, during weeks 9 and 10, [[Sam Willcocks]], [[Tom Lee]] and [[Matthew Stratford]] decided it would be a good idea to tear everything out of the AV and Computing racks (See [[The Great Tech Redo 2017]]). Unfortunately, some of the servers didn't like being turned off and moved around and decided to fail.


The first server to complain was backup, which complained of a degraded array. This was caused by a [[docs:Glossary | High Impedance Air Gap]] between the hard drive power supply and the drive.
The first server to complain was backup, which complained of a degraded array. This was caused by a [[docs:Glossary | High Impedance Air Gap]] between the hard drive power supply and the drive.
Line 14: Line 14:
Tim determined that the drive was healthy, which was slightly concerning, but more concerning was the beeping that started to come from Attenborough.
Tim determined that the drive was healthy, which was slightly concerning, but more concerning was the beeping that started to come from Attenborough.


At this point #Computing moved up a level of emergency.
#Computing moved up a level of emergency.


[[File:Example.jpg]]
[[File:4_Attenborough_Dies.png]]


Attenborough disaster
Rebuilding the dead OS drive failed. So the team decided to give the old "failed" drive a try. This started to rebuild fine so everyone went to the pub.
  - Patching the AV Rack
 
  - Tom hears a beeping
One Courtyard meal later and the OS drive claims to be rebuilt, but just to be sure, Attenborough was rebooted into the RAID BIOS. Much to the disappointment of all present, Attenborough promptly started to beep and reported the RAID array to be degraded. Just in case something other than the drive had failed, Attenborough was set about rebuilding his RAID array again. This failed.
  - OhShit.jpeg
 
  - Dead OS drive
At this point there was only one thing to do: call [[Sam Willcocks|Sam]]. Sam suggested installing Ubuntu on another, unraided, drive to dump all of the data on Attenborough onto Backup.
  - No problem, we'll bung another drive in
 
  - After sorting through all the other dead OS drives, we found an healthy (we think) 500GB drive
The new Ubuntu-based temporary Attenborough was given the hostname "TomScott" after York alumnus [[w:Tom_Scott_(entertainer)]], who is in part known for bodging together the Emoji keyboard.
  - Tried rebuilding onto the new drive
 
  - Raid status optimal
[[Hui-Ling Phillips]] and [[Katherine Bell]] had arrived bringing the gift of biscuits and all awaited data to start pouring onto Backup through the magic of rsync with the Pirates of the Caribbean soundtrack playing in the background to match the atmosphere and keep morale high.
  - Reboot
 
  - Raid status degraded
Pizza was then ordered.
  - Old drive passing SMART tests
 
  - Bung old drive into server
The decision was made to prioritise current and paid productions from pending edits during the transfer; so these projects were synced to Backup first. Edwin then set about copying these projects from backup to his SSD so that YSTV definitely, absolutely, without a doubt had a copy. In the mean time [[Kenric Yuen|Kenric]], Hui-Ling, Katherine and Edwin started crimping some Cat5 cables to help while computers were being dealt with.
  - RAID card crashes
 
  - And again
All seemed well so the Tech/Computing teams went back to completing small jobs about the studio. Tim started working on chron jobs to automate backups, Tom started to assemble a media cache for the edit PCs as a way to combat drive failure, and Matt continued work on patching/routing various cables. All was fine, until Backup started reporting SMART errors.
  - Optimal
 
  - Degraded
Goddammit.
  - [[Katherine Bell]] and [[Hui-Ling Phillips]] arrive with biscuits
 
  - RAID card crashes
Now there was a mad scramble to retrieve data from Backup onto Edwin's SSD.
  - Try rebuild again
 
  - Phone call to [[Sam Willcocks]] who is in sheffield after a job interview in London
As there was not much else to do other than to wait for files to sync and pray that Backup lived long enough, Tom took this opportunity to go home, have a shower, and change - shortly followed by Katherine and Hui-Ling. Meanwhile, Matt and Tim pulled Backup out of the Computing rack. Upon Tom's return, Tim took his shift of showering and changing and Tom pulled the 2TB drive out of [[Obriain]] to be sacrificed to the great Backup RAID array. Matt and Tom take the opportunity during the rebuild to continue the ongoing attempt to tidy up the studio.
  - Discussions about transferring data to [[Bruce]]
 
  - Decision is made to transfer data off of [[Attenborough]] to [[Backup]]
After Tim's return, Tim and Tom continue setting up the media cache while Tom continued his effort to write the wiki article for the 4<sup>th</sup> in the series of Drive Crashes while the crash unfolded around him.
  - Pirates of the Caribbean soundtrack used to compliment the atmosphere (and keep spirits high)
 
  - Pizza ordered
=Attendees and Roles=
  - Debian installed on another 500GB drive
{| class="wikitable"
  - Given the host name "TomScott" due to the new OS disk being a temporary bodge
|-
  - ZFS pool mounted
! Person
  - The rsyncing begins
! Role
  - We rsync the most recent (and most important) productions to backup
|-
  - [[Edwin Barnes]] starts transferring footage from Pending Edits backup to Edit 2 to edit
| Katherine
  - [[Tim Bradgate]] starts working on chron jobs to auto backup
| Nervous and there
|-
| Hui-Ling
| Cable Monkey
|-
| Tom
| Chief Bodger
|-
| Tim
| Linux Wrangler
|-
| Matt
| Cat5 Patcher
|-
| Edwin
| Stressed out Editor
|-
| Sam
| Remote Tech Support
|-
| Kenric
| Crimping Party Starter
|}
 
=Drive Crash 4 as Told by #Computing=
[[File:1_Tempting_Fate_1_Rob.png]]
 
[[File:2_Tempting_Fate_2_Rob.png]]
 
[[File:3_Tempting_Fate_3_Matt.png]]
 
[[File:4_Attenborough_Dies.png]]
 
[[File:5_Optimism_Peter.png]]
 
[[File:6_Realism_Edwin.png]]
 
[[File:8_Optimism_Tim.png]]
 
[[File:9_Backup_Kills_A_Drive.png]]
 
[[File:10_All_Is_Well.png]]
 
=Lessons Learned=
* Keep regular backups
* Don't unplug the servers
** No, that's not a good reason to
** Seriously, they will fail
* Drives cling to life until powered off (mostly)
* Bring biscuits
* Sleep is good
 
=The Final Fatality=
After returning home to recover from the ordeal, Tom sat down at his desktop to find it frozen. After months of being neglected to be maintained, and several days of being left on, Tom's desktop's OS drive had failed. The final victim of Drive Crash 4.
 
[[Category:Drive_Crashes]]

Latest revision as of 11:49, 25 October 2018

At the end of the 2016/2017 academic year, during weeks 9 and 10, Sam Willcocks, Tom Lee and Matthew Stratford decided it would be a good idea to tear everything out of the AV and Computing racks (See The Great Tech Redo 2017). Unfortunately, some of the servers didn't like being turned off and moved around and decided to fail.

The first server to complain was backup, which complained of a degraded array. This was caused by a High Impedance Air Gap between the hard drive power supply and the drive.

Shortly after plugging the drive back into backup, Web started to complain of a drive reporting SMART errors. This drive was replaced and Backup, not to be outdone by Web, decided to destroy one of its drives. This drive was replaced and all was well.

The Attenborough Disaster

Matt and Tim Bradgate were happily patching cat5 when Tom (who was patching SDI in the AV rack at the time) noticed a suspiciously familiar beeping noise.

The general reaction was "oh crap, not again".

Edwin Barnes was asked to stop editing so we could shutdown Attenborough and the dead drive was identified as one of the OS disks. The disk was replaced and the array began to rebuild so Tim turned his attention to determining why the drive had failed, whereas Matt and Tom went back to patching the AV rack.

Tim determined that the drive was healthy, which was slightly concerning, but more concerning was the beeping that started to come from Attenborough.

#Computing moved up a level of emergency.

4 Attenborough Dies.png

Rebuilding the dead OS drive failed. So the team decided to give the old "failed" drive a try. This started to rebuild fine so everyone went to the pub.

One Courtyard meal later and the OS drive claims to be rebuilt, but just to be sure, Attenborough was rebooted into the RAID BIOS. Much to the disappointment of all present, Attenborough promptly started to beep and reported the RAID array to be degraded. Just in case something other than the drive had failed, Attenborough was set about rebuilding his RAID array again. This failed.

At this point there was only one thing to do: call Sam. Sam suggested installing Ubuntu on another, unraided, drive to dump all of the data on Attenborough onto Backup.

The new Ubuntu-based temporary Attenborough was given the hostname "TomScott" after York alumnus w:Tom_Scott_(entertainer), who is in part known for bodging together the Emoji keyboard.

Hui-Ling Phillips and Katherine Bell had arrived bringing the gift of biscuits and all awaited data to start pouring onto Backup through the magic of rsync with the Pirates of the Caribbean soundtrack playing in the background to match the atmosphere and keep morale high.

Pizza was then ordered.

The decision was made to prioritise current and paid productions from pending edits during the transfer; so these projects were synced to Backup first. Edwin then set about copying these projects from backup to his SSD so that YSTV definitely, absolutely, without a doubt had a copy. In the mean time Kenric, Hui-Ling, Katherine and Edwin started crimping some Cat5 cables to help while computers were being dealt with.

All seemed well so the Tech/Computing teams went back to completing small jobs about the studio. Tim started working on chron jobs to automate backups, Tom started to assemble a media cache for the edit PCs as a way to combat drive failure, and Matt continued work on patching/routing various cables. All was fine, until Backup started reporting SMART errors.

Goddammit.

Now there was a mad scramble to retrieve data from Backup onto Edwin's SSD.

As there was not much else to do other than to wait for files to sync and pray that Backup lived long enough, Tom took this opportunity to go home, have a shower, and change - shortly followed by Katherine and Hui-Ling. Meanwhile, Matt and Tim pulled Backup out of the Computing rack. Upon Tom's return, Tim took his shift of showering and changing and Tom pulled the 2TB drive out of Obriain to be sacrificed to the great Backup RAID array. Matt and Tom take the opportunity during the rebuild to continue the ongoing attempt to tidy up the studio.

After Tim's return, Tim and Tom continue setting up the media cache while Tom continued his effort to write the wiki article for the 4th in the series of Drive Crashes while the crash unfolded around him.

Attendees and Roles

Person Role
Katherine Nervous and there
Hui-Ling Cable Monkey
Tom Chief Bodger
Tim Linux Wrangler
Matt Cat5 Patcher
Edwin Stressed out Editor
Sam Remote Tech Support
Kenric Crimping Party Starter

Drive Crash 4 as Told by #Computing

1 Tempting Fate 1 Rob.png

2 Tempting Fate 2 Rob.png

3 Tempting Fate 3 Matt.png

4 Attenborough Dies.png

5 Optimism Peter.png

6 Realism Edwin.png

8 Optimism Tim.png

9 Backup Kills A Drive.png

10 All Is Well.png

Lessons Learned

  • Keep regular backups
  • Don't unplug the servers
    • No, that's not a good reason to
    • Seriously, they will fail
  • Drives cling to life until powered off (mostly)
  • Bring biscuits
  • Sleep is good

The Final Fatality

After returning home to recover from the ordeal, Tom sat down at his desktop to find it frozen. After months of being neglected to be maintained, and several days of being left on, Tom's desktop's OS drive had failed. The final victim of Drive Crash 4.