Drive Crash 2: The fscking

From YSTV History Wiki
Revision as of 17:33, 2 April 2014 by Rw776 (talk | contribs) (Rw776 moved page Drive Crash 2 to Drive Crash 2: The fscking)
Jump to navigation Jump to search

During the Easter of 2014, then-Computing Officer Lloyd Wallis did some semi-planned and somewhat-thought-through work to increase the storage available in YSTV as well as increasing resilience.

Potentially also to cover in this document is the shortly-proceeding replacement of fsrv's OS drive after a kerfuffle.

The Filling of Fsrv

In the months leading up to the Easter break, there were several occassions where the Pending Edits share on fsrv was completely full. This needed fixing.

So, a plan was devised to add another TB of storage to both fsrv and backup and grow their respective RAID5 arrays. We were in posession of one spare 1TB SATA drive, so at the start of Easter it was proposed we grow the fsrv array during the holiday while it was not in heavy use, then passing the money to grow backup at the beginning of the Summer term.

Of course, with any plan to touch the file server since the computing team played Drive Crash classic, a plan was first put in place to do a complete test restore of the data on backup, to ensure that if it all went wrong we still had a copy.

The Borking of Backup

Of course, it turned out that running a restore of the entirety of Finished Shows did not go well. After leaving it running overnight, the restore was still copying the first file, 2013's Live on the Lawn, having succesfully copied 430GB of the file so far. This was quite worrying as the file was only 2GB.

So, after a few more tries and a few more wiggles, scrapping BackupPC sounded like a good idea, replacing it with a flat copy of all the files.

Then at some point, we noticed something interesting with one of the files - the system would sit for ages on that one thing. Applying smartctl gave us good news - a potentially failing hard disk. By removing said disk and rebuilding as a 2TB array, the copy could continue. Testing on the HDD later showed that it was not actually failing.

After the ~18 hours copying, a fresh and working backup was obtained.