Over the years with a digital camera I’ve never got anywhere a professional standard with proper workflow tools which allow me to sift out the good from the bad (and dispose of the latter). This has left me with approaching 500GB of image data from all manner of places & events: weddings, parties, trips abroad, my family growing up, aircraft, racing cars and in more recent times a growing set of attempts to be arty or “wildlifey”.
That’s a lot of data. And it’s growing – every photo I take ends up on there. Last time I took my camera out (to an air show) I came back with almost 3000 shots!
A few weeks ago, despite taking regular backups to an external disk, my wife and I realised that we’d really quite like to be able to back this up off-site. When I checked the SMART power-on time data of the drives it was somewhere in the region of 54000 hours each. Six and a bit years. Ouch! Can anyone smell an MTBF period approaching?
So… I could have taken my NAS to work where we have $SILLY_GIGABITS_PER_SEC internet connectivity, but I’m loathe to power it down and move it due to the age and runtime of the drives. Thankfully my ISP (hi Virgin Media, I’m looking at you here) has some fairly preposterous packages available with reasonably decent upload speed, subject to usage caps – which meant despite the prospect of it taking days, it’s possible to do an upload to some cloud storage provider from home.
Google’s storage pricing policy changed recently (although everyone else is playing race to the bottom with them) to provide 1TB of storage for a relatively reasonable USD9.99 per month. That brought a large amount of storage down into “accessible” land, so I decided (being Google Apps for Domains user for years, since it was free) to shell out for that after doing some testing.
But… How on earth to sync data from a NAS without having an intermediary client to sync it?
My tool of choice at work for that would be rsync – I’ve shifted giant quantities of data from place to place with that but it has no native support for cloud “filesystems” (which actually aren’t filesystems, they’re almost all massive object stores with a filesystem abstraction at the client end). A bit of searching around kept pointing me back to one place, a tool called…
(there are two, actually. This one is written in Python and seems reasonably well maintained)
On a CentOS box I ran
[graeme@centos ~]$ sudo yum -y install python-pip
…and then, because it’s not just available on github but also in the Python Package Index…
[graeme@centos ~]$ sudo yum -y python-pip install gsync
Voila! A working version – or so I thought. I did a quick test, which failed, and discovered I might need a few patches to make it all work properly. Thankfully, github being what it is, they were mainly in the issues list for the project so I could pick them out and make the app work as I expected. I’ll put some more details in here if anyone asks about them.
Having done some tests with small datasets limited to one, tens or hundreds of files, it transpired that one thing I was missing – badly – was “magic” to do with MIME types (and yes, it is called “magic” – look in /usr/share/magic or similar on a Linux system and see this Wikipedia entry). Files kept repeatedly being written up to the Google Drive, so I had to use the –debug switch to find out why and largely found lines telling me that the local MIME type – often “application/octet-stream” – didn’t match the type that Google had provided at the far end – say “image/x-panasonic-rw2”. If they didn’t match, the whole file got copied over again.
Digital cameras produce a plethora of different media types – JPEG, a bazillion different variations of RAW, and even more variations of video files. MPEG4, AVI, Quicktime MOV, MPEG2TS, AVCHD… loads of them, some of which are actually containers with streams of other formats inside. What a pain.
All things being equal, one of the things you get with a commercial OS such as Windows or MacOS X is the magic data already included and regularly updated, both by the OS vendor and application software vendors. The Linux world, of course, is a little different in that most distributions come with a file – /etc/magic – to which you can add your own definitions. There are many out there, which a judicious bit of searching will turn up for you. I found a few collections for digital media formats so added them to my /etc/magic, and the problem was solved.
Then it was time for a first sync.
It took over a week!
But it worked, with a few wrinkles:
- it doesn’t catch 500 Internal Server Errors from Google very gracefully – rather than retrying, it quits
- predictably it can consume a lot of RAM
- I had to stop and add a few more MIME types
So the next question was: would it run on my DNS323?
And the answer…
A huge, great, big fat hairy yes.
I had to do some more tweaking with the Alt-F Package list, installing Python and using Python to install pip so I could get all the dependencies installed (like the third-party magic.py), adding stuff to /etc/magic and making a small final tweak to the underlying gsync code to look at that file (because it wasn’t).
It’s now running – albeit at about half the speed of the 1.6GHz CentOS box it was running on, but given that the device has only 64MB RAM and a nominal 500MHz Marvell Ferocean ARM CPU (reporting 332 BogoMIPS at boot) I’m very pleasantly surprised that it even starts up. I’ve had to wrap it in a script which basically says “foreach item in a list of directories in a predictable directory structure; do gsync blah…; done” rather than “gsync $massive_heap_o’data” to keep the memory usage down (and logging easier to digest), but running it is.
It’ll be interesting to see how long it takes before it finishes. Or crashes. Either way I’ll come back and update this post in a day or two.