Personal Data Curation – Part 2

January 5, 2011

by — Posted in Personal Writing

I did the first part of “Personal Data Curation” back in October and I promised a follow-up.  Three months later and now you get one.   For a final count of the email.   I sent out this tweet yesterday:

Emails saved from 2010 – 21209- size 1979536 KB – this is after spam filtering, deduplication, and removing and all mailing list emails…..

So over the last year I have saved just around 2GB of data.   You can understand why I can’t throw it into a single inbox.   I average around 1.5k-2k emails a month I save.  This makes it easy to add up the sizes and get a total.   I haven’t looked at how many emails I have sent for two reasons.   The first is I have a few automated emails that are collections of RSS feeds that I generate and save to email (my tweets, Facebook updates, etc.) – so the number count is not going to be correct without a bit of digging.   The other reason is I’m scared – I don’t really want to know how many emails I have written.

Moving on…

I have consolidated a few of my sets on Flickr by removing sets for photos taken a certain day.   For example if I went to the park of May 5 – I would create a set named 05-05-10.  I would then upload all the pictures to the set.  That was all fine and dandy until I reached the point that I’m dealing with hundreds of sets.   These hundreds of sets made it even harder to use third-party tools – it was also extraneous information.  The photos already included the meta-data of when they were taken, so I really had no reason to narrow it down into those sets for organization.    I moved all those sets into the same Year – Month arrangement I did with my email.  I made a collection of each year 2006-2011.  Under each collection there are twelve sets – January 2010, February 2010, etc…).   I managed to combine about 300 sets into 60 so far.

I still have other sets outside of these, but I still need to go through and clean up meta data.  I need to tag photos.  I need to properly title photos.   I just need to dig in more and organize.  I’m sure you can understand how scary of prospect that is.  I’m looking at 15k photos.  It will take me weeks, but i need to dig through a little at a time.  Heck I’m pushing 4k of pictures that include my son alone.

Once everything is nice and organized I plan on using Bulkr to download all the photos.   Then I’ll have an organized local copy.   Then at the end of each month after all the photos are uploaded I can just re-run Bulkr and update my local cache.

The problem really isn’t staying organized, the problem is keeping organized by staying on top of all of this data.   Once I have the process in place it should all be a piece of cake.   Email has been organized and archived like clockwork for months now and it is fabulous.   Everything is easier to find and reference.   The next part of this I’ll give you an update on how the pictures are going and document organization.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.