I have been saving my online data pretty much for three years. Most of it is backed up in email via some fashion or another. Some of it is stored in my blog. All of it is important. It is my history of that time span. It’s my thoughts, my communications, my media consumption. It is not a complete picture since it is only as good as what I place out in public or email via private. It is however, a picture that echoes who I am. It is something that I will continue doing and will hopefully leave in good company to my family after I am gone in buried. They may however consider it trash and the only glimpses further generations out will get of me is via the web wayback machine – or the future equivalent. This opens up the question, what good is this data to me now.
I’ve been trying to find a way (other than WordPress) to throw everything into a single database of some sort and see what type of information I can get out of it about myself. My own naiveté is that no good software exists for that. You can export your twitter stream and get key word analysis done after a long drawn out process of curating down the data, but shouldn’t this be something that can be done automatically? Facebook, Twitter, Google, and every other company underneath the sun that has access to your personal data has software that can do this. Why? Am I the only one who cares about looking and analyzing my own data? Isn’t this type of thing going to be the equivalent to going to the psychologist today? You would be able to self exam yourself by narrowing down queries.
Me: “Database, tell me about when I was three”
Database: “You have these 15 marked photographs from when you were three. You also have written these 25 blog articles that mention age three. Here are major life events that occurred to you when you were three. Here are the major world events that happened when you were three.”
Me: “Subtract “my son” from the blog articles”
Database:”13 blog articles left”
This would need some major tagging to cross-reference information to get stuff out. I would also like it to have hooks into other databases for simple cross-reference information that would be relational to the data (Wikipedia, IMDB, etc.). I know that even if I had a generic database and fed it movie I had seen, it wouldn’t be able to make recommendations without an algorithm that could calculate it. I’m not worried about recommendations so much at this time as a way to sort. It should however be able to cache data in from other databases, so if I wrote about watching Star Wars – it could bring up cultural significance and the IMDB stats. This way you could see how a certain book, movie, or song had relevance in changing your path slightly 3, 5 or 20 years down the road.
This database should also be self growing – learning without manual input. It should be pulling in data from RSS feeds, google search, your email account, your documents, and anything else you can think that may have information about you. Your database in the end should know more information about you than you know about yourself. It knows the emails you haven’t replied to, and may know which ones are important. At the least it could tell you how often you talk about “paranoia” in your email conversations.
Why can’t the average person start taking back his own data. Data portability is great. The current state of data portability (very limited) doesn’t offer much you can do with the data. It allows you to take it to another service. It allows you to back it up locally. Unless you are a ubergeek there isn’t much you can do with it yourself. I’ll keep watching for this personal databasing software that one day my mythically rise from the sea of internet web sites. Knowing most people and how much they care about this stuff though – I won’t hold my breath.