I’ve been following (rather sporadically) some of the information about copyright restructuring (aka reform) in New Zealand. Unfortunately, at the moment I feel that I’ve been rather too busy to be proactive and I have turned into a reactive person.
Over the weekend I discovered that the National Library of New Zealand had been trawling the net and bulk downloading harvesting,
every publicly accessible website that falls under the nz country code, as well as certain other websites that are owned by New Zealanders or legally considered New Zealand publications.
This is because
The internet is always changing, and uses a myriad of technologies, so it is impossible to make a perfect copy. Despite this, we are hoping to harvest 100 million URLs during October 2008, giving us a snapshot of the internet at that time. link
I also discovered that if I took the usual measure, to avoid being crawled and archived, by using a robots.txt file inside my website, the National Library would ignore my wishes right to object to outside crawling of my site and take it anyway “to enhance the likelihood of our being able to harvest as full a snapshot of the .nz domain as possible“.
That’s OK though because I know that the National Library has my best interests at heart and knows better than I do about the archiving of my website. They know that although I might consider my few pages of rants, raves and links to be temporary, my grandchildren will want to benefit from the crazy thoughts that I had way back in October 2008.
But what about the people who have developed applications which generate dynamic webpages based on entry data – I’ve seen countless webpages like this. The astute developer will have done the right thing and placed an anti crawl robots.txt file in their website to avoid crawlers doing huge damage to their server traffic. I know you can opt in to opt out of being crawled, but you have to have seen it at the right time, followed up the user agents, worked out what was happening and put in your request. I didn’t notice until it was a bit life. Hu hum, life got in my way.
I’m in two minds about this idea of the National Library archiving everything in the .nz webosphere. As much as anyone I want the stories collected and collated for the future, couldn’t they respect respect some of our wishes for autonomy over our material? I have no robots.txt file on any of my websites at the moment – although I do want the right as a relatively savvy citizen to have the option. As it happens, it appears the biggest internet archive site in the world agrees with me.
EDIT: Aha! A rewrite rule with a Rickroll. Nice. Very. Nice.