The importance of preservation

This file was last modified on January 01 1970 00:00:00.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


The Great Archive

I am sure you are familiar with the tale of the Great Library of Alexandria. If not, it goes as follows: after Alexander the Great founded the city (guess why it's called "Alexandria"), he fell in love with it and wanted to make it special. His vision - though it only really came into fruition when his successor Ptolmey II took power - was to have Alexandria become a cultural powerhouse where not only all knowledge in the ancient world could be found. This would attract scholars to it and make the city even more important than it alreay was. The way he planned to achieve this was by having a library that would collect any written works it could and archiving them

Alexandria, being one of the most important points of commerce in Mediterranean, was bound to have a lot of ships passing through, and so the local govenment would start requiring all ships that stopped by to hand over all of the scrolls they had on board for copying. Then a scribe would copy the work, store the original in the Great Library and hand the ships the copy. Yeah, that is how the story goes. It is said that great minds like Archimedes, Eratosthenes, among other scholars studied in the halls of what quickly became known as "the capital of wisdom".

However, as the story goes, during the roman civil war the library suffered greatly. During a counter-attack, Julius Cesar ordered the ships on the docks of Alexandria be set fire in a strategic move. Unfortunately, however, the fire spread from the docks to the library, burning all the books (scrolls, the style of book we have today is called "codex") inside.

This is regarded by authors in the following centuries to have been one of the greatest losses ever suffered by humanity, and while both the story and its conclusion aren't true (as in, there's a lot more nuance to it than people think), I think the tale of the Great Library is incredibly useful.

The Library of Internetia

The Internet Archive was founded on May 10th, 1996. Even at this early point of the internet, there was a need - or at least academic interest in archiving the internet and its websites. Especially back in the '90s, Web 1.0 was novel and rapidly changing. The World Wide Web was special, but as it would soon become apparent, lots of pages were ephemeral. People lose interst in their websites so they stop paying for hosting (be that hosting from some other computer or just economizing on bandwidth when self-hosted). Websites and data in general can be lost immediately and in an instant, and so it became crucial that someone would archive this new special thing we had in our hands. The Internet Archive has since expanded into other territories, but the original intent to archive as much of the internet as possible stayed the same.

However, the Library of Alexandria is not immune to fire.

Keeping the Internet Archive up is expensive. They have over 100 petabytes (that's 1000 drives of 1TB) of data. As far as I can tell, this is not counting redundancy measures to prevent data loss when faced with hard drive rot or an attack, which adds up (assuming they're even taking HD rot into account - if they aren't, a lot of the archive could already be lost to time). Plus all the bandwidth that is necessary when you're working a website with as much traffic as the Internet Archive. Plus the servers themselves have to be fast enough for the task - bandwidth is not the only speed bottleneck. These are all expenses the Internet Archive have to pay for, and all of these are expanding rapidly (lest we forget that they're not erasing archived snapshots unless demanded by the original owner).

More than the constant expenses, the Internet Archive has also come across its own set of very, very expensive lawsuits. Hachette v. Internet Archive, where the registered library made a way to work as a library and got sued for it, and that time when Universal Music Group sued them for preserving phonograph records were and still are (that last one is still ongoing) very expensive endevours.

Both of these lawsuits happened during a time where the Internet Archive, which within hacker culture is a sort of "hallowed ground", ironically got hacked. They were already getting DDOS attacked before and since this, but on the night of October 9th 2024 the website got a Javascript pop-up saying "Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!"

After this the Internet Archive got shut down for a few days, and in the days following that attack the database became read-only, meaning no new content could be archived, meaning websites that could have been lost were not archived, or at least not in their current forms.

They have since started to take pre-emptive measures to counter future attacks, and the site it back up to full functionality, but what would have happened if the malicious actors had actually deleted the entire database?

Beyond Alexandria - True Preservation

As I have explained in the side-note, the Great Library was not as big of a loss as the stories tell us. However, it tells us that whenever we concentrate knowledge in one major archive, we stand to have a single point of failure. For the ancient world, it was the fire that took with it most of the library's records. For the modern world, it might be a particularly malicious attack on the Internet Archive, be that through a script kiddie that wanted to put a target on his back, or a copyright holder who could not care less about its cultural impact. Whatever the case, we should fight to keep the Internet Archive up for as long as possible, but we should not assume it will exist forever. We should take matters into our own hands.

One of the reasons the loss of the Great Library was not as big as told is becasue there were other libraries at the time. Some attempted to be competitors, some were there first, and some were just normal libraries that had the books. If the Great Library burnt down, the others were unaffected and the works were bound to also be found somewhere else.

In the end, true preservation did not come in the form of keeping alive one archive of all human knowledge but instead in the form of many smaller archives with only a small part of human knowledge. Of course, it is still a great loss if the one big archive is lost forever, but if there are known libraries of other smaller archives, that information can outlive the original and the archive.

This concept was understood in the World Wide Web back in the late 90's and early 2000's - maybe not in such a poetical way, and maybe not with a bird's eye view, but it was definitively understood that if there was something cool out there to check out, you had to preserve it or else you might never see it again.

Look in the mirror - What do you see?

Eventually, people started hosting copies of websites or files they found interesting or important. These clones were called "mirrors". There were many reasons for these: some more and some less reprehensible.

First I want to go over the most common use these days: offloading bandwith and downloads. These mirrors will usually be controlled by the same person who owns the site, but host files for you to download in other servers to keep the main website from spending more in bandwith than needed and keep download speeds as high as possible by having mirrors in servers in different parts of the world. For examples of this, you can check most linux distros. In smaller projects, partner websites will sometimes also be listed among these mirrors. Basically, if you have bandwidth to spare and a project you want to help out with, hosting a mirror of the download file with the owner's permission can really help!

However, it's not just downloads that should be preserved: websites as well (and sometimes even moreso). The preservationists and archivists split into two groups: those who archived websites and those who archived information, and from here there was a spectrum of consent on the part of the original owner. On one end we have archivists that would always ask for permission before hosting a mirror. These were obviously the most well-liked, and often to stop people from asking over and over again, website owners would write their mirror policy in their "about" or "contact" pages. Sanctioned archivists would be legally fine in case of a takedown request: they had permission, after all. However, what if you wanted to take your website down?

That's where the more radical archivists would come into play. These did not care about your permission: you have something worth sharing, they will preserve it at all costs. I myself am pretty split between this position and the one prior to this: sure, consent is great, but if you change as a person, realise your old posts don't reflect who you are anymore, and decide to take it all down, suddenly mirror hosters will either be contacted or the dynamic mirror they had suddenly stops working. It's generally a loss for humanity, despite being a win for the ego of the original poster.

But of course, that's it for the website archivists. The information archivists differentiate themselves from these because they didn't mirror a whole website but rather a section of it. Some would try to even pass it off as their own work, putting their own branding on top (sometimes on purpose, sometimes not). This was generally a bad image, but the truth is That at the time there was little you could do about it (to be fair, that's kinda still the case without a lawyer). Sometimes you want some credit for making the mirror available, and sometimes you want all the credit because no one is going to notice if you rip off some other guy's site (everyone is going to notice).

The last group is the script kiddies. Hackers in name only, these kids would host mirrors of websites that looked credible to either steal credentials (as in they save the password you provide on the fake login page) or spread malware (in cases where they replicate a download page for software). Either way, these are despicable. Fortunately, we have largely managed to learn to read URLs before we click links, I would hope.

The argument

With so much bad and so much morally gray area, it's hard to see why mirroring would be good. However, I still think you should mirror websites, or at least mirror the pages you find important.

We do not know when our favorite website will be taken down. I myself was distraught when, after Razorback95 became Drevonor, even that website became just a black screen. The original Razorback95 got taken down because of a lack of attendance (bandwidth is expensive), and Drevonor is now being worked on as its successor but for a while the files hosted there were seemingly lost to anyone who didn't do a backup of the site. The ISO of Windows95D could have been lost if Kugee had decided to just not work on Drevonor anymore.

So please, if you see something worth telling the world about, don't just link to it: mirror it. Do not make any changes to the original: preserve it as it is, even to your detriment. Also, maybe ask for consent? The guy you're preserving might think you're ripping him off, when that's the total opposite of your intent! That said, I'm not your dad, I can't stop you from just making a mirror without asking for permission. Hell, I myself am a bit torn between these two positions, I really can't tell you what to do.

Final considerations

This is an opinion piece. I'll write a tutorial on a different page. Stay tuned for that! Or maybe just read the documentation for wget?

Having that said, I hope I made clear why I think mirrors are not only useful but important in our little Web 1.0 community. I also understand if it's not a great argument, especially when faced with the other kinds of disrespectful or even malicious kinds of mirrors. I would love to hear your thoughts, so please do it with the stuff on my contacts! Either that or meet me later on aftersleep: I made a thread specifically about this!

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRwQBNgtLaog/IeOuVJNXnu82UrRwUCZ5+X+gAKCRBJNXnu82Ur
RyXEAP9My6JMPb4FAK9Dn3nMWUKSzCi6HHh8PIQG3stLXXwYpAD9FKcx+/EGPdE0
yjpqyurZ+xy7zAj7FsA5/81Ojarm9wE=
=qwLp
-----END PGP SIGNATURE-----


Comments


Name (required):
Website (optional):
Comment:

Your comment cannot be over 1000 characters long. BBcode not supported yet.


Warning: include_once(/home/public//logs/comments.html): Failed to open stream: No such file or directory in /home/public/opinion-pieces/insights/preserve/commenter.php on line 17

Warning: include_once(): Failed opening '/home/public//logs/comments.html' for inclusion (include_path='.:/usr/local/php/8.4.8-nfsn2/lib/:/usr/local/php/lib/') in /home/public/opinion-pieces/insights/preserve/commenter.php on line 17