Friday 22 November 2013

What file formats will persevere?


Frankly, I am a bit embarrassed by how little I know about digital preservation, and I think I am going to be even more embarrassed by the end of this post, because it is going to be filled with assumptions and likely inaccuracies.

Since I know so little, I will stick to my personal experience with digital preservation, which is limited to saving files in different formats on my computer. For a bit of history, growing up, I used Microsoft Word to write my papers, then Open Office (when my mom lost our Microsoft Word cd), next Pages (when I got my first MacBook), and now I use a combination of Google Docs, LibreOffice, and Mou. These choices were made based on a combination of hardware, cost, my beliefs, never based on the file formats they support and how those formats are preserved.

Now that I am thinking about these choices that I made, it seems like Word and Pages were the most problematic for preservation because they both use proprietary formats, which are not human readable. Therefore, if Microsoft and Apple go out of business in the future, .doc and .pages files may fail to compile, leaving me without those documents. Open Office and LibreOffice seem like slightly better options because the source code for both text editors is open, so if a .odt file fails to compile, then a programmer can fix the code for the text editor so it does compile. However, these programs are still problematic because they necessitate programming knowledge, since .odt files are not human readable. Lastly, I think Mou is the best option out of the ones that I have listed here because it uses a simple markup language, saves to .html and is, therefore, human readable. Thus, even if html and markup become obsolete, researchers can read my documents, albeit without the added style.

As you can probably see from my discussion thus far, even the best of the programs that I have used only result in a barely readable product in the future, so what is the solution? As far as saving text goes, I can only think of one file format that will stand the test of time--definitely not forever, but for a long time--and that format is .tex, which is the standard format for text that has been typeset using Donald Knuth's typesetting language, TeX. I think this format will last for a long time because of the spirit in which its typesetting language was developed--Knuth always focuses on stability and has not fundamentally changed features since the late eighties. Now that I have described, very briefly, why .tex is a good file format for preservation, I will mention that I do not use .tex files because it is extremely difficult and time consuming to learn TeX or LaTeX (the modern version of TeX). Therefore, the moral of this story is that there are no free fonts!

References

Wichary, M. (2007, December 26). Typesetting [Photograph]. Retrieved from http://www.flickr.com/photos/mwichary/

1 comment:

  1. Interesting. I am a user of OpenOffice (I am too cheap to buy Microsoft Office), and I save my documents to .odt for myself and .pdf or .doc when I need to share them. It is a little bit frustrating when you have to keep track of all these different formats on your computer, but I never really thought about the detailed differences between the file formats. It is a good point to note the ones that are human readable as being most likely to persevere. I never thought of a word-based document as being anything but human-readable, but I suppose that if you do not have the source code for the program that opens the document, you would not be able to read it.

    ReplyDelete