Historical databases – what comes first, the Database or the Taxonomy?

By Roger Davies / 14 October 2013 / Data Manipulation, History

My blog post below about Fouche’s use of a card index system to counter crime and terrorism in the early 1800s, led me down an interesting path to the inventor of card indexing, Carl Linnaeus in about 1760.   What’s fascinating is that Linnaeus was faced with a very modern problem – volume of data.   At the time the range of scientific discoveries of new flora and fauna caused an information overload. Linnaeus and other scientists of the time were overwhelmed with new information in large quantities. Big data is only a relative term.

Linnaeus needed a system that could record the data so that it could be retrieved from a number of directions, so he “invented” the process of card indexes, filling three cards for each new artefact, cross referencing them, then storing them in a particular order to allow easy retrieval.   Not too dissimilar from any modern database, of course, but at the time it was quite a conceptual step, and required a taxonomy or ontology, in a sense.  What comes first – the concept of the database or the concept of a taxonomy?  Is it the need to organise the data that demands a taxonomy or does the taxonomy allow the creation of a database?

Screen Shot 2013-10-14 at 11.38.12

An early Linnaeus database system

Linnaeus started off using large sheets of paper but in the 1760s moved to smaller cards as they were more manageable, physically to sort and order. Actually I think there is more to the simple index card concept of Linnaeus that meets the eye, and similarities with modern analytical tools that are not immediately apparent.  I’m neither a database specialist nor an IT expert, so these are somewhat amateur comments. I’d be interested for the views of some of my colleagues who do have their head around modern tools such as Palantir:

  • Linnaeus and Fouche’s card indexes demanded useful summaries of information in digestible quantities. So they force some “thought” before ingestion of data,
  • Linnaeus could cope with different “formats”. Linnaeus included sketches on his cards to aid comprehension. Modern database analysis tools also allow different formats.
  • Input is defined by the size of the card, demanding appropriate brevity and conciseness.   The identity of the “inputter” can be inferred either from the handwriting or by a set of initials of the author. Dates too can be recorded.
  • Moving index cards around a desk (easy because of their size) allows a visual component of the analysis process)
Examples of index cards generated by Linnaeus

Examples of index cards generated by Linnaeus

In summary, of course modern tools like Palantir are so much better but there are conceptual parts of the 250 year old card index system that are remarkably similar to modern tools, in ways that are not immediately obvious.

Screen Shot 2013-10-14 at 11.38.49

Data visualization by Linnaeus

One comment on Historical databases – what comes first, the Database or the Taxonomy?

  1. Tom Hankey

    Hi Roger

    I actually enjoy looking at the “first principles” of such concepts and am always astonished by the fact that someone thinks way ahead of the curve and develops such concepts that become the basis of wide reaching and often global solutions. The methodology of “indexing” was a genius idea and as you rightly say, the modern database uses often many indexes.
    The big problem though with indexing is that one is increasing the size of the database with every index as each record or tuple as they call them must be in the index. Also some indexes contain several keys and again this adds to the size of the file.
    Other indexes such as wordsearches are particular size intensive. Take this short note for example. It would be very quick to do a search of each word to find the word “size” but how about searching the encyclopaedia Brittanica. It would even take a computer a fair bit of time to search that way. The way that would be done would be to index every word by a coded location and put the words in alphabetical order in that index then cross-referencing the location. The search then is like the “pick a number between 1 and a million” trick. It should only take a few simple tests to find the number by halving the results. ie > or < 500000 etc , I think that only 7 or 8 tests are required maximum. That would enable an extremely quick search using the computer as it is an ordered set, but the index file for it will be of considerable size.
    I am hopefully not teaching egg sucking here but I thought I would add a small bit to your post.

    Reply

Leave a Reply to Tom Hankey Cancel Reply

Your email address will not be published.