It is interesting to see what the Canadian Museum of Nature (CMN) has been accomplishing over the years, by investigating the museum’s collaborations with organisations worldwide. The museum is in partnerships with over 600 universities and research organisations in 52 countries. We could not achieve our mission alone!
There were more than 3000 acquisitions over the past 25 years, and they can be grouped under:
- Botany 1348
- Mineralogy 584
- Palaeontology 55
- Zoology 1072
Acquisitions can be in the form of donations, exchanges, purchases, or staff field trips.
Sometimes a donation is large. For example, in 2007 a donation of beetles, Coleoptera: Scarabaeidae included 64392 specimens! In 2018, a donation from Université de Montréal of “Collection exhaustive des invertébrés (et poissons)de fond du St-Laurent maritime.” was almost as large.
The Canadian Museum of Nature’s CSIM department maintains a spreadsheet with a row for each acquisition. Information was entered into this spreadsheet by museum staff over the course of 25 years. The museum is much older than 25 years, but the spreadsheet only goes back that far. Some examples from the spreadsheet from six continents are:
- Smithsonian Institution, Department of Paleobiology, Collections, Washington D.C., Section: Palaeontology, Collection: Fossil Vertebrate, Description: Plaster cast of holotype of Enaliarctic emlongi including skull and both mandibles, Transaction type: exchange, Feb 2014,
- Herbario Nacional de Bolivia, La Paz, Bolivia, Collection: Lichen (CANL), Description: Lichens of Bolivia, Transaction type: donation, 1993
- Herbarium (CHR), Landcare Research – Manaaki Whenua, Lincoln New Zealand, Collection: Vascular Plant (CAN), Section: Botany, Description: Pseudognaphalium duplicates of CHR 582880 & CHR 582881, Transaction type: exchange, 2013
- National Herbarium, South African National Biodiversity Institute, Pretoria, South Africa, Section: Botany, Collection: Vascular Plant (CAN), Description: Vascular plants (list provided), Transaction type: donation, 2013
- Instituut voor Systematische Plantkunde, Rijksuniversiteit Utrecht, Utrecht, Netherlands, Section: Botany, Collection: Bryophyte (CANM) Description: Bryophyta Neotropica Exsiccata, Fasc. VI, No. 251 – 300, Transaction type: exchange, 1992
- Department of Biology, Sri Venkateswara University, Tirupati, An.Pra, India, Section: Zoology, Collection: General Invertebrate, Description: Poecilobdella granulosa (Savigny), 1988
How did we investigate the information in this spreadsheet? It is large, over 3000 rows, so it is not something you can read directly. The information is easier to visualise with the help of the interactive map HERE to get a better feel for the extent of the worldwide collaboration. We created the map using custom computer programs which were written for this purpose over the past few weeks. The programs read in the spreadsheet and tabulate the rows and fields.
The spreadsheet was created by wetware (a computer geek term for humans), so as expected there are normal variations in spelling. For example, ‘Ontario’ was sometimes entered as ‘Ont’, ‘ON’, with or without the capitalisation, (that is 6 variations so far). Add an incorrect spelling or two for a total of 7 or so variations. The variations are only a problem when we need to use a software program to process the information: the program thinks there are 7 different provinces where there is in fact just one Ontario. The problems were not just in ‘Ontario’, the words University and Department have a few abbreviations and variations, not to mention some Quebec names with accented characters. Did I mention typo’s? So we needed to do some ‘cleanup’ by hand before the programs could be used.
We ‘cleaned’ the data using OpenRefine, an open source program which makes the task easy. OpenRefine presents the information for view like a spreadsheet, but it does not work like a spreadsheet. It allows you to facet a column so you see something like this for the Province/State column:
- Oklahoma 5
- Ontario 600
- Ontartio 1
- Ont 400
- ont 2
- ON 45
- Oregon 15
Looking at this, the variations stand out. Then OpenRefine allows you to correct typo’s. When a problem appears in several rows (like ‘ON’ appearing in 45 rows), they can all be corrected in one action.
Then we wrote a simple Python program to find the Latitude and Longitude coordinates for each city, by invoking a Google web service. The program stores the location data in a file. The program also reformats the pertinent spreadsheet information into a file that our interactive map program can read.
The interactive map program presents a world map showing the locations of the organizations which we collaborated with. You can zoom and pan the map, and click on a location to see the name of the organization. The interactive map shows coastline and border mapping information from OpenStreetMap.
In some locations, we collaborated with several organizations so when you click on the icon in the map you see a list of them. At this point we encountered more variations’ such as ‘Biology Department’ vs ‘Department of Biology’, which results in two list entries where there should only be one. We went back to step one to remove duplication from the names.
Now, looking at the map, you can see where in the world we have been collaborating with scientific organizations! Our next step is to look at the partnerships over the years, and present this in charts. We also plan to go back further than 25 years, as the museum has been active since the 1850’s. It will be interesting to see the archives, I am looking forward to it.
Bio: Rick Leir is a volunteer who worked many years in IT but regrets not being a scientist!