CMN Acquisitions

It is interesting to see what the Canadian Museum of Nature (CMN) has been accomplishing over the years, by investigating the museum’s collaborations with organisations worldwide. The museum is in partnerships with over 600 universities and research organisations in 52 countries. We could not achieve our mission alone!

There were more than 3000 acquisitions over the past 25 years, and they can be grouped under:

  • Botany      1348
  • Mineralogy   584
  • Palaeontology 55
  • Zoology     1072

Acquisitions can be in the form of donations, exchanges, purchases, or staff field trips.

Sometimes a donation is large. For example, in 2007 a donation of beetles, Coleoptera: Scarabaeidae included 64392 specimens!  In 2018, a donation from Université de Montréal of “Collection exhaustive des invertébrés (et poissons)de fond du St-Laurent maritime.” was almost as large.

The Canadian Museum of Nature’s CSIM department maintains a spreadsheet with a row for each acquisition. Information was entered into this spreadsheet by museum staff over the course of 25 years. The museum is much older than 25 years, but the spreadsheet only goes back that far. Some examples from the spreadsheet from six continents are:

  • Smithsonian Institution, Department of Paleobiology, Collections, Washington D.C., Section: Palaeontology, Collection: Fossil Vertebrate, Description: Plaster cast of holotype of Enaliarctic emlongi including skull and both mandibles, Transaction type: exchange,  Feb 2014,
  • Herbario Nacional de Bolivia, La Paz, Bolivia, Collection: Lichen (CANL), Description: Lichens of Bolivia,  Transaction type: donation, 1993
  • Herbarium (CHR), Landcare Research – Manaaki Whenua, Lincoln New Zealand, Collection: Vascular Plant (CAN), Section: Botany, Description: Pseudognaphalium duplicates of CHR 582880 & CHR 582881, Transaction type: exchange,   2013
  • National Herbarium, South African National Biodiversity Institute, Pretoria, South Africa, Section: Botany, Collection: Vascular Plant (CAN), Description: Vascular plants (list provided),  Transaction type: donation, 2013
  • Instituut voor Systematische Plantkunde, Rijksuniversiteit Utrecht, Utrecht, Netherlands, Section: Botany, Collection: Bryophyte (CANM)  Description: Bryophyta Neotropica Exsiccata, Fasc. VI, No. 251 – 300,  Transaction type: exchange, 1992
  • Department of Biology, Sri Venkateswara University, Tirupati, An.Pra, India, Section: Zoology, Collection: General Invertebrate, Description: Poecilobdella granulosa (Savigny), 1988

How did we investigate the information in this spreadsheet? It is large, over 3000 rows, so it is not something you can read directly. The information is easier to visualise with the help of the interactive map  HERE to get a better feel for the extent of the worldwide collaboration. We created the map using custom computer programs which were written for this purpose over the past few weeks. The programs read in the spreadsheet and tabulate the rows and fields. 

The spreadsheet was created by wetware (a computer geek term for humans), so as expected there are normal variations in spelling. For example, ‘Ontario’ was sometimes entered as ‘Ont’, ‘ON’, with or without the capitalisation, (that is 6 variations so far). Add an incorrect spelling or two for a total of 7 or so variations. The variations are only a problem when we need to use a software program to process the information: the program thinks there are 7 different provinces where there is in fact just one Ontario. The problems were not just in ‘Ontario’, the words University and Department have a few abbreviations and variations, not to mention some Quebec names with accented characters. Did I mention typo’s? So we needed to do some ‘cleanup’ by hand before the programs could be used.

We ‘cleaned’ the data using OpenRefine, an open source program which makes the task easy. OpenRefine presents the information for view like a spreadsheet, but it does not work like a spreadsheet. It allows you to facet a column so you see something like this for the Province/State column:

  • Oklahoma 5
  • Ontario 600 
  • Ontartio 1
  • Ont  400 
  • ont 2
  • ON 45
  • Oregon 15

Looking at this, the variations stand out. Then OpenRefine allows you to correct typo’s. When a problem appears in several rows (like ‘ON’ appearing in 45 rows), they can all be corrected in one action.

Then we wrote a simple Python program to find the Latitude and Longitude coordinates for each city, by invoking a Google web service. The program stores the location data in a file. The program also reformats the pertinent spreadsheet information into a file that our interactive map program can read. 

The interactive map program presents a world map showing the locations of the organizations which we collaborated with. You can zoom and pan the map, and click on a location to see the name of the organization. The interactive map shows coastline and border mapping information from OpenStreetMap.

In some locations, we collaborated with several organizations so when you click on the icon in the map you see a list of them. At this point we encountered more variations’ such as ‘Biology Department’ vs ‘Department of Biology’, which results in two list entries where there should only be one. We went back to step one to remove duplication from the names.

Now, looking at the map, you can see where in the world we have been collaborating with scientific organizations! Our next step is to look at the partnerships over the years, and present this in charts. We also plan to go back further than 25 years, as the museum has been active since the 1850’s. It will be interesting to see the archives, I am looking forward to it.

Bio: Rick Leir is a volunteer who worked many years in IT but regrets not being a scientist!

CMN Loans

It is interesting to see what the Canadian Museum of Nature (CMN) has been accomplishing over the years, by investigating the museum’s collaborations with organizations worldwide. The museum is in partnerships with over 600 universities and research organisations in 52 countries. We could not achieve our mission alone!

The Canadian Museum of Nature’s CSIM department maintains a spreadsheet with a row for each loan. Information was entered into this spreadsheet by museum staff over the course of 25 years. The museum is much older than 25 years, but the spreadsheet only goes back that far.

The map above shows loans during a five year period. Click ‘next’ to see the next five years. You can zoom and pan. Loans from the CMN are shown with short-dashed lines and loans to the CMN are shown with long-dashed lines. When there are both to and from loans in a five year period, they are shown with a solid line. Hover over a marker to see loans details.

The Longest Bike Trail in the Gatineau

Here’s a great place to ride a bike. It used to be the railway from Ottawa north to Maniwaki, but now it is a ‘linear park’. Kazabazua had the longest bar in the Gatineau, I suppose it should have the longest park too. Start in a little village named Low, which is a bit north of Wakefield. The starting spot in Low was not easy to find, it did not jump out at me. Look for a lumberjack’s boat resting until the end of its days in the park parking lot. (I parked at the hockey arena before seeing the park parking.) Do not attempt to bike on the roads up to here from Ottawa, it would be quite dangerous.

The parking at the start in Low

From Low north, the trail is gently uphill (the railway grade). In places the trail follows a contour through the woods, with the hillside dropping off steeply on one side. Further on, you find it level for miles. Once on the trail, expect some easy riding. Bring a bike with wide tires, because the surface is soft sand in places. My 1.5 inch wide slicks worked fine. If it is too easy for you, you could explore the side roads around Low.

In Low there are many anglophones, but Quebec is mostly francophone. Try your best to speak French, it is a great chance to practice, and in any case it is only decent to try.

It was a railroad many years ago

The trail is restricted to bicycles and hikers in the summer, and snowmobiles in the winter. Horse riding is for some reason prohibited. I for one would not mind sharing the trail with horse riders.

Is it a park .. or a bike trail?

This is a very quiet trail, there was hardly anyone enjoying it on a May 24 weekend. One couple brought a music system with them. Pack some supplies with you, it is a long ride to Maniwaki!

There are lots of birds along the trail, all the common types, bluejays, gold finches, warblers, and also several that I could not name. What is the same size and shape as a Pileated woodpecker but mostly coloured in shades of brown? What has yellow on the back of the head, and a white rump?

Venosta Station!

The railway station at Venosta is a bit run down these days.

Venosta

Venosta also has a government office, a church, and a few houses.

Some natural attractions include the gorge at Kazabazua, and the flatlands with sand ridges left by the ice age glaciers. Happy trails!

Backyard Wilderness 3d Movie

An image from the Backyard Wilderness 3d movie

The Nature Museum in Ottawa is playing an amazing movie showing wildlife in the north america forests. Backyard Wilderness has a feel-good family oriented theme.

It is amazing how close-up the photography is. The foxes, wood ducks, mice and salamanders are at times just inches from the camera. Clearly the photographer would be using a sensor to trigger the camera, but still, how could she set up the equipment in such a perfect position?

The theatre’s 3d vision is much better than what you see in the Cineplex with their disposable polarised glasses. Nature.ca has non-disposable glasses which give a bright, clear 3d view. The glasses use active shutter technology.  The left lens ‘opens’ for a fraction of a second, and the projector/screen shows the left view. Then the right lens ‘opens’ and the projector/screen shows the right view, and the cycle repeats. You cannot see any flickering or hear any mechanism. The 3D effect is more striking that what you get with polarised glasses, because they do not perfectly block your left eye from seeing the right image and vice versa.

There are some human actors in the movie, but they are upstaged by the wild animals. The human family lives in a rather nice two storey house, which is in a forested setting and harmonious with nature. Any neighbouring house seems to be out of sight. Any vehicles or roads are almost out of sight. This is an ideal living arrangement that would be difficult to attain for most of us!

The cost is $4 in addition to the regular museum entrance fee.

Solr Resources

Where do you look for information on the Apache Solr project? There is good information, but it is a bit scattered.

The colorful front page introduces all the features.

For reference info, first look at the Reference Documents. This link is release specific; you may want a different version.

There is also the Reference Guide (a large pdf). It is release specific; you may want a different version.

The latest pre-release version of the above reference is in the Confluence Wiki

See also the Solr Community Wiki

The users mailing list archives is here. And the developer mailing list archives.

More approachable info can often be found on personal blogs:

The issue trackers are packed with good info:

StackOverflow has good questions, answers, and discussion.

Google books can be useful. To find out about “solr shard” you can try this query then select a book. For “solr nested”, try this query .

Solr Specific Search
I added a custom search widget to http://leirtech.com (in the footer, so scroll way down, with any tab). It is a site specific search, and its results are from the above listed sites only. For example, enter ‘nested’ and press enter: the best results are from the likes of
Yonick, Lucidworks, and the Cwiki. Please excuse Google’s promoted links at the top, Google feels a need to make some money.

Isaac Asimov’s Foundation

(or Pinky and the Brain??)

A presentation for grade 10 graduation

Hello Everyone, I am here to discuss the last step of your journey: conquering the world. However, before we dive into that, I believe some introductions are in order. I am the man who is most commonly known as “The Mule”, but my position is “First Officer” so please refer to me as that. I have, in my time, had the accomplishment of nearly conquering the entire galaxy and that is what I am here to talk to you about. By this point, if you succeeded in the prior steps, you have accumulated quite a bit of power. Your goal for this next step is to use that power to achieve total domination.

But don’t be too hasty; unless you have military strength that puts the rest of the world combined to shame, or mind control powers similar to my own, you will probably just end up dead if you attempt to win with brute force. From this point on, advance slowly, capturing small territories as peacefully as possible. Avoid bringing attention to the threat you pose as this will get you eliminated. After capturing territory, be certain to purge it of any threats and convert all of its inhabitants to your cause. At a certain point, no matter how careful you are, your advances will be noticed. Go on the offensive. Capture large targets and let the small ones fall in later. Be fast as if you can strike before your enemy can mobilise, you will be able to take down strong targets with little resistance. If you play your cards right, you will successfully dominate the world and its inhabitants will fall under your control. Do as you wish with them. It is hardly their decision to make.

I believe this marks the end of our presentation so I wish the best of luck to all of you. I have to be getting back to my empire so Good day.

Solr heap size

Solr’s default heap size needs to be increased.

The following info is from Shawn Heisey’s post to the Solr mailing list. I copied it here as a note to myself, and in the hopes of helping Solr newcomers.

There are exactly two solutions to the OutOfMemoryError regarding heap
space:

1) Increase the heap size.
2) Decrease the memory requirements.

The default heap size that Solr 5.0 and later starts with is 512MB.
This is a very small heap size. We are aware that the default is very
small — this is intentional, so that the default install is runnable on
virtually any hardware.

Almost all production Solr installs will require increasing the heap
size. If you get a little bit of data in a Solr install and then make
complex query requests, it can easily require more than 512MB of total heap.

Solr Search

Look, the Apache Solr search server is installed on the Blinkmonitor.com site now!

You will be thinking “big whup” perhaps, because WordPress (WP) already has Search built into it.

But .. WP’s search speed is limited by MySQL’s text search speed. That is fine for a few thousand posts, but when there are millions you will find yourself waiting for search results. Solr has its own inverted database, and indexes all the words in the posts or pages.

Better still, Solr has faceted search (not so for WP). Looking at the search results, you can select a category like ‘books’, or confine your search to a tag such as ‘Heritage’. You can order the results by relevance, or list  the newest ones first.

And highlighting is a big benefit. Looking at the search results, you will see snippets of the pages you were searching for, with the searched text highlighted.

How does this all work, it might seem unlikely that a complex PHP project like WP could be integrated with a complex Java project like Solr. This is all ‘easy’ because Solr has a RESTful interface. It responds to HTTP requests (such as GET, POST et al). When you type, say, “charlie” into the search box, WP does a GET to Solr. Solr accepts the search argument “charlie”, checks its index to find out which pages contain “charlie”, and returns a list of pages in the GET result. WP displays the list as links you can click on to see the pages.

When an author writes a WP page, WP sends it to Solr to be indexed. And when a page is updated, it gets sent to Solr again.

WordPress (WP) needs a plugin for this all to work.  There are several WP Solr plugins, and I chose the great WPSOLR plugin. It is free, but there are paid options that you might want to consider (disclosure: I am just a user, and am not paid for this mention). Paid installation support is available, but this will not be necessary if you are familiar with Solr.

Solr is quick enough to provide ‘autocomplete’ suggestions in the search box. I have that configured using the older spellchecker method. There is a suggester module, new as of last year, but I have not yet persuaded it to build its index. Soon..

Angular WordPress

The WordPress folks have been busy creating a RESTful API, and that is good because it is the way of the future. This API will be the basis for Single Page Applications (SPA).

The API could also (hand waving here) be used for communications between web services.

The RESTful API makes it possible for a SPA web site client (the Javascript in your browser) to make AJAX calls to the web server API, and display the returned information. Then the web page can be snappy fast. When you click on a button, there is no need to refresh the whole page. Just the part that needs to change. I say button, but this applies to menus and other controls in the page. Likewise, when the user interacts with the SPA, the information she enters would get POSTed to WordPress via the API.

My interest is in the heart of the WordPress page: the posts, as seen by a visitor (the admin UI is important too, but it can wait). Traditional WordPress displays a list of posts, and you can scroll down to see more. I would prefer to see a list of excerpts, each with a ‘more’ control so you can see the whole post. I do not want to be inserting ‘Read More’ tags manually. And I want the ‘more’ button to be snappy: no page refresh needed. This becomes possible with the RESTful API. How is it done? There are a few things needed:

  • Install you own WordPress. I suspect this is needed, because I am adding files to the WordPress root directory.
  • Install the WP REST API plugin as instructed at http://v2.wp-api.org/
  • Write some JavaScript for AJAX calls. I like to organize it using AngularJS, as instructed at https://angularjs.org/

WordPress uses a single entry Facade pattern: the default route is handled by index.php. An Apache rewrite rule directs all viewing requests to index.php. For example, this post is viewed at http://blinkmonitor.com/2016/04/angular-wordpress/ .
But it is easy to add a SPA web app on  separate web site by using your own custom html and  Javascript.  This Javascript would contain AJAX calls to populate the post viewing area. For example, this post can be viewed at http://leirtech.com/ (several posts appear on this page, and this post is a few down from the top one). In both sites, the blog content is being supplied from the same back-end database. But the leirtech.com site does not provide any way to edit the post contents. It could provide this if I took the time to develop the front-end.

The standard WordPress root directory looks like this:

$ ls -l /usr/share/wordpress
total 164
-rw-r--r-- 1 root root 418 Sep 24 2013 index.php
-rw-r--r-- 1 root root 5035 Oct 6 2015 wp-activate.php
drwxr-xr-x 9 root root 4096 Feb 29 14:18 wp-admin
-rw-r--r-- 1 root root 271 Jan 8 2012 wp-blog-header.php
-rw-r--r-- 1 root root 1369 Oct 3 2015 wp-comments-post.php
lrwxrwxrwx 1 root root 36 Feb 3 08:48 wp-config.php ../../../etc/wordpress/wp-config.php
-rw-r--r-- 1 root root 2853 Dec 16 04:58 wp-config-sample.php
drwxr-xr-x 6 root root 4096 Feb 29 14:18 wp-content
-rw-r--r-- 1 root root 3286 May 24 2015 wp-cron.php
drwxr-xr-x 14 root root 12288 Feb 29 14:18 wp-includes
...

We will create the separate web site with these files and directories:

-rw-r--r-- 1 rleir rleir 5953 Apr 14 10:48 index.html
drwxrwxr-x 2 rleir rleir 4096 Apr 14 10:44 js
drwxrwxr-x 2 rleir rleir 4096 Apr 12 15:01 img
drwxrwxr-x 2 rleir rleir 4096 Apr 12 15:01 css

The directories contain, for a start:

js/app.js
js/services.js
js/controllers.js
css/app.css
img/quotes-meta.gif

The WordPress dashboard will still be accessed via domain.com/wp-admin/something.php, but when you are reading posts you will be using domain.com/index.html.
The index.html contains, for a start:
<!doctype html>
<html lang="en" ng-app="ricksiteApp">
<head>
<meta charset="utf-8">
<title>LeirTech Consulting</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css">
<link rel="stylesheet" href="css/app.css">
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.9/angular.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.9/angular-sanitize.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.9/angular-resource.js"></script>
<script src="js/app.js"></script>
<script src="js/controllers.js"></script>
<script src="js/services.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>

More to come as I sort this out.