Solr Resources

Where do you look for information on the Apache Solr project? There is good information, but it is a bit scattered.

The colorful front page introduces all the features.

For reference info, first look at the Reference Documents. This link is release specific; you may want a different version.

There is also the Reference Guide (a large pdf). It is release specific; you may want a different version.

The latest pre-release version of the above reference is in the Confluence Wiki

See also the Solr Community Wiki

The users mailing list archives is here. And the developer mailing list archives.

More approachable info can often be found on personal blogs:

The issue trackers are packed with good info:

StackOverflow has good questions, answers, and discussion.

Google books can be useful. To find out about “solr shard” you can try this query then select a book. For “solr nested”, try this query .

Solr Specific Search
I added a custom search widget to http://leirtech.com (in the footer, so scroll way down, with any tab). It is a site specific search, and its results are from the above listed sites only. For example, enter ‘nested’ and press enter: the best results are from the likes of
Yonick, Lucidworks, and the Cwiki. Please excuse Google’s promoted links at the top, Google feels a need to make some money.

Isaac Asimov’s Foundation

(or Pinky and the Brain??)

A presentation for grade 10 graduation

Hello Everyone, I am here to discuss the last step of your journey: conquering the world. However, before we dive into that, I believe some introductions are in order. I am the man who is most commonly known as “The Mule”, but my position is “First Officer” so please refer to me as that. I have, in my time, had the accomplishment of nearly conquering the entire galaxy and that is what I am here to talk to you about. By this point, if you succeeded in the prior steps, you have accumulated quite a bit of power. Your goal for this next step is to use that power to achieve total domination.

But don’t be too hasty; unless you have military strength that puts the rest of the world combined to shame, or mind control powers similar to my own, you will probably just end up dead if you attempt to win with brute force. From this point on, advance slowly, capturing small territories as peacefully as possible. Avoid bringing attention to the threat you pose as this will get you eliminated. After capturing territory, be certain to purge it of any threats and convert all of its inhabitants to your cause. At a certain point, no matter how careful you are, your advances will be noticed. Go on the offensive. Capture large targets and let the small ones fall in later. Be fast as if you can strike before your enemy can mobilise, you will be able to take down strong targets with little resistance. If you play your cards right, you will successfully dominate the world and its inhabitants will fall under your control. Do as you wish with them. It is hardly their decision to make.

I believe this marks the end of our presentation so I wish the best of luck to all of you. I have to be getting back to my empire so Good day.

Solr heap size

Solr’s default heap size needs to be increased.

The following info is from Shawn Heisey’s post to the Solr mailing list. I copied it here as a note to myself, and in the hopes of helping Solr newcomers.

There are exactly two solutions to the OutOfMemoryError regarding heap
space:

1) Increase the heap size.
2) Decrease the memory requirements.

The default heap size that Solr 5.0 and later starts with is 512MB.
This is a very small heap size. We are aware that the default is very
small — this is intentional, so that the default install is runnable on
virtually any hardware.

Almost all production Solr installs will require increasing the heap
size. If you get a little bit of data in a Solr install and then make
complex query requests, it can easily require more than 512MB of total heap.

Solr Search

Look, the Apache Solr search server is installed on the Blinkmonitor.com site now!

You will be thinking “big whup” perhaps, because WordPress (WP) already has Search built into it.

But .. WP’s search speed is limited by MySQL’s text search speed. That is fine for a few thousand posts, but when there are millions you will find yourself waiting for search results. Solr has its own inverted database, and indexes all the words in the posts or pages.

Better still, Solr has faceted search (not so for WP). Looking at the search results, you can select a category like ‘books’, or confine your search to a tag such as ‘Heritage’. You can order the results by relevance, or list  the newest ones first.

And highlighting is a big benefit. Looking at the search results, you will see snippets of the pages you were searching for, with the searched text highlighted.

How does this all work, it might seem unlikely that a complex PHP project like WP could be integrated with a complex Java project like Solr. This is all ‘easy’ because Solr has a RESTful interface. It responds to HTTP requests (such as GET, POST et al). When you type, say, “charlie” into the search box, WP does a GET to Solr. Solr accepts the search argument “charlie”, checks its index to find out which pages contain “charlie”, and returns a list of pages in the GET result. WP displays the list as links you can click on to see the pages.

When an author writes a WP page, WP sends it to Solr to be indexed. And when a page is updated, it gets sent to Solr again.

WordPress (WP) needs a plugin for this all to work.  There are several WP Solr plugins, and I chose the great WPSOLR plugin. It is free, but there are paid options that you might want to consider (disclosure: I am just a user, and am not paid for this mention). Paid installation support is available, but this will not be necessary if you are familiar with Solr.

Solr is quick enough to provide ‘autocomplete’ suggestions in the search box. I have that configured using the older spellchecker method. There is a suggester module, new as of last year, but I have not yet persuaded it to build its index. Soon..

Angular WordPress

The WordPress folks have been busy creating a RESTful API, and that is good because it is the way of the future. This API will be the basis for Single Page Applications (SPA).

The API could also (hand waving here) be used for communications between web services.

The RESTful API makes it possible for a SPA web site client (the Javascript in your browser) to make AJAX calls to the web server API, and display the returned information. Then the web page can be snappy fast. When you click on a button, there is no need to refresh the whole page. Just the part that needs to change. I say button, but this applies to menus and other controls in the page. Likewise, when the user interacts with the SPA, the information she enters would get POSTed to WordPress via the API.

My interest is in the heart of the WordPress page: the posts, as seen by a visitor (the admin UI is important too, but it can wait). Traditional WordPress displays a list of posts, and you can scroll down to see more. I would prefer to see a list of excerpts, each with a ‘more’ control so you can see the whole post. I do not want to be inserting ‘Read More’ tags manually. And I want the ‘more’ button to be snappy: no page refresh needed. This becomes possible with the RESTful API. How is it done? There are a few things needed:

  • Install you own WordPress. I suspect this is needed, because I am adding files to the WordPress root directory.
  • Install the WP REST API plugin as instructed at http://v2.wp-api.org/
  • Write some JavaScript for AJAX calls. I like to organize it using AngularJS, as instructed at https://angularjs.org/

WordPress uses a single entry Facade pattern: the default route is handled by index.php. An Apache rewrite rule directs all viewing requests to index.php.
But it is easy to add a SPA web app by using an index.html file instead. This file contains AJAX calls to populate the post viewing area.

The standard WordPress root directory looks like this:

$ ls -l /usr/share/wordpress
total 164
-rw-r--r-- 1 root root 418 Sep 24 2013 index.php
-rw-r--r-- 1 root root 5035 Oct 6 2015 wp-activate.php
drwxr-xr-x 9 root root 4096 Feb 29 14:18 wp-admin
-rw-r--r-- 1 root root 271 Jan 8 2012 wp-blog-header.php
-rw-r--r-- 1 root root 1369 Oct 3 2015 wp-comments-post.php
lrwxrwxrwx 1 root root 36 Feb 3 08:48 wp-config.php ../../../etc/wordpress/wp-config.php
-rw-r--r-- 1 root root 2853 Dec 16 04:58 wp-config-sample.php
drwxr-xr-x 6 root root 4096 Feb 29 14:18 wp-content
-rw-r--r-- 1 root root 3286 May 24 2015 wp-cron.php
drwxr-xr-x 14 root root 12288 Feb 29 14:18 wp-includes
...

We will add these files and directories:

-rw-r--r-- 1 rleir rleir 5953 Apr 14 10:48 index.html
drwxrwxr-x 2 rleir rleir 4096 Apr 14 10:44 js
drwxrwxr-x 2 rleir rleir 4096 Apr 12 15:01 img
drwxrwxr-x 2 rleir rleir 4096 Apr 12 15:01 css

The directories contain, for a start:

js/app.js
js/services.js
js/controllers.js
css/app.css
img/quotes-meta.gif

The WordPress dashboard will still be accessed via domain.com/wp-admin/something.php, but when you are reading posts you will be using domain.com/index.html.
The index.html contains, for a start:
<!doctype html>
<html lang="en" ng-app="ricksiteApp">
<head>
<meta charset="utf-8">
<title>LeirTech Consulting</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css">
<link rel="stylesheet" href="css/app.css">
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.9/angular.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.9/angular-sanitize.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.9/angular-resource.js"></script>
<script src="js/app.js"></script>
<script src="js/controllers.js"></script>
<script src="js/services.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>

 
 
More to come as I sort this out.

AngularJS First Steps

AngularJS in combination with Bootstrap is such an improvement to the standard trio of HTML, CSS and Javascript. Under the covers, the trio is still there of course. But the new AngularJS framework provides a structure which helps organize all the actions in the Javascript, so you can have a more complex web app without a huge increase in development effort. The most noticeable change is the speed of actions, because there is no page reload whenever you click on a control (try the navigation buttons above). Most actions can be handled in the browser, and just occasionally is a page reload needed.

Bootstrap’s big win for me is in responsive layouts. Responsive, meaning that you can visit the site with a small mobile screen or with a large PC screen, and the site will have a similar appearance. There is one package of source code supporting all device sizes. Boostrap provides a grid layout system, which simplifies the code. It is more straightforward than the underlying relative and absolute positioning, and obsoletes the old methods of layout by table or (egad) frames.

The responsive layout on my site is not perfect. On a desktop, I would prefer to have the navigation controls in the left column, but on a mobile I prefer them at the top of the main column as above. There is probably an easy way to do that, check back soon.

But what about SEO. Will Google be unhappy that the content is conditionally visible? Google had problems with sites which had deceptive SEO in the form of hidden or small-font text. This page is not deceptive in that way, but it could trigger the alarms at Google. This issue is TBD.

What about browsers with JavaScript disabled? Then all pages will appear, and that is a good fallback.

What is content is loaded dynamically using AJAX XHR from a WordPress server? In this case, the HTML file does not contain content, but instead has (view the source to verify!):

<div ng-bind-html="pages[0].content.rendered" ></div>

Then SEO might be a challenge, because we would depend on the Google spider bot to execute our Javascript. That might be OK; I hear that it does so. Yes! see Google’s rendering here.

I will check my server logs in the next few days to see when the XHR gets requested. But will Bing and other search engines also execute our Javascript? I will provide updates on this ASAP.

What about Google’s Webmaster Tools Data Highlighter? Unfortunately, it does not work for me at the moment.

What about the Back button? Site navigation using the menu buttons is not saved to the browser history, and our Javascript will need to update the history by making calls to the browser API. This is also TBD.

What about older browsers? Then all pages will appear, and again that is a good fallback. Some browsers showing this problem are MSIE 6, Konqueror 4.8, and Lynx.