How to Prepare to Write Your Own Solr Plugin
I wanted to write a custom Solr plugin. So, I googled for a guide, tutorial, or even some basic information as to where to begin. I was not able to find this so I figured it out, and am writing this post so that others should not have trouble doing the same thing.
The purpose of this tutorial is to show how to set up the environment to build and deploy your own plugins. This will not cover what you can do in your plugin since that’s already covered in the Solr and Lucene Java docs.
This tutorial should show you how to set up the environment to be able to write your own filter, tokenizer, analyzer, query parser, etc.
Also, note that this all pertains to Solr 3.6. You need to have a working copy of Solr 3.6, ant, and java installed as a prerequisite to proceeding.
1) Make a new directory for your Solr plugins anywhere you feel like putting it:
$ mkdir ~/solr-plugins
2) Make the lib and src directories:
$ mkdir ~/solr-plugins/lib ~/solr-plugins/src
3) Obtain the library .jar files. There should be a .war file inside the dist directory of the Solr directory. Unzip that .war file into a temporary directory anywhere, and then copy the apache-solr-*.jar and lucene-core-3.6.0.jar files into your libray.
$ mkdir ~/solr-unzippped $ unzip ~/apache-solr-3.6.0/dist/apache-solr-3.6.0.war -d ~/solr-unzippped $ cp ~/solr-unzippped/WEB-INF/lib/apache-solr-*.jar ~/solr-unzippped/WEB-INF/lib/lucene-core-3.6.0.jar ~/solr-plugins/lib $ rm -rf ~/solr-unzippped
(Note: You may want to copy other .jar files to have access to other libraries. You can also find other Solr .jar files in the dist directory of Solr. These libraries should be enough to get you started. Also, remember that you can view the contents of a .jar file with `jar tf path/to/file.jar`
5) Select a directory relative to your solr instance where you can deploy your plugin. I made a directory in my contrib directory called custom:
$ mkdir ~/apache-solr-3.6.0/contrib/custom
6) Modify your solrconf.xml file to point to the location of your custom plugins. Open solrconf.xml in your favorite editor:
emacs ~/apache-solr-3.6.0/example/solr/conf/solrconfig.xml
And add the following line:
<lib dir="../../contrib/custom" regex=".*\.jar" />
Adjust the above path (and all of the other lib paths) if your have your Solr instance in a different directory than the default example.
5) Now, you can make the ant build.xml file. I originally set this up with Eclipse so I used Eclipse to generate the build.xml file for me, but then I made some changes to it and added the ability to deploy the .jar file. You can do it that way too, or you can more simply copy and paste from my build.xml file and make the appropriate changes to reflect your paths:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project basedir="." default="build" name="CustomTokenizerFactory">
<property environment="env"/>
<property name="debuglevel" value="source,lines,vars"/>
<property name="target" value="1.6"/>
<property name="source" value="1.6"/>
<path id="CustomTokenizerFactory.classpath">
<pathelement location="bin"/>
<pathelement location="lib/apache-solr-core-3.6.0.jar"/>
<pathelement location="lib/lucene-core-3.6.0.jar"/>
<pathelement location="lib/apache-solr-solrj-3.6.0.jar"/>
</path>
<target name="init">
<mkdir dir="bin"/>
<copy includeemptydirs="false" todir="bin">
<fileset dir="src">
<exclude name="**/*.launch"/>
<exclude name="**/*.java"/>
</fileset>
</copy>
</target>
<target name="clean">
<delete dir="bin"/>
</target>
<target depends="clean" name="cleanall"/>
<target depends="build-subprojects,build-project" name="build"/>
<target name="build-subprojects"/>
<target depends="init" name="build-project">
<echo message="${ant.project.name}: ${ant.file}"/>
<javac debug="true" debuglevel="${debuglevel}" destdir="bin" source="${source}" target="${target}">
<src path="src"/>
<classpath refid="CustomTokenizerFactory.classpath"/>
</javac>
</target>
<target depends="build" name="deploy">
<echo message="${ant.project.name}: ${ant.file}"/>
<jar destfile="~/apache-solr-3.6.0/contrib/custom/Custom.jar" basedir="bin" includes="**/*.class" />
</target>
</project>
6) Now that everything is in place, you can finally start to work on your custom Solr plugin. It is best, especially when getting started, to work from an existing plugin. I recommend that you download the Solr 3.6 source and find a plugin that is close to what you want to do and copy it into your new plugin.
Change the package name to something unique (i.e. org.mycompany.solr.analysis).
7) To build and deploy (using the build.xml file above with the correct paths set-up) call “ant deploy” from the plugins directory:
$ cd ~/solr-plugins/
$ ant deploy
If there are no errors then your plugin should now be deployed.
7) Modify your schema to include the new plugin. Be sure to use the correct package name (i.e. org.mycompany.solr.analysis.MyCustomFilterFactory)
You can find examples of how to apply a custom plugin in your schema, here: http://wiki.apache.org/solr/SolrPlugins
8) Restart the Solr instance and confirm that it is running by trying to access it. If you get an error that says that the plugin cannot be found, then please review the steps.
9) Test your new plugin. If it is an analyzer that’s assigned to a field, you can test with the analysis tool: http://localhost:8983/solr/admin/analysis.jsp
Now, any new Solr plugins that you want to make and deploy can be done from the same place and you can build and deploy them with a single command (ant deploy) and then just update the schema and restart Solr to start using them.
MailWarnMe – A Warn Before Sending Tool for Mac Mail
I needed a simple plugin for Mac Mail that would prevent me from sending email to certain recipients from the wrong account. So I did a little research on the undocumented Mail API and went through Code Samples, relying heavily on the groundwork laid by the GrowlMail plugin, and came up with this:
http://code.google.com/p/mail-warn-me/
It’s not complete, but it IS functional and already useful to me. I still need to make an installer for it, and add some features so the general public might find it user-friendly, but in the meantime, if anybody else is scouring the web looking for information on how to swizzle Mail’s methods, the source code might be of use to you.
Test multiple IE versions on OS X (or Ubuntu)
http://shapeshed.com/journal/testing_with_ie6_ie7_and_ie8_on_virtualbox/
Amazing! I don’t know how I haven’t found this before. I am about to try it out. Will post back here if it DOESN’T work. Otherwise assume that it does and enjoy!
(for Mac users) Open any man page in Preview
man -t "sudo" | open -f -a /Applications/Preview.app
That’s it.
Seach your PHP includes
If you ever find yourself working a large PHP project with many includes, you may come to point where you need to quickly locate where a particular function or global variable is defined. Ideally, this should be easy to find, but when multiple people are working a project, it can sometimes be very time consuming to hunt down where something occurs.
Here are a few lines that you should able to insert somewhere in your code to refresh the page and find whatever you’re looking for in $search.
$search = "function do_something";
$cmd = "grep ".escapeshellarg($search)." ".implode(" ",get_included_files());
echo "<pre>".`$cmd`;
exit();
The above example can worked into something more useful. Enjoy!
How to Index a Site with Python Using solrpy and a Sitemap
I wrote this entry on the TNR Global Blog:
http://www.tnrglobal.com/blog/2010/07/how-to-index-a-site-with-python-using-solrpy-and-a-sitemap/
Small PHP Script to Load All Javascript in One Request
It is generally a good idea to try to reduce the number of HTTP requests for a web page to load more quickly. If you have many javascript files, it is ideal to put them into a single file and load that one file. Unfortunately, this can be difficult to manage especially when using scripts that have upgrades periodically (such as Prototype and Scriptaculous, jQuery, etc.). A simple solution is to have a fast script that puts all of the javascript into a single file on the server side. Here is that script:
<?php
$time_start = microtime_float();
$files = $_GET['load'];
// Make sure the input is not unreasonably large or contains any characters that would allow access of files outside of this directory or hosted elsewhere (i.e. no colons or slashes)
if (strlen($files)>200 || preg_match('/[^a-zA-Z0-9.,_-]/',$files))
exit("Error with input");
$files_array = explode(',',$files);
$output = '';
foreach($files_array as $file) {
$file = $file.'.js';
if (!file_exists($file))
exit("File not found: $file");
$output .= "// ---------------- $file ----------------".PHP_EOL;
$output .= file_get_contents($file);
$output .= PHP_EOL;
}
$time_end = microtime_float();
$time = $time_end - $time_start;
$output .= "// Rendered by js_load in $time seconds";
header("Content-Type: text/javascript");
header("Content-Length: ".strlen($output));
echo $output;
function microtime_float() {
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
?>
This file should be called js_load.php and it should be placed in the same directory as all other js files. Usage would be:
<script type="text/javascript" src="/global/js/js_load.php?load=prototype,scriptaculous,effects,dragdrop,my_script"></script>
This runs nice and fast and does the trick. An improvement would be to store a cached version of the output and to check the file dates when accessed and only update the cache when there is an update, rather than have to read through and output the actual files each time.
Note, some testing showed almost no difference in speed between file_get_contents() and fpassthru().
Scriptaculous Draggable In a Div with “overflow:auto”
How many times does this happen to you? You have a Scriptaculous Draggable object and it is inside a <div> that has a style of “overflow:auto;”. You start dragging that Draggable around only to find that when you leave the bounds of it’s containing <div> the element disappears! What is a web developer to do?!
I had this very problem and started to tackle it myself by adding some code to the onStart and onEnd callbacks of my Draggable objects to move the Draggable to be a child of document.body temporarily while dragging. This was effective unless the the original div scrolled. There were some other issues as well, so I started searching around for a better approach. I found some discussion of it here:
http://dev.rubyonrails.org/ticket/5771
I then found a solution here:
http://groups.google.com/group/prototype-scriptaculous/browse_thread/thread/b614e856a9aa9d05/b89629e07d1a2f8c?#b89629e07d1a2f8c
The code below is the solution that I found as presented by Google Groups user Christophe Boulain:
function getDragElement(element) {
var el = element.cloneNode(true);
el.id = 'sub'+element.id;
el.style.position = 'relative';
document.body.appendChild(el);
return el;
}
var SubsDraggable = Class.create(Draggable, {
initialize:function($super, element) {
var options = arguments[2] || {};
$super(element,options);
if( typeof(this.options.dragelement) == 'undefined' )
this.options.dragelement = false;
},
initDrag:function(event) {
if(!Object.isUndefined(Draggable._dragging[this.element]) && Draggable._dragging[this.element])
return;
if(Event.isLeftClick(event)) {
// abort on form elements, fixes a Firefox issue
var src = Event.element(event);
if((tag_name = src.tagName.toUpperCase()) && (
tag_name=='INPUT' ||
tag_name=='SELECT' ||
tag_name=='OPTION' ||
tag_name=='BUTTON' ||
tag_name=='TEXTAREA')
)
return;
var pointer = [Event.pointerX(event), Event.pointerY(event)];
// HERE are my modifications to calculate the new clone position. I'm not sure if there is an easier method, but this one seems to work.
var pos = this.element.cumulativeOffset();
var scroll = this.element.cumulativeScrollOffset();
var vpscroll = document.viewport.getScrollOffsets();
this.offset = [0,1].map( function(i) {
return (pointer[i] - pos[i] + scroll[i] - vpscroll[i])
});
Draggables.activate(this);
Event.stop(event);
}
},
startDrag: function($super, event) {
if( this.options.dragelement ){
this._originalElement = this.element;
this.element = this.options.dragelement(this.element);
Position.absolutize(this.element);
Position.clone(this._originalElement, this.element);
}
$super(event);
},
finishDrag: function($super, event, success) {
$super(event, success);
if(this.options.dragelement){
Element.remove(this.element);
this.element = this._originalElement;
this._originalElement = null;
}
}
});
To use it, initialize your Draggable like this:
new SubsDraggable('elementid', { dragelement: getDragElement, ...});
CodeIgniter’s Var Cache
I wrote a library for CodeIgniter called “Var_cache”. It a simple solution for quickly storing and retrieving a variable that does not need to regularly calculated (a typical example being the result of a COUNT(*) query). It is available here:
http://codeigniter.com/wiki/Var_Cache/
Finding Nearby Locations
Let’s say you have a list of 60 or so locations and you need a fast way to sort them by distance from a user’s current location. You might have a form where a user can enter their current address or zip code, etc. and you want it to output the different locations sorted by distance. Here’s one way to do that:
First, you will need a Google Maps API Key. That is very easy to obtain here: http://code.google.com/apis/maps/signup.html
Now that you have that, you have a couple of options. If you want to work on the client side (and assuming this is a web application), you can use Google’s Javascript API’s Geocoding Object. I am going to talk about doing it on the server side though…
Google also has a HTTP Geocoding Service which can be called as follows:
http://maps.google.com/maps/geo?q=1600+Amphitheatre+Parkway,+Mountain+View,+CA&output=json&sensor=true_or_false&key=your_api_key
In PHP, you can take the query and the result from this web service and get the result:
<?php
$query = $_POST['q']
$result = file_get_contents('http://maps.google.com/maps/geo?q='.urlencode($query).'&output=json&sensor=false&key=your_api_key')
?>
Now, parse the result for the geocode object. Note, the result can be request in json or other formats such as XML. Choose whichever fits your purpose.
At this point, you will also need to know the geocode of all of your locations. Assuming that you have them in a database, you can add a field called “geocode” or “lat” and “lon”. If you you do not have a particular geocode set for a certain location you can set it this one time using the web service and store it so it never has to be checked again. Obviously it would be a large operation to have to check each location for every search.
Assuming that you have all of the locations, run the Haversine formula on each location’s geocode and the entered location’s geocode from the web service. (Note: for much more accuracy you can use Vincenty’s formula, but that is not necessary in this application). Here is an implementation that I found on stackoverflow.com and have tested:
// pass the latitudes and longitudes in as degrees
function getDistance($lat1,$long1,$lat2,$long2)
{
$r = 3963.1; //3963.1 statute miles; 3443.9 nautical miles; 6378 km
$pi = pi();
// convert the degrees to radians
$lat1 = $lat1*($pi/180);
$lat2 = $lat2*($pi/180);
$long1 = $long1*($pi/180);
$long2 = $long2*($pi/180);
$ret = (acos(cos($lat1)*cos($long1)*cos($lat2)*cos($long2) + cos($lat1)*sin($long1)*cos($lat2)*sin($long2) + sin($lat1)*sin($lat2)) * $r) ;
return $ret;
}
Put each of those distances in an array, putting them either at the beginning or end of the array depending on whether they are greater than or less than the first or last item.
Then display to the user the corresponding stores.