We’ve been helping with the development of the PRISMS resource review utility and public interface.
Our most recent task was to find a way to provide access to the data in an exchange format compatible with the NSDL. To do this we were looking for a PHP-based OAI provider that could interface with our MySQL database. We found a capable application in the work of Heinrich Stamerjohanns. He has put together a set of PHP scripts which provides exactly the kind of functionality we needed.
The code is a bit old, last update having been made in June of 2005. Functionally, however, it is more than adequate for our needs. Well, mostly, as it was necessary to make at least one update to accommodate the nsdl_dc metadata format.
The change starts with the configuration variable
$METADATAFORMATS (defined in oaidp-config.php). Originally the information stored in this array did not allow for multiple namespace declarations inside the metadata container. I added a new property (
xml_namespaces) that consists of an associative array containing additional namespace declarations. The format of this new array is “prefix” => “namespace”.
Next I had to modify the
metadataHeader() function (defined in oaidp-util.php) to enable the inclusion of the additional namespaces in the XML. This function creates the metadata container node. I added a new section that checks for the presence of the
xml_namespaces property and then adds these namespaces to the container node.
The final step was handily included by Stamerjohanns. When specifying a metadata format you also indicate the PHP file that will generate the output. You can write your own, but it’s easier to base this on the record_dc.php file. As long as the XML format you’re using is flat you can just copy the original file and modify the database field and XML node references.
Character set madness
One thing that always seems to snag me is dealing with character sets. Follows are my suggestions and thoughts on on using this code with UTF-8 character data.
We use UTF-8 character encoding for our data. This is fine and the script supports UTF-8. If you use this character set make sure you specify this in the configuration file, otherwise you are likely to encounter XML processing errors.
Related to our use of UTF-8, I updated an if condition in the
xmlstr() function that appeared to be encoding strings as UTF-8 data when the format specified is ISO-8859-1.
I’m not sure if the code specifies UTF-8 as the character set for the database connection. Also, I don’t know if it is important to ensure that the mbstring PHP module is available.
A view on MySQL
Originally we had planned on using a view to generate the flat table structure necessary for enabling this script. Unfortunately our web host is still on MySQL 4 and so views are not available. Luckily the script appears perfectly happy if you specify a virtual table as your source. The nice thing about this is that if our host ever updates to MySQL 5+ we can create the view and replace the table reference.
Get the code
I’ve uploaded the code (with modifications) to our site: PHP-based OAI Provider