All posts by Brian

Apache Log Format Update

A note way past due.

I updated the log format for flora.p2061.org (but not the other sites on flora). Some information is not included in the default “combined” format that could help with debugging (such as user cookies). I set up a new format that is similar to the IIS W3C extended format. Two values not included that could prove useful are bytes received (%I) and bytes sent (%O). The logio_module has to be enabled to record this information. It would probably be worthwhile to add this functionality once we start placing more public content on the server. For now the following format suffices:

#Fields: date    time    c-ip    cs-host    cs-version    cs-method    cs-uri-stem    cs-uri-query    cs(Referer)    cs(User-Agent)    cs(Cookie)    sc-status    time-taken
LogFormat "%{%Y-%m-%d    %H:%M:%S}t    %h    %V    %H    %m    %U    %q    %{Referer}i    %{User-Agent}i    %{Cookie}i    %>s    %D" webapp

Playing nice with CPU usage

I needed to do a mass resampling of around 280,000 images. There are a number of ways to do this, but I settled on doing it via PHP because the images are stored on our web server, the total size of the images is large (~10GB), and I didn’t want to kill my machine trying to get it done.

PHP is ideal for a task such as this: parsing directories and subdirectories for images is easy; resampling using the built-in library (GD) is a breeze; specifying the destination as a subdirectory is simple. The one minus was processor usage. Performing image manipulation eats up the CPU in a big way.

Luckily linux systems have a built-in utility for addressing a situation like this: nice. nice will “run a program with modified scheduling priority.” I’m running the image manipulation script using the following command:

nice --adjustment=19 php script.php

If nothing else is going on the script will use whatever resources are available. When anything with a higher priority executes, that program will take precedence over the script with regard to system resources. The script should thus not affect the responsiveness of the web server. This is the reason I was searching for this kind of functionality.

Items Utility: Data Conversion Ready

After creating a new structure for the current year’s piloting data there has been a bit of disconnect in development between the two data sets. There’s just not enough time to ensure that everything works across both data sets. There were basically two choices to moving forward: 1) construct a view of the new data structure that mimics the old data structure; 2) port the old data structure to the new one and continue redevelopment of the scripts. I chose the latter, mainly because there are some improvements I’d like to make to the interface in the process.

I created a series of SQL statements to run in MySQL that will convert the data from the packet_item_records and miscon_packet_refs tables to the packet_students, packet_data, and miscon_packetdata_refs tables. After the conversion I updated any pages that referenced the old data structure. So far in testing the data seems to have converted perfectly.

Minus one issue. Multiple selections for the answer choice questions (A, B, C, D) from 2006 were recorded as a generic Multiple rather than Y+NS, N+NS, etc. This value is not represented in the updated data format or on the data entry forms or summary tables. Rather then spend too much time addressing this issue I’m going to leave these values empty for now. I don’t expect this to be a problem since the researchers are focused on data for the current pilot and field tests. Also, I’m keeping the old version of the piloting data and scripts that interact with it online in case it’s needed.

Creating Graphs in PHP

I was tasked with creating a demonstration of a dynamically generated graph for one of the grant proposals being worked on at present. This isn’t something we’ve really done before in PHP, but I had a feeling it would not be a unique requirement. A quick search revealed that a PEAR package exists for just this purpose: Image_Graph.

Installation was not too difficult, though I did have to ensure that a few dependencies were installed before PEAR would load the package. Documentation for the module is not that great, but there are a number of samples and a fairly active discussion forum. (There used to be a decent community, but the site appears to have gone under.)

I did not find the graphing object very intuitive. It seems to me that a lot of the properties and functions are abstracted from the objects they affect. I suspect this was a design decision to allow for greater flexibility, but it may also be due to an older coding style. The current branch appears to have been developed in earnest starting in 2005, and it’s been a few years since a new release. Even so, my initial exposure leads me to believe the package may be stable enough for even a production-level project so long as sufficient QA has been performed.

It took me only a couple of days to hack together the demonstration. Though the requirements of the script were few, I decided to push the limits of my object-oriented PHP and create the graphing component as a class. Overall it was a good first-try at producing something of this nature. You can find the demonstrator at http://flora.p2061.org/weather_data/.

Items Utility: Piloting Data Entry Updates

Piloting packets for this year are starting to come in and the data from these packets needs to be entered into the items utility. Because the questions in the packet have changed from previous years, and due to the rigid structure of the piloting data entry form and data tables, I decided to rewrite the data entry form. The new format allows more flexible on the front-end without requiring modification of the back-end. In addition I modified the way PHP interacts with the packet data from MySQL by utilizing a structured array to hold the data.

Continue reading Items Utility: Piloting Data Entry Updates

Subversion Repository for Items

I’m currently cleaning up the items utility so that it can be brought into subversion. I’ve already deleted unused files and done some reorganization of the site files. Next up I’m going to work on some interface modifications that I hope will simplify the it as well as create some common navigational elements across  pages. The final step will involve unifying the code in a control structure similar to other recent projects.

Because this is more of a long-term project it is being performed along side any current development of the items utility. As a result any modifications to the items utility should be brought to my attention so that they can be ported to the SVN working copy if necessary.

Using Subversion for Application Development and Deployment

I recently wanted to update our install of WordPress to the latest version. WordPress is a fairly easy install, and we could learn a thing or two about application set-up by examining their code. But I recently switched to using subversion to deploy and maintain our install. In just the little bit I’ve used subversion so far, I believe development and deployment of our internal applications would be simplified by employing it for all our projects. Here’s a quick outline of the process, with examples based on my WordPress deployment.

Continue reading Using Subversion for Application Development and Deployment

Expanding Search Terms for More Inclusive Results

While working on the Benchmarks search I wanted to try and provide a feature I find useful on Google and other search engines: word form expansion (lemmatisation). A little research showed to me that this would require more work than we really should be spending on search functionality. Especially considering that the built-in MySQL full text search capability is sufficient for our needs. So I decided to focus on a feature that would still provide value but require little time: word stem expansion.

Continue reading Expanding Search Terms for More Inclusive Results

Items Utility: School Reports Update

I made a few updates to the school reports at the request of the researchers.

  • Added a save feature so that packet/item selections for a report can be retrieved at a later date.
  • Added the ability to set thresholds that must be met before the script will include statistics in the report.
  • Updated tables to reflect changes in displayed information and formatting.

In regards to the above changes, the save feature is of particular note for it inscrutibility. While the technique used to save/retrieve the report setting is adequate, the code to enable this functionality is not well-implemented. Unfortunately, time constraints required a fast, rather than best, implementation.

As modifications are requested this script is getting to be a little harder to work with. The code was created over the course of a week or two back in January. I was able to make quick work of it by using some concepts I initially worked out for summary table generation (and based on discussions with Brian W). While the data is stored in a psuedo object-oriented format, the script itself is fairly linear in design. If more modifications are requested the script may need some rewrites to enable a bit more flexibility.

Best Practices for Setting MySQL Server Runtime Parameters

Since we’ll be exposing MySQL to significantly more traffic (due mainly to the transition to a database-driven version of Benchmarks Online [dbBOL]) I decided to spend some time optimizing the server’s settings. There are a number of settings that can be tweaked to improve performance. I based my decisions on the information available from the references cited and the performance statistics reported by MySQL (SQL SHOW VARIABLES or use PHPMyAdmin). MySQL has been running for 131 days as of the writing of this post (see cached copy of the runtime stats), so I expect the data will be a fairly good indication of the performance of MySQL under its current usage. Unfortunately, I expect the usage pattern to change significantly once dbBOL is released. As a result some of the settings used will be based on expected usage patterns. At specific intervals after dbBOL is released we should examine the performance of MySQL based on the runtime stats to determine if additional tweaking needs to be performed. I recommend the following schedule: 1 week, 1 month, 3 months, then every 6 months.

Continue reading Best Practices for Setting MySQL Server Runtime Parameters