Category Archives: Web Development

YUI caching breaks editing

From the AJAX settings screen (/admin/settings.php?section=ajax) you can modify various parameters regarding how Moodle loads the YUI libraries. One of the options is to cache the JavaScript output. This has obvious performance benefits and should normally be enabled. However, during a recent update it appears that caching has broken the ability to edit. Whenever an editing page loads the browser throws the following error:

TypeError: M.core_formchangechecker is undefined

The form actually loads but because of the JavaScript error the rendering of the forum stops before it is complete, leaving the user unable to edit a number of options for the component being edited.

To work around the issue I have disabled caching. This isn’t ideal, but until a solution is found it is the most workable option.

Optimizing database queries in CakePHP

We’ve been using CakePHP for a while now. I’ve found it to be very helpful for quick prototyping and development of core site functionality. This is nice because it allows more time for tweaking a site’s user interface, functionality, and performance. That last one, performance, can occasionally be a challenge. One of the means of optimizing a site’s performance is through the crafting of database queries that take into account indexes and conditions. But because CakePHP is building the SQL we have to keep in mind how CakePHP does this as we construct our query parameters.

Let’s review one of those situations. Say we have the following (simplified) models:

class Item extends AppModel {
  var $hasMany = array('Answer');
  var $belongsTo = array('Packet');
}

class Student extends AppModel {
  var $hasMany = array('Answer');
} 

class Answer extends AppModel {
  var $belongsTo = array('Student','Item','Packet');
}

class Packet extends AppModel {
  var $hasMany = array('Answer','Student');
  var $hasAndBelongsToMany = array('Item');
}

Let me explain the model relationships. We have test items that are assigned to packets. Packets are sent out for testing to a group of student. So each answer is unique based on the packet, item, and student. The (somewhat circular) relationships between these models provides for easy access to specific data. For example, let’s say we want to know all the answers from a particular state for a couple of our items. Here’s an example query that present some optimization issues:

$queryOpts = array(
  'fields' => array('Item.*');
  'conditions' => array('Item.id' => array(1995, 1726, 1971, 1707, 1972));
  'contain' => array(
    'Answer' => array(
      'fields' => array('Answer.*');
      'conditions' => array(
        'Answer.packet_id IN (SELECT id FROM packets WHERE state = "CA")'
      )
    )
  );
);
$items = $this->Item->find('all',$queryOpts);

You’ll note that I’m not using CakePHP syntax for the Answer model conditional. We had been using this conditional in our web application prior to moving to CakePHP. Since there is not, so far as I can tell, an easy way to work with SQL subqueries we decided to start by forcing a query similar to the original.

Running the above creates the following SQL statement when querying the answers table:

SELECT `Answer`.* FROM `answers` AS `Answer` WHERE `Answer`.`packet_id` IN (SELECT id FROM packets WHERE packet_type = "F") AND `Answer`.`item_id` IN (1995, 1726, 1971, 1707, 1972)

We have a fairly large data set so the above query takes ~32 seconds to run on my system. But that query is not optimized; reordering the conditions in the WHERE clause produces the following query:

SELECT `Answer`.* FROM `answers` AS `Answer` WHERE `Answer`.`item_id` IN (1995, 1726, 1971, 1707, 1972) AND `Answer`.`packet_id` IN (SELECT id FROM packets WHERE packet_type = "F")

The optimized query performs significantly better, <1 second.

Since for this example we know which items we want we can update the query to manually force the desired ordering:

$queryOpts = array(
  'fields' => array('Item.*');
  'conditions' => array('Item.id' => array(1995, 1726, 1971, 1707, 1972));
  'contain' => array(
    'Answer' => array(
      'fields' => array('Answer.*');
      'conditions' => array(
        'Item.id' => array(1995, 1726, 1971, 1707, 1972)
        'Answer.packet_id IN (SELECT id FROM packets WHERE state = "CA")'
      )
    )
  );
);
$items = $this->Item->find('all',$queryOpts);

But what if we don’t know what items we want. Say we’re pulling items (and answers) as part of another containable query. We have no way to specify the items in the Answer model conditionals. But we can improve the query so that we’re not using a subquery each time (i.e. we’re doing it the “CakePHP” way):

$fields = array('Packet.id');
$conditions = array('Packet.packet_type' => 'F');
$packet_list = $this->Item->Packet->find('list',array('fields'=>$fields,'conditions'=>$conditions));

$queryOpts = array(
  'fields' => array('Item.*');
  'conditions' => array('Item.id' => array(1995, 1726, 1971, 1707, 1972));
  'contain' => array(
    'Answer' => array(
      'fields' => array('Answer.*');
      'conditions' => array(
        'Answer.packet_id' => $packet_list
      )
    )
  );
);
$items = $this->Item->find('all',$queryOpts);

Which produces the following two queries:

SELECT `Packet`.`id` FROM `packets` AS `Packet` WHERE `Packet`.`packet_type` = 'F';
SELECT `Answer`.*, `Answer`.`item_id` FROM `answers` AS `Answer` WHERE `Answer`.`packet_id` IN (1278, 1277, 1276, 1274,...,1726, 1971, 1972, 1995);

Again we’re seeing the kind of performance we want, <1 second for both queries. As good as when we knew all our conditions in advance. The main drawback to doing it this way is the size of the query string sent to your database. If your first query is pulling a large number of results then your second query will be quite large. And most databases have limits on how large a query can be.

This isn’t the only way to get to the end point either. As with any programming task there are a variety of ways to arrive at the desired result. For example, you could break up the query into multiple queries that build off of the previous query. Sticking with our items/answers query you could first get the items in one query. Then foreach through the results and run a query to get the answers, integrating the results manually. It’s more work, and likely would decrease performance since you’re increasing the number of database queries, but if you can’t seem to optimize your all-in-one queries it’s worthwhile to try other options to see if you can get better performance.

Aside: performance problems due to poorly optimized queries can be somewhat mitigated by query caching. Even so, you’re better of with an optimized query to begin with since it saves you from a performance hit every time the relevant cached result is flushed.

ASP goes on a diet

I recently modified our custom mass mailer script to accept attachments. After the modifications the new functionality worked nicely, except when it didn’t. Sometimes the script would error out when it accessed the Request object. When this happened I would receive the following message:

Request object error ‘ASP 0104 : 80004005’ Operation not Allowed

Though the error messages produced by ASP tend to be a bit cryptic, Microsoft sometimes does a decent job of documenting them on their web site. A quick search revealed the problem: IIS 5.x and 6 have a default request limit of 200K. That’s sufficient in most situations where you would attach something to an email, but once you start attaching multiple documents there’s a chance you’ll need a higher limit. The fix is easy enough, but does require editing the IIS metabase. Setting the AspMaxRequestEntityAllowed property to an appropriate limit (in bytes, up to 1GB max) addresses the issue. I like to use metaedit to edit the metabase, but you can also use the command-line metabase editing script:

cscript adsutil.vbs set w3svc/ASPMaxRequestEntityAllowed 20971520

20971520 is the limit of the request size in bytes (20M in this case).

References:

WP redirects confuse IE

Some users were having problems on a community site that was implemented using WordPress+BuddyPress. After some testing the issue involved users visiting the site with Internet Explorer (IE). These users were receiving error pages instead of site content. Other web browsers did not have similar issues.

The cause of the problem turns out to be a confluence of issues:

  • The site is hosted on a Windows+IIS server, a rare platform for WordPress (and even more so for BuddyPress) and one that probably doesn’t receive full attention during quality assurance testing.
  • How WordPress performs redirects on IIS is a bit quirky. BuddyPress issues a lot of redirects so this redirect quirk comes into play quite often. The issue is that when WordPress needs to perform a redirect on IIS it returns a “refresh” header pointing to the new page rather than a “location” header.
  • IE’s attempt to make the Internet more friendly; specifically IE’s use of “friendly error pages.” These friendly error pages replace the content delivered by the server (if that content falls below a certain size in KB).

Normally none of these issue are a problem by themselves and a web browser (including IE) will load the page indicated by the redirect. However, all three of the above issues taken together result in a situation where IE never sees the header refresh and so doesn’t redirect the user to the correct location.

The fix is fairly simple: change the headers that WordPress sends to include the standard “location” header. To do this you modify wordpress/wp-includes/pluggable.php@wp_redirect() so that it reads as follows (line 14 is new):

function wp_redirect($location, $status = 302) {
	global $is_IIS;

	$location = apply_filters('wp_redirect', $location, $status);
	$status = apply_filters('wp_redirect_status', $status, $location);

	if ( !$location ) // allows the wp_redirect filter to cancel a redirect
		return false;

	$location = wp_sanitize_redirect($location);

	if ( $is_IIS ) {
		header("Refresh: 0;url=$location");
		header("Location: $location", true, $status);
	} else {
		if ( php_sapi_name() != 'cgi-fcgi' )
			status_header($status); // This causes problems on IIS and some FastCGI setups
		header("Location: $location", true, $status);
	}
}

Integrating calculated fields and model data in CakePHP

(This is mostly a summary of Dealing with calculated fields in CakePHP’s find().)

One of the great things about CakePHP is that if it doesn’t have some core functionality you want/need there are easy ways to add it. More and more I’m taking advantage of this ability. This all comes about because I wanted to have calculated fields available inline with the model data. By calculated fields I mean results that are not data columns (e.g. SELECT *, CURDATE() AS current_date FROM users … yes, that’s a fairly contrived example).

By default Cake places results from calculated fields outside the model data, like this:

Array
(
    [0] => Array
        (
            [User] => Array
                (
                    [id] => 1
                    [username] => aaas
                )

            [0] => Array
                (
                    [current_date] => '2011-05-13'
                )

        )
)

What we want is to place the “current_date” calculated field inside the User model, so it’s more naturally accessed with $users[0]['User']['current_date'] instead of $users[0]['User'][0]['current_date']. Easy enough to do through a model’s afterFind() callback method (to make it widely available place the function in app_model.php).

function afterFind($results, $primary=false) {
	if($primary == true) {
		if(Set::check($results, '0.0') && Set::check($results, "0." . $this->alias)) {
			$fields = array_keys( $results[0][0] );
			foreach($results as $key=>$value) {
				foreach( $fields as $fieldName ) {
					$results[$key][$this->alias][$fieldName] = $value[0][$fieldName];
				}
				unset($results[$key][0]);
			}
		}
	}
	return $results;
}

And yet, even as I add such enhanced functionality to my web apps I’m finding limits. The above logic is complicated a little because you don’t know what type of results you’re getting. The results you get from calling, for example, find('all') versus find('count') are not the same. But Cake doesn’t give any hints to the afterFind() callback, and as a result additional logic is included to try and guess at our data structure. The above adds a quick but incomplete hack by a) checking that the results are for the primary model queried (e.g. not from contained model), b) checking for the presence of nested numerical keys, and c) checking that there is model data to integrate with.

The take away is that while the code produces the desired data structure, in its current form it does so only for specific results.

Align two columns in Excel

I recently had two sets of data, one a full list of records and the other a list of identifiers for the records that needed to be extracted. Extracting the relevant records from the full list would be a fairly easy programming task, but the data was in Excel and I wanted to try and accomplish the task in that environment. Thankfully this problem has already been solved and the answer posted to the web.

If I have one column of identifiers for the records that need to be extracted (column A, 100 records) and one column with the full list of identifiers (column B, 1000 records) the following formula will indicate which identifiers from column B match an identifier from column A:

=IF(ISNA(MATCH(B1,$A$1:$A$100,0)),"","X")

Just place the formula in its own column and copy down for the length of the full data set. The columns do not need to be in any particular order and you can create a separate worksheet that contains your filter list, keeping your data and your filter separate. You can then AutoFilter on your search column to see the results (and copy/paste to another worksheet if necessary).

What file was that in?

One of the more annoying things about coding is finding the right file. Or, even worse, finding a file you didn’t know you needed to look at. Especially when the number of files you’re parsing is in the thousands. If you’re on Windows the built-in search can help, but you never know if all the files in the target directory have been indexed.

Luckily, one of the nice things about coding is that you’re often dealing with plain text files. And you are typically searching for a particular string. As they say, there’s an app for that. Both *nix and Windows are capable of searching through the contents an entire hierarchy of files using command-line programs. Each OS has a variety of commands that can do the job, but I’ll highlight the two I use most often here.

In Windows you would use findstr.exe, and it’s as simple as running the following from the containing directory:

findstr.exe /MIS "searchtext" filetype

On *nix systems the grep command is your friend:

grep -lr "searchtext" filetype

Of course, a quick google search will get you all the help you need in refining your search.

Update 2012-08-22: What file wasn’t that in?

I recently had a need to find files that did not contain a specific string. A bit more difficult of a prospect. I suspect there’s probably a way to do this on the command line, but so far I’ve only come up with a work-around (in Windows). Use findstr.exe in a loop to show how many times each file contains the string, then filter for files with zero instances:

for /R %f in (filetypes) do find /C /I "searchtext" %f >> ..find.txt

(filetypes in this instance can be a space-separated list of possible file types.)

Careful with Google Analytics filters

I made a change to the filters attached to our profiles in Google Analytics (GA) so that we would capture only relevant traffic. Namely, I was attempting to avoid capturing traffic to our development servers. Unfortunately, instead of a domain-based filter I created one that resulted in GA capturing zero traffic. Fail.

The problem stems from my attempt to use a predefined filter to block out the unwanted traffic. I tried to set up a “include only traffic from the domains that include” filter, thinking this was a filter on the server’s domain. What this filter actually checks against is the visitor’s domain. I guess a little more attention to the “from/to” part of the filter would have made this apparent, but Google doesn’t provide much documentation (or even better, examples) on their filters.

After a few days I noticed the sudden lack of data and realized my error. Turns out what I really wanted was a custom filter with the following settings:

  • filter type: include
  • fiter field: hostname
  • filter pattern: regex for the domain (e.g. www.project2061.org
  • case sensitive: no

Even after looking at this again, it’s still not clear to me why these two are different. The descriptive language is almost identical. Google needs to do some work here.

Of course, there are also other solutions to this problem, listed below:

  • Use an include to pull in the GA code and only populate the file on the servers where tracking should occur. The only significant problem here would be forgetting to set up the include.
  • Create advanced segments in GA that filters out traffic to anything other than the production domains. This would have the benefit of being able to track both production and development server use in the same report. But if there are any significant difference between the two sites then this wouldn’t be all that helpful.

References:

AbleCommerce Tax Zone Modification

One of the features of AbleCommerce is the ability to charge taxes based on address. The feature is nice, but has one major flaw … the tax rate has to managed manually. We actually have only a few states where we are required to charge taxes, but one of those is California which is horribly convoluted. The problem boils down to, essentially, the need to track the tax rate for each zip code individually. You can’t even really use blocks of zip codes because a locality’s zip codes may not be consecutive.

AbleCommerce uses the concept of zones for assigning tax rates. For our purposes a zone can be applied based on state or a combination of state and a list of zip codes. Because of California’s tax rules and AbleCommerce’s tax rule management functionality we have to create a separate zone for each tax rate. The zip code list is just a comma-separated string of zip codes. Maintaining the information for the >1000 zip codes is difficult to do in this format.

I created a simple PHP-based page that keeps track of each zip code and its associated tax rate. While not perfect, this goes a long way towards simplifying maintenance of this information. A link to this page can be located in the AbleCommerce administrative section under Configure->Regions->Zones.

AbleCommerce has one other issue that has to be addressed. The default AbleCommerce installation limits a zone’s zip code filter to 255 characters (or about 40 zip codes) in the database and on the web form. Due to the situation with California taxes we have some zones that have significantly more than 40 zip codes and would require the creation of over 70 zones. To get around this problem I removed the web form limit and modified the database field ac_shipZones@PostalCodeFilter to be varchar(8000).

Rewriting URLs on IIS6

One of the nice things about CakePHP is that it attempts to make a site more friendly to the average web site visitor by creating easier-to-remember URLs. There are a few techniques that CakePHP uses, but for the sake of this conversation we’ll focus on one way in particular: apache’s mod_rewrite functionality. Using mod_rewrite, URLs that would normally include the controller file (index.php) and a querystring can be rewritten as a simple file path.

Microsoft doesn’t include mod_rewrite-style functionality in IIS by default, though it has created an extension for IIS7. Since we are currently using IIS6 the extension isn’t an option for us. Fortunately there are a few other options available, including an open-source project called Ionics Isapi Rewrite Filter (IIRF). IIRF is an ISAPI filter that is very similar to mod_rewrite in terms of functionality. After some testing I’ve found this filter works almost perfectly for enabling CakePHP’s friendly URLs.

Getting Started

Installation is painless and requires only a few steps. The following is based on the IIRF 2.0.1.15 release which does not include an installer:

  1. Extract the files from the IIRF archive to a folder on the server.
  2. Open the properties of the IIS root “Web Sites” folder or the specific site that needs IIRF.
  3. In the “ISAPI Filters” tab create a new entry for IIRF; name the new filter (e.g. “IIRF”); specify the IIRF DLL located with the extracted IIRF files at binIIRF.dll.
  4. Ensure that the IIS user has read/execute access to the IIRF DLL and read/write access to any directories that will be used for logging.
  5. Restart IIRF.

IIRF should now be installed, but I’ve found that sometimes the filter won’t appear to be active in the “ISAPI Filters” tab until it has been used the first time. To enable IIRF all you have to do is create a file called IIRF.ini and place it in the web site root folder or in the root of a virtual directory. This file contains any local configuration directives (such as log directory) and URL rewriting rules.

The obligatory gotcha

I mentioned before that the filter works almost perfectly. The only situation where I’ve had problems up to now is if the URL contains one or more space characters. The IIRF log indicates that a URL with a space is being parsed correctly and returning a valid URL. And yet the web server reports a 404 error.

I’ve only done testing when the final URL references a physical file, but based on a conversation in the support forum I suspect this problem also affects parameters in the querystring. I haven’t yet had a chance to fully investigate the issue, so I don’t have a work-around yet when the issue affects physical files. There does, however, appear to be a work-around for spaces in the querystring.

For now I have decided to ensure that any URLs parsed by IIRF that will point to a physical file/directory does not contain spaces.

References

CakePHP usage:

Space-in-URL problems: