Troubleshooting SUSE

Troubleshooting a problem that has no immediately discernible cause can be difficult. But I’ve found that asking certain questions, looking at certain log files, and performing other actions will usually reveal enough information to make an informed decision on what steps may help resolve any issues. The following is a list of questions and steps that should help you out.

Can you duplicate the problem?

If you’re not sure where to start, the best thing to do might be to try and reproduce the problem. Not only does this indicate that a problem is general to the server (i.e. not just a user error), but it can also give you more information about a problem than a user will likely be able to provide.

What is the nature of the problem encountered?

Most often the problem will be noticed because one of our web-based application is misbehaving. But beyond this we may be able to figure out what’s wrong by examining how the problem manifests itself. The trick is to follow visible errors back to the source. By looking at how the problem is experienced we can figure out likely sources.

Here’s some guidance in this regard:

  • The browser says the web site can not be accessed. Is the server running? Is Apache running? Is the site’s name resolution set up correctly?
  • The server can be accessed via the web browser, but the incorrect content is shown. Is Apache loading the correct .conf files? Are there any problems with the Apache configuration?
  • The web site comes up, but the page doesn’t display correctly. Is PHP throwing errors? Is MySQL running? Are there any file system errors?

Have you made any changes to the system?

Whereas initially we tried to determine the problem source by how it was experienced, the next step is to try to determine the source based on actions we have taken. Though this question may seem pointless, remember that it can be relatively easy to dismiss changes to a system that seem unrelated to a problem. The full extent a change has had on a system may not be revealed until after a problem is encountered.

Here’s a few questions to help you along:

  • Have you modified any PHP pages?
  • Have you made any changes to the database structure?
  • Have you made any changes to the database permissions (user access modification or password change)?
  • Have you made any changes to the SUSE user permissions (filesystem access modification or password change)?
  • Have you changed any settings related to PHP (php.ini), MySQL (my.cnf), or Apache (.conf files)?
  • Have you added, removed, or modified any software or services?

Check the PHP error log

If the error manifests itself on a web page on our production server you won’t see the errors generated by PHP. Client error output is disabled and all errors are logged to a file.

PHP itself will rarely be the source of a problem since debugging should have occurred on the development server. However, the errors it throws (such as inability to access a file, the fact that a MySQL query did not return a result set, or even that a module has not been loaded) can lead you to the true source of the problem.

tail /var/tmp/phperr.log

Check the MySQL error log

If the problem is with MySQL you should be able to tell in the server’s error log. The most likely error you’ll find in here is indication of table corruption.

sudo tail /inet/db/flora.err

Repair corrupt database tables

Corrupt tables will typically result in the tables being inaccessible. If a table is having problems and you attempt to access it MySQL will let you know. The first step in attempting to fix corrupt tables is to run the online table checker. In all likelihood this will fix the problem tables.

In some instances you may not be able to repair the table. This may indicate that the tables are corrupt because of corruption of the underlaying file system. If this is the case you should first fix the file system, then fix the table.

For more information on fixing corrupted tables see Recovering from MySQL Table Corruption

/usr/local/mysql/bin/mysqlcheck --all-databases --auto-repair --force --password

Run SQL statements directly

We should be performing error handling on as much of the non-logic-based code as possible (connecting to databases, accessing file systems, etc.). Unfortunately, a number of our older scripts do not do this. Sometimes even when we do report or log errors there’s not enough information to tell exactly what’s happening.

If the source of the problem is MySQL, one way of gleaning additional information is to try to run the query interactively and examining the recordset or errors returned by MySQL.

Check the warn log

If the source of the problem is in the operating system then you should be able to find useful information in the warn log. For example, errors related to a problem with file system will show up here.

sudo tail /var/log/warn

Repair filesystem corruption

Our system uses ReiserFS, which is fairly robust in terms of recovery, but it won’t do this automatically upon finding an error. The main reason being that it would be bad to fix a problem with the file system while it is currently in use. While it is possible to run a file system check interactively, it is best (and easiest) to do so as the system boots.

A page outlining more interactive procedures will be provided at a later date.

sudo /sbin/shutdown -rF now

Check the web server logs

While it is rare to find information useful for troubleshooting in the web server logs, you shouldn’t ignore it completely. Particularly helpful is the general error log, which will contain any errors produced as Apache starts or interacts with the system. The site error logs will most likely contain events related to page requests.

tail /inet/www/log/error_log

tail /inet/www/site/logs/eYYYYMM.log

Command Reference

Check for the Apache process

ps -Al | grep httpd

Check the domain names Apache is responding to

/etc/init.d/apachectl -S

Syntax check the Apache .conf files

/etc/init.d/apachectl -t

Check the Apache error logs

tail /inet/www/log/error_log

tail /inet/www/site/logs/eYYYYMM.log

Check domain name resolution

nslookup sitename

Check for the MySQL process

ps -Al | grep mysql

Check the MySQL error log

sudo tail /inet/db/flora.err

Repair corrupt MySQL tables

/usr/local/mysql/bin/mysqlcheck --all-databases --auto-repair --force --password

Check the PHP error log

tail /var/tmp/phperr.log

Check for system errors

sudo tail /var/log/warn

Repair file system errors

sudo /sbin/shutdown -rF now