May 08, 2012
Slow Plone site? Let's use haufe.requestmonitoring
Your Plone site suddently become slow, or maybe sometimes it gets slack. How to investigate it (and prevent future problems)
One of the worst times for a web programmer or administrator comes when he gets an e-mail like this one: "Hi! The site (you are in charge of) is very slow today! I really cannot work!"
But it can get even worst if you read the mail later, when the problem has already disappeared on his own. So you'll never know if it was a problem of huge (unexpected) traffic or if it's something you'll face again somedays.
If this will arise again soon, you will surely get another e-mail from the same guy, less polite... probably.
Let's examine the problem in depth. This is what you have to do for a Plone site.
Always rely on monitor tools! Some example?
This kind of tool is very helpful: it warns you when something is going wrong (high memory consumption? disk space down? CPU at 100%?) and collect statistical data, that can be analyzed later.
Monitor your running Zope instance
If you can reach Plone site ZMI in the very moment it slows down, you can sometimes collect some information from it.
The first (old) product that give us this is DeadlockDebugger (z3c.deadlockdebugger is more recent). This product allows you to access a special URL where you can see in real time what Zope threads are doing. If you think this is cool, you have to know that it is commonly suitable only on development environment.
... so what?
If you can access your ZMI root (one level above your Plone site, reach it as a Manager user in the main acl_user, commonly "admin") you can get some general information on thread status from the "Debug Information" section.
You can always reach the top ZMI level using acquisition, even if Zope main folder is not published. Just type http://youssite.com/Control_Panel/manage_main.
At the bottom level of "Debug Information" view, you can see the connections section.
However, this is not a view for real monitor but when you have no better tool you can find useful information there (sometimes you need to be lucky!).
The tool you really need to have in your Zope installation is ZopeHealthWatcher, an evolution of the deadlockdebugging stuff.
When configured in your installation you can run a command which dumps the threads status: this way, you will see "what Zope is doing now".
Monitoring long running requests
The solutions I just mentioned, however, suffer a big limitation: you must perform analysis during the slow-down time (commonly, while you are at the telephone with the this-time-polite-no-more guy that can't work on the site).
Let's change the subject slightly. You can get some benefit enabling the request response time on your front-end HTTP Web server (for example Apache).
This way you can collect additional HTTP information about how much time is needed to execute each request, then you'll need a tool for parse results (or manually look at them).
Also: this amount of time is not the time that Zope actually needed to perform the request, but the total time of the network behind Apache (net latency, reverse proxy, load balancing software and so on).
Monitoring (only) long running requests
If the monitor of HTTP response time is good, this is the best solution: haufe.requestmonitoring.
So, although the Plone site suddenly goes fast again and you were not able to run your real-time analysis, your Zope instance will contain a set entries like this:
------ 2012-04-22T11:51:52 WARNING RequestMonitor.DumpTrace Long running request Request 1 "/VirtualHostBase/http/yoursite.com:80/siteid/VirtualHostRoot/foo/foo" running in thread 1096747328 since 10.0001170635s Python call stack (innermost first) ...
With thi tool, you have the path request called, the time consumed and a traceback of what Python is doing.
Reading this log is not so simple. All slow-down entries are in the same log of the normal instance activities.
Sometimes if you see that calling an URL takes too much time, it doesn't necessarely mean that that URL is slow: another request probably freezed Zope.
Also: the same request can be found more than once, so you can read that calling "/mycontent/myview" takes "only" 10 second, while later you have the same record from the same request with a bigger time.
Recently we needed to find the bottleneck on a huge installation (with a lot of running instances). The installation of haufe.requestmonitoring was good, but we needed to obtain a report of slow requests.
To get this we spent a little amount of time writing a parser for Zope instance log, where haufe.requestmonitoring is installed: the Zope Long Running Request Analyzer.
It's quick and dirty, but you can use it to find the really-slow request called in your Zope site, and find the bottlenecks.
When you have a real bottleneck or problem in your code, logging what Plone is doing can be your silver bullet.
However, do not discard the simple HTTP response time analysis discussed above: remember that not-too-slow request called hundreds of times can make your Plone site slow.
I hope you found here a good tool for you, so that you don't have to meet the never-polite-again guy on the phone!