Load Testing BOSH on Openfire
Finally I have come up with some results :). During the preparation and the testing process, I have experienced much more difficulties than I expected. Some of these were due to lack of adequate hardware resources, while others were caused by the limitations of the testing environment, The Grinder.
Test Environment
For the testing process, I used two machines in the same LAN, one as the server and the other as the client. Openfire was installed on the server and using the User Creation Plugin, database was populated with 10000 users (user0, user1, ..), where every user had 24 rosters.
The test script acts as a client and each running instance of the script gets a different user id, starting from 0. The test script firstly logs in to the server (initiates a BOSH session, authenticates, binds resource), gets rosters and then starts sending messages and changing presence. It is supposed to send one to one messages every 5 seconds, and change presence every 30 seconds until it receives the stop signal from the console.
The test starts with 10 threads (that means, 10 running instances of the test script, i.e. 10 users) and after every 2 seconds, 10 new threads are injected.
For those interested running the test at home ;), I have prepared a detailed document on setting up the test environment and running the tests in IgniteRealtime.
Results and Conclusions
After analyzing the test results, I obtained the following charts:



| Transacton Name | Tests Passed | Tests w/ Errors | Pass Rate | Mean Response Time | Response time standard dev. | Mean Response Length | Mean Time Resolve Host | Mean Time Establish Connection | Mean Time to First Byte |
|---|---|---|---|---|---|---|---|---|---|
| Initiate a BOSH session | 10 | 0.0 | 1,000 | 577.5 | 45.6 | 661.0 | 60.7 | 69.0 | 507.6 |
| Authenticate | 10 | 0.0 | 1,000 | 216.6 | 71.48 | 108.0 | 0.0 | 0.0 | 194.6 |
| Bind resource | 10 | 0.0 | 1,000 | 187.7 | 42.63 | 208.0 | 0.0 | 0.0 | 185.5 |
| Request a session from the server | 10 | 0.0 | 1,000 | 194.3 | 39.86 | 199.0 | 0.0 | 0.0 | 193.3 |
| Get roster | 10 | 0.0 | 1,000 | 277.7 | 112.97 | 2098.0 | 0.0 | 0.0 | 276.9 |
| Change presence | 206 | 0.0 | 1,000 | 307.32 | 476.13 | 1190.1 | 0.38 | 46.58 | 301.93 |
| Send one to one message | 1000 | 0.0 | 1,000 | 532.63 | 3211.44 | 938.88 | 3.95 | 89.31 | 500.1 |
| Totals | 1256 | 0.0 | 1,000 | 486.05 | 2873.67 | 968.77 | -1.0 | -1.0 | -1.0 |
I have also analyzed the output of Openfire's loadstats plugin and got these charts:

As we can clearly see from these graphs, everything went fine on the server side, the server handled all the load gracefully (max db connections were set to 100). Actually, it was obvious that we can't see a server crash with one load injecting machine.
There seems to be a critical point somewhere around 350th second. This is the point where the client ran out of CPU. After this point, client threads started to lose their connections because they couldn't get CPU along the inactivity period, which was 30 seconds. When a client doesn't make a request to the server during the inactivity period, its session will be killed by the server. Jetty responds with a 404 - Not Found message to clients who have lost their session.