OK, so we didn’t quite get to 100 as originally planned – but this time it wasn’t OpenSim’s fault. Yes, by the end you could tell the sim was straining – and at about 65 avatars, the physics engine finally choked on trying to solve a 15 avatar capsule interpenetration (or at least, my interpretation of the bug – analysis pending); but it kept on accepting logins and people kept arriving – and very quickly we hit 70, … 75, … 80 then peaked at 85 before running out of people, slipping back to 79 and manually shutting the sim down to grab the all important debug dump.
It’s important to note here – these were real clients, using SL-derived viewers. By comparison libsl is a lot friendlier on the packet engine than the full viewer, so bots tend to be a less effective test. (Plus users introduce randomness that bots cant quite emulate). Wright Plaza with 85 avatars and their attachments weighs in at a healthy 15,400 prims – so there was no shortage of texture of prim data to be sent to each client – it’s actually probably one of the nastiest sims to do load tests in – which makes it great for this. Furthermore the hardware it is located on isn’t exactly top of the line, or even middle-of-the-line.
The short news is – we’ve made some really impressive progress in the the last week. Earlier we got up to 50 – which was tweaked, tailored and adjusted to get us to where 100 or even 150 isn’t really that out of the question anymore. There’s three big causes for this – first, abandoning OpenJpeg for decoding J2K textures made some very noticable improvements to stability (it’s in progress to abandon it for Encoding too); this means we’re not crashing on the way up – which means we can hit higher concurrencies more reliably. Second – John Hurliman from Intel rewrote our throttle routines and some low-level packeting code, which delivered a big boost to packet performance. Third – multiple efforts to reduce memory use in key places, has at least halved operating memory requirements – at 85 concurrent, memory was peaking at a mere 1.7gb (~20mb/user).
A result of these improvements has been memory IO is no longer such a major bottleneck – we’re actually beggining to hit the point where CPU usage is nearly becoming a more important bottleneck (we were hitting 90% CPU at peak — although the physics interpenetration mentioned above might be distorting this, since it could lead to run-away CPU use) – which is a refreshing change, since it is a lot easier to optimise around, and the tools for CPU use profiling are a lot better than those for memory IO profiling – and produce a lot more meaningful information.
We’d like to continue these load tests – the information the devs have gotten in the last week has been absolutely invaluable. Having a big pool of testers able to jump in on a moments notice has resulted in getting performance fixes tested and integrated a lot faster than usual – it’s also helped stability, each crash has been diagnosed and debugged in series as it is encountered. It’d be very easy to say that performance & stability wise, more has happened in the last week than the last 6 months – and we still need your help to keep going. We’re going to be continuing these load tests next week – there will probably be another major effort at getting 100+ avatars in a sim next Friday (same time, 1PM PST). If you want to know when the next test is planned, and help out – either hang around in #opensim on Freenode, or follow @osgrid or @adamfrisby where I’ll announce them they come.
Next stop, 150.



This is frickin’ awesome. I’ve been watching the devs knock themselves out on crashes and fixes, and people should know just how much hard work has been done.
Apropos of nothing, I just love the view of the mini-map in the picture. Reminds me of something out of a movie, where they show the ground-zero of an epidemic outbreak. Hehe.
Marcus Llewellyn
10 Oct 09 at 12:02 am
Congrats Team OSGrid on stealing the single sim avatar count title! We shall be back! Let’s see if next week we can get 100 I think I could add 25-50 real clients next week over what we added this round to the test. Amazing work by the core devs, Intel, IBM and others making this advancement happen. 150 reliably per sim on Opensimulator & things really start getting interesting.
Fellow Grid RG
Kyle G
10 Oct 09 at 4:38 am
Yeah, next week Intel has asked to run the test on some of their hardware (pending a internal confirmation), which should be very interesting. I’m hoping with a little more publicity and notice, we can try really seriously aim for 150. We only were 15 shy of 100 this week, if we can add another 25-50, we’d be there.
The great thing is – this is a worst-case sim to be profiling, Mono and Wright Plazas are both notorious for bad behaviour. Getting them running – and running well is an achievement.
Adam Frisby
10 Oct 09 at 4:53 am
Interesting, what are you using instead of OpenJPEG for J2K images?
thought
10 Oct 09 at 6:35 am
I see comments that the machine used for this test was not the best or biggest avaialable. We at PMgrid are running on very old kit and would like to compare what we have running to the machine you used for the test to better understand why we have concurrency issues. We are running on SVN 9971 so I am hoping its just that.
Would it be possible to publish the full spec of the machine used for the 85 avatar test along with the broadband speeds that its connected on?
If it is as bad as our kit we maybe can look forward to a brighter future in Pmgrid once we next bring Opensim up to the latest release too.
Thanks for all this sterling work guys!
Bob Wellman
10 Oct 09 at 8:20 am
Thought: CSJ2K – a port of the Jpeg2K Java reference to C# done by the libomv team.
Bob Wellman: It’s a 2007-era Core2Duo w/ 4GB of RAM on a 100mbit colocated pipe.
Adam Frisby
10 Oct 09 at 8:46 am
to be more specific its a 2.2ghz Core2Duo with 4gb Ram running on Fedora 9 32bit PAE it has a 7200 RPM SATAII 250gb Drive, its running inside of Mono 2.4.2.3 its a stock mono install no fancy switches or patches.
Nebadon Izumi
10 Oct 09 at 7:48 pm
Well done!
Keystone Bouchard
11 Oct 09 at 5:09 pm
Great job!Team OSGrid!
My Team also tested that and got some results and share with you and hope you team get more good test result.
But I worry about my test client that maybe not good test because it can not download data from server as real SL viewer for all logins.As you know,now Opensim have neck bottle that network tranfer,but not only CPU/Mem etc.
you can try to login using real SL viewer as you can do and do not forget to delete you cache before every login.I think that you maybe get a new result.
Here is my result:
(1)Conditions
-Upstream revision
-Test servers With UGRM servers and region server
-Sepc. of test server:
Xeon 3040@1.86GHz and 3.00GB RAM
-Test data:Loading about 2000 objects and 12000 prims
-login one user by each 20 seconds
(2)Results:
-Can login more that 61 users(keep walkinging for 15mins)in fact, can login more ones(maybe about 65 users)
-Some data about performance of test server side
users CPU Memory(G) Network(Gbyte)
30 10% 1.904 0.517
61 31% 1.904 1.761
and Memory:(average available)
Network(average Byte Totl/sec)
Cheers,
caocao
12 Oct 09 at 8:19 pm