Adam Frisby

Inside the Asset Server

with 6 comments

The single most misunderstood bit of technology in both OpenSim and Second Life is the thing called the asset server; while my understanding of the backend of Second Life’s asset cluster is limited (Squid+Isilon AFAIK) – my understanding of OpenSim’s is fairly comprehensive.

Let’s start with the basics – the asset server is a gigantic document storage server. In our cases, documents are things such as primitives, textures, sounds, avatars – the works.

Each document is referred to by a specific unique filename called a UUID – which is a 128-bit number (think 3 with 38 zeros behind it – big number) – a UUID looks something like this “d61c1990-79b9-11de-8a39-0800200c9a66″ when represented in hexadecimal notation. You have probably seen them around a lot – just using OpenSim or Second Life.

A UUID is an excellent choice for a filename; because you can make them up randomly and statistically guaruntee it hasnt been used before (the chance of a UUID duplicate for a good random UUID is about 1 in a very very very very large number). This means you can have multiple asset servers on the one grid – each upload can be given an ID randomly, without the need for a central authority to give them out.

They also optimize extremely well – each “bit” is a single yes or no question; to find out exactly where an asset is located, you can play a game of 20 questions, but in this case, with 128 bits – you can ask 128 questions. For example; finding which server an asset is located on, in a server farm with 4 machines (labelled 1,2,3 & 4) could be found with the following question chain…

  1. Is the asset on a server labelled closer to 4 than 1? (yes – servers 3,4 still possibilities)
  2. Is the asset on an odd numbered server? (yes – located on server 3)

Asset found on server #3. It’s not very long is it? Even with say 300 servers; you can find the answer within 9 questions. Some further searching is needed to be done to locate the asset on the physical disk itself; but even then you will be asking a very small number of questions – far less than the maximum 128 allowed.

To put this into a diagram, fetching an asset directly by it’s indexed UUID is equivilent to something like this:

Fig 1. How asset UUIDs divide and conquer

Fig 1. How asset UUIDs divide and conquer

Given that computers are capable of asking and answering questions very very quickly (several billion per second); and that none of the above questions require a central server; you can expand your asset server farm in a fairly linear fashion without scaling constraints; providing that you structure your data appropriately.

128 questions also means that you can define very very precise questions; so many that if you created 100 trillion assets every second; the sun would be lifeless before you ran out of address space that can be answered by that many questions. (Although a piece of statistical mathematics called the Birthday Paradox does make the address space significantly less usable the closer you get to that point – but for other reasons)

At this point, we know how assets are stored and retrieved – files are uploaded into the server farm, and given a UUID; that UUID can be used to find out where the asset is located precisely, and return the data later. However, when you access an item off the asset server; there’s only a modest chance you are accessing the asset by it’s UUID directly.

A lot of the time, you will be accessing the asset through a layer of redirection called the Inventory Server – if you have a texture within your inventory; there are actually two components; the first component is the data itself (the asset), and the second component is the inventory shell (uploader, time uploaded, permissions, reference to the asset, etc). When you lose an item of inventory (such as during a transaction); it is often not the asset servers fault, but instead the inventory server.

The inventory server exists so that if you give a copy of a texture to a friend; you and that friend do not make a duplicate copy of the heavier portion (the asset) – both inventory items, yours and your friends will have the same underlying asset ID. This also means that copying an item within your inventory will not prevent it from being saved during a asset glitch (as both point at the same data).

In Second LifeĀ®, the inventory server is using MySQL (and probably InnoDB), and the asset server uses IsilonFS (a custom hardware appliance based system.); in OpenSim this will vary a lot depending on the providers configuration – most grids use a SQL backend for storing both assets and inventory. By default, OpenSim will use a small embedded database for both Inventory and Assets – called SQLite. For grids, we recommend something more robust.

In grid mode, MySQL is still an appropriate solution for Inventory; however asset data will often exceed the normal operating tolerances of MySQL and lead to frequent table corruption, table locking issues (which in turn make performance suck) and other nasties. No large grid should be using MySQL for assets (small home and private grids are however fine.)

Grids such as ReactionGrid which use MS-SQL can get a bit further than MySQL when storing asset data (for inventory they are fundementally the same); MS-SQL like most sane database engines, store BLOB data seperately in a BSP-Tree indexed system; this means they can scale up pretty far (although at about the point of clustering, things degrade.); Postgres can do the same – however as best I am aware, Postgres is not yet properly supported in OpenSim.

For bigger grids like OSGrid (where we have hundreds of gigabytes of assets) a dedicated solution is ideal – unlike Second Life, we cant afford to purchase high end NAS-equipment for our storage solutions; so we have built a distributed system called Fragstore which has two big features.

  1. A configurable backend – it can use distributed storage systems (such as Project Voldemort or Hadoop), or export to a filesystem (which can be a distributed filesystem via projects like KosmosFS.)
  2. A duplicate detection system – so files uploaded twice only get stored once.

The duplicate detection is achieved via a secondary layer of indirection; when we allocate a UUID, we hash the incoming data (256-bit SHA1) and store the UUID and the Hash into a small database table (currently MySQL based); when we recieve future uploads; if the hash matches, the result is only stored once on disk.

The final point I would like to touch on is asset transmission and security – assets are transmitted in both SL and OpenSim over HTTP; the reason this works and is secure is because the UUIDs used in the requests are unguessable. Even by a computer making millions of guesses per second; the chance of hitting a valid address is very very small (same probability as generating a duplicate UUID).

From the asset server to the region server, the request is something in the form of http://assetserver.com/<uuid>/data – which will return the asset in a encoded container (usually packaging a little bit more information about the asset such as the content type, etc.) – using /metadata instead of /data will get you a JSON-encoded package with a bit more information about the asset (creation date, specific asset type, etc.)

From the region servers to the users – this transmission can vary a little. In both SL and OpenSim, the region servers act as proxy caches for the asset server; because assets cannot be updated (and instead are replaced) – this works fairly well. If twenty users in a region see a texture – it only needs to be fetched off the core asset service once; because the region will send it twenty times to the users; rather than twenty users hitting the core server.

In the case of OSGrid, this means we have 2,500+ reverse proxies sitting infront of the asset server (albeit in a somewhat suboptimal layout.); it reduces the bandwidth on the core asset server by approximately 90%+ (you can see our asset stats here); which means we can get away with much lower operating costs (since asset delivery costs are shared with region operators).

4 Vote

Feedback

If you found this post useful and want me to write more on this topic, please use the vote button to the left or leave me a comment below.

Written by Adam Frisby

July 26th, 2009 at 9:43 am

6 Responses to 'Inside the Asset Server'

Subscribe to comments with RSS or TrackBack to 'Inside the Asset Server'.

  1. I can verify that Linden Lab is using InnoDB, not MyISAM (or other). Good choice. :)

    I haven’t looked at the DB schema for OpenSim in a while – while I agree UUIDs are great for unique identifiers, they are several orders of magnitude slower than integers when doing things within a DB implementation, especially when joining tables, etc. Are there plans to keep UUIDs for unique lookup, while using integers for things like joins within a DB instance? Great work, and way to break it down in English, Adam.

    Regards,

    -Tim / Flip

  2. One other question I forgot to ask: any plans to support memcache (or something similar) in the future for frequently accessed assets? While it wouldn’t show a huge performance increase with what OpenSim is being used for today, down the road, it could be quite beneficial.

  3. Whilst I think GUIDs are still a bit shite for UIDs, I can see why you’ve used them here. I’m still not convinced they’re ideal for the backend storage layer however (as FlipperPA mentions).

    Perhaps it would be appropriate to have the backend database node keep an index of it’s internal (highly optimised and indexed) ids (straight INT/BIGINT UNSIGNED) and mapped to the GUIDs that the requesting application would use.

    Again, I’m still interested in helping out with this side of things – but I think with my other commitments (for now) I’d need to be given a specific problem and scope. You have my number Adam :)

    Incidentally, What would happen if a new asset server came online with objects with the same GUID (via chance or even a malicious user)? It’s vital that the platform can handle this gracefully, especially for a place such as OSGrid.

    Will Dowling

    29 Jul 09 at 1:33 am

  4. @Will: Asset ID’s should be being issued by the asset server itself as part of the data upload; which avoids collision; however for compatibility purposes – you can upload an asset with a specific destination ID, however if that ID is already occupied, the upload is ignored.

    UUIDs on the backend are actually fine when you arent using MySQL. MySQL optimises UUIDs very badly; treating them the same as any other text field (which arent good for indexes) – Postgres, MS-SQL and others either have a GUID class which treats it as a 128-bit integer, or handle text indexes better.

    Inserting into MySQL; it’s actually best to convert 128-bit into 4×32-bit integers, then make one ‘unique’ key from the composite of all four fields. That hints MySQL enough to treat it as a 128-bit int and gets you the performance you want; albeit at the cost of not being able to read the UUIDs easily anymore.

    @Flip: We dont use UUIDs for table joins anywhere – OpenSim is fairly naieve and doesnt need them; all the transactions we do are on a key/value store basis mostly. (which works well for scaling into distributed systems)

    Table joins are only used for things like analytics — and we have to be careful with some of those, since it’s very easy to cause a nasty table lock which pauses the whole grid.

    As for memcache – I can tell you OSGrid does this already; not memcache itself – but we do rely on a frequently accessed files cache to boost performance on the top requested 3GB of assets.

    Adam Frisby

    29 Jul 09 at 7:49 am

  5. I’d be interested to read an article detailing the inventory process and explaining the reasoning behind not limiting inventory references on the Second Life grid, if you know those reasons.

    The duplication prevention system is brilliant and brings to mind, though I’m not sure why, the fact that SL notecards are “recreated” meaning, I believe, that they’re assigned a new UUID every time they’re updated. That’s what I’ve been told is the complication in allowing scripts to write to them which is an interesting point you might add to your inventory article when you write it because I told you to so there.

    Khamon Fate

    31 Aug 09 at 3:11 pm

  6. The reason LL dont ‘update’ existing assets is it makes the assets uncacheable – which is a huge performance bottleneck.

    Adam Frisby

    8 Sep 09 at 10:54 am

Leave a Reply

 

You need to log in to vote

The blog owner requires users to be logged in to be able to vote for this post.

Alternatively, if you do not have an account yet you can create one here.

Powered by Vote It Up