Adam Frisby

Creating effective fingerprints from Primitive Groups

with one comment

I briefly touched on previously the concept of fingerprint registration as a method of verifying object legitimacy before. What I’d like to now go into is the technical side of things, first answering whether it’s possible, and secondly answering how much “tamper-proofing” one of these signatures can withstand before.

This post is aimed at researchers and programmers in the field. It contains lots of unashamedly technical language. You have been warned. Second warning is - we’re only going to cover Primitive Groups (”Objects” in Second Life) as things such as sound and texture fingerprinting have been covered in far more detail by researchers far more knowledgeable than myself.

Firstly: Is it possible?

The short answer here is yes - the long answer is still yes - but the solution isnt very good if a single change is enough to break the entire fingerprint. Most fingerprinting schemes such as MD5/SHA are designed to signify if any slight change has occured, but in our case we dont want to know if a slight change has occured so much as if it’s still similar to the original.

In the cases above, you can make very “short” fingerprints since you have very specific criteria you are matching against for tampering. In our case, if someone resizes the object slightly it shouldnt break the entire scheme.

So, onto some ideas on how to measure similarity between objects - any good fingerprint is going to take into account a number of these measurements and decide on how many are similar. The fingerprints should be easily comparable too - because searching a database of a million such fingerprints should be doable quickly and easily without too much database load.

Volume to container volume ratio

The idea here is to measure the volume of the entire object (that is, the space it would displace if dipped into a bucket of water), compared to the volume of a box big enough to fit it exactly. A square object would leave no water remaining, and hence have a ratio of “1.00″, but a sphere leaves a much more distinct mark.

Objects which are very similar are going to have very similar volume displacement ratios, resizing a single component (or primitive) of a larger set is going to do very little at changing the ratio unless it is a very significant change.

It is worth noting that you need a minimum complexity for this metric to be of much value - very simple objects are likely to generate lots of collisions and false objects (as there is only so many spheres and boxes that can be described), which brings us to point #2.

Caveat: The bounding box needs to be the smallest possible bounding box for any possible rotation of the object to be effective at comparison. Computing the optimal rotation may be expensive (although something that might in theory be doable with a boolean search through rotations)

Simple facts about the group - Minimum Complexity

Things such as the number of primitives, the types of primitives used, etc all form a group of simple facts - unfortunatley these are the most distortable and easily changeable - but again if you change too many you end up with a very different looking object.

It’s important to note however, it’s possible to add a lot of “invisible” primitives onto it to add numbers to this, but not change the object, so it’s key that we use this metric simply one way - the minimum complexity must be close to equal or exceed the original fingerprinted object (give it a say 20% fudge-factor for people who can clean off bits and pieces trying to dodge this metric).

Primitive “Levenshtein” Distance

In computer science, the Levenshtein distance is the number of characters you need to alter, delete or insert between two pieces of text to get the same string. It’s used in spell checkers to try correct common typos (ie it picks the thing closest to what you had).

I think it could have a practical application here too - if we consider two seperate objects as pieces of text, then we calculate the number of primitives that need to be changed, inserted or deleted to match the other object. If we consider each change seperately (size, rotation, shape, etc), an object derived from another object would have a fairly small distance, however this solution does break down when we consider objects with a very small number of primitives to begin with.

Creating a signature with these

It’s best if we consider each of these a seperate signature that is never combined, rather when you compare the signature, you actually compare a set of signatures like the ones above seperately, then you calculate how many of them hit a collision vs how many did not.

The ultimate caveat here is that none of the solutions work very well when the object is not very complex to begin with. I suspect on any object with less than 20 primitives this is not going to work too well (although the effectiveness of the measure will increase dramatically with each additional primitive in the group.)

It is also worthwhile to take watermarks of any associated assets such as textures and materials and handle those seperately as this should try to survive an object being retextured, or in the case that someone rips other peoples textures for an unassociated product.

For computational expense purposes, each signature should produce a number - ideally a nice integer number, a database table can then be indexed by each signature so that you can search for a range within say 10% of each and every index quickly and easily with minimal of lookup expense.

Final notes

The above can be used fairly indiscriminantly as checks that can be done on any client anywhere since the algorithms do not rely on any form of obfuscation. An agency setting up something to mark signatures of popular items would likely want to employ these style signatures, plus a bunch of hidden ones so that an attacker did not know exactly what they were looking for — however any good long-term solution should survive public scrunity of the algorithm as well, it just may be difficult to do so due to the lack of large amounts of data to compare (unlike say sounds of textures).

Written by Adam Frisby

July 26th, 2008 at 11:44 pm

One Response to 'Creating effective fingerprints from Primitive Groups'

Subscribe to comments with RSS or TrackBack to 'Creating effective fingerprints from Primitive Groups'.

  1. [...] is original or not - that much can be done with a form of ‘object hashing’ (or fingerprinting), determining whether something is identical to as it was shipped is actually a lot easier than [...]

Leave a Reply