Archive for the ‘copyright’ tag
Creating effective fingerprints from Primitive Groups
I briefly touched on previously the concept of fingerprint registration as a method of verifying object legitimacy before. What I’d like to now go into is the technical side of things, first answering whether it’s possible, and secondly answering how much “tamper-proofing” one of these signatures can withstand before.
This post is aimed at researchers and programmers in the field. It contains lots of unashamedly technical language. You have been warned. Second warning is - we’re only going to cover Primitive Groups (”Objects” in Second Life) as things such as sound and texture fingerprinting have been covered in far more detail by researchers far more knowledgeable than myself.
Firstly: Is it possible?
The short answer here is yes - the long answer is still yes - but the solution isnt very good if a single change is enough to break the entire fingerprint. Most fingerprinting schemes such as MD5/SHA are designed to signify if any slight change has occured, but in our case we dont want to know if a slight change has occured so much as if it’s still similar to the original.
In the cases above, you can make very “short” fingerprints since you have very specific criteria you are matching against for tampering. In our case, if someone resizes the object slightly it shouldnt break the entire scheme.
So, onto some ideas on how to measure similarity between objects - any good fingerprint is going to take into account a number of these measurements and decide on how many are similar. The fingerprints should be easily comparable too - because searching a database of a million such fingerprints should be doable quickly and easily without too much database load.
Volume to container volume ratio
The idea here is to measure the volume of the entire object (that is, the space it would displace if dipped into a bucket of water), compared to the volume of a box big enough to fit it exactly. A square object would leave no water remaining, and hence have a ratio of “1.00″, but a sphere leaves a much more distinct mark.
Objects which are very similar are going to have very similar volume displacement ratios, resizing a single component (or primitive) of a larger set is going to do very little at changing the ratio unless it is a very significant change.
It is worth noting that you need a minimum complexity for this metric to be of much value - very simple objects are likely to generate lots of collisions and false objects (as there is only so many spheres and boxes that can be described), which brings us to point #2.
Caveat: The bounding box needs to be the smallest possible bounding box for any possible rotation of the object to be effective at comparison. Computing the optimal rotation may be expensive (although something that might in theory be doable with a boolean search through rotations)
Simple facts about the group - Minimum Complexity
Things such as the number of primitives, the types of primitives used, etc all form a group of simple facts - unfortunatley these are the most distortable and easily changeable - but again if you change too many you end up with a very different looking object.
It’s important to note however, it’s possible to add a lot of “invisible” primitives onto it to add numbers to this, but not change the object, so it’s key that we use this metric simply one way - the minimum complexity must be close to equal or exceed the original fingerprinted object (give it a say 20% fudge-factor for people who can clean off bits and pieces trying to dodge this metric).
Primitive “Levenshtein” Distance
In computer science, the Levenshtein distance is the number of characters you need to alter, delete or insert between two pieces of text to get the same string. It’s used in spell checkers to try correct common typos (ie it picks the thing closest to what you had).
I think it could have a practical application here too - if we consider two seperate objects as pieces of text, then we calculate the number of primitives that need to be changed, inserted or deleted to match the other object. If we consider each change seperately (size, rotation, shape, etc), an object derived from another object would have a fairly small distance, however this solution does break down when we consider objects with a very small number of primitives to begin with.
Creating a signature with these
It’s best if we consider each of these a seperate signature that is never combined, rather when you compare the signature, you actually compare a set of signatures like the ones above seperately, then you calculate how many of them hit a collision vs how many did not.
The ultimate caveat here is that none of the solutions work very well when the object is not very complex to begin with. I suspect on any object with less than 20 primitives this is not going to work too well (although the effectiveness of the measure will increase dramatically with each additional primitive in the group.)
It is also worthwhile to take watermarks of any associated assets such as textures and materials and handle those seperately as this should try to survive an object being retextured, or in the case that someone rips other peoples textures for an unassociated product.
For computational expense purposes, each signature should produce a number - ideally a nice integer number, a database table can then be indexed by each signature so that you can search for a range within say 10% of each and every index quickly and easily with minimal of lookup expense.
Final notes
The above can be used fairly indiscriminantly as checks that can be done on any client anywhere since the algorithms do not rely on any form of obfuscation. An agency setting up something to mark signatures of popular items would likely want to employ these style signatures, plus a bunch of hidden ones so that an attacker did not know exactly what they were looking for — however any good long-term solution should survive public scrunity of the algorithm as well, it just may be difficult to do so due to the lack of large amounts of data to compare (unlike say sounds of textures).
Practical alternatives to “Copy Protection”
So, in my previous few posts on this topic - I have somewhat neglected covering the practical alternatives. Things that can be made to work, and can be difficult if not impossible to break. I’ve made some mentions before on things that can be done, but I’m going to elaborate on them here.
The Good, the Bad, and the Ugly.
To begin with, we’re going to need to make a divide between ‘good’ and ‘bad’ consumers - good consumers are going to be defined as your standard consumers - the people who like to purchase legitimate content from the legitimate sellers - and like to know that they have bought legitimate content.
The second group are the group who dont really mind if they purchase pirated content (or get it for free), this group is somewhat of a lost cause. They dont tend to buy content today, and they probably wont change that habit in the future.
What you want to target is not minimising the size of the second group (all that will do is waste time and is unlikely to get you any kind of extra revenue), but preventing as many of the first group from slipping into the second group (intentionally or unintentionally)
Signing content
Just like a signed copy of a book is worth more than the plain hardcover, it’s possible to sign a purchase with a “To <buyer>, I <content creator here> can affirm this is a legitimate copy that was sold to you.”, there’s a few ways of doing this, number one:
Verifying purchases via a server
Have a registration server - anyone can see the signature of your item and confirm it against the server to see if the person who has it legitimately bought it. This does have the downside that you need to maintain your server ad-infinium if you want people to be able to verify your content.
Verifying purchases via cryptography
This is a niftier solution, and should work for all time as long as people have a copy of something called your “public key”. This means that when you sell the item to someone, you add a digital signature to the purchase with “XYZ bought this from me.” and then sign that message with something called your “private key”. As long as your public key is public - anyone can use it to verify it was you who really signed it.
Pros of Signing Content
- People can verify that a purchase they made came from the original creator legitimately.
- Other people can verify it too - lowering the social value of possessing fakes.
- Helps build up a brand
Cons of Signing Content
- Relies on people recognising content to be able to say it was a fake of designer X.
- You need to probably rely on a mix of both cryptographic signatures and verification services which will likely involve a cost - for a identity-verified cryptographic keypair (such as the ones Verisign provide), and the cost of hosting the service.
Fingerprinting (”Watermarking”)
It’s possible to take a digital asset, and produce a fingerprint of it - fingerprints, like their physical counterpart are very good signatures of someone, but they arent someone themselves. In digital terms this means producing a smaller version of the asset that is unique to it, and registering it so that if any “clone” shows up, it can be said to be derived from the original asset.
Services exist already for print media which register these fingerprints so that if they are ever used elsewhere, someone can verify who originally made the asset.
Pros of fingerprints
- You can verify a fingerprint with a third party to see the original creator of the item.
- Help when filing copyright infringement notices because you have the registration to act as a “I did this first”.
Cons of fingerprints
- Fingerprints cannot tell if something is or is not legitimate alone.
- Fingerprints can be “smudged” by tampering with the asset, the more “smudge-resistant” you make it, the higher chance false positives can occur.
Make it as easy to buy legitimate content, reward those who do.
This one is more of a business opportunity for some individual or group - but make it possible to buy your content on an amazon/iTunes equivilent which is quick and easy to purchase from, and guaruntees legitimate content.
If your content is a pain to purchase, the chances of someone getting frustrated and either nor purchasing, or getting via less-than-legitimate means increases. Reward the consumers who do purchase legitimate content with updates and other services that people getting the false one wont - as a side bonus this will instill some brand loyalty and likely get them buying more content from you in future.
None of these ideas are mutually exclusive - they work best together.
Fingerprinting is complemented nicely when you have signatures attached - in doing so, you can combine them to say “This is not a legitimate item, the original was created by XYZ who’s signature is missing”. By doing so, you can place social pressure on people to purchase the real thing.
While there will always be a group (mentioned above) who dont care - the majority (the good consumer group) will, and will likely try purchase legitimate whenever possible. If merchants present their digital signatures and a third-party verification as part of the purchase process, then it becomes signficiantly more difficult to buy a fake unintentionally.
One last thing
This list is not a total list - it’s what I thought of in five minutes. There’s plenty of other ideas which can be made to work, a lot of it requires third party verification from reputable services, but thankfully neither of these is a new thing. Digimarc provide watermark/fingerprinting services with registration already today for print/web media, and Verisign provide the cryptographic keys nessecary for signing content. (The algorithms for which are very well documented already having been invented at least thirty years ago)
Copy Protection Nuances.
I had a very interesting discussion with David Levine (SL: Zha Ewry) last night at the Metaverse Meetup, several luminaries were present, including Prokofy Neva, Tish Shute, and others. We had a varied discussion ranging from the possible future of Virtual Worlds to an informative discussion on the feasibility of copy protection in open standards and worlds.
Reuters has some interesting coverage over here, however I do feel the need to make some corrections on a few points made. While Eric has got lot of interesting points covered, some of them are a bit more nuanced than first appear and I’d like to cover a few of them.
In OpenSim, by default, no copy protection will exist at all. “You cannot know what a foreign piece of software will do with a piece of digital content once it receives it,” Levine said. To insert a digital rights management tool into OpenSim is to invite criminal hackers to find ways to circumvent it and undermine the credibility of the software, he argued.
This isn’t quite true - at least some of it anyway. While he’s spot on with David’s comment that you cant tell what a foreign system will do with a piece of data. OpenSim does support permissions by default - the nuance here is that permissions do not equal copy protection. Copy protection (also known as DRM) I’ve covered in more detail previously.
By default OpenSim - right now, supports your standard SL-flavoured permissions as the default permission module, it’s there today - yes you can swap one permission module with one that doesnt respect those, and yes you could remove it entirely.
Unfortunately as I’ve stated before, there’s no rule of computer science that stops someone from modifying something. Good or bad it is always possible, even if you need to go down to the level where you have a soldering iron installing a “mod chip”. With open source software this is admittedly easier - but any professionally schooled programmer will have all the grounding needed to defeat a copy protection system.
This is why both myself and David Levine believe that the solution is to engineer something that involves assisting and speeding up legal systems. Modern societies decide to respect copyright laws, and therefor they built institutions such as courts to handle disputes, however Prokofy does raise the point that lawyers tend to be expensive, and if the only way to sell content is to have a professional lawyer, then we’re back to old media conglomerates.
As I have stated before, I’m not entirely sure this will be the case, there’s a number of reasons for that, first - something being broken is somewhat black and white - if there is any way to get content under terms not licensed to you, then you can do it. It doesnt really matter that suddenly there is an additional method for doing so, because it was already possible.
The presence of the Open Grid Protocols allows one more potential avenue of attack, but to a malicious individual, this is more difficult than just grabbing the asset from the local cache, or using a tool such as GL Intercept, because it requires connecting in additional servers and dealing with a lot more than you absolutely have to.
Returning to my point - I think we will find that actually people want to be legitimate, purchase content from legitimate providers - and hosting companies (who are actually powering the systems running the World) will have big financial incentives to obey the law and not have copyright infringing content on their systems (since it makes them liable, and corporate lawyers really don’t like that.)
The solutions I’ve mentioned before still hold, first - you can keep on keeping on, in all probability sales will increase rather than decrease because you will be dealing with a much much wider audience. Second - hosting providers will want to be allowed to receive content from top creators, and that means signing contracts which indicate they will enforce permission models wanted by creators (and moderated by consumer demands).
I think for us, the developers the key is to make it possible for people to say “Well, I want my content handled in these five ways.” and be able to host a world that interoperates obeying those laws. Likewise we need to make the inverse easy too so that people who want to share content themselves can, and do so easily. This part comes down to tools - which is in the domain of the technical, however if someone violates that contract, then that’s the moment that social systems need to be employed.
Social solutions do not necessarily mean legal systems - it’s possible that it’s as simple as “Well, you violated our contract, therefor we’re never sharing any more content with you”. Legal contracts will likely be the mainstay at the higher levels (as they always are), but there is nothing stopping the establishment of guilds or other groups which represent groups of content creators to enforce en-masse.
Certainly commercial pressures will cause people doing hosting services to enforce these, because if they do not, their customers will be denied access to new content which will hurt business.
It’s also possible for people to consider alternative models of distribution, including the possibility of say subscriptions to content providers, for instance paying a regular fee to be allowed access to the content creators library of content (done either per user, or per region, I can see plenty of use for this).
For those of you interested in hearing more, and exactly what myself and David discussed, a video of the presentation has gone online - you can hear our exact words and all the nuances therein (and unfortunately with a topic this complex, there’s a lot.).
Virtual Worlds: Why DRM cannot protect you [for long].
There’s a very fundamental problem facing many content creators in Virtual Worlds these days (such as Second Life™, IMVU™ and others), and that is the problem of Piracy - where one unscrupulous individual takes content from a designer or developer, and then attempts to resell it as their own.
It’s a problem - no-one can deny that, but the solution to the problem is not ‘deep’ DRM. There are a few reasons for this, especially when it comes to content (scripts and backend programming are another matter entirely and something I will get to in a moment)
Three reasons why this wont work for visual content
First, the obvious one - content must be displayed on the users screen. This means it must be presented to the video card in an unencrypted form. I’ve heard a few silly ideas to prevent this one, such as encrypting the texture and using a shader to decrypt it on the video card (just run the shader in a virtual machine).
At a very fundamental level, the laws of mathematics do not allow you to say “This number cannot be copied.”, computers which are based on very high level mathematics are still subject to these immutable laws. There’s a parallel law here which states that you can always modify something - sure you can make it a house of cards that breaks if you make a change, but someone can always employ superglue to prevent that.
It’s technical, but it’s worth reading the examination of the Skype binary (PDF) done by a security analysis team, the Skype developers know their stuff, exactly how to use cryptography properly, how to try prevent debuggers from being run, etc. Every single one of their protections has been examined and detailed specifically in that document - no matter how clever you think you are, there are cleverer people out there and not all of them have good motivations.
Second reason why this wont work - You hand the legitimate user both the content and the key to decrypt it to display it - there’s no way to avoid this without disallowing the user to view the item (which defeats the purpose of content). There’s nothing stopping them from making a copy of both parts, and once the schema is broken, there’s no going back - it’s out there. You cant revise the encryption scheme after it’s been broken, your content is now available unencrypted.
This has been a big problem with things like DVD encryption, because to release a new encryption scheme you need to get every user to update, and titles released under the old scheme are still broken. DRM used in popular products tends to have a life somewhere between a week and three months - assuming point #1 doesn’t hold, this still means you have to assume all your content more than at most 3 months old is piratable - how many content producers produce enough content every month to make their old lines completely redundant from a sales perspective?
Third reason - DRM tends to annoy customers. Consider the possibility where you want to teleport your avatar around a hypothetical super-grid the size of the internet. You enter a sim which hasnt been authorised (and I’d say in the long term, most will fall into this class - similar to only how a small % of sites have SSL certificates), and bam, your avatar vanishes.
Well, what can you do? Not much - but you arent likely to buy avatars from this user again that’s for certain. There is likely going to be a commercial incentive towards content which after you buy is free to do what you want with. (With copyright law enforcing violators and pirates).
So - how the hell do you protect your revenue/sales in an environment where anything goes?
This question is the real question that should be asked, the answer hasnt yet been determined (market forces will likely be the ones to figure out which models work, and which dont)
- Custom Content - in a world where everything is mass produced and cloned, unique content that has been hand crafted for what you want is a drawcard. It’s unique, it’s yours, it’s $50.00/hour design fees.
- Keep on keeping on - The current model is unlikely to collapse, brands seem to matter and people like being able to say they have legitimate content. Systems will likely appear that allow you to verify whether someone has paid for a piece of content or not. Piracy goes on in virtual worlds today, but sellers seem to keep making sales (I’d like to know more from specific sellers how their sales have gone when a piece of content has been pirated significantly).
- Mark your intent - Tying in with the above point is the idea that you can mark your intent - this is ’shallow’ DRM - it’s nothing that cannot be removed, but it does signify what the creator wanted you to do with this content and has licensed you to do. If someone violates these terms, you can deal with them the same way copyright infringement is handled in the real world, courts. For all the complaints that go on about the DMCA, the act does provide a relatively sane way to deal with IP infringement from a content creator perspective (however beware, filing a false DMCA claim IS perjury).
So what about scripts?
Well, if your script is going to be transmitted from host to host - you have the same problems that commercial web scripts have - and all of the above applies. With sufficient bandwidth and processor time however, it is possible to run scripts on your servers for other peoples (the “hosted” model). OpenSim supports this hosted model via the ScriptEngine that can be run as a grid server - hopefully these kinds of things will become easier to setup and maintain, and perhaps a giant such as Akamai will take to the role for other people.