Tuesday, 1 May 2007

Opening up Second Life content to Web tools and services

I strongly suspect that there is some kind of universal truth that whenever two or more librarians are gathered together they will try and catalogue the world around them.

This was certainly the case with the early Web, leading most notably to developments like the Dublin Core and, in a UK-specific context, activities like the ROADS project, and more recently Intute (previously the Resource Discovery Network).

Whether these kinds of attempts to catalogue the digital world around us are of any lasting value is up for debate of course, particularly in the light of more, err…, mainstream activities like Google, del.icio.us and other social tagging services.

So it was with some trepidation that I read the report of the Second Life Cataloguers & Collection specialists meeting, sent recently to the AllianceSecond mailing list.

Here we go again! What happened in the early days of the Web is going to happen all over again in Second Life. Now, on many levels I hate comparisons being drawn between Second Life and he early Web. As I’ve argued in my recent presentations, there are some very, very fundamental differences between the two that make simplistic comparisons dangerous. But one has to accept there are also some similarities, and this in one of them.

I fully expect someone, somewhere to suggest that we need a SL-DC or an SL-MARC in the near future.

Now, I have to confess that part of me, probably the weaker part, also wants to do this kind of thing. Oh my god… somebody stop me. Puleeze. We don’t need this.

That said, we do need something if we are going to open up virtual worlds like Second Life to competing, alternative, and ultimately better approaches to resource discovery than the built-in search engine.

OK, so what kinds of things need to happen?

Well, firstly, we need to open up the virtual world to the Web much better than it is now.

I’ve argued previously that we need a way of persistently citing (linking to) things in Second Life, independently of their current location.

But we also need to go much further than this.

Recently the Electric Sheep Company announced their beta SL search engine. This uses an in-world robot to collect information about SL objects flagged as being for sale, offering the resulting information thru a Web-based search engine.

Impressive stuff, but I wonder if it's the right approach? Instead, how about if Linden Lab exposed on the Web every single SL object with a name other than 'Object', every avatar, every landmark, every sim, every classified ad, every event, every group - assigning each a persistent and unique ‘http’ URI and making available representations of those things in such a way that a robot traversable, indexable ‘web’ of SL objects would made available? Such a 'web' could be crawled and indexed by any of the existing Web search engines.

How would such a ‘web’ be made? What are the key relationships between the objects and other entities in SL that would allow us to build the links between things? Well, we have some very obvious relationships that we could build on. The relationships between the world as a whole and the sims it contains. The relationships between sims and other sims because of their geographic co-location. The relationships between sims and the objects within those sims. And between objects and other objects, in terms of containment. There are also the relationships between avatars and objects such as ownership and creation. And between avatars and landmarks. I’m sure I could go on.

Exposing all of this information to Google in the form of a robot traversable ‘web’ of HTML pages could be very powerful, though I guess the signal to noise ratio might be questionable? The Electric Sheep engine currently offers a search of about 2 million 'products' (objects marked for sale) - small by Web search engine standards. What would the HTML representations of each SL entity look like? I don’t know. We’d need to think about representing any text associated with the entity (the name and description for example) in the body of the page. But there may be other techniques… I’m not sure.

We could also think about developing a simple tagging mechanism for in-world objects. How about using a construct like:

tags: tag1, tag2, tag3, ...

in the Description field of any object. For the die-hard librarian-types like myself, that would allow us to use conventions like dctagged to add more than simple keywords if we wanted to, e.g.

tags: dctagged, dc:creator=powellandy, dcterms:educationlevel=ukel3

or whatever.

OK, I won’t push it because I'm getting onto dangerous ground in terms of complexity!

But you get the idea. This kind of general approach would expose SL content is a way that could be exploited by existing Web search engines and other tools (and by completely new, and as yet unimagined, tools and services) out there on the mainstream Web. It seems to me that this is a much more open approach than simply trying to improve the in-world search engine per se.

I’d welcome people’s thoughts on this. Particularly those of librarians!


PeteJ said...

I agree with the core point that it would be very useful to have reasonably persistent identifiers for things other than locations in SL (and yes, not just objects, but avatars, events etc).

And I also agree that http URIs would provide a good and useful means of identifying them, and would facilitate making available representations of them: using http URIs would "bring SL space into Web space", if you like.

You touch on the signal-to-noise ratio, and I guess I'm a bit sceptical about that too if every single object was identified and exposed in this way - bearing in mind that every copy of an object will be a new object. (Oh, FRBR for SL, anyone? Only joking... well, maybe...)

But perhaps more significantly, I think there are significant "privacy" issues here: as I understand it at least, the objects in my avatar's inventory aren't accessible/"visible" to others until I place a copy somewhere in the world or wear them, and even then I might choose to restrict access to the parcel(s) of land on which they are placed or choose the time and place where I wear them. By indexing only the objects for sale, the Electric Sheep harvester is effectively limiting itself to those objects which people have decided they do want to be "visible" (because they want to sell them).

So I'd be inclined to say that automatically exposing every object on the Web in the way you suggest would run counter to some of the in-world constraints and controls that are already in use. For such an approach to work, I think it would probably need some sort of opt-in approach ("Check here for this object to be visible on the Web"), but that would need some careful thought about how that works when objects are transferred/purchased. The seller of the virtual sex aid may be quite happy to advertise their ownership of the object to the world on the Web; some of the purchasers may prefer to limit knowledge of their ownership to a more select audience!

Re tagging: I like the idea, but I don't like "overloading" the "description" field for that purpose! ;-)

David Viney said...

Currently developing a Second Life Search Engine at SL Crawler. Will keep you posted on how we get on. D.