Lazyweb: can anyone tell me about symmetric classification

Classification involves dividing a set of items into classes based on their attributes. I've been trying to apply this to modelling autism, without much success. I now suspect that part of the reason for my failure is that the traditional assumptions in classification problems do not often apply in the real world.

Traditional (computer aided) classification (see eg http://www.csse.monash.edu.au/~dld/Snob.html) assumes a large number of items and a small number of attributes. But consider:

• Classification of supermarket customers based on buying habits.
• Classification of bacteria based on the genes they carry.
• Face recognition from digital images.
• Classification of documents by word frequency.

The traditional model of classification breaks down in the following ways:

• There are often as many attributes by which to classify items as there are items.
• The items may be best classified one way for one subset of items, and another way for another set.
• Given the large number of attributes, the data is necessarily sparse.

Furthermore, these problems are often symmetric: The problem is just as meaningful if you call the attributes items and the items attributes.

• Classification of supermarket goods by who buys them.
• Classification of patients by their responses to drugs.
• Classification of users by their movie ratings.
• Classification of genes by which bacteria they are found in.
• Classification of words by the documents they are found in.

I'm interested in classification strategies that treat the problem symmetricaly. Ring any bells for you?

Responses

Lee points me to this Clay Shirky essay, in which Shirky points out the flaws of hierarchical classification and advocates tagging as an alternative.

Jiri pointed me to one of his blog entries which touches on similar concerns, and suggests the keyword "bigraph".

pfh notes that Singular Value Decomposition is suitably symmetric. I know this has been applied to at least the words-and-documents problem.

 [æ]