Setting this up in proper Bayesian fashion we have:

- prior beliefs in their gender, say p=0.5 that they are female, and p=0.5 that they are male

- conditional probabilities of the chance they will make a certain choice given that they are female or given that they are male

We then do the usual Bayesian trick of using a prior probabilty distribution and a conditional probability distribution to get a posterior probability distribution: our belief that they are female or male given that they have made a certain choice. As we observe them making several choices, we become more and more certain of their gender.

This seems quite nice! We don't require that males *always* act one way and females *always* act another way. The differences can be quite subtle.

But now suppose we encounter someone who is not entirely playing female or entirely playing male. Their choices are a *blend* of female and male. Say they are 25% female, 75% male in their choices. We would initially be uncertain of their gender, and this would serve us well, but over time the evidence would lean toward them being male. Given enough observations we would become certain that they are male. **And then we would be constantly surprised** as they continue to do some things that males very rarely do.

A better model might be that people have a degree of maleness between zero and one. Our beliefs are now a probability density function on [0,1], and observing a choice that they make changes the shape of this function.

If most people are either mostly playing female or mostly playing male, as is currently the case, then we might reasonably put most of the mass in our prior belief density function near zero and one. But we should make sure that the function is nowhere zero, so that whatever might be going on we will eventually work it out.

This type of error might have some generality. I've seen it come up in phylogenetic tree inference. Due to horizontal gene transfer, inheritance might not be strictly a tree. We may have bootstrapped the data to buggery and think we know exactly what the tree is, and it's not a tree at all. I've also seen it in calling the genomic sequence of strain of bacteria. The population might actually contain a mixture of different sequences. Also, quite similar to gender inference, caste or class inference.

**7/12/2010:** Lars Jermiin pointed out to me that there is a general way to detect this kind of error without needing a more correct model to compare against which bears a passing similarity to the Parametric Bootstrap. Being constantly surprised is the clue. Sample possible models from the posterior distribution (here "person is male", "person is female"), and from these sampled models, sample the same set of observations as those you actually have. For each sampled set of observations calculate a probability of observing these given the prior model distribution, producing a distribution of probabilities. If your actual observations have a prior probability of being observed that is an outlier on this distribution (either too unlikely or too likely), then you should suspect that your model is insufficiently general.

**5/8/2012:** A further problem is that gender is not a single dimension. Regarding it as a spectrum, we will at least not be constantly surprised by a person who is "male" in one aspect but "female" in an orthogonal aspect, but we won't be able to understand them as well as we could with a multi-dimensional model. So really we should unpack gender into a whole lot of different aspects (physical, social, sexual, etc, etc), and while we may have a prior expectation of some of these being correlated, evidence should be able to disabuse us of this.