The other sparrow drops


A while back I was thinking about a medical prediction application. The idea was:

collect data points -> create model -> use model to plausibly fill in gaps in data, and predict future

The data points would be collected from a large number of people, and combined.

The sampling algorithm I came up with to perform the modelling proved to be impractical: it took too long to explore the full possibility space. (Note: in my previos post, I'm essentially proposing that humans do something similar... only they make it work.)

A simpler (obvious, even) alternative has occurred to me. In order to predict the future, find points in stories similar to the current state. Their recorded progressions can be used to give an idea of one's own future (or hypothetical futures). Refuse to predict if there are too few similar stories -- ideally the current state point would be surrounded by past recorded states.

What does similar mean? Need a distance function. Most important decision here is how much weight to give each dimension of the state vector. A possibility would be to pick some key dimension you wish to predict (eg happiness at T plus one month), then choose weights so that this is as well predicted by the best fit to the current data point (and some number of past points) as possible.

This is a straightforward optimization problem. As the parameters to be estimated (dimension weights) will be far out-numbered by the data points, over-fitting should not be a problem.

Better still: optimize not only weights, but also the credence given to each fit in rank order of closeness, so as to minimize probability of error.

Final note: my previous sparrowfall implementation used continuous-valued dimensions. This wasn't ever a very good fit to the data being collected. The new method would work just as well, or better, with discrete data.