Random network search


A flood search of limited depth in a completely randomly connected network is fairly efficient, in that it is unlikely that a query will be relayed to any node that had already seen that query.

In gnutella, a random set of connections might be achieved by a node periodically choosing a peer, querying it for a list of its peers, and replacing that peer with a peer chosen from that list.

The depth of a query would be represented as simply the number n of nodes to be queried. When a node is propagates a query, n-1 is divided amongst the outgoing queries. If there are more than n-1 connections, only n-1 are passed the query.

If a query has already been seen, it is not propagated further.

This all should allow accurate estimation of the thoroughness a query achieves (so long as the size of the network is known).

A couple of further optimizations: Give priority to low-depth queries, in order to encourage low-depth queries. Cache results, and allow queries to specify the freshness of results required. Queries accepting low freshness results would also be given priortity. Cached negative results also record a generalization of the negative, eg a query for "Britney Spears" might result in a cached result of "nothing containing 'Bri'".