- Generate your data in any order, one element per line. Suggested format:
<table identifier><key><delimiter><value>
By sticking in a table identifier you can put multiple tables in the one file (or multiple indexes of the one table).
- Sort your data. If it's too large to fit in memory, use this script by Nicholas Lehuen.
- Look up data using the class below.
class Searcher:
def __init__(self, filename):
self.f = open(filename, 'rb')
self.f.seek(0,2)
self.length = self.f.tell()
def find(self, string):
low = 0
high = self.length
while low < high:
mid = (low+high)//2
p = mid
while p > 0:
self.f.seek(p)
if self.f.read(1) == '\n': break
p -= 1
line = self.f.readline()
if line < string:
low = mid+1
else:
high = mid
p = low
while p > 0:
self.f.seek(p)
if self.f.read(1) == '\n': break
p -= 1
result = [ ]
while True:
line = self.f.readline()
if not line or not line.startswith(string): break
if line[-1:] == '\n': line = line[:-1]
result.append(line[len(string):])
return result
This class can be used to quckly find all lines in a sorted file that start with a given string. Pass find a string containing <table identifier><key><delimiter> and it will return all associated values.