Code-typed databases

A running program consists of code and data. Data type information may either be part of the code or the data. When the type information is part of the data, the program is "dynamically typed". When the type information is part of the code, we generally call it "statically typed". This assumes that the code will not change while the program is running. The assumption breaks down when the data is persistent (for example, when it is stored in a database) so that the code may change yet still use the same data. For this reason, I'll refer to "statically typed" programs with persistent data as "code typed".

Well, except that there aren't really any "code typed" programs. The minute you start using an SQL database, your type information -- your table column types -- is stored in the database. Same goes for object-oriented databases such as ZODB.

C++ is a good example of statically typed language with an expressive type system. Consider:

class My_class {
    vector<My_structure> somethings;
};

My_class everything;

Note here that merely by declaring the data types, and one global variable, the whole structure is created automatically. You don't have to say "everything = My_class(); everything.somethings = []" as you would in a dynamically typed language such as python. You don't have to say "CREATE TABLE somethings ( ... );" as you would in SQL. You don't have to say "if 'somethings' not in root: create_somethings()" as you would with ZODB.

The whole thing pops into existence implicity. Zero install. I like that. I think databases should work like that too.

Note that C++ does have constructors, and they are sometimes necessary, which breaks the paradigm a little. To do away with them as well, I'd like to borrow a little used trick from python. If you declare:

class My_class:
    somethings = ()

"somethings" behaves exactly like it is an instance variable. Assigning a new value to "somethings" in an instance will mask the default value declared in the class. It behaves just as though you did that assignment in the constructor.

Of course, this falls apart when you have mutable data structures. If I had said "somethings = []", then one instance could append an object to the list, and all the other instances would see it. So don't allow mutable data structures, functional-programming style. This is good for other reasons too (see previous blog entries).

Ok, now think about what this means. Type depends on the location in the database (path from the root object), as determined by the code. You can add or remove fields from types, and (with a little care) not have to touch the database itself in doing so. Hell, you can add and remove whole tables.

Python persistence is kind of half baked actually. Most persistence schemes are: you store the name of the class with an object, but you don't store the object's behaviour. Maybe the class you need won't even be there any more when you want to load the instance into memory again. Renaming a class, or moving it to a different file or directory, is impossible. Better hope you structured your code right first time! For a code-typed database, moving stuff around is not a problem.

Or consider SQL persistence. You're putting some of the type information (how objects behave, display themselves on the web, etc) in one place (source code) and other bits of the type information (column types, primary keys, foreign keys, deletion cascade behaviour) in another (SQL declarations). More often than not, you end up declaring the column types twice: once in SQL, and once in the source code. Duplicating information is really bad practice.

With a code-typed database, all your type information is declared once in one place, the code. You're not artificially splitting it into stuff that changes often (mostly object behaviours) and stuff that changes rarely (SQL stuff). Code-typing means persistence does not break the object orientated programming style.