PyMML

PyMML is a Python package for statistical analysis and automatic classification of data. It implements a selection of the MML estimators described by Professor Chris Wallace in the book "Statistical and Inductive Inference by Minimum Message Length", including estimators for hidden factor analysis and mixture modelling.

Example:

>>> data = array([[9.42,9.20], [1.47,1.15], [5.45,5.85], [7.09,7.04], [9.15,9.39],
...               [30.26,0.53], [30.84,0.82], [30.21,0.32], [30.53,0.52], [30.78,0.45]])
>>> mml.Mixture_estimate(data, ( mml.Hidden_factor_estimate, 
...   array([[-100.0,100.0],[-100.0,100.0]]), array([0.01,0.01]), array([100.0,100.0]) ))
Mixture_estimate: 151.95 nits
 50% Hidden_factor_estimate: 64.15 nits
                mean=[ 30.52   0.53]
               sigma=[ 0.29  0.18]
          has_factor=False
        factor_loads=[ 0.  0.]
 50% Hidden_factor_estimate: 79.76 nits
                mean=[ 6.52  6.53]
               sigma=[ 0.28  0.29]
          has_factor=True
        factor_loads=[ 3.24  3.34]
       factor_scores=[ 0.84 -1.57 -0.26  0.16  0.83  2.75  2.88  2.71  2.79  2.82]

Documentation:

Download:

PyMML uses the numarray and scipy packages.



A variety of MML software written in C and FORTRAN may also be downloaded from here.

The book "Statistical and Inductive Inference by Minimum Message Length" describes the theory behind MML in detail.


PyMML was written on behalf of the School of Computer Science and Software Engineering at Monash University, under the supervision of Dr. David Albrecht.

home



[æ]