Warning

This is all very old. Save for a cosmetic change in October 2007, it's as it was written in 1994.

Summary of the dissertation

A dissertation submitted in candidature for the degree of Doctor of Philosophy in the University of Cambridge.

Probability theory might seem a queer subject to study in a physics laboratory; some people think it better suited to a statistics department. However this dissertation demonstrates the essential utility of the subject; put simply it is the framework within which we should analyse data. Accordingly it is appropriate that it should be studied be we physicists, engineers or indeed anyone who does experiments, collects data and makes inferences from them.

Despite this, probability theory still lies some distance from the mainstream of data analysis. The first chapter is written to address this problem and takes the form of a tutorial. It contains two novel examples which demonstrate clearly the basics of the subject. I hope that it achieves a wider readership than most PhD dissertations.

The second chapter of this dissertation discussed the role of non-specific models in the form of probabilistic networks. I begin with a handbook of traditional techniques in an attempt to make the work accessible to a lay audience, and extend existing methods of calculating the Hessian to produce a positive definite approximation; this is an important step towards robust network programs.

I then turn to the subject of marginalization and show how to moderate the output of classifiers to take account of their non-linear nature. I improve existing techniques to do this for binary classifiers and extend the idea to handle softmax networks. Here an essentially probabilistic idea leads to better predictions when judged by a traditional technique, the error on an independent test set.

In the third chapter I explore hyper-parameter control for large networks and show how a combination of random sampling techniques and vector algebra obviate the need to compute and store the weights' Hessian matrix. This is a significant advance, as many practical applications of network based models require many more weights than traditional methods allow.

In this dissertation, the aim is always to provide things of practical worth, and so I proceed to applications of our ideas. All the demonstrations in this thesis have been performed using the Backprob program, whose thirty thousand lines of code have taken a significant time to write and debug. By making this available to other researchers, I hope to facilitate more general acceptance of the Bayesian approach.

Chapter four begins by illustrating the use of networks on a toy problem; I then explore networks with large numbers of hidden units. Such networks are conventionally held to be only of limited usefulness as they are believed to have a poor ability to generalize. I show that this is not the case; probability theory suggests both a better error measure than the test error, and gives us insight into the large network limit.

Chapter five introduces a second synthetic example, which allows us to compare the performance of model based and non-specific probabilistic analyses. The example is a simple one, yet it demonstrates that non-specific models are never completely general. This is both good and bad. On the positive side it means that we can interpret some of the model's parameters, while on the negative side it means that we must still be careful when we choose a model for a problem.

The last two chapters return to the opening theme: probability as a tool for solving practical problems. I first address the problem of reading characters taken from vehicle number plates, and show that probabilistic networks are a suitable solution, though as the number of weights is large we need to apply the theory developed in chapter three. A number of different approaches are considered and I show that by paying due attention to data preparation and analysis the reading accuracy may be significantly improved.

Finally I analyse the data collected by the number plate reader and use Bayesian techniques to make inferences about the nation's driving habits along three roads near Cambridge. In particular I find that traveling time, rather than distance, better explains the observed distribution of traffic and furthermore that as vehicles age the journeys they undertake change significantly.

There is one unifying concept throughout this work; regardless of whether we analyse number plates, playing cards, traffic distributions or robot arms, probability theory is the right way to do it. The approach is simple to learn and only ignorance and inertia prevent its more widespread acceptance. This dissertation is my attempt to spread the gospel a little wider.