Data as Justification?

Way back when, I spent a couple of years doing research on intelligent software agents, which included a chunk of time looking at formal agent logics.

One of the mantras of a particular flavour of epistemic logic we relied on comes in the form of the following definition: “knowledge is justified true belief”.

This unpacks as follows: you know something if you believe it AND your belief is true AND your belief is justified. So for example, I toss a coin and you believe it falls heads up. It is heads-up but I haven’t shown you that, so you don’t know it, even though you believe it and that belief is a true belief (the thing you believe – that the coin is heads up – is true). However, if I show you the coin is heads-up, you now know it because your belief is justified as well as being true. (I seem to recall it gets a bit more fiddly as you introduce time into this explicitly too…)

When we start to look at what data can do for us, one of those things is to provide justification for our beliefs. Hans Rosling’s ever amusing ignorance tests demonstrate why we sometimes need our beliefs challenging and his data rich presentations (such as the OU co-produced Don’t Panic shows on BBC2) use data to either confirm our beliefs – reinforcing our knowledge – or show them to be false beliefs (that is, beliefs we have, but that don’t correspond to the state of the world, i.e. beliefs that are untrue).

As well as acting as justification for a belief, data can also create beliefs. But even if the data is true, we still need to take care that any beliefs we generate from the data are justified.

For example, you may or may not find this sentence confusing – more than half of UK wage earners earner less than the average salary. If you think of an average as a mean value, then a quick example easily demonstrates this: four office workers are sat in a bar with “average”-ish incomes, and in walks Mark Zuckerberg. Add the respective incomes together and divide by five. How many people in the bar now have an income higher than that average (mean) value?

However, if you regard an average in terms of the median – mid-point – value, then one person will have the median income and, assuming the original four had slightly different incomes, two will have an income below it, and two will have an income above it.

So when your data point is an average, even if it is correctly calculated (i.e. the data is true), you need to take care what sort of belief you take away from it… Because even if you correctly identify which average is being talked about, you may still come away with a false belief about how the values are distributed. (Not all distributions are, erm, normal…)

And it goes without saying that you also need to be critical of the data itself. Because it may or may not be true…