statistics often lie

It is pointless attempting a definition of a discipline as broad as statistics. All that such an attempt would achieve would be to attract disagreement. Instead I want to focus on a few of the properties of the discipline which stand in contrast to those of data mining. One such difference is related to the remarks in the last paragraph of the previous section. This is that statistics as a discipline has a certain conservativeness. There is a tendency to avoid the ad hoc, and prefer the rigorous. Of course, this is not of itself bad: only through rigour can mistakes be avoided and truth be unearthed. However, it can be detrimental to discovery if it promotes an overcautious attitude. This conservativeness may derive from the perspective that statistics is a part of mathematics – a perspective with which I do not agree. Although statistics clearly has mathematics at its base (as do physics and engineering, for example, and likewise neither is regarded as a ‘part’ of mathematics), it also has very strong links with each of the disciplines which generate the data to which statistical ideas are applied.

The mathematical background and the emphasis on rigour has encouraged a tendency to require proof that a proposed method will work prior to the use of that method, in contrast to the more experimental attitude which is at home in computer science and machine learning work. This has meant that sometimes researchers in those other disciplines, looking at the same problems as statisticians, have produced methods which apparently work, even if they cannot be (or have not yet been) proven to work. The statistical journals, in general, tend to avoid publishing ad hoc methods in favour of those which have been established, by relatively rigorous mathematics, to work. Data mining, being an offspring of several parents, has inherited the adventurous attitude of its machine learning progenitor. This does not mean that data mining practitioners do not value rigour, but merely implies that they are prepared to forgo it if this can be seen to give results.

Web Information service © 2010 - Registered