1 Introduction

Imagine if life was deterministic! The commute to the gym, university or work would always take exactly the same length of time; weather predictions would always be accurate; you would know exactly which lotto numbers would be drawn on the weekend…

However, life is full of unpredictable variation.

Variation may be unpredictable, but patterns still emerge. We may not know what the next toss of a coin will produce… but we see a pattern in the long run: a Head appears about half the time.

Probability is one of the tools used to describe and understand this unpredictability. Distribution theory is about describing the patterns in this unpredictability using probability. Statistics is about data collection and extracting information from data, using probability and distribution theory to make decisions based on the data.

In most fields of study, always being certain is almost impossible, and so probability is necessary:

  • What is the chance that a particular share price will crash next month?
  • What are the odds that a medical patient will suffer from a dangerous side-effect?
  • How likely is it that a dam will overflow next year?
  • What is the chance of finding a rare bird species in a given forest?

To answer these questions, a framework is needed: concepts like probability need defining, and notation and theory are required. These tools are important for modelling real-world phenomena, but also for providing a firm mathematical foundation for the theory of statistics.

Computers and software packages are essential tools in the application of statistics to real problems. In this book, the statistical package R is used, and will be used to illustrate various concepts to help you understand the theory.

One way in which R can be used is to easily compute probabilities for specific distributions. Also, R can be used to verify (not prove) theoretical results obtained. To do this, a technique called computer simulation can be used.

Simulation can also be used to solve problems for which it may be difficult (or impossible) to obtain a theoretical result. Sometimes these numerical solutions to intractable analytical problems is termed Monte Carlo simulation.