You don't need to be a statistics expert to appraise research papers, but a basic understanding of common statistical terms will help in analysing the author's interpretation of a study's findings. We've included some frequently used terms below, using definitions adapted from the NICE Glossary.
Absolute Risk
Absolute risk is the likelihood or probability of a specific event occuring in the group being studied e.g. an adverse reaction to the drug being tested. Absolute risk increase refers to the increase in likelihood of an event occuring as a result of an intervention and absolute risk reduction refers to the reduction in likelihood. These are both sometimes called the 'risk difference'.
Confidence Interval
A confidence interval is the range of results that is likely to include the 'true' value for the population. A wide confidence interval (CI) indicates a lack of certainty about the true effect of the test or treatment - often because a small group of patients has been studied. A narrow CI indicates a more precise estimate (for example, if a large number of patients have been studied).
The CI is usually stated as '95% CI'. This means that the range of values has a 95 in a 100 chance of including the 'true' value. For example, a study may state that 'based on our sample findings, we are 95% certain that the 'true' population blood pressure is not higher than 150 and not lower than 110'. In such a case the 95% CI would be 110 to 150.
Confounding
A confounder is a variable whose presence affects the variables being studied. This means that confounding occurs when an intervention's effect is distorted because of an association between the population or intervention, or outcome and confounder that can influence the outcome independently of the intervention under investigation.
For example, a study of heart disease may look at a group of people who exercise regularly and a group who do not exercise. If the ages of the people in the 2 groups are different, any difference in heart disease rates between the 2 groups could be because of age rather than exercise. So, age is a confounding factor.
Heterogeneity / Homogeneity
Heterogeneity is a term used in meta-analyses and systematic reviews to describe when the results of a treatment (or estimates of its effect) differ significantly in different studies. Such differences may occur as a result of differences in the populations studied, the outcome measures used or because of different definitions of the variables involved. The opposite of this is homogeneity, which indicates that the results of different studies included in the review are similar.
Intention-to-treat analysis
Refers to the assessment of the people taking part in a trial, based on the group they were randomly allocated to and regardless of whether or not they dropped out or fully completed the treatment. Intention-to-treat analyses are used to assess clinical effectiveness because they mirror actual practice, when not everyone adheres to the treatment, or the treatment changes based on how a person responds to it.
Number needed to harm
A measurement of the chance of experiencing a specified harm in a specified time because of the treatment or intervention. Ideally, this number should be as large as possible.
Number needed to treat
Refers to the average number of patients who need to have a treatment or intervention for one of them to get the positive outcome in the time specified. The closer the number needed to treat (NNT) is to 1, the more effective the treatment.
Odds ratio
An odds ratio compares the probability of of something happening in 1 group with the odds of it happening in another. An odds ratio of 1 shows that the odds of the event happening (for example, a person developing a disease or a treatment working) is the same for both groups. If greater than 1, this means that the event is more likely in the first group than the second. If less than 1 means that the event is less likely in the first group than in the second group.
P value
The p value is a statistical measure that indicates whether or not an effect is statistically significant.
When a study that compares two treatments finds that one is more effective than the other, the p value is the probability of obtaining these results by chance.
If the p value is below 0.05 (i.e. there is less than a 5% probability that the results occurred by chance), it is considered that there probably is a real difference between treatments. If the p value is 0.001 or less (less than a 0.1% probability), the result is seen as highly significant.
It is worth noting that a statistically significant difference is not necessarily clinically significant. For example, drug A might relieve pain and stiffness statistically significantly more than drug B. But, if the difference in average time taken is only a few minutes, it may not be clinically significant.
If the p value shows that there is likely to be a difference between treatments, the confidence interval describes how big the difference in effect might be.
Relative risk
Sometimes referred to as 'risk ratio', relative risk refers to the probability of an event occurring in the study group compared with the probability of the same event occurring in the control group, described as a ratio. If both groups face the same level of risk, the relative risk is 1. If the first group had a relative risk of 2, subjects in that group would be twice as likely to have the event happen. A relative risk of less than 1 means the outcome is less likely in the first group.
Statistical power
If a study has been able to demonstrate an association or causal relationship between 2 variables (if an association exists) it means that the study is statistically significant. The statistical power of a study is usually refers to the number of people included in the study; too few and the differences in the outcomes will not be statistically significant.