Press enter or spacebar to select a desired language.
Press enter or spacebar to select a desired language.

Introduction to Critical Appraisal

Critical Appraisal is the process of carefully and systematically examining research to judge its trustworthiness, and its value and relevance in a particular context. It is an essential skill for evidence-based medicine because it allows people to find and use research evidence reliably and efficiently. 

CASP


Stack of research materials, some of which have page markers in, indicating pages to be readDeveloping a research question and locating research papers are the first steps in getting evidence into practice. Understanding the content of those papers and determining the strength of the evidence is the essential next stage of evidence-based medicine, before any findings can inform clinical decision-making.

Whether you are studying or working in healthcare, developing your critical appraisal skills will help you to assess the relevance, validity and reliability of research papers not only for use in academic assignments such as essays, reports and dissertations, but also in journal clubs, service improvements and clinical practice.

In this guide, we'll show you some of the basic principles involved in critical appraisal and introduce you to a range of resources, e-learning and support available from the Knowledge and Library Services at UHCW.

scattered colourful post-it notes with the top notes reading 'validity' in black marker.There are three basic principles to consider when appraising the evidence: 

Validity

Validity refers to the soundness of the study design and research methods. That is, a study can be said to be valid if it has been conducted in such a way that the results are unbiased. 

Internal validity refers to the degree of confidence that any cause-and-effect relationships detected in the results are trustworthy because the study had been designed to eliminate bias wherever possible. The internal validity of a study can be threatened by many different factors - see the Bias tab for examples.

External validity refers to the extent to which results from a study can be applied to other situations, groups or events. In many cases, you might ask the question: Could this paper be applicable to the patients in my care?

Reliability

Where the concept of validity is concerned with the accuracy of the study's design and results, reliability is the measure of how consistently a research method measures something. If the same results can be reproduced consistently under the same conditions, we can say that method is reliable.

Relevance

Assessing the relevance of a research paper refers to whether the study is relevant to your research question. If it isn't, a full appraisal might not be necessary. However, other considerations for relevancy include:

  • Whether the paper addresses a gap in the current research landscape
  • Whether the paper offers real-world implications for clinical practice

There are a range of critical appraisal checklists and tools which offer a series of questions and prompts to help you assess the validity, reliability and revelance of your chosen paper. You'll find links to them later in this guide. However, the broad questions we are attempting to answer with critical appraisal include:

  • Is the study relevant to the research question?
  • Did the study use appropriate methods to address the research question?
  • Are the results of the study valid?
  • Are the valid results of the study generalisable to other populations?

    Bias in research refers to a systematic error that can occur during the design, conduct, or interpretation of a study, leading to inaccurate conclusions.

    CASP 

    scattered wooden bricks, with 4 in the centre that spell the word bias.

    Bias can be introduced at any stage of the research process, from the way the study was designed to the decision to publish the findings. Because there are so many opportunities for bias to affect the reliability and validity of the results, it is important that:

    • Researchers are aware of how bias can impact their work and take steps to avoid or minimise it, as well as transparently report their efforts in their writing.
    • Consumers - or readers - should be familiar with different types of bias, their impact on different study designs and how to find examples in research papers.

    The Centre for Evidence Based Medicine (Oxford) have produced a useful resource to help navigate this problem. The Catalogue of Bias is a comprehensive repository of definitions, examples and preventative measures for researchers and appraisers alike. However, here are some common examples that critical appraisal tools encourage you to interrogate when appraising a paper:

    Selection bias can occur when participants selected for a study show a systematically different characteristic to the population the study is intending to investigate. This can occur when participants are recruited to the study, or when they are allocated to a treatment or control group. This is known as allocation bias.

    • How to detect selection bias: check the paper for information on how participants were recruited, screened and included for the trial. Ask yourself if the authors give details on how participants were randomised, and if they provide any baseline comparisons of the groups.

    Observer bias can occur when researchers allow their expectations or perspectives to influence how they perceive and record the data generated during a study. This is more likely to occur when observers are aware of the research aims and hypotheses. Therefore, blinding (or masking) of observers can help reduce this risk.

    • How to detect selection bias: check the paper for information on how participants and/or researchers were blinded (if appropriate for the study design). Do the authors offer any acknowledgement of the potential impact of their biases?

    Information bias can occur when information is not accurately recorded during the study. This could be due to observer bias - a subjective interpretation of the data. However, measurement bias can also occur if there are mistakes or inconsistencies in collecting data, or the wrong measurement tools are used. Similarly, mistakes can also be make if data collection relies on patient or observer recall.

    • How to detect information bias: assess whether the authors have used a double-blind approach. if it's not possible, have they developed a protocol for the data collection and reported it? Are the questionnaires or instruments used appropriate for the intervention?

    Attrition bias can occur when the participants that drop out of a study systematically differ to those who complete it. Some loss of participants is normal: people may no longer have the motivation or time to participate; others may experience unwanted side-effects or anticipated different outcomes. Attrition bias is a particular threat to experimental study designs with a treatment and a control group, as the unequal loss from one group to another can distort the results. Unreliable results can lead to incorrect conclusions that an intervention is effective.

    • How to detect attrition bias: check the paper for an analysis of the participant data, usually including baseline demographics. Information on the number of participants lost and their reasons for leaving the study should be clearly reported.  

    Reporting bias can occur when researchers selectively report - or deliberately omit - relevant data from their study. This might involve a failure to declare conflicts of interest, attempts to change study outcomes or under/over reporting benefits and harms (this can be linked to publication bias).

    • Detecting reporting bias is difficult. It's hard to know when data is missing or being selectively reported. However, CEBM suggest one method: obtaining the protocol from a trial registry (via databases such as clinicaltrials.gov or the WHO clinical trials database) and compare the intended outcomes to those published in the final paper.

    Publication bias is similar to reporting bias, but refers to the decision not to publish a study's findings because of negative or inconclusive findings. When data is missing from the wider body of evidence on a particular topic it means clinicians have an inaccurate picture of the efficacy of that intervention. This Ted Talk from Ben Goldacre provides an interesting overview of this pervasive problem in healthcare research. If only positive findings make it to publication, can we rely on the conclusions being drawn from the research?

    • Combating publication bias is difficult. There are trial registries and reporting standards that aim to combat it, but this relies on researchers complying with them. Some journals, like Trials have made publishing 'null' results part of their publishing ethos. Information Professionals and Librarians advise Systematic Reviewers to look beyond published journal articles and also interrogate data in trials registries in their work.

    Stethoscope lying near a stack of textbooks with pages marked by colourful tabs.Before you choose a tool to appraise your papers, you will need to know what type of research you have in front of you. We've outlined below some of the most common study designs, using definitions adapted from the e-LfH's Critically Appraising the Evidence Base programme.

    Case-Control Studies
    A case-control study is an analytical observational research design which aims to determine if there's a relationship between an outcome (e.g. lung cancer) and a past exposure (e.g. smoking). One group of patients - cases - are selected based on whether they have the outcome of interest. Another group of patients - controls - are selected based on having similar characteristics, but without the outcome of interest. The main difference between this and cohort studies is that with case-control studies, researchers are starting from a known outcome and retrospectively analysing those individuals who already have it.

    Case Report / Case-Series
    A case report is an uncontrolled descriptive study where the symptoms, diagnosis and management of a single patient are described in detail. Reports of several patients with a given condition, usually covering the course of the condition and the response to treatment are called a Case Series. There is no comparison (control) group of patients.

    Cohort Studies
    A cohort study is an analytical observational research design which follows patients over a period of time to see if there is an association between an exposure of interest (e.g. smoking) and an outcome (e.g. lung cancer). They may also follow a comparison group to compare rates of the outcome of interest. Generally cohort studies will look forward in time, but some studies can be retrospective if they are examining data where the exposure and disease have already occurred.

    Cross-sectional Studies
    Cross-sectional studies are an observational research design that provide a 'snapshot' observation of a set of people at a specific time, by measuring incidences of exposures and outcomes in a given population. Qualitative cross-sectional studies might capture data about a defined population's experience: e.g. staff surveys or the census.

    Qualitative Studies
    Qualitative research is a descriptive design that explores people's beliefs, experiences and attitudes, usually about a particular phenomenon. It asks questions about how and why and generates non-numerical data, such as a person's description of their pain rather than a measure of pain. Qualitative research techniques include focus groups and in-depth interviews and the data that they yield is transcribed and coded, with codes being linked together to identify themes.

    Randomised Controlled Trials
    A randomised controlled trial (RCT) is an experimental primary research design that aims to assess the effectiveness of an intervention compared to a control. A sample population is taken from the target population before randomly being allocated to two groups. One will receive the intervention (e.g. a drug, surgical procedure, or service) and the other receives the control (e.g. current treatment, placebo or no intervention). These groups are followed over a period of time and researchers will collect data on any outcomes and compare them. If bias has been minimised / avoided and there is a difference in outcomes between the groups, this can then be attributed to the intervention.

    Systematic Reviews
    A systematic review is a secondary research design that aims to locate, appraise and synthesise all the evidence on a particular topic. They feature a clearly defined review question and must establish specific eligibility criteria in order for studies to be included. Comprehensive searches of databases, trial registries, conference papers and grey literature are carried out to identify all of the evidence, after which the studies are screened by at least 2 independent reviewers to see if they meet the inclusion criteria. Data extraction, quality assessment (e.g. risk of bias) and synthesis of the findings follow in order to produce the review.

    You don't need to be a statistics expert to appraise research papers, but a basic understanding of common statistical terms will help in analysing the author's interpretation of a study's findings. We've included some frequently used terms below, using definitions adapted from the NICE Glossary.

    Absolute Risk
    Absolute risk is the likelihood or probability of a specific event occuring in the group being studied e.g. an adverse reaction to the drug being tested. Absolute risk increase refers to the increase in likelihood of an event occuring as a result of an intervention and absolute risk reduction refers to the reduction in likelihood. These are both sometimes called the 'risk difference'.

    Confidence Interval
    A confidence interval is the range of results that is likely to include the 'true' value for the population. A wide confidence interval (CI) indicates a lack of certainty about the true effect of the test or treatment - often because a small group of patients has been studied. A narrow CI indicates a more precise estimate (for example, if a large number of patients have been studied).

    The CI is usually stated as '95% CI'. This means that the range of values has a 95 in a 100 chance of including the 'true' value. For example, a study may state that 'based on our sample findings, we are 95% certain that the 'true' population blood pressure is not higher than 150 and not lower than 110'. In such a case the 95% CI would be 110 to 150.

    Confounding
    A confounder is a variable whose presence affects the variables being studied. This means that confounding occurs when an intervention's effect is distorted because of an association between the population or intervention, or outcome and confounder that can influence the outcome independently of the intervention under investigation.

    For example, a study of heart disease may look at a group of people who exercise regularly and a group who do not exercise. If the ages of the people in the 2 groups are different, any difference in heart disease rates between the 2 groups could be because of age rather than exercise. So, age is a confounding factor.

    Heterogeneity / Homogeneity
    Heterogeneity is a term used in meta-analyses and systematic reviews to describe when the results of a treatment (or estimates of its effect) differ significantly in different studies. Such differences may occur as a result of differences in the populations studied, the outcome measures used or because of different definitions of the variables involved. The opposite of this is homogeneity, which indicates that the results of different studies included in the review are similar.

    Intention-to-treat analysis
    Refers to the assessment of the people taking part in a trial, based on the group they were randomly allocated to and regardless of whether or not they dropped out or fully completed the treatment. Intention-to-treat analyses are used to assess clinical effectiveness because they mirror actual practice, when not everyone adheres to the treatment, or the treatment changes based on how a person responds to it.

    Number needed to harm
    A measurement of the chance of experiencing a specified harm in a specified time because of the treatment or intervention. Ideally, this number should be as large as possible. 

    Number needed to treat
    Refers to the average number of patients who need to have a treatment or intervention for one of them to get the positive outcome in the time specified. The closer the number needed to treat (NNT) is to 1, the more effective the treatment.

    Odds ratio
    An odds ratio compares the probability of of something happening in 1 group with the odds of it happening in another. An odds ratio of 1 shows that the odds of the event happening (for example, a person developing a disease or a treatment working) is the same for both groups. If greater than 1, this means that the event is more likely in the first group than the second. If less than 1 means that the event is less likely in the first group than in the second group.

    P value
    The p value is a statistical measure that indicates whether or not an effect is statistically significant.
    When a study that compares two treatments finds that one is more effective than the other, the p value is the probability of obtaining these results by chance.

    If the p value is below 0.05 (i.e. there is less than a 5% probability that the results occurred by chance), it is considered that there probably is a real difference between treatments. If the p value is 0.001 or less (less than a 0.1% probability), the result is seen as highly significant. 

    It is worth noting that a statistically significant difference is not necessarily clinically significant. For example, drug A might relieve pain and stiffness statistically significantly more than drug B. But, if the difference in average time taken is only a few minutes, it may not be clinically significant. 

    If the p value shows that there is likely to be a difference between treatments, the confidence interval describes how big the difference in effect might be.

    Relative risk
    Sometimes referred to as 'risk ratio', relative risk refers to the probability of an event occurring in the study group compared with the probability of the same event occurring in the control group, described as a ratio. If both groups face the same level of risk, the relative risk is 1. If the first group had a relative risk of 2, subjects in that group would be twice as likely to have the event happen. A relative risk of less than 1 means the outcome is less likely in the first group.

    Statistical power
    If a study has been able to demonstrate an association or causal relationship between 2 variables (if an association exists) it means that the study is statistically significant. The statistical power of a study is usually refers to the number of people included in the study; too few and the differences in the outcomes will not be statistically significant.

    White checklist on blue background. Large red tick in first black outlined checkbox.There are a wide variety of different checklists available, each written for different types of study design and with varying levels of complexity. There is no "gold standard" set of checklists to use: try exploring different providers and choose checklists that suit you, the way you work and the nature of your research. Make sure you stick to your chosen checklist for every article you review for your work and avoid switching between different checklists to ensure consistency.

    AMSTAR
    AMSTAR (A MeaSurement Tool to Assess systematic Reviews) can be used to assess the methodological quality of a systematic review as well as a guide for carrying our a systematic review. When all items on the checklist have been addressed, AMSTAR considers a review "well done".

    CASP
    This set of eleven critical appraisal tools are designed to be used when reading research, these include tools for Systematic Reviews, Randomised Controlled Trials, Cohort Studies, Case Control Studies, Economic Evaluations, Diagnostic Studies, Qualitative studies and Clinical Prediction Rule.

    Centre for Evidence Based Medicine (CEBM)
    Produced by the CEBM at Oxford, this series of critical appraisal worksheets covers a range of different types of research studies and includes versions in multiple languages.

    Joanna Briggs Institute
    The Joanna Briggs Institute publish a wide range of checklists that cover the major study design types as well other sources including expert opinion, policy and narrative evidence.

    SIGN (Scottish Intecollegiate Guidelines Network)
    SIGN have published 6 checklists covering major study designs and have been evaluated to provide a balance between methodological rigour and practicality of use.

    Mixed Methods Appraisal Tool (MMAT)
    The MMAT is intended to be used as a checklist for concomitantly appraising and/or describing studies included in systematic mixed studies reviews (reviews including original qualitative, quantitative and mixed methods studies).

    AACODS grey literature checklist
    The AACODS checklist is a tool for evaluating the Authority, Accuracy, Coverage, Objective, Date and Significance of grey literature.

    Critically Appraising for Antiracism

    Racial bias in research impacts a study’s relevancy, validity and reliability, though presently this aspect is not addressed in critical appraisal tools, and consequently appraisers may often not take racial bias into account when assessing a paper’s quality.

    Critically Appraising for Antiracism

    Visit criticallyappraisingantiracism.org for a supplementary tool (and other useful information and resources) to support you in addressing racial bias in your own appraisal of the literature. 

    Knowledge Skills Training from UHCW KLS

    Our free Knowledge Skills Training is available to all UHCW staff and students on placement. As well as sessions on searching for evidence, we can also help you understand it. We currently have two sessions available on critical appraisal, an introduction based on this information in this guide, and a practical session:

    Introduction to Critical Appraisal
    If you are new to critical appraisal, this session will introduce you to its basic principles and the anatomy of a research paper. We’ll also examine different types of study design and explore the tools available to support you in analysing the validity and reliability of healthcare research.
    Length: 60 mins
    Suitable for: Anyone new to critical appraisal or in need of a refresher.
    Book this session by e-mailing: Beth Jackson (Knowledge Skills Librarian) via Beth.Jackson@uhcw.nhs.uk

    Critical Appraisal in Practice
    Meet with a member of Library staff to critically appraise a research paper of your choice using the appropriate CASP checklist. If you don’t have a paper in mind, we’re happy to share examples of our own with you.
    Length: 60-90 mins
    Suitable for: All levels – although complete beginners may wish to attend the session above in advance.
    Book this session by e-mailing: Beth Jackson (Knowledge Skills Librarian) via Beth.Jackson@uhcw.nhs.uk
     

    leon-Oalh2MojUuk-unsplash

    e-Learning Opportunities

    Critically Appraising the Evidence Base

    This programme aims to support NHS staff with understanding the different methods and tools to carry out critical appraisal of research, through 8 bite-sized modules. Developed by the NHS Knowledge for Healthcare Learning Academy in partnership with recognised subject matter experts. It is suitable for healthcare staff at any level, whether new to the topic or just in need of a refresher.

    Critical Appraisal Techniques for Healthcare Literature

    This online course was developed by City St George's, University of London and focuses on the principles of critical appraisal, and techniques that can be used when appraising healthcare papers. Hosted on the FutureLearn platform which also runs other online courses in healthcare from a range of providers.

    Finding and Appraising the Evidence

    This course has been written for those who wish to improve their skills in critical appraisal and does not assume prior knowledge or training in this area. The course has been structured in bite-sized chunks to allow users to learn at their own pace.

    Cochrane Evidence Essentials
    A free online resource for both healthcare staff and patients; four modules of 30–45 minutes provide an introduction to evidence based medicine, clinical trials and Cochrane Evidence.

    Online Resources

    The BMJ - How to Read a Paper
    A reputable series of articles on reading an interpreting different kinds of scientific papers, covering common study designs and statistics.

    Catalogue of Bias
    Collaborative project featuring a database of all the different biases that effect healthcare research.

    JAMA Evidence
    Collection of resources and publications to support decision-makers to assess the validity, importance, and applicability of healthcare research. The JAMA Network also produce a User's Guide to the Medical Literature, to help readers understand and interpret clinical research.

    NICE Glossary and CASP Glossary
    Two helpful compilations of definitions related to critical appraisal and evidence-based medicine.

    Understanding Health Research
    This tool will guide you through a series of questions to help you to review and interpret a published health research paper.

    Study Designs - Deakin University Library
    A series of useful LibGuides on different types of study designs. Also includes specific guides on:
    Quantitative Study Designs and Qualitative Study Designs

    Students 4 Best Evidence
    Network of students across the world interested in evidence-based healthcare. The website features useful blogs, reviews and resources covering EBM topics, including critical appraisal and study designs.

    Image

    Video Tutorials

    HRP Statistics Portal
    YouTube playlist of statistics and research topics, produced by the World Health Organization (WHO) Human Reproduction Programme. An online portal is also available, but users will need to request access. Visit the WHO HRP page here.

    Critical Appraisal Modules 2019
    This YouTube playlist includes seven modules that address critical appraisal concepts and methods for six different research designs. Published by Cochrane Mental Health.

    Critical Appraisal of an RCT using CASP
    A series of bite-sized videos taking the viewer through the process of appraising a randomised controlled trial using the CASP checklist, question by question. Produced by the team at Barts Health Knowledge and Library Services.

    Stack of books and stationery, indicating someone studying.

    Related sources

    PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis)

    The PRISMA documents are designed to help authors transparently report why their systematic review was done, what methods they used, and what they found. The flow diagram helps authors describe the different phases of a literature review, including the numbers of records identified, screened and reasons for including and exclusing from the final paper. 

    The Equator Network

    The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network seeks to improve the reliability of health research by promoting accurate reporting and the use of reporting guidelines. This resource contains a comprehensive database of reporting guidelines that covers the main study types.
     

    Books