Notes on the use of Methodology Checklist 5: Diagnostic studies

Section 1 Section 1 identifies the study and makes a series of statements that you can use to assess the internal validity of the study. This is to help you check that the study has been carried out carefully, and that the results reflect the accuracy of the test being evaluated. Each statement covers an aspect that research has shown makes a significant difference to the conclusions of a study. These notes are based on the QADAS tool: Whiting J, Rutjes AW, Dinnes J, Reitsma JB, Bossuyt PM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Tech Assess 2004;8(25).
Statement 1.1 The spectrum of patients is representative of the patients who will receive the test in practice

 

What does this statement mean?

When does this statement apply?

Studies should be scored as:

 

This statement is about spectrum bias.
You should have a clear idea of the population, or spectrum, of patients you would expect to see in practice, taking into account factors such as disease prevalence and severity, age, and gender.
Different demographic and clinical features between populations may lead to considerable differences in measures of diagnostic accuracy. It is difficult to generalise from reported estimates of diagnostic accuracy if the spectrum of tested patients is not similar to the patients on whom the test will be used in practice.
A description of the spectrum of patients should refer to the severity of the target condition, demographic features, and the presence of differential diagnosis and/or comorbidity. Diagnostic test evaluations should include an appropriate spectrum of patients for the test under investigation. Inclusion criteria for patients should be clearly defined.

Always applies.

Well addressed if you believe, based on the information provided by the authors, that the spectrum of patients included in the study was representative of those on whom the test will be used in practice. This judgement should be based on both the method of recruitment and the characteristics of those recruited.
Adequately addressed if it seems likely that the spectrum of patients was representative of those seen in practice but the paper is unclear or lacking some information
Poorly addressed where a group of patients known to have the target disorder are recruited along with a group of healthy controls.

Statement 1.2

Selection criteria are clearly described

 

What does this statement mean? When does this statement apply? Studies should be scored as:

 

Have the authors provided a clear definition of the criteria used to select patients for entry into the study?

Always applies.

Well covered if you think that all relevant information regarding how participants were selected for inclusion in the study has been provided.
Adequately addressed if some information is provided, but not enough to make you confident you understand what the selection criteria were and how they were applied.
Poorly addressed if some information is provided but you are unclear about what the criteria were or how they were applied.
Not addressed or Not reported if there is no discussion of selection criteria, reject the study.

Statement 1.3 The reference standard is likely to classify the condition correctly.
  What does this statement mean? When does this statement apply? Studies should be scored as:

 

The reference standard is the method or test used to determine the presence or absence of the target condition. The choice of reference standard depends on the defined target condition and the purpose of the study.
To assess the diagnostic accuracy of the new or “index test”, results from the index test are compared with results from the reference standard.
If no single reference test is available, then careful clinical follow-up, a consensus between observers, or the results of two or more combined tests may be used to determine the presence or absence of the target condition.
Estimates of the performance of the index test are based on the assumption that the reference standard that is 100% sensitive and specific. If there are any disagreements between the reference standard and the index test then it is assumed that the index test is incorrect.

Always applies. Your key question may specify the use of a particular reference standard. In this case, exclude all studies that do not use your specified reference standard.

Well covered if you believe that the reference standard is likely to classify the target condition correctly.
Adequately addressed if you think the authors have not fully justified their choice of reference standard.
Poorly addressed if you do not think that the reference standard was likely to have classified the target condition correctly.
Not addressed if there is insufficient information to make a judgement.

Statement 1.4 The period between reference standard and index test is short enough to be reasonably sure that the target condition did not change between the two tests.
  What does this statement mean? When does this statement apply? Studies should be scored as:
  This statement is about disease progression bias.
Ideally, results from the index test and the reference standard are collected from the same patients at the same time. Delay between the two measurements could allow either spontaneous recovery or disease progression to occur.
The length of time causing such bias will depend on the condition. A delay of a few days is unlikely to be a problem for chronic conditions. For some diseases a delay between tests may be critical.
This type of bias may occur in chronic conditions in which the reference standard involves clinical follow-up of several years.
Usually applies Well covered. For rapidly developing conditions, delays of hours to a few days are acceptable. For chronic conditions, disease status is less likely to change rapidly and a delay of weeks is acceptable.
Adequately addressed if you think the delay is lengthy, but still acceptable. You should decide when you set your key questions what constitutes an acceptable delay.
Poorly addressed. If you think the period between the performance of the index test and the reference standard was sufficient to allow disease status to change between the performance of the two tests
Not addressed if insufficient information is provided.
Statement 1.5 The whole sample, or a random selection of the sample, was verified using a reference standard of diagnosis.
  What does this statement mean? When does this statement apply? Studies should be scored as:
  This statement is about partial verification bias, also known as work-up bias, (primary) selection bias or sequential ordering bias.
If only some of the study group receive confirmation of the diagnosis by a reference standard, and the results of the index test influence the decision to perform the reference standard, then biased estimates of test performance may arise. True random selection of patients to receive the reference standard will address this problem.
Generally only occurs when patients are tested by the index test before the reference standard. Well addressed if it is clear that all patients who received the index test went on to receive verification of their disease status using the same reference standard.
Adequately addressed if the reference standard was not the same for all patients.
Poorly addressed if not all of the patients who received the index test received verification of their true disease state.
Not applicable if the reference standard was applied first, and you are confident that verification bias could not have occurred.
Statement 1.6 Patients received the same reference standard regardless of the index test result.
  What does this statement mean? When does this statement apply? Studies should be scored as:
  This statement is about differential verification bias.
This occurs when different reference standards are used to verify the index test results. Different reference standards may vary in their definition of the target condition (e.g. histopathology of the appendix and natural history for the detection of appendicitis). It often occurs when patients testing positive on the index test receive a more accurate, often invasive, reference standard than those with negative test results. The correlation between a particular (negative) test result and being verified by a less accurate reference standard will affect measures of test accuracy in a similar way to partial verification, but less seriously.
Generally only occurs when all patients are tested by the index test before the reference standard. Well addressed if it is clear that all patients who received the index test had their disease status verified using the same reference standard.
Adequately addressed if the reference standard was not the same for all patients.
Poorly addressed if some of the patients who received the index test did not have their true disease state verified.
Not applicable in case-control designs where the order of the tests is reversed (ie reference standard first).
Statement 1.7 The reference standard was independent of the index test (ie the index test did not form part of the reference standard).
  What does this statement mean? When does this statement apply? Studies should be scored as:
  This statement is about incorporation bias.
Incorporation bias may occur when the result of the index test is used to establish the final diagnosis. This will probably increase the agreement between index test results and the reference standard, and hence overestimate the measure of diagnostic accuracy.
Note: knowledge of the results of the index test does not automatically mean that the results are incorporated in the reference standard. For example, a study investigating magnetic resonance imaging (MRI) for diagnosing multiple sclerosis could have a reference standard composed of clinical follow-up, cerebrospinal fluid analysis and MRI. In this case the index test forms part of the reference standard. If the same study used a reference standard of clinical follow-up and the results of the MRI were known when the clinical diagnosis was made but were not specifically included as part of the reference, then the index test does not form part of the reference standard.
Only applies when a composite reference standard is used to verify disease status. Poorly addressed if the index test formed part of the reference standard.
Not applicable if it is clear that the index test did not form part of the reference standard.Note: “Poorly addressed” does not refer to whether or not incorporation bias is described or discussed as it may be quite clearly described. “Poorly addressed” refers to the fact that including the index text in the reference standard introduces a potential bias.
Statements 1.8 and 1.9 The execution of the index test was described in sufficient detail to permit replication of the test.
The execution of the reference standard was described in sufficient detail to permit replication of the test.
  What does this statement mean? When does this statement apply? Studies should be scored as:
  A sufficient description of the execution of index test and reference standards is important for two reasons. First, variation in measures of diagnostic accuracy can sometimes be traced back to differences in the execution of index/reference standards. Second, a clear and detailed description (or references) is needed to implement the test in another setting. If tests are executed in different ways then this could affect test performance. The extent to which this would alter results would depend on the type of test. Usually applies. Well addressed if the study reports sufficient details to permit replication of the index test and reference standard.
Adequately addressed if only the bare minimum of information has been provided.
Not reported if detail is insufficient.
Statements 1.10 and 1.11 Index test results were interpreted without knowledge of the results of the reference standard.
Reference standard results were interpreted without knowledge of the results of the index test.
  What does this statement mean? When does this statement apply? Studies should be scored as:
  This statement is about review bias.
Review bias is similar to blinding in intervention studies. Interpretation of the results of the index test may be influenced by knowledge of the results of the reference standard, and vice versa. The effect on results will depend on the degree of subjectivity in the interpretation of the test result. The more subjective the interpretation the more likely that the interpreter can be influenced by the results of the index test in interpreting the reference standard, and vice versa.
If the index test is always performed first then interpretation of the results of the index test will usually be without knowledge of the results of the reference standard. If the reference standard is always performed first then the results of the reference standard will be interpreted without knowledge of the index test. In certain situations the results of both the index test and reference standard are blinded in both directions before being interpreted. Well addressed if the study clearly states that the test results (index or reference standard) were interpreted blind to the results of the other test.
Adequately addressed if you are uncertain of the reliability of the blinding procedure.
Poorly addressed if you regard the blinding procedure as inadequate.
Not applicable where test results are entirely objective or tests were carried out in an independent laboratory.
Statement 1.12 Uninterpretable or intermediate test results are reported
  What does this statement mean? When does this statement apply? Studies should be scored as:
  A diagnostic test can produce an uninterpretable/ indeterminate/intermediate result with varying frequency, depending on the test. Uninterpretable results are often removed from the analysis which may lead to biased assessment of the test characteristics. Any bias will depend on the correlation between uninterpretable test results and true disease status. If uninterpretable results occur randomly then they should not affect test performance. Whatever the cause of uninterpretable results it is important for them to be reported so that their impact on test performance can be determined. Always applies. Well addressed if it is clear that all test results are reported.
Poorly addressed if it is clear that such results occurred, but it is not clear to what extent they have been reported.
Not addressed if there is no mention of whether such results occurred, or how they were handled.
Statement 1.13 An explanation is provided for withdrawals from the study.
  What does this statement mean? When does this statement apply? Studies should be scored as:
  This occurs when patients withdraw from the study before the results of both the index test and reference standard are known. If patients lost to follow-up differ systematically from those who remain, for whatever reason, then estimates of test performance may be biased. Always applies. Well addressed if it is clear what happened to all patients who entered the study (eg a flow diagram of study participants is reported).
Poorly addressed if some of the participants who entered the study did not complete it and are not accounted for.
Not reported if it is not clear whether all patients who entered the study are accounted for.
Statement 1.14 The same clinical data were available when test results were interpreted as would be available when the test is used in practice.
  What does this statement mean? When does this statement apply? Studies should be scored as:
  The availability of clinical data (anything relating to the patient that can be obtained by direct observation) during the interpretation of test results may affect estimates of test performance. Such knowledge can influence the test result if it involves an interpretative component. If clinical data will be available when the test is interpreted in practice then it should be available when the test is evaluated. Does not apply to tests which are fully automated and involve no interpretation, or where the index test is intended to replace other clinical tests. Well addressed if it is clear that the index test was evaluated in circumstances identical to those that apply in routine practice.
Adequately addressed if there is discussion of any differences between the circumstances of test evaluation and routine practice.
Not reported if the circumstances of test evaluation and routine practice are not discussed.
Section 2 Section 2 relates to the overall assessment of the paper.  It rates the methodological quality of the study, based on the responses in section 1, using the following coding system:
  All or most of the criteria have been fulfilled. Where they have not been fulfilled the conclusions of the study or review are thought very unlikely to alter
  Some of the criteria have been fulfilled. Those criteria that have not been fulfilled or not adequately described are thought unlikely to alter the conclusions
  Few or no criteria fulfilled. The conclusions of the study are thought likely or very likely to alter
The code allocated here, coupled with the study type, will decide the level of evidence that this study provides
Section 3 Section 3 asks for any general comments that you might want to incorporate into an evidence table at the next stage of the process

[SIGN 50 Annex C] [Notes on use of checklist]

Scottish Intercollegiate Guidelines Network
SIGN 50: A guideline developer's handbook <Methodology <Home