Article
The reliability, usability, and applicability of tools to appraise quality and risk of bias in systematic reviews: a prospective evaluation of AMSTAR, AMSTAR 2 and ROBIS
Search Medline for
Authors
Published: | February 12, 2020 |
---|
Outline
Text
Background/research question: Readers of systematic reviews (SRs) and overview authors require valid, reliable, and practical means to evaluate the methodological quality and risk of bias of SRs.
To evaluate and compare the interrater and inter-centre reliability, usability, and applicability of three available tools for SRs: AMSTAR, AMSTAR 2, and ROBIS.
Methods: Using a random sample of 30 SRs of randomized trials, two reviewers at each of three collaborating centres (Canada, Germany, and Portugal) independently applied AMSTAR, AMSTAR 2, and ROBIS and reached consensus. To test for inter-rater reliability between pairs of reviewers and consensus decisions between centres, we used Gwet’s AC1 statistic. To estimate usability, we calculated the median (interquartile range (IQR)) time to complete the appraisal and reach consensus for each tool.
Results: The median (IQR) time for reviewers to complete the assessments was 15.7 (11.3), 19.7 (12.1), and 28.7 (17.4) minutes for AMSTAR, AMSTAR 2, and ROBIS respectively. The time to reach consensus was 2.6 (3.2), 4.6 (5.3), and 10.9 (10.8) minutes for AMSTAR, AMSTAR 2, and ROBIS, respectively. Interrater reliability varied by centre, but across all centres was substantial to almost perfect for 8/11 (73%) AMSTAR, 8/16 (50%) AMSTAR 2, and 12/24 (50%) ROBIS items. Inter-centre reliability was substantial to almost perfect for 6/11 (55%) AMSTAR, 10/16 (63%) AMSTAR 2, and 7/24 (29%) ROBIS items. Agreement on confidence in the results of the review (AMSTAR 2) ranged from slight (AC1 0.05, 95% CI -0.17 to 0.27) to perfect (1.00) between reviewers and moderate (AC1 0.58, 95% CI 0.30 to 0.85) to substantial (AC1 0.74, 95% CI 0.30 to 0.85) across centres. Agreement on overall risk of bias in the SR (ROBIS) ranged from moderate (AC1 0.47, 95% CI 0.17 to 0.77) to almost perfect (AC1 0.96, 95% CI 0.89 to 1.00) between reviewers and from poor (AC1 -0.21, 95% CI -0.55 to 0.13) to moderate (AC1 0.56, 95% CI 0.30 to 0.83) between centres.
Conclusion: Compared to AMSTAR 2 and ROBIS, reviewers completed AMSTAR appraisals the quickest and obtained substantial agreement for a greater number (most) of items. Low levels of inter-centre reliability, particularly on overall AMSTAR 2 and ROBIS ratings, is concerning as it limits readers’ ability to interpret the ratings applied by varied review groups. Improved documentation may be needed to assist reviewers in consistently interpreting and applying each tool’s supporting guidance.