GMS | 21. Jahrestagung des Deutschen Netzwerks Evidenzbasierte Medizin e. V. | The reliability, usability, and applicability of tools to appraise quality and risk of bias in systematic reviews: a prospective evaluation of AMSTAR, AMSTAR 2 and ROBIS

21. Jahrestagung des Deutschen Netzwerks Evidenzbasierte Medizin e. V.

Deutsches Netzwerk Evidenzbasierte Medizin e. V.

13. - 15.02.2020, Basel, Schweiz

Article

XML version

Send article

The reliability, usability, and applicability of tools to appraise quality and risk of bias in systematic reviews: a prospective evaluation of AMSTAR, AMSTAR 2 and ROBIS

Meeting Abstract

Search Medline for

Michelle Gates - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada
Allison Gates - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada
Barbara Prediger - Universität Witten/Herdecke, Institut für Forschung in der Operativen Medizin, Department für Humanmedizin, Deutschland
Monika Becker - Universität Witten/Herdecke, Institut für Forschung in der Operativen Medizin, Department für Humanmedizin, Deutschland
Gonçalo Duarte - University of Lisbon, Clinical Pharmacology Unit, Instituto de Medicina Molecular, Lisbon, Portugal
Maria Cary - University of Lisbon, Clinical Pharmacology Unit, Instituto de Medicina Molecular, Lisbon, Portugal
Ben Vandermeer - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada
Ricardo Fernandes - University of Lisbon, Clinical Pharmacology Unit, Instituto de Medicina Molecular, Lisbon, Portugal; Santa Maria Hospital, Department of Pediatrics, Portugal
Dawid Pieper - Universität Witten/Herdecke, Institut für Forschung in der Operativen Medizin, Department für Humanmedizin, Deutschland
Lisa Hartling - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada

Nützliche patientenrelevante Forschung. 21. Jahrestagung des Deutschen Netzwerks Evidenzbasierte Medizin. Basel, Schweiz, 13.-15.02.2020. Düsseldorf: German Medical Science GMS Publishing House; 2020. Doc20ebmPP8-03

doi: 10.3205/20ebm106, urn:nbn:de:0183-20ebm1062

Published:	February 12, 2020

© 2020 Gates et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.

Outline

Top
Text

Text

Background/research question: Readers of systematic reviews (SRs) and overview authors require valid, reliable, and practical means to evaluate the methodological quality and risk of bias of SRs.

To evaluate and compare the interrater and inter-centre reliability, usability, and applicability of three available tools for SRs: AMSTAR, AMSTAR 2, and ROBIS.

Methods: Using a random sample of 30 SRs of randomized trials, two reviewers at each of three collaborating centres (Canada, Germany, and Portugal) independently applied AMSTAR, AMSTAR 2, and ROBIS and reached consensus. To test for inter-rater reliability between pairs of reviewers and consensus decisions between centres, we used Gwet’s AC1 statistic. To estimate usability, we calculated the median (interquartile range (IQR)) time to complete the appraisal and reach consensus for each tool.

Results: The median (IQR) time for reviewers to complete the assessments was 15.7 (11.3), 19.7 (12.1), and 28.7 (17.4) minutes for AMSTAR, AMSTAR 2, and ROBIS respectively. The time to reach consensus was 2.6 (3.2), 4.6 (5.3), and 10.9 (10.8) minutes for AMSTAR, AMSTAR 2, and ROBIS, respectively. Interrater reliability varied by centre, but across all centres was substantial to almost perfect for 8/11 (73%) AMSTAR, 8/16 (50%) AMSTAR 2, and 12/24 (50%) ROBIS items. Inter-centre reliability was substantial to almost perfect for 6/11 (55%) AMSTAR, 10/16 (63%) AMSTAR 2, and 7/24 (29%) ROBIS items. Agreement on confidence in the results of the review (AMSTAR 2) ranged from slight (AC1 0.05, 95% CI -0.17 to 0.27) to perfect (1.00) between reviewers and moderate (AC1 0.58, 95% CI 0.30 to 0.85) to substantial (AC1 0.74, 95% CI 0.30 to 0.85) across centres. Agreement on overall risk of bias in the SR (ROBIS) ranged from moderate (AC1 0.47, 95% CI 0.17 to 0.77) to almost perfect (AC1 0.96, 95% CI 0.89 to 1.00) between reviewers and from poor (AC1 -0.21, 95% CI -0.55 to 0.13) to moderate (AC1 0.56, 95% CI 0.30 to 0.83) between centres.

Conclusion: Compared to AMSTAR 2 and ROBIS, reviewers completed AMSTAR appraisals the quickest and obtained substantial agreement for a greater number (most) of items. Low levels of inter-centre reliability, particularly on overall AMSTAR 2 and ROBIS ratings, is concerning as it limits readers’ ability to interpret the ratings applied by varied review groups. Improved documentation may be needed to assist reviewers in consistently interpreting and applying each tool’s supporting guidance.

gms | German Medical Science