gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

A step toward anonymized medical image pipeline

Meeting Abstract

  • Hamidreza Naderi Boldaji - Institut für Medizinische Informatik Universitätsklinikum Heidelberg, Heidelberg, Germany
  • Fleur Fritz-Kebede - Institut für Medizinische Informatik Universitätsklinikum Heidelberg, Heidelberg, Germany
  • Martin Dugas - Institut für Medizinische Informatik Universitätsklinikum Heidelberg, Heidelberg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 119

doi: 10.3205/23gmds166, urn:nbn:de:0183-23gmds1669

Veröffentlicht: 15. September 2023

© 2023 Naderi Boldaji et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Data anonymization refers to the removal or encoding of identifiers that link individuals to the stored data in order to preserve private or confidential information. This protects the privacy of individuals and corporations while ensuring that the data collected and exchanged is credible. The use of anonymized data allows data to be shared with a secondary audience, such as other institutions, research organizations, or individuals.

The anonymization algorithms can be applied to research documents and datasets, such as trial results. The patient records are often in document form whereas the clinical trial data is usually in quantitative tabular form. Images and videos of patients, however, need to be considered and anonymized separately before they can be used for research and investigation.

Although there are several researches and projects done in medical tabular data anonymization [1] and semi-automatic DICOM image anonymization [2], to the best of our knowledge, there is no work done on the automatic anonymization pipeline of non-tabular medical datasets. In this work, we focus on medical image datasets and introduce a concept towards an automatic DICOM anonymization pipeline for medical datasets such as CT and Ultrasound.

Methods: The proposed method involves three main parts:

  • Pipeline structure: Talend studio software is used to read, call the anonymization methods and transfer images.
  • Anonymization of DICOM headers: In the US for instance, the Health Insurance Portability and Accountability Act (HIPAA), provides the standard for the de-identification of protected health information and lists 18 identifiers that must be treated with special care [3]. The list varies between organizations and countries. In this study, DICOM header anonymization is done by applying algorithms based on any customized input list of headers to delete sensitive personal data.
  • Fade-out information burned on image pixels: some types of images, particularly ultrasound images contain additional, partly sensitive information burned into the pixel data, which must be faded out or blurred.

Results: The proposed method was successfully tested on two publicly available datasets [4], [5]. First, the DICOM headers are anonymized based on an input list of identifiers. Next, all bounding boxes and coordinations of texts in the image are extracted and saved as a CSV file using the text-detection method CRAFT [6]. As a last step, the information inside the extracted bounding boxes faded out in the original image.

Discussion: The introduced method was successfully tested on two publicly available datasets, however, some important points should be considered for the next studies to address real-world problems. The anatomical structures in the image can be used for identification. The manipulation of anatomical connectivity might be considered a possible solution to this issue. Furthermore, since there is no standard for the data burned onto images, a further step should be designed to modify selected bounding boxes and prevent text from being detected incorrectly.

Conclusion: We proposed a pipeline to anonymize medical images focusing on DICOM headers as well as burned data on images, making the data suitable for use by research institutes.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Prasser F, Kohlmayer F. Putting statistical disclosure control into practice: The ARX data anonymization tool. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. 2015. p. 111-48.
2.
DicomCleaner software. PixelMed Publishing. Available from: http://www.dclunie.com/pixelmed/software/webstart/DicomCleanerUsage.html Externer Link
3.
Health Insurance Portability and Accountability Act. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Available from: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#standard Externer Link
4.
Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data Brief. 2020;28:104863.
5.
Albertina B, Watson M, Holback C, Jarosz R, Kirk S, Lee Y, Lemmerman J. Radiology data from the cancer genome atlas lung adenocarcinoma [tcga-luad] collection. The Cancer Imaging Archive. 2016;10:K9.
6.
Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019. p. 9365-9374.