Article
Systematically evaluating interfaces for RNA-seq analysis from a life scientist perspective
Search Medline for
Authors
Published: | August 27, 2015 |
---|
Outline
Text
Introduction: Massively parallel RNA-sequencing (RNA-seq) has become the instrument of choice for transcriptome analysis and a cornerstone of modern life science laboratories, but it is generally carried out by bioinformaticians. One of the reasons is the huge amount of data generated. For example, sequencing platforms can generate terabytes of data in a single sequencing run where many datasets are sequenced in the scope of a single project. Furthermore a complete RNA-Seq analysis requires many working steps. To test for differential expression, the sequenced short reads first need to be filtered and mapped to the genome. The next steps involve quantification followed with a test for differential expression. Subsequently, a gene set enrichment analysis can be performed. For every step many tools are available and each one needs input files in a specific format. Therefore results from different programs and with different formats need to be combined. For extracting useful information out of this massive amount of data, substantial computational skills and resources are required.
With sequencing costs being constantly reduced and sequencing speed and efficiency rising exponentially, it might become more important to shift at least some analysis steps from core facilities to life scientists, using standardized tools with easy-to-use interfaces.
Methods: We performed a systematic search and evaluation of such interfaces to investigate to what extent these can indeed facilitate RNA-seq data analysis even for users without extensive computer-science background.
Material and methods: We performed a systematic search in PubMed using the key words “RNA”, “seq”/ “RNA-seq” and “pipeline”/“workflow”/“integrated solution” and a complementary search using Google and Wikipedia. In consultation with biologists and bioinformaticians we defined criteria for a detailed evaluation of more widely used interfaces. Central criteria were ease of configuration, documentation, usability, computational demand, and reporting.
Results: We found a total of 29 open source interfaces, and 6 of the more widely used interfaces were evaluated in detail. The interfaces differ in the number of working steps covered. Only 13 interfaces allow for a complete analysis, while 20 of them integrate the steps mapping and quantification. Other interfaces are focused just on quantification and differential expression analysis. At a first glance, most of the evaluated interfaces make RNA-sequencing analyses easier to perform compared to manual execution of analysis steps, but most of them require a moderate to considerable amount of time to get familiar with, and eventual problems cannot be solved without considerable efforts or IT skills.
Discussion: No interface scored best in all of these criteria, indicating that the final choice will depend on the specific perspective of users and the corresponding weighting of criteria. Considerable technical hurdles had to be overcome in our evaluation. For many users this will diminish potential benefits compared to command line tools, leaving room for future improvement of interfaces.