dbHT-Trans is an efficient tool for filtering the protein-encoding transcripts as being assembled by RNA-Seq according to search for homologous proteins.
In RNA-Seq studies, the post-assembly quality filtering is necessary before performing downstream analyses. In theory, it is expected that the falsely assembled transcripts of protein-encoding genes will fail to translate into authentic amino acid sequences although they always contain many deduced ORFs long or short. Here, we present an automatic and efficient tool of dbHT-Trans for filtering the protein-encoding transcripts assembled by RNA-Seq. For each candidate transcript, we first deduced all potential open reading frames and translated them into amino acid sequences. By searching against reference protein database, a transcript would be predicted a false one when it has no homologous sequence. Using this method, it is expected to filter out the falsely assembled transcripts of protein-encoding genes. Application of dbHT-Trans to the annotated transcriptome of mouse revealed that the sensitivity was almost 90% for recalling protein-encoding transcripts. After this quality filtering, the numbers of assembled genes became more consistent between Cufflinks and Trinity tools. To significantly decrease the data storage, we transformed all intermediate data into descriptive metadata and stored by MySQL database, which will be utilized by downstream analyses in a real-time style.
Schematic view of dbHT-Trans implementation
The dbHT-Trans is implemented in Python/MySQL with source codes freely available here.
Deng F, Chen SY. dbHT-Trans: An Efficient Tool for Filtering the Protein-Encoding Transcripts Assembled by RNA-Seq According to Search for Homologous Proteins. Journal of Computational Biology. 2016, 23(1):1-9.