TAfinder: a web-based tool to identify Type II toxin-antitoxin loci in bacterial genome sequences

Type II TA loci in prokaryote
Bacterial type II toxin-antitoxin (TA) loci typically consist of two tandem protein-coding genes. They have been either demonstrated or hypothesized to play key roles in the stabilization of plasmid and other mobile genetic elements (Van et al. PLoS Genet., 2009), stress responses (Gerdes, et al. Nat Rev Microbiol., 2005), contribution to virulence (Lobato-Márquez, FEMS Microbiology Reviews, 2016), and persister cell formation (Harms, et al. Science, 2016). In addition, type II TA systems have also been researching hotspot in synthetic biology for its wide application in genetic manipulation. In recent years, bioinformatics analyses show that the type II TA modules are widely spread not only upon plasmids but also on chromosomes of bacteria and archaea (Pandey et al., NAR, 2005; Makarova, et al., Biol Direct, 2009; Leplae R et al., NAR, 2011).

TAfinder predict type II TA system
The TAfinder web server was designed to quickly predict and compare type II TA loci in newly sequenced bacterial genomes. It combines a homologous search module and an operon detection module to enhance the prediction performance. Firstly. TAfinder has newly updated the backend database TADB2 developed by our group in 2011 (Shao, et al, NAR, 2011; total citations of 77). TAfinder then searches the full set of protein-coding regions using HMMer3 and BLASTp for homologs of toxin or antitoxin proteins. It employs 6119 type II TA pairs, and the manually curated HMM-profiles (108 for toxins and 201 for antitoxins). TAfinder subsequently identifies the instances of co-localization of two short (90–600 bp) toxin and antitoxin genes on the same strand and overlapping (or apart) by a few bases with a distance of -20 to 30 bp. Finally, it employs a TA family-specific scoring system to define the putative TA loci with a high prediction accuracy. Putative TA pairs are displayed in tabular form on the web, and hyperlink-paths to other public databases, such as NCBI and TADB, are be provided.

TA domain pair-based classification
We have classified TADB entries by a second toxin-antitoxin domain-based classification system as suggested recently by Makarova et al. (Biol Direct, 2009, 4:19). Through the TADB 'Browse by toxin/antitoxin-related domain' page, users can retrieve TA protein domain pairs, for example, Xre-MazF, Xre-RelE and Xre-HipA. The relationship, where recognizable, of TA family with TA domain pair classification systems is shown in Table S2. A similar description of these two TA classification systems also available in the newly added 'Introduction' webpage in TADB.

Table S2. The relationship of TA family (a) and TA domain pair (b) classification systems of identified and/or predicted TA loci.

(a) The TA family classification system is based on toxin protein sequence similarity as described in the reviews by Gerdes et al. (Nat Rev Microbiol, 2005, 3(5):371-82) and Van Melderen et al. (PLoS Genet. 2009, 5(3):e1000437).
(b) The TA domain pair classification system as suggested recently by Makarova et al. (Biol Direct, 2009, 4:19) is based on identification of TA pairs sharing cognate toxin and anti-toxin domains and is independent of wider protein-level similarity .
(c) The relationships noted in this table were defined by analysis of 793 TA loci that had previously been classified by BOTH systems into individual TA families and TA domain pairs.
(d) The remaining 3,032 TA loci presently in TADB that had been classified by the TA domain pair system alone were then classified into corresponding TA families using this same mapping approach. Numbers indicated by curly brackets represent TA loci of known TA domain pair assignment that could not be matched reliably to a unique TA family.