We are instreseted in:

Molecular evolutionary patterns and processes of protein family or superfamily
Moleculer evolutionary mechanisms of protein structure and function
Protein-protein interaction network and metabolic network
Peptide design
Mechanism of complex disease
Computational approach for drug target discovery

Molecular evolutionary patterns and processes of protein family or superfamily

Our study reveals the great differences between soft and hard ticks in the Kunitz/BPTI family. Compared with hard ticks, soft ticks do not possess group II and III proteins and multi-domain proteins (three-, four-, five- and seven-KU). Many multi-domain Kunitz/BPTI proteins were created in hard ticks using the group I domain as a module after the split between hard and soft ticks. Groups II and III, which exhibit significantly higher expression during long-term blood feeding, are only present and expanded in the genus Ixodes. In Ixodes, positive selection drove the expansion of the Kunitz/BPTI family and the evolution of new functions in group II such as ion channel-modulating ability. The two groups may play a profound role in the long-term blood feeding of hard ticks by enhancing blood-feeding efficiency in the late stages of long-term blood feeding, which is beneficial for the rapid engorgement of hard ticks. Therefore, our results suggest that the differences in the Kunitz/BPTI family between soft and hard ticks may be linked to the evolution of long-term blood feeding in hard ticks. Finally, we propose that the six genes (Isc.218, Isc.190, Isc.255, Isc.196, Isc.180 and Isc.179) identified in our study may be candidate target genes for tick control.


Figure 1. The evolution scenario and structural variations of Kunitz/BPTI family in the ticks

Moleculer evolutionary mechanisms of protein structure and function

The BRCT domain (BRCA1 C-terminal domain) is an important signaling and protein targeting motif in the DNA damage response system. The BRCT domain, which mainly occurs as a singleton (single BRCT) or tandem pair (double BRCT), contains a phosphate-binding pocket that can bind the phosphate from either the DNA end or a phosphopeptide. In this work, we performed a database search, phylogeny reconstruction, and phosphate-binding pocket comparison to analyze the functional evolution of the BRCT domain. We identified new BRCT-containing proteins in bacteria and eukaryotes, and found that the number of BRCT-containing proteins per genome is correlated with genome complexity. Phylogeny analyses revealed that there are two groups of single BRCT domains (sGroup I and sGroup II) and double BRCT domains (dGroup I and dGroup II). These four BRCT groups differ in their phosphate-binding pockets. In eukaryotes, the evolution of the BRCT domain can be divided into three phases. In the first phase, the sGroup I BRCT domain with the phosphate-binding pocket that can bind the phosphate of nicked DNA invaded eukaryotic genome. In the second phase, the phosphate-binding pocket changed from a DNA-binding type to a protein-binding type in sGroup II. The tandem duplication of sGroup II BRCT domain gave birth to double BRCT domain, from which two structurally and functionally distinct groups were evolved. The third phase is after the divergence between animals and plants. Both sGroup I and sGroup II BRCT domains originating in this phase lost the phosphate-binding pocket and many evolved protein-binding sites. Many dGroup I members were evolved in this stage but few dGroup II members were observed. The results further suggested that the BRCT domain expansion and functional change in eukaryote may be driven by the evolution of the DNA damage response system.


Figure 2. Single and double BrcT domain functional sites.

Protein-protein interaction network and metabolic network

We reconstructed a human heart-specific metabolic network based on transcriptome and proteome data. The resulting model consists of 2803 reactions and 1880 metabolites, which correspond to 1721 active enzymes in human heart. Using the model, we detected 24 epistatic interactions in human heart, which are useful in understanding both the structure and function of cardiovascular systems. In addition, a set of 776 potential biomarkers for cardiovascular disease (CVD) has been successfully explored, whose concentration is predicted to be either elevated or reduced because of 278 possible dysfunctional cardiovascular-associated genes. The model could also be applied in predicting selective drug targets for eight subtypes of CVD. The human heart-specific model provides valuable information for the studies of cardiac activity and development of CVD.


Figure 3. Reconstruction of human heart-specific metabolic network

Peptide design

Based on the analyses for AMPAR receptor evolution, the peptide selectively disrupting the long-term potentiation(LTP) induced by AMPAR receptor has been designed which is never reported before this study. The experiment shows that this peptide can regulate effectively LTR of mouse, and have the persistence effect on anxiety and depression. Thus, this disrupting peptide can be a useful fool to explore the disease mechanism.


Figure 4. Peptide can regulate LTP

Mechanism of complex disease

Animal models have been extensively used in the study of cardiovascular disease (CVD) and have provided important insights into disease pathogenesis and drug development. However, the level of conservation of gene expression patterns of the orthologous genes between human and animal models was unclear. To address this issue, we compared the expression of orthologous genes in human and four models (rhesus, rat, mouse and dog), based on 42 normal heart samples with high quality gene expression data. The results show that the global expression profiles between animal model and human orthologous genes are highly preserved. The phylogenetic tree inferred from the gene expression profiles has similar topology to that of the species tree. However, differentially expressed genes (DEGs) between human and each model were identified and these four gene datasets are enriched with different molecular functions, including hormone–receptor binding and geranyl transferase activity. The 65 overlapped DEGs between four sets are involved in thyroid cancer, proteasome systems, aminoacyl-tRNA biosynthesis and GST (Glycine, Serine and Threonine) metabolism, of which functions are divergent between models and humans. In addition, 46.2% (30/65) of the communal genes have been experimentally proven to be associated with cardiovascular disease. Next, we constructed a co-expression network based on intra- and inter-species variation, to elucidate the altered network organization. It indicates that these DEGs evolved as modules rather than independently. The integrated heart transcriptome data should provide a valuable resource for the in-depth understanding of cardiology and the development of cardiovascular disease models.


Figure 5. comparison of expression pattern between human and animal models

Computational approach for drug target discovery

An accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods.The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA). The CMASA can be used in drug target discovery.


Figure 6. The superposition output of the CMASA