A revolution in genomics over the last decade has largely streamlined the process of genome assembly and gene identification, leading to deposition of hundreds of plant genomes in databases. However, predicting and validating the functions of the identified genes is still a major challenge. In most plant genomes, genes especially those involved in metabolism, lie in large gene families of dozens of members and are poorly annotated. For example, >80 of the 101 BAHD acyltransferases in cultivated tomato are annotated with just their domain architecture, which is not informative for dissecting genetics of metabolic traits such as lignin/cuticular wax production, fruit ripening, stress response, and mutualist interactions.
There are two critical bottlenecks here:
The FuncZyme project seeks to address these challenges by:
This project is funded by the National Science Foundation IOS-Plant Genome Research Program Award #2310395 (link)
Enzyme activity data curated from research publications including various outputs of the FuncFetch pipeline
We will develop enzyme function prediction models based on Curated Data. These models will be applied on various high-quality sequenced plant genomes. These predictions will be stored here.