Curated Data

FuncFetch is a workflow that integrates NCBI E-Utilities, OpenAI's GPT-4, and Zotero to screen thousands of manuscripts and extract enzyme activities. This process involves querying PubMed, screening abstracts with GPT-4, and collecting PDFs using Zotero. GPT-4 extracts enzyme information from the PDFs and deposits it in a tabular file. The extracted data is then manually curated to ensure accuracy.

The table below contains three key outputs from the FuncFetch pipeline:

  • Flagged Abstracts: Text files from Step 2.
  • Minimally Curated Set: Tabular files from Step 5.
  • Verified Set: Fully verified entries from Step 6.

This project is funded by the National Science Foundation IOS-Plant Genome Research Program Award #2310395 (link) For more information, read our paper: bioRxiv.