Amaranthus

This project is the longest living project in genomics using NGS technologies starting from early 2011 until now. The Figure captures the timeline below and accomplishments above with publications in green.
Initially, finding the grains was a challenge. It was yet another revelation that the grains from the market could be sowed to grow plants. However, lack of literature on the taxonomic classification of grain amaranths in India, demanded that we procure seeds for all three species, hypochondriacus, cruentus and caudatus, from a known vendor using their ornamental names, princess’s feather, autumn touch, love lies bleeding respectively. Comparative taxonomy of the three species under grain amaranths with those procured from the market, all grown to maturity in campus grounds (see embedded figure bottom-left), it was clear that the grain procured from the market was Amaranth hypochondriacus (princess’s feather). Meeta, one of our PhD students and a botanist by training harvested seeds from all three species and maintained an herbarium. By then, the fever of draft genome from short reads among the NGS community was catching fire across the world. Tools such as SOAPdenovo matured to assemble genomes from short reads using both paired-end and mate-pair libraries. The raw paired-end and mate-pair reads with varying insert sizes from the chromosomal DNA of Amaranth hypochondriacus was generated and assembled by Meeta in 2013. In 2013, long reads from PacBio were becoming popular in obtaining better quality genome assemblies from eukaryotes. PacBio sequencing services were the unavailable in India. Subha and Vibha managed to send the leaves of A. hypochondriacus to Pullman, Washington to obtain 25X coverage of PacBio reads. However, IBAB did not have a high RAM computer required for error correction of PacBio reads, a prerequisite to assembling. Subha recalls using AWS instances late in 2013 to run ECtools to correct errors in PacBio reads using assembled contigs from short reads. Believe it or not, that month the AWS bill went as high as $2000. This incidence and other server failure created frustration and tears enough to convince the Director to obtain a computer with 1TB RAM early in 2014. It should be mentioned that tools for assembling error-prone PacBio reads were still in its infancy. The learning curve was steep. Several interns including Nivedita, Sowmya and Savita learnt to assemble and annotate genomes by helping Meeta. However, short read assembly using SOAPdenovo ran much faster on this computer. Soon a manuscript disclosing the draft genome and developmental transcriptome of the first C4 dicot and second member from the order Caryophyllales was published (PMID: 25071079). Very proud moment for IBAB! The next undertaking was to assemble the transcriptomes using reads from 16 samples representing 4 developmental stages. The 1TB RAM server made this achievable. Using chimeric transcripts from the assembly, we were able to show that one of the key genes in the lysine biosynthetic pathway, DHDPS, fell within 4000 bases to glycosidase gene transcribing in opposite directions, such that their 3’UTRs were overlapping. This proximity was unique to the genome of Amaranthus hypochondriacus, suggesting potential role of glycosidase gene in regulating lysine in seeds. This work was published in 2016 (PMID: 28786999). By 2015, the PIs of the amaranth project secured a DBT grant funding both sequencing and manpower. This helped in generating sequences of many other landraces of grain amaranth from India with unknown taxonomy including the Suvarna marketed by GKVK. By 2016, improved tools to correct errors in PacBio reads were available. Meeta was able to procure 12.5% corrected reads using a tool called CANU. The corrected reads were assembled using two tools such as CANU and FLYE, which were then merged using QuickMerge tool to improve the L50 from 1885 to 624. By then, availability of chromosome-level assembly of aother strain grown in the US, Plainsman, was publicly available. Using simulated mate-pair reads of increasing insert sizes from the genome of Plainsman, the L50 was reduced to 56. We also used publicly available HiC data from Plainsman to improve the L50 to 20. This assembly was published in 2020 along with extensive genomic classification of the IBAB variety (Ah-white) with other accessions of grain amaranths from India, which were sequenced at the BioIT center (PMID: 33262776). For the first time, many landraces of grain amaranths from India could be classified along with the known accessions from a collection of amaranth varieties maintained at Amaranth Institute. Most importantly, our work showed that the taxonomy of Suvarna, marketed by GKVK, is Amaranthus cruentus. In 2018, PIs of the amaranth project at IBAB secured a grant in collaboration with ISSER, Tirupati and University of Hyderabad to identify genes implicated in oil content, seed size and other desirable phenotypes using a technology called TILLING. This work is currently ongoing. Some landraces that showed differential phenotypes have now been sequenced.