PREreview of Giant genes are rare but implicated in cell wall degradation by predatory bacteria

by Craig McCormick

Published: March 29, 2024
DOI: 10.5281/zenodo.10895367
License: CC BY 4.0

We, the students of MICI5029/5049, a Graduate Level Molecular Pathogenesis Journal Club at Dalhousie University in Halifax, NS, Canada, hereby submit a review of the following BioRxiv preprint:

Giant genes are rare but implicated in cell wall degradation by predatory bacteria

Jacob West-Roberts, Luis Valentin-Alvarado, Susan Mullen, Rohan Sachdeva, Justin Smith, Laura A. Hug, Daniel S. Gregoire, Wentso Liu, Tzu-Yu Lin, Gabriel Husain, Yuki Amano, Lynn Ly, Jillian F. Banfield

bioRxiv 2023.11.21.568195; doi: https://doi.org/10.1101/2023.11.21.568195

We will adhere to the Universal Principled (UP) Review guidelines proposed in:

Universal Principled Review: A Community-Driven Method to Improve Peer Review. Krummel M, Blish C, Kuhns M, Cadwell K, Oberst A, Goldrath A, Ansel KM, Chi H, O'Connell R, Wherry EJ, Pepper M; Future Immunology Consortium. Cell. 2019 Dec 12;179(7):1441-1445. doi: 10.1016/j.cell.2019.11.029

SUMMARY: Ultra-small bacteria in the phylum Omnitrophota contain some open reading frames (ORFs) greater than 20kbp that should theoretically encode giant proteins (>7000 amino acids). There has been speculation that some features of these giant proteins might allow Omnitrophota to engage in parasitic or predatory activities, but in the absence of isolation and culture of these bacteria, little concrete evidence supports this idea. Here, the authors provide a survey of predicted giant proteins in Omnitrophota and compare the phylogenetic distribution of giant proteins in bacteria and archaea. They found archaea to have fewer giant genes compared to bacteria, and amongst the bacteria, Omnitrophota was found to have the highest average copy number of ORFs predicted to encode protein products greater than 10K amino acids. Most giant protein sequences contained multiple transmembrane domains as well as peptidase domains that suggest localization to the cell membrane and proteolytic activity. Additionally, similarity between Omnitrophota giant genes and those of other predatory bacteria indicate that these proteins might be involved in the degradation of cell walls of prey species. Finally, in silico structural analysis predicted that the encoded giant proteins form novel tubule-like structures of unknown function. This study highlights the need for further study to understand what, if any, utility is provided in possessing giant genes and proteins.

OVERALL ASSESSMENT: Reader engagement with this fascinating research topic was undermined by poor connections between the figures and the text, with some figures not discussed in the text at all. We noted several instances where figure annotation and legends could be improved for clarity. Take-home messages should be expressed clearly throughout the manuscript. Our readers would have benefitted from a longer introduction that could provide more context and field-specific evidence for giant proteins and their functions. Considering the large size of the putative ORF, we noted with concern that the authors identified metagenomic assembly quality as a hurdle; accordingly, we wanted to see stronger evidence that the authors had achieved the metagenomic assembly quality necessary to confidently assign the reported giant ORFs.

DETAILED U.P. ASSESSMENT:

OBJECTIVE CRITERIA (QUALITY)

1. Quality: Experiments (1–3 scale; note: 1 is best on this scale) SCORE = 2.5

· Figure by figure, do experiments, as performed, have the proper controls? Do analyses use the best-possible (most unambiguous) available methods quantified via appropriate statistical comparisons?

· In general, the authors provide transparency regarding the methodology and limitations. This aspect of the manuscript could be improved by providing some clarity about the source of the 46 giant genes from the 1873 genomes. The authors described how they curated 3 whole genomes. But of the 46 giant genes, how many came from the curated whole genomes and how many from the metagenomic data?

· We have concerns about the methodology used to predict protein structure. For these predictions, the authors split the amino acid sequences into chunks of 1000 residues, at 500 amino acid intervals. This means that there is an overlap of 2 segments to get a sense of the entire protein structure. We would have appreciated an effort to bootstrap this process to improve the “depth” of coverage – maybe 1000 residues at every 100 AA, or a combination of long and short sequences overlapping. In general, we struggled to understand how the authors stitched the sequences together.

2. Quality: Completeness (1–3 scale) SCORE = 2.5

· Does the collection of experiments and associated analysis of data support the proposed title- and abstract-level conclusions? Typically, the major (title- or abstract-level) conclusions are expected to be supported by at least two experimental systems.

· The current dataset does not sufficiently support the authors’ title- and abstract-level conclusions. While the authors address that metagenomic assembly quality is a hurdle to accurately identifying a giant ORF they do not provide the reader with confidence that they have achieved the metagenomic assembly quality necessary to predict genes of this size. For example, hybrid sequencing was done on a minority of samples and language describing sequence polishing suggests lack of expertise with the subject.

· The links between the giant genes, the omnitrophota phylum, and cell wall degradation to support predatory lifestyle, remain underdeveloped. Including hypotheses and model diagrams in the Discussion about how giant genes are thought contribute to bacterial fitness and predation in its ecological niche (given that there has been other papers published about this phylum and the authors collected DNA from environmental samples) may help the reader. See the following paper: https://ami-journals.onlinelibrary.wiley.com/doi/full/10.1111/1462-2920.16170

· Are there experiments or analyses that have not been performed but if ‘‘true’’ would disprove the conclusion (sometimes considered a fatal flaw in the study)? In some cases, a reviewer may propose an alternative conclusion and abstract that is clearly defensible with the experiments as presented, and one solution to ‘‘completeness’’ here should always be to temper an abstract or remove a conclusion and to discuss this alternative in the discussion section.

· Higher quality assembly of this region could reveal stop codons, cleavage sites, or other genetic determinants suggesting this region is not transcribed and translated as a single protein.

3. Quality: Reproducibility (1–3 scale) SCORE = 2

· Figure by figure, were experiments repeated per a standard of 3 repeats or 5 mice per cohort, etc.?

· Is there sufficient raw data presented to assess the rigor of the analysis?

· More long- and short-read sequencing at higher depths could provide the confidence needed to support the authors’ claims.

· Newly sequenced genomes are not being uploaded to NCBI until publication, which makes it difficult to assess their analysis.

· Are methods for experimentation and analysis adequately outlined to permit reproducibility?

· Predicted folded structures calculated using Alphafold2 (see Methods).” There is some description of how they used AlphaFold but they don’t say they used it in the methods or cite it.

· We appreciate that the authors have made their code available on FigShare.

· If a ‘‘discovery’ dataset is used, has a ‘‘validation’ cohort been assessed and/or has the issue of false discovery been addressed? N/A

4. Quality: Scholarship (1–4 scale but generally not the basis for acceptance or rejection) SCORE = 1.5

· Has the author cited and discussed the merits of the relevant data that would argue against their conclusion?

· Has the author cited and/or discussed the important works that are consistent with their conclusion and that a reader should be especially familiar when considering the work?

· Specific (helpful) comments on grammar, diction, paper structure, or data presentation (e.g., change a graph style or color scheme) go in this section, but scores in this area should not be significant basis for decisions.

· The Introduction is quite brief and does not properly equip the reader to appreciate the major findings of the manuscript or place them in the context of the field, particularly with regard to these bacteria and fitness in their ecological niche (i.e. predation).

· We recommend that the authors cite relevant literature throughout the Results and Discussion sections to properly support statements and provide reader with the tools to investigate further.

· Headings should be more informative throughout.

· Figure-by-figure:

· Fig 1: To improve comprehension, we suggest the authors add labels to Fig 1A (the rings are not labelled, so the reader must find this information in the figure legend). Additionally, increasing the font size on the x-axis and providing axis labels for Figures 1B & 1C would make them easier to understand. The scales differ between B & C, which gives the false impression that these values are similar (unless the reader can zoom in to read the x-axis scales).

· Figs 2 and 3: That these figures should deleted because they are not mentioned in the Results text.

· Fig 7: This figure is mis-referenced in the “Conserved functionalities encoded in proximity to giant genes” section of the Results.

MORE SUBJECTIVE CRITERIA (IMPACT):

1. Impact: Novelty/Fundamental and Broad Interest (1–4 scale) SCORE= 2.5

A score here should be accompanied by a statement delineating the most interesting and/or important conceptual finding(s), as they stand right now with the current scope of the paper. A ‘‘1’’ would be expected to be understood for the importance by a layperson but would also be of top interest (have lasting impact) on the field.]

How big of an advance would you consider the findings to be if fully supported but not extended?

· Insufficiently supported conclusions and the poor quality of presentation greatly undermine the potential impact of research that should be of great interest to a diverse readership. Furthermore, we suggest that the authors should ore clearly emphasize their main conclusions and include some discussion around concrete future directions.

2. Impact: Extensibility (1–4 or N/A scale) SCORE = N/A

Has an initial result (e.g., of a paradigm in a cell line) been extended to be shown (or implicated) to be important in a bigger scheme (e.g., in animals or in a human cohort)? This criterion is only valuable as a scoring parameter if it is present, indicated by the N/A option if it simply doesn’t apply. The extent to which this is necessary for a result to be considered of value is important. It should be explicitly discussed by a reviewer why it would be required. What work (scope and expected time) and/or discussion would improve this score, and what would this improvement add to the conclusions of the study? Care should be taken to avoid casually suggesting experiments of great cost (e.g., ‘‘repeat a mouse-based experiment in humans’’) and difficulty that merely confirm but do not extend (see Bad Behaviors, Box 2)

· N/A.

Competing interests

The authors declare that they have no competing interests.