Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

We are witnessing a steep increase in model development initiatives in genomics that employ high-end machine learning methodologies. Of particular interest are models that predict certain genomic characteristics based solely on DNA sequence. These models, however, treat the DNA as a mere collection of four, A, T, G and C, letters, dismissing the past advancements in science that can enable the use of more intricate information from nucleic acid sequences. Here, we provide a comprehensive database of quantum mechanical (QM) and geometric features for all the permutations of 7-meric DNA in their representative B, A and Z conformations. The database is generated by employing the applicable high-cost and time-consuming QM methodologies. This can thus make it seamless to associate a wealth of novel molecular features to any DNA sequence, by scanning it with a matching k-meric window and pulling the pre-computed values from our database for further use in modelling. We demonstrate the usefulness of our deposited features through their exclusive use in developing a model for A->C mutation rates.

Original publication

DOI

10.1038/s41597-024-03772-5

Type

Journal article

Journal

Sci Data

Publication Date

22/08/2024

Volume

11

Keywords

Machine Learning, DNA, Quantum Theory