The intention of the research is to research viruses utilizing parameters obtained from distributions of nucleotide sequences within the viral RNA. Searching for for the enter information homogeneity, we analyze single-stranded RNA viruses solely. Two approaches are used to acquire the nucleotide sequences; Within the first one, chunks of equal size (4 nucleotides) are thought-about. Within the second method, the entire RNA genome is split into components by adenine or probably the most frequent nucleotide as a “area”.
Rank-frequency distributions are studied in each circumstances. The outlined nucleotide sequences are indicators similar to a sure extent to syllables or phrases as seen from the character of their rank-frequency distributions. Throughout the first method, the Pólya and the detrimental hypergeometric distribution yield the most effective match. For the distributions obtained throughout the second method, we have now calculated a set of parameters, together with entropy, imply sequence size, and its dispersion.
The calculated parameters grew to become the idea for the classification of viruses. We noticed that proximity of viruses on planes spanned on numerous pairs of parameters corresponds to associated species. In sure circumstances, such a proximity is noticed for unrelated species as nicely calling thus for the enlargement of the set of parameters used within the classification.
We additionally noticed that the fifth most frequent nucleotide sequences obtained throughout the second method are of various nature in case of human coronaviruses (totally different nucleotides for MERS, SARS-CoV, and SARS-CoV-2 versus an identical nucleotides for 4 different coronaviruses). We count on that our findings will likely be helpful as a supplementary software within the classification of illnesses attributable to RNA viruses with respect to severity and contagiousness.