Bio Saga Headlines

Bio Saga

Wednesday, July 27, 2011

New Algorithm for detection of viral sequence fragments of HIV-1 subfamilies

Background

Methods of determining whether or not any particular HIV-1 sequence stems - completely or in part - from some unknown HIV-1 subtype are important for the design of vaccines and molecular detection systems, as well as for epidemiological monitoring. Nevertheless, a single algorithm only, the Branching Index (BI), has been developed for this task so far. Moving along the genome of a query sequence in a sliding window, the BI computes a ratio quantifying how closely the query sequence clusters with a subtype clade. In its current version, however, the BI does not provide predicted boundaries of unknown fragments.

Results

We have developed Unknown Subtype Finder (USF), an algorithm based on a probabilistic model, which automatically determines which parts of an input sequence originate from a subtype yet unknown. The underlying model is based on a simple profile hidden Markov model (pHMM) for each known subtype and an additional pHMM for an unknown subtype. The emission probabilities of the latter are estimated using the emission frequencies of the known subtypes by means of a (position-wise) probabilistic model for the emergence of new subtypes. We have applied USF to SIV and HIV-1 sequences formerly classified as having emerged from an unknown subtype. Moreover, we have evaluated its performance on artificial HIV-1 recombinants and non-recombinant HIV-1 sequences. The results have been compared with the corresponding results of the BI.

Conclusions

Our results demonstrate that USF is suitable for detecting segments in HIV-1 sequences stemming from yet unknown subtypes. Comparing USF with the BI shows that our algorithm performs as good as the BI or better.
thumbnailFigure 1. Method outline. Outline of the steps of the method, assuming 2 subtypes composed each of 2 sequences. The following color code for columns and nucleotides, respectively, is used in the topmost part: green - completely conserved columns (with respect to all subtype sequences and the query sequence), red - columns removed due to insertion by the query sequence or too much gaps in the alignment, yellow - minority nucleotides in the column. The part of the sequence coloured in gray (at the bottom) indicates a subtype yet unknown. The last row gives the classification of the query sequence into known and unknown positions.

Do you wish to know more?

Life Science and Informatics

What is this?
is this a new industry?
or a old wine in a new bottle?

Well Life Sciences and Informatics can be anything form computational biology, all omes and omics, core bioinformatics to curation and literature mining, database creation, in the area of biology, chemistry , bio-chem space.

There are number of companies in India and bangalore is the forefront as a major bio-cluster with 20 to 30 companies in this sphere.

now how good are these companies doing?
how good are they in terms of the international markets and how profitable is their business?
what do they do?
their clients?

These are some interesting things that could be discussed in this blog page...

Tag It