The following code example takes a 454scaffold file that was used by genemark to predict genes and returns a collection of ChromosomeSequences.
To locate exon-intron positions in previously uncharacterised genes (or predicted genes) that have not yet been associated with cDNAs, we performed spliced alignment, maintaining maximum similarity between homologues, as described in Methods.
GACK, however, predicts additional divergent genes in these previously identified variable loci, suggesting that the level of divergence may be even greater than determined by the original authors.
Despite generating a large number of models, gene-finding algorithms were less successful at predicting heterochromatic genes than euchromatic genes.
"There are stretches of sequence identity in regions where there are not predicted genes, and much more than you would predict from regulatory regions," said Dr. J. Craig Venter, president of Celera.
The Trichoplax genome contains about 98 million base pairs and 11,514 predicted protein-coding genes.
Instead of probing for sequences of known or predicted genes that may be dispersed throughout the genome, tiling arrays probe intensively for sequences which are known to exist in a contiguous region.
They have difficulty predicting genes with unusual structures, as well as those with a weak translation start signal, weak splice sites or single exon genes.
Dr. Cooke said his next project was to test whether all 40,000 of these predicted human genes were real.
These data show that predicted genes tend to have a higher rate of amino-acid substitution than known genes in the genus Drosophila .