
seqs: the maximum number of aligned sequences to keep.

The figure below illustrates the process followed when a query against all selected FASTA files is performed. When performing this operation, one BLAST query is executed for each sequence in the FASTA file. Regarding the query, there are also two possibilities: using the sequences in one of the selected FASTA as queries or using the sequences in an external FASTA file as queries. Regarding the database to use in the queries, there are two possible modes: querying against all the selected FASTA files or querying against each FASTA file separately. This operation allows performing different BLAST queries using the selected FASTA files. Reformat output file: allows to specify the format parameters of the output FASTA containing the consensus sequence (see section Reformat file to learn more about this formatting). On the other hand, when this option is selected, then all amino acids in such positions are reported (e.g. Verbose: in protein sequences, when this option is unselected then X is used for ambiguous positions in the consensus sequence.

Read the Consensus bases description to understand how this option is used in each case. Minimum presence: the minimum presence for a given nucleotide or amino acid in order to be part of the consensus sequence. Those positions where all base frequencies are below the Minimum presence threshold are represented by an N (nucleotide sequences) or X (protein sequences) in the consensus sequence.

Above threshold: considers all nucleotide (DNA) or amino acid (protein) bases with a frequence above the Minimum presence threshold at each position.

Those positions where the most frequent base is under the Minimum presence threshold are represented by an N (nucleotide sequences) or X (protein sequences) in the consensus sequence. Most frequent: considers the most frequent nucleotide (DNA) or amino acid (protein) bases at each position.Conserved Genome Annotation (CGA) Pipeline.
