Data formats

Multiple Sequence Alignment

Protein or nucleic sequences are accepted and returned as a JSON dictionary. In general, VisuaLife does not check if they are properly aligned, thus any sequence database may be stored in that format. The dictionary however has been devised to store data parsed from a format used to store a Multiple Sequence Alignment (MSA):

  • ALN (Clustal-W output)
  • FASTA
  • MSF

Sequences from a file are loaded into a JSON-like dictionary with the following keys:

  • sequence: the sequence itself, in the single-letter code
  • description: longer free-text description of a sequence
  • id: short string identifying a sequence in a database, e.g. PAHAL_7G158700

It’s not guaranteed that all the three keys will be present in an entry.