package biocaml

  1. Overview
  2. Docs
The OCaml Bioinformatics Library

Install

Dune Dependency

Authors

Maintainers

Sources

biocaml-0.11.2.tbz
sha256=fae219e66db06f81f3fd7d9e44717ccf2d6d85701adb12004ab4ae6d3359dd2d
sha512=f6abd60dac2e02777be81ce3b5acdc0db23b3fa06731f5b2d0b32e6ecc9305fe64f407bbd95a3a9488b14d0a7ac7c41c73a7e18c329a8f18febfc8fd50eccbc6

doc/biocaml.unix/Biocaml_unix/Phred_score/index.html

Module Biocaml_unix.Phred_scoreSource

PHRED quality scores.

A PHRED score is defined as -10*log(p) rounded to an integer, where p is a probability.

To conserve space, the integer value of a PHRED score is encoded as an ASCII character in fastq files. Unfortunately two encodings have been used, one that increments the value by 33 and the other by 64. Most fastq files use 33 and that is the default in this module.

However, Illumina used a 64 offset for a brief period of time, and you must be careful to know whether you have fastq files with this encoding. For details see The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Using an offset of 33 or 64 in this module corresponds to using the fastq-sanger or fastq-illumina encodings, respectively, defined in this paper. However, note the term fastq-illumina is now misleading since Illumina has also switched to using an offset of 33.

Sourcetype t = private int
include Sexplib0.Sexpable.S with type t := t
Sourceval t_of_sexp : Sexplib0.Sexp.t -> t
Sourceval sexp_of_t : t -> Sexplib0.Sexp.t
Sourcetype offset = [
  1. | `Offset33
  2. | `Offset64
]
Sourceval sexp_of_offset : offset -> Sexplib0.Sexp.t
Sourceval offset_of_sexp : Sexplib0.Sexp.t -> offset
Sourceval __offset_of_sexp__ : Sexplib0.Sexp.t -> offset
Sourceval of_char : ?offset:offset -> char -> t Core.Or_error.t

of_char ~offset x returns the PHRED score encoded by ASCII character x.

Sourceval to_char : ?offset:offset -> t -> char Core.Or_error.t

to_char t encodes t as a visible ASCII character (codes 33 - 126) if possible.

Sourceval of_int : int -> t Core.Or_error.t

of_int x returns the PHRED score with the same value x, or returns Error if x is negative.

Sourceval to_int : t -> int

Convert a PHRED score to an integer.

Sourceval of_probability : ?f:(float -> int) -> float -> t Core.Or_error.t

of_probability ~f x returns -10 * log_10(x), which is the definition of PHRED scores.

PHRED scores are integral, and it is only loosely specified that float value returned by the above formula should be "rounded to the closest integer". However, that statement is imprecise; there is more than one way to do such a rounding. A reasonable choice is made by default, but you can control the behavior by providing f.

Return Error if given probability x not between 0.0 and 1.0.

Sourceval to_probability : t -> float

to_probablity x converts x to a probablity score. Note this is not the inverse of of_probability due to the rounding done by the latter.

Sourceval of_solexa_score : ?f:(float -> int) -> Solexa_score.t -> t

of_solexa_score x converts Solexa score x to a PHRED score.

The conversion produces a float, and it is unclear what convention is used to convert the resulting float value to an integer. As in of_probability, the optional f parameter is provided to dictate this.

Sourceval to_solexa_score : ?f:(float -> int) -> t -> Solexa_score.t

to_solexa_score t converts PHRED score t to a Solexa score.

The conversion produces a float, and it is unclear what convention is used to convert the resulting float value to an integer. As in of_probability, the optional f parameter is provided to dictate this.

Sourceval min_as_char : offset -> t

The min and max PHRED scores when encoded as ASCII characters. Since PHRED scores are virtually always ASCII encoded, you are unlikely to see values outside this range. However, this module allows creating values outside this range, e.g. of_probability 1e-13 exceeds max_as_char, and of_probability 0.9 is smaller than min_as_char (for either offset).

Sourceval max_as_char : t
OCaml

Innovation. Community. Security.