package biocaml
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=fae219e66db06f81f3fd7d9e44717ccf2d6d85701adb12004ab4ae6d3359dd2d
sha512=f6abd60dac2e02777be81ce3b5acdc0db23b3fa06731f5b2d0b32e6ecc9305fe64f407bbd95a3a9488b14d0a7ac7c41c73a7e18c329a8f18febfc8fd50eccbc6
doc/biocaml.unix/Biocaml_unix/Fastq/index.html
Module Biocaml_unix.Fastq
Source
FASTQ files. The FASTQ file format is repeated sequence of 4 lines:
\@name sequence +comment qualities ...
The name line begins with an @ character, which is omitted in the parsed item
type provided by this module. Any spaces after the @ are retained, but the specification implies that there shouldn't be any such spaces. Trailing whitespace is also retained since you should not normally have such files.
The comment line, which begins with a +, is handled similarly. The purpose of the comment line is unclear and it is rarely used. Also, "comment" may not be the correct term for this line.
The name line may be structured into two parts: a sequence identifier and an optional description. We provide a function split_name
to parse such a value. However, an item
's name
field contains the unparsed string because it is unclear whether fastq files really follow this. Also the format of the description is unspecified. When it is provided, usually it has some additional structure, so the minimal amount of parsing done by split_name
isn't too useful anyway.
Illumina uses a systematic format for the name line that serves as a unique sequence identifier. Use Illumina.sequence_id_of_string
to parse an item
's name
field when you have fastq files produced by Casava version >= 1.8. Earlier versions of Casava returned a different format, which is not currently supported in this module (it could be easily added).
The qualities line is returned as a plain string, but it is required to be decodable as either Phred or Solexa scores. Modules Phred_score
and Solexa_score
can be used to parse as needed.
Older FASTQ files allowed the sequence and qualities strings to span multiple lines. This is discouraged and is not supported by this module.
Split a name string into a sequence identifier and an optional description. It is assumed that the given string is from an item
's name
field, i.e. that it doesn't contain a leading @ char.
include sig ... end
val write :
Future_unix.Writer.t ->
item Future_unix.Pipe.Reader.t ->
unit Future_unix.Deferred.t
val write_file :
?perm:int ->
?append:bool ->
string ->
item Future_unix.Pipe.Reader.t ->
unit Future_unix.Deferred.t
Low-level Printing
This function converts item
values to strings that can be dumped to a file, i.e. they contain full-lines, including all end-of-line characters.
Low-level Parsing
qualities sequence line
parses given qualities line
in the context of a previously parsed sequence
. The sequence
is needed to assure the correct number of quality scores are provided. If not provided, this check is omitted.