package biocaml
Install
Dune Dependency
Authors
Maintainers
Sources
md5=e292efa2f61fec33dad63ec897106f59
sha512=35519bf3b1e67a9191ef9bb74eba0dae941e0d05bad89076a36f507dc5c2d105a03c1c917d5a3f7ed9d1da4acbf3199582f78c308aa2a5a22c21f743945c852b
doc/biocaml.unix/Biocaml_unix/index.html
Module Biocaml_unix
Source
Affymetrix's BAR files. Their Tiling Analysis Software (TAS) produces BAR files in binary format but this module supports only the text format generated by selecting the "Export probe analysis as TXT" option.
Extension of Core's Result. Internal use only.
Affymetrix's BPMAP files. Only text format supported. Binary BPMAP files must first be converted to text using Affymetrix's probe exporter tool.
Affymetrix's CEL files. Only text format supported. Binary file must be converted using Affymetrix's conversion tool. This tool does not change file extension, so be sure your file really is in text format.
Chromosome names. A chromosome name, as defined by this module, consists of two parts. An optional prefix "chr" (case-insensitive), followed by a suffix identifying the chromosome. The possible suffixes (case-insensitive) are:
FASTQ files. The FASTQ file format is repeated sequence of 4 lines:
Data structures to represent sets of (possibly annotated) genomic regions
Interval tree (data structure)
Consistent printing of errors, warnings, and bugs. An error is a user mistake that prevents continuing program execution, a warning is a milder problem that the program continues to execute through, and a bug is a mistake in the software.
PHRED quality scores.
Efficient integer sets when many elements expected to be large contiguous sequences of integers.
Ranges of contiguous integers (integer intervals). A range is a contiguous sequence of integers from a lower bound to an upper bound. For example, [2, 10]
is the set of integers from 2 through 10, inclusive of 2 and 10.
Roman numerals. Values greater than or equal to 1 are valid roman numerals.
SAM files. Documentation here assumes familiarity with the SAM specification.
Nucleic acid sequences. A nucleic acid code is any of A, C, G, T, U, R, Y, K, M, S, W, B, D, H, V, N, or X. See IUB/IUPAC standards for further information. Gaps are not supported. Internal representation uses uppercase, but constructors are case-insensitive. By convention the first nucleic acid in a sequence is numbered 1.
Range on a sequence, where the sequence is represented by an identifier.
Solexa quality scores.
Strand names. There are various conventions for referring to the two strands of DNA. This module provides an of_string
function that parses the various conventions into a canonical representation, which we define to be '-' or '+'.
Buffered transforms. A buffered transform represents a method for converting a stream of input
s to a stream of output
s. However, input
s can also be buffered, i.e. you can feed input
s to the transform and pull out output
s later. There is no requirement that 1 input produces exactly 1 output. It is common that multiple input values are needed to construct a single output, and vice versa.
Track files in UCSC Genome Browser format. The following documentation assumes knowledge of concepts explained on the UCSC Genome Browser's website. Basically, a track file is one of several types of data (WIG, GFF, etc.), possibly preceded by comments, browser lines, and a track line. This module allows only a single data track within a file, although the UCSC specifies that multiple tracks may be provided together.
Transcripts are integer intervals containing a list of exons. Exons are themselves defined as a list of integer intervals.