package biocaml
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=fae219e66db06f81f3fd7d9e44717ccf2d6d85701adb12004ab4ae6d3359dd2d
sha512=f6abd60dac2e02777be81ce3b5acdc0db23b3fa06731f5b2d0b32e6ecc9305fe64f407bbd95a3a9488b14d0a7ac7c41c73a7e18c329a8f18febfc8fd50eccbc6
doc/biocaml.unix/Biocaml_unix/index.html
Module Biocaml_unix
Source
Affymetrix's BAR files. Their Tiling Analysis Software (TAS) produces BAR files in binary format but this module supports only the text format generated by selecting the "Export probe analysis as TXT" option.
Extension of Core's Result. Internal use only.
Affymetrix's BPMAP files. Only text format supported. Binary BPMAP files must first be converted to text using Affymetrix's probe exporter tool.
Affymetrix's CEL files. Only text format supported. Binary file must be converted using Affymetrix's conversion tool. This tool does not change file extension, so be sure your file really is in text format.
Chromosome names. A chromosome name, as defined by this module, consists of two parts. An optional prefix "chr" (case-insensitive), followed by a suffix identifying the chromosome. The possible suffixes (case-insensitive) are:
FASTQ files. The FASTQ file format is repeated sequence of 4 lines:
Data structures to represent sets of (possibly annotated) genomic regions
Interval tree (data structure)
Consistent printing of errors, warnings, and bugs. An error is a user mistake that prevents continuing program execution, a warning is a milder problem that the program continues to execute through, and a bug is a mistake in the software.
PHRED quality scores.
Efficient integer sets when many elements expected to be large contiguous sequences of integers.
Ranges of contiguous integers (integer intervals). A range is a contiguous sequence of integers from a lower bound to an upper bound. For example, [2, 10]
is the set of integers from 2 through 10, inclusive of 2 and 10.
Roman numerals. Values greater than or equal to 1 are valid roman numerals.
SAM files. Documentation here assumes familiarity with the SAM specification.
Nucleic acid sequences. A nucleic acid code is any of A, C, G, T, U, R, Y, K, M, S, W, B, D, H, V, N, or X. See IUB/IUPAC standards for further information. Gaps are not supported. Internal representation uses uppercase, but constructors are case-insensitive. By convention the first nucleic acid in a sequence is numbered 1.
Range on a sequence, where the sequence is represented by an identifier.
Solexa quality scores.
Strand names. There are various conventions for referring to the two strands of DNA. This module provides an of_string
function that parses the various conventions into a canonical representation, which we define to be '-' or '+'.
Buffered transforms. A buffered transform represents a method for converting a stream of input
s to a stream of output
s. However, input
s can also be buffered, i.e. you can feed input
s to the transform and pull out output
s later. There is no requirement that 1 input produces exactly 1 output. It is common that multiple input values are needed to construct a single output, and vice versa.
Track files in UCSC Genome Browser format. The following documentation assumes knowledge of concepts explained on the UCSC Genome Browser's website. Basically, a track file is one of several types of data (WIG, GFF, etc.), possibly preceded by comments, browser lines, and a track line. This module allows only a single data track within a file, although the UCSC specifies that multiple tracks may be provided together.
Transcripts are integer intervals containing a list of exons. Exons are themselves defined as a list of integer intervals.