OCaml Outreachy Internships
This is a record of all past OCaml Community Outreachy Internship Projects.
Summer 2023
Persistent Storage in MirageOS Unikernels
Every operating system, even unikernels, need a way to persist data accross reboots. Having persistent storage capabilities in MirageOS is definitely a feature to consider including. Developing this includes building libraries for partitioning disks, filesystems for these partitions, and a simple, intuitive, and programmatic way to interact with these storage devices from a user's view point. This project pushes this vision one step further by building a library for GPT partitioning.
Improving Error Reporting in Existing PPXLIB-Based PPXs
In the past, when 'ppxlib' encountered an exception in a transformation, it stopped the rewriting process, causing the rewriters after to not be processed. Also multiple errors could not be reported at the same time there were multiple failing rewriters as just the first raising rewriter will raise, and the compilation process stops there. But now, a raising rewriter does not stop the preceding rewriter from running, allowing multiple to be raised both in the context-free phase and all the other phases.
MIDI Over Ethernet With MirageOS
MIDI, which stands for Musical Instrument Digital Interface, is a widely used protocol in the world of music and audio technology. MirageOS is a library operating system that specialises in creating lightweight, secure, and efficient unikernels. Unikernels are highly-specialised, single-purpose virtual machine images designed for specific applications, and it is written in OCaml. The project focussed on implementing the rtpMIDI protocol for serialising-deserialising of MIDI messages over Ethernet and implementing use cases like a publisher-subcriber based server-client model for MIDI messages.
Winter 2022
Implement a Non-Blocking, Streaming Codec for TopoJSON
TopoJSON is an extension to GeoJSON to encode topology. This allows for redundant data to be removed and file sizes to be greatly reduced. This is often very desirable especially when working with data in the browser. In a previous Outreachy internship, a new OCaml library was implemented to provide an OCaml library for TopoJSON. This project will build on this adding more functionality to the library and providing a non-blocking, streaming codec version similar to the geojsone library.
Summer 2022
Expand OCaml 5.0 Parallel Benchmark Suite
OCaml 5.0 will be live soon! It ships with support for shared-memory parallelism and concurrency OCaml has missed all these years. This will be accompanied by a robust set of Multicore libraries useful for parallel programming. The Multicore compiler and libraries are under active development and will continue to evolve as the OCaml ecosystem moves towards Multicore. For assessing the impact of new features in the OCaml compiler and Multicore libraries, we have a set of sequential and parallel benchmarks present in our benchmark suite. While the sequential benchmarks contain many real-world applications, a wider set of parallel benchmarks would be useful. This project entails gathering the parallel benchmarks available at various places like https://github.com/ckoparkar/ocaml-benchmarks and making them available in the benchmark suite.
Extend OCaml's GeoJSON Library to Support TopoJSON
TopoJSON is an extension to GeoJSON to encode topology. This allows for redundant data to be removed and file sizes to be greatly reduced. This is often very desirable especially when working with data in the browser. This project looks to extend `ocaml-geojson` to support TopoJSON.
Winter 2021
Build a Monitoring Dashboard for OCaml.org
We currently have no visibility on the performance of the server serving v3.ocaml.org, which pages are most visited, if errors happen, etc. To offer some visibility, we can implement a basic monitoring dashboard that would provide Metrics such as: Memory, CPU, Open file descriptors, Statistics such as (check if GDPR compliant first!) Requested URIs, User agents, Language, Logs. This project consists of mostly two parts: a frontend and a backend. The backend consists of building a high-level library to collect data and get statistics on them. The frontend will use this library to display graphs of the metrics, statistics, and other data we want to collect.
Improve the OCaml Meta-Programming Ecosystem
It's common for programming languages to provide some way to meta-program in order to preprocess code before reaching the last compilation step, for example, in the form of macros or templates. The OCaml compiler doesn't provide a full built-in macro system, but the OCaml parser does provide syntax for preprocessing purposes: attributes and extension points. We -the OCaml community- also have an official framework, called `ppxlib`, to write preprocessors -called PPXs- based on that syntax and integrate them into the compilation process. However, it's on the OCaml community to write and provide important PPXs to the OCaml developers. We've noticed that having the most important PPXs under the official PPX GitHub organisation -next to `ppxlib`- is helpful. Developers can easily find them; developers can trust them; and they're well-written and hygienic, so developers can use them as how-to guides for writing other PPXs. In this project, you'll write one or some of those official standard PPXs.
Support `.eml` Files in OCaml's VSCode Extension
Support `.eml` files in OCaml's VSCode extension Dream, the OCaml web framework, uses `.eml` files to embed HTML in OCaml files. At the moment, opening these files in VSCode, with the official OCaml VSCode extension, will not provide any syntax highlighting or diagnostics for the `.eml` files, because they are not supported. The goal of the project is to add support for the syntax in the extension itself as a first step, and eventually, add support for the language in the OCaml Language Server (LSP) as a second step.
Summer 2021
Create opam Package Search
Opam is the source-based package manager for OCaml code. This project comprises of writing a new web client for rendering output from the opam package database. There is a JSON endpoint on opam.ocaml.org, which provides information about packages that would provide metadata about the packages. We can extend this JSON metadata to include all the opam packages (not just the top 10) and use that to power a search frontend for the website. This may include presenting the data as a GraphQL endpoint with the frontend querying that endpoint using GraphQL.
Improve the OCaml.org Website
OCaml.org is the main website for OCaml, a functional, typed, high-level programming language. This project revolves around improving the website on multiple different fronts including: layout, accessibility, and content.
Improve the OCaml.org Website
OCaml.org is the main website for OCaml, a functional, typed, high-level programming language. This project revolves around improving the website on multiple different fronts including: layout, accessibility, and content.
Summer 2020
Reducing Global Mutable State in the OCaml Compiler Codebase
Structured Output Format for the OCaml Compiler Messages
Usually, the output messages from the compiler are a bit more difficult to read for a machine, hence it's more time consuming to find the warnings, errors, etc., and their origin. By producing a structured output for compiler messages, other tools can more easily interoperate with them and provide tooling on top of the messages.
Summer 2019
Test the OCaml Compiler With Code Coverage Tools
Improving the compiler testing process using code coverage tools. The core OCaml system has a large test suite, and it would be very useful to see which parts of the system are tested more actively and which are not so. Developers will be helped to see where it is needed to add new tests, and in the process of improving coverage, it is possible to find unexplored bugs and fix them. It might help to make OCaml and its libraries more reliable.
Test the OCaml Compiler With Random Tests and a Reference Interpreter
The aim of this project is to extend an existing testcase-generator for the OCaml compiler, using a reference interpreter (existing or newly developed) to find a lot of bugs in the compiler and fix as much of them as possible.
Summer 2016
Winter 2015
Summer 2014
MirageOS Contributions and Improvements
MirageOS Cloud API Support
MirageOS (see http://xenproject.org/developers/teams/mirage-os.html, http://www.openmirage.org/) is a type-safe unikernel written in OCaml which generates highly specialised "appliance" VMs that run directly on Xen without requiring an intervening kernel. A MirageOS application typically runs via several communicating kernel instances on the cloud. Today these instances are difficult to manage; we would like to explore strategies for managing these distributed computations using common public cloud APIs such as those exposed by Amazon EC2 and Rackspace. First we need to create pure OCaml API bindings for (e.g.) EC2 and Rackspace (purity is needed to ensure portability). These API bindings can then be used to provide operating-system-level abstractions to the unikernels. For example, a traditional VM might hotplug a vCPU; while a MirageOS application would request a "VM create" using the cloud API and "connect" the new instance to the existing network. We should be able to spin up 1000s of "CPUs" by using such APIs in a cluster environment. As well as helping Xen/Mirage, the public cloud API bindings will be very useful to other people in other contexts -- a nice side-effect.