Parsing and serializing MIDI files in Rust

Fast, allocation free MIDI parsing

Author profile picture
Pedro Tacla Yamada
3 May 20226 min read

This is a super simple post about augmented-midi, an unpublished crate for MIDI event and file parsing and serialization.

The library:

  • Parses standard MIDI files and messages
  • Serializes MIDI messages
  • Performs no allocations on parse (results are either owned or input references)

It is part of the augmented-audio repository and is used by audio-processor-standalone to feed MIDI input into generated processor CLIs.

You implement the AudioProcessor and MIDIEventHandler traits and wrap it with audio-processor-standalone.

Now you have a CLI that can process audio and MIDI, both in real-time and offline.

When rendering offline, your CLI will be able to be process MIDI input:

cd ./crates/apps/synth
cargo run \
    --release -- \
    --input-file=../../../input-files/synthetizer-loop.mp3 \
    --output-file=synth.wav \
    --midi-input-file=../../augmented/data/augmented-midi/bach_846.mid

This outputs synth.wav (⚠️ level are off so this is loud):

I wrote about what I’m trying with these abstractions in more detail on “Generic AudioProcessors in Rust”.

Usage example

Using the TryFrom trait implementation:

let input_buffer = [0x9_8, 0x3C, 0x44];
let message = MIDIMessage::<&[u8]>::try_from(input_buffer).unwrap();
assert_eq!(
   midi_message,
   MIDIMessage::NoteOn(MIDIMessageNote {
      channel: 8,
      note: 60,
      velocity: 68
   })
);

Using the nom functions, which are more verbose:

use augmented_midi::{MIDIMessage, MIDIMessageNote, MIDIParseResult, parse_midi_event, ParserState};

// Initialize parser state. This is here to support rolling status on MIDI files.
// Not relevant for single events.
let mut state = ParserState::default();

// We'll parse this &[u8] buffer
let input_buffer = [0x9_8, 0x3C, 0x44];

// We parse a message borrowing from the input buffer. We could use `MIDIMessage<Vec<u8>>`
// to allocate owned messages.
//
// This will determine how variable size messages (SysEx) will be allocated.
//
// Parsing is otherwise only using the stack.
let parse_result: MIDIParseResult<MIDIMessage<&[u8]>> =
    parse_midi_event(&input_buffer, &mut state);
let (_remaining_input, midi_message) = parse_result.unwrap();

assert_eq!(midi_message, MIDIMessage::NoteOn(MIDIMessageNote { channel: 8, note: 60, velocity: 68 }));
match midi_message {
    MIDIMessage::NoteOn(MIDIMessageNote { channel, note, velocity }) => {
        println!("\o/");
    },
    // MIDIMessage::... variants contain all the messages
    _ => {}
}

Parser combinators: nom

The parser is implemented using nom parser combinators.

nom not required for MIDI, which is a simple protocol, but made it easier to go with a good structure from the start.

We have some type aliases:

pub type Input<'a> = &'a [u8];
pub type MIDIParseResult<'a, Output> = IResult<Input<'a>, Output>;

Out inputs are slices of bytes and our outputs are nom results, these are Result types where success is a tuple of the remaining input and the parsed type and failure is an error, that will reference the input position for the failure.

Here’s the function to parse the MIDI file format from MIDI files’ header chunk as an example:

fn parse_file_format(input: Input) -> MIDIParseResult<MIDIFileFormat> {
    let (input, format) = be_u16(input)?;
    let format = match format {
        0 => Ok(MIDIFileFormat::Single),
        1 => Ok(MIDIFileFormat::Simultaneous),
        2 => Ok(MIDIFileFormat::Sequential),
        _ => Ok(MIDIFileFormat::Unknown),
    }?;
    Ok((input, format))
}

Our function takes Input, which is just &[u8] and returns Result<(&[u8], MIDIFileFormat), Error>.

It takes a stream of bytes, parses a MIDIFileFormat enum and returns it with the remaining stream of bytes. Otherwise, it returns an error.

MIDI events/files are very simple and compact, but there’s a bit of bit fiddling. For example, most events have 3 bytes, with the first byte being both the type of the event and sometimes what channel it refers to. The first 4 bits of this status byte are the status and the last 4 bits might contain the channel information.

The crate is useful to avoid repeating this on application code. It uses bit-masks and returns nice ‘rusty’ types other code can pattern match over.

The entry-points to the library are the parse_midi_event and parse_midi_file functions and the MIDIMessage or MIDIFile types.

The functions and types are generic: MIDIMessage<Buffer: Borrow<[u8]>>.

Most messages have static sizes and will be stack-allocated. For example, note-on messages always have 3 bytes while program change messages always have 2 bytes. The generic argument is used to determine how the variable size messages are handled (just SysEx).

If MIDIMessage<&'a [u8]> is parsed, the sys-ex messages will point to the input buffer. On the other hand, if MIDIMessage<Vec<u8>> is parsed, the sys-ex messages will be owned and the buffers copied into vecs.

The same goes for parts of the MIDI file spec which require strings such as chunks (this will only be used for unknown chunks found). Though these strings will always only have 4 bytes so maybe this could be removed.

Some nice Rust, here is SysEx parsing:

pub fn parse_sysex_event<'a, Buffer: Borrow<[u8]> + From<Input<'a>>>( // [1]
    input: Input<'a>,
) -> MIDIParseResult<'a, MIDISysExEvent<Buffer>> {                    // [2]
    let (input, _) = alt((tag([0xF7]), tag([0xF0])))(input)?;
    let (input, bytes) = take_till(|b| b == 0xF7)(input)?;
    let (input, _) = take(1u8)(input)?;
    Ok((
        input,
        MIDISysExEvent {
            message: bytes.into(),                                    // [3]
        },
    ))
}
  1. The type is generic over a Buffer type argument, which will be the output storage a. we require this type to be borrow-able as a [u8] slice, so that we can read it back in a standard way as a slice of bytes b. we require that this type be convertible from our input type, which is &[u8]
  2. We return the input/output tuple result as before, lifetime annotations are carried with the input type
  3. Conversion from the input slice into just a reference to it (no conversion) or into a Vec<u8> is handled by .into() calling the Into trait

cookie-factory is used for serializing MIDI events. Currently, only MIDI events can be serialized, rather than the full MIDI standard files specification.

This maps closely to the nom parser combinators. I remember something about isomorphic parser/pretty-printer code, but this is a very tiny serializer.

Performance

augmented-midi is very fast, these are benchmarks..

Generally:

  • Performance isn’t an issue for this
  • No allocation is 1 order of magnitude faster than yes allocation 🤷‍
  • My computer can parse billions of MIDI messages a second; so it doesn’t matter

Conclusion

That is all! Thank you for reading and give me feedback on Twitter @yamadapc.