Parsing and serializing MIDI files in Rust
Fast, allocation free MIDI parsing
This is a super simple post about augmented-midi
, an unpublished crate for MIDI
event and file parsing and serialization.
The library:
- Parses standard MIDI files and messages
- Serializes MIDI messages
- Performs no allocations on parse (results are either owned or input references)
It is part of the augmented-audio
repository
and is used by audio-processor-standalone
to feed MIDI input into generated processor CLIs.
You implement the AudioProcessor
and MIDIEventHandler
traits
and wrap it with audio-processor-standalone
.
Now you have a CLI that can process audio and MIDI, both in real-time and offline.
When rendering offline, your CLI will be able to be process MIDI input:
cd ./crates/apps/synth
cargo run \
--release -- \
--input-file=../../../input-files/synthetizer-loop.mp3 \
--output-file=synth.wav \
--midi-input-file=../../augmented/data/augmented-midi/bach_846.mid
This outputs synth.wav
(⚠️ level are off so this is loud):
I wrote about what I’m trying with these abstractions in more detail on “Generic AudioProcessors in Rust”.
Usage example
Using the TryFrom
trait implementation:
let input_buffer = [0x9_8, 0x3C, 0x44];
let message = MIDIMessage::<&[u8]>::try_from(input_buffer).unwrap();
assert_eq!(
midi_message,
MIDIMessage::NoteOn(MIDIMessageNote {
channel: 8,
note: 60,
velocity: 68
})
);
Using the nom
functions, which are more verbose:
use augmented_midi::{MIDIMessage, MIDIMessageNote, MIDIParseResult, parse_midi_event, ParserState};
// Initialize parser state. This is here to support rolling status on MIDI files.
// Not relevant for single events.
let mut state = ParserState::default();
// We'll parse this &[u8] buffer
let input_buffer = [0x9_8, 0x3C, 0x44];
// We parse a message borrowing from the input buffer. We could use `MIDIMessage<Vec<u8>>`
// to allocate owned messages.
//
// This will determine how variable size messages (SysEx) will be allocated.
//
// Parsing is otherwise only using the stack.
let parse_result: MIDIParseResult<MIDIMessage<&[u8]>> =
parse_midi_event(&input_buffer, &mut state);
let (_remaining_input, midi_message) = parse_result.unwrap();
assert_eq!(midi_message, MIDIMessage::NoteOn(MIDIMessageNote { channel: 8, note: 60, velocity: 68 }));
match midi_message {
MIDIMessage::NoteOn(MIDIMessageNote { channel, note, velocity }) => {
println!("\o/");
},
// MIDIMessage::... variants contain all the messages
_ => {}
}
Parser combinators: nom
The parser is implemented using nom
parser
combinators.
nom
not required for MIDI, which is a simple protocol, but made it easier to
go with a good structure from the start.
We have some type aliases:
pub type Input<'a> = &'a [u8];
pub type MIDIParseResult<'a, Output> = IResult<Input<'a>, Output>;
Out inputs are slices of bytes and our outputs are nom
results, these are
Result
types where success is a tuple of the remaining input and the parsed type
and failure is an error, that will reference the input position for the failure.
Here’s the function to parse the MIDI file format from MIDI files’ header chunk as an example:
fn parse_file_format(input: Input) -> MIDIParseResult<MIDIFileFormat> {
let (input, format) = be_u16(input)?;
let format = match format {
0 => Ok(MIDIFileFormat::Single),
1 => Ok(MIDIFileFormat::Simultaneous),
2 => Ok(MIDIFileFormat::Sequential),
_ => Ok(MIDIFileFormat::Unknown),
}?;
Ok((input, format))
}
Our function takes Input
, which is just &[u8]
and returns Result<(&[u8], MIDIFileFormat), Error>
.
It takes a stream of bytes, parses a MIDIFileFormat
enum and returns it with
the remaining stream of bytes. Otherwise, it returns an error.
MIDI events/files are very simple and compact, but there’s a bit of bit fiddling. For example, most events have 3 bytes, with the first byte being both the type of the event and sometimes what channel it refers to. The first 4 bits of this status byte are the status and the last 4 bits might contain the channel information.
The crate is useful to avoid repeating this on application code. It uses bit-masks and returns nice ‘rusty’ types other code can pattern match over.
The entry-points to the library are the parse_midi_event
and parse_midi_file
functions and the MIDIMessage
or MIDIFile
types.
The functions and types are generic: MIDIMessage<Buffer: Borrow<[u8]>>
.
Most messages have static sizes and will be stack-allocated. For example, note-on messages always have 3 bytes while program change messages always have 2 bytes. The generic argument is used to determine how the variable size messages are handled (just SysEx).
If MIDIMessage<&'a [u8]>
is parsed, the sys-ex messages will point to the
input buffer. On the other hand, if MIDIMessage<Vec<u8>>
is parsed, the sys-ex
messages will be owned and the buffers copied into vecs.
The same goes for parts of the MIDI file spec which require strings such as chunks (this will only be used for unknown chunks found). Though these strings will always only have 4 bytes so maybe this could be removed.
Some nice Rust, here is SysEx parsing:
pub fn parse_sysex_event<'a, Buffer: Borrow<[u8]> + From<Input<'a>>>( // [1]
input: Input<'a>,
) -> MIDIParseResult<'a, MIDISysExEvent<Buffer>> { // [2]
let (input, _) = alt((tag([0xF7]), tag([0xF0])))(input)?;
let (input, bytes) = take_till(|b| b == 0xF7)(input)?;
let (input, _) = take(1u8)(input)?;
Ok((
input,
MIDISysExEvent {
message: bytes.into(), // [3]
},
))
}
- The type is generic over a
Buffer
type argument, which will be the output storage a. we require this type to be borrow-able as a[u8]
slice, so that we can read it back in a standard way as a slice of bytes b. we require that this type be convertible from our input type, which is&[u8]
- We return the input/output tuple result as before, lifetime annotations are carried with the input type
- Conversion from the input slice into just a reference to it (no conversion)
or into a
Vec<u8>
is handled by.into()
calling theInto
trait
Serialization: cookie_factory
cookie-factory is used for serializing MIDI events. Currently, only MIDI events can be serialized, rather than the full MIDI standard files specification.
This maps closely to the nom
parser combinators. I remember something about
isomorphic parser/pretty-printer code,
but this is a very tiny serializer.
Performance
augmented-midi
is very fast, these are benchmarks..
Generally:
- Performance isn’t an issue for this
- No allocation is 1 order of magnitude faster than yes allocation 🤷
- My computer can parse billions of MIDI messages a second; so it doesn’t matter
Conclusion
That is all! Thank you for reading and give me feedback on Twitter @yamadapc
.