MIDI

By the late 1970s and early 1980s there were quite a few companies manufacturing voltage-controlled electronic synthesizers. Many of those companies were increasingly interested in the potential power of computers and digital communication as a way of providing control voltages for their synthesizers. Computers could be used to program sequences of numbers that could be converted into specific voltages, thus providing a new level of control for the synthesized sound.

In the early 1980s a consortium of synthesizers manufacturers—led by Dave Smith, founder and CEO of the Sequential Circuits company, which made the Prophet synthesizer—developed a communication protocol for electronic instruments that would enable instruments to transmit and receive control information. They called this new protocol the Musical Instrument Digital Interface (MIDI).

The MIDI protocol consists of a hardware specification and a software specification. For the hardware, it was decided to use 5-pin DIN plugs and jacks, with 3-conductor cable to connect devices. The cable establishes a circuit between two devices. Communication through the cable is unidirectional; one device transmits and the other receives. Thus, MIDI-capable devices have jacks labeled MIDI Out (for transmitting) and MIDI In (for receiving), and sometimes also a jack labeled MIDI Thru which transmits a copy of whatever is received in the MIDI In jack. MIDI communication is serial, meaning that the bits of information are transmitted sequentially, one after the other. The transmitting device communicates a digital signal by sending current to mean 0 and no current to mean 1, at a rate of 31,250 bits per second.

For the software specification, it was decided that each word of information would consist of ten bits: a start bit (0), an 8-bit byte, and a stop bit (1). Thus MIDI is theoretically capable of transmitting up to 3,125 bytes per second. A MIDI message consists of one status byte declaring what kind of message it is, followed by zero or more data bytes giving parameter information. In the next paragraphs discussing the actual contents of those bytes, we’ll ignore the start and stop bits, since they’re always the same for every byte and don’t really contain meaningful information.

A status byte always starts with the digit 1 (distinguishing it as a status byte), which means that its decimal value is in the range from 128 to 255. (In binary representation, an 8-bit byte can signify one of 256 different integers, from 0 to 255. The most significant digit is the 128s place in the binary representation.) A data byte always starts with the digit 0 (to distinguish it from a status byte), which means that its decimal value is in the range from 0 to 127.

What do the numbers actually mean? You can get a complete listing of all the messages in the official specification of MIDI messages. Briefly, there are two main categories of messages: system messages and channel messages. System messages contain information that are assumed to be of interest to all connected devices; channel messages have some identifying information coded within them that allows receiving devices to discern between 16 different message “channels”, which can be used to pay attention only to certain messages and not others. (A good metaphor for understanding this would be a television signal broadcast on a particular channel. Devices that are tuned to receive on that channel will pay attention to that broadcast, while other devices that are not tuned to receive on that channel will ignore it.) Let’s take a look at channel messages.

Channel messages are used to convey performance information such as what note is played on a keyboard, whether a pedal is up or down, etc. The types of channel messages include: note-off and note-on (usually triggered by a key on a pianolike keyboard), pitchbend (usually a series of messages produced by moving a wheel, to indicate pitch inflections from the main pitch of a note), continuous control (usually a series of messages produced by a fader, knob, pedal, etc., to describe some kind of curve of change over time such as volume, panning, vibrato depth, etc.), aftertouch (a measurement of the pressure applied to a key after it’s initially pressed), and program messages (telling the receiving device to switch to a different timbre). Rather than try to describe all of these in detail, we’ll look at the format of one particular type of message: note-on.

When a key on a synthesizer keyboard is pressed, two sensors at different heights underneath the key are triggered, one at the beginning and the other at the end of the key’s descent. Since the distance between the two sensors is known, the velocity with which the key was pressed can be calculated by measuring the time between the triggering of the two sensors. (v=d/t) The synthesizer clocks the time between the triggering of the two sensors to detect the velocity of the key’s descent. The synthesizer then sends out a MIDI note-on message telling which key was pressed and with what velocity.

A MIDI note-on message therefore consists of three bytes: message type, key number, and velocity . The first byte is the status byte saying, “I’m a note-on message.” Since the format of a note-on message is specified as having three bytes, the receiving device knows to consider the next two bytes as key number and velocity. The next two bytes are data bytes stating the number of the key that was pressed, and the velocity with which it was pressed. Let’s looks closely at the anatomy of each byte. (We’ll ignore the start and stop bits in this discussion, and will focus only on the 8-bit byte between them.)

A status byte for a note-on message might look like this: 10010000. The first digit is always 1, meaning “I’m a status byte.” The next three digits say what kind of message it is. (For example, 000 means “note-off”, 001 means “note-on”, and so on.) The final four digits tell what channel the message should be considered to be on. A receiving device can use the channel information to decide whether or not it wants to pay attention to the message; it can pay attention to all messages, or it can pay attention only to messages on a specific channel. Although these four digits together can express decimal numbers from 0 to 15, it’s conventional to refer to MIDI channels as being numbered 1 to 16. (That’s just a difference between computer numbering, which almost always starts at 0, and human counting, which usually starts at 1.) So, the four digits 0000 mean “MIDI channel 1”.

The first data byte that follows the status byte might look like this: 00111111. The first digit of a data byte is always 0, so the range of possible values that can be stated by the remaining seven digits is 0 to 127. By convention, for key numbers the decimal number 60 means piano middle C. The number shown in this byte is 63, so it’s indicating that the key D# above middle C is the key that was pressed. (Each integer designates a semitone on the equal-tempered twelve-tone chromatic scale, so counting up from middle C (60) we see that 61=C#, 62=D, 63=D#, and so on.) The next data byte might look like this: 01101101. This byte designates the velocity of the key press on a scale from 0 to 127. The number is calculated by the keyboard device based on the actual velocity with which the key was pressed. The value shown here is 109 in decimal, so that means that on a scale from 0 to 127, the velocity with which the key was pressed was pretty high. Commonly the receiving device will use that number to determine the loudness of the note that it plays (and it might also use that number for timbral effect, because many acoustic instruments change timbre depending on how hard they’re played).

So the whole stream of binary digits (with start and stop bits shown in gray), would be 010010000100011111110011011011. The first byte says, “I’m a note-on message on channel 1, so the next two bytes will be the key number and velocity.” The second byte says, “The key that was pressed is D# above middle C.” The third byte says, “On a scale from 0 to 127, the velocity of the key press is rated 109.” The device that receives this message would begin playing a sound with the pitch D# (fundamental frequency 311.127 Hz), probably fairly loud. Some time later, when the key is released, the keyboard device might send out a stream that looks like this: 010010000100011111110000000001. The first two bytes are the same as before, but the third byte (velocity) is now 0. This says, “The key D# above middle C has now been played with a velocity of 0.” Some keyboards use the MIDI note-off message, and some simply use a note-on message with a velocity of 0, which also means “off” for that note.

Notice that the MIDI note-on message does not contain any timing information regarding the duration of the note. Since MIDI is meant to be used in real time—in live performance, with the receiving device responding immediately—we can’t know the duration of the note until later when the key is released. So, it was decided that a note would require two separate messages, one when it is started and another when it is released. Any knowledge of the duration would have to be calculated by measuring the time elapsed between those two messages.

A complete musical performance might consist of very many such messages, plus other messages indicating pedal presses, movement of a pitchbend wheel, etc. If we use a computer to measure the time each message is received, and store that timing information along with the MIDI messages, we can make a file that contains all the MIDI messages, tagged with the time they occurred. That will allow us to play back those messages later with the exact same timing, recreating the performance. The MIDI specification includes a description of exactly how a MIDI file should be formatted, so that programmers have a common format with which to store and read files of MIDI. This is known as Standard MIDI File (SMF) format, and such files usually have a .mid suffix on their name to indicate that they conform to that format.

It’s important to understand this: MIDI is not audio. The MIDI protocol is intended to communicate performance information and other specifications about how the receiving device should behave, but MIDI does not transmit actual representations of an audio signal. It’s up to the receiving device to generate (synthesize) the audio itself, based on the MIDI information it receives. (When you think about it, the bit rate of MIDI is way too low to transmit decent quality audio. The best it could possibly do is transmit frequencies up to only about 1,500 Hz, with only 7 bits of precision.)

The simplicity and compactness of MIDI messages means that performance information can be transmitted quickly, and that an entire music composition can be described in a very small file, because all the actual sound information will be generated later by the device that receives the MIDI data. Thus, MIDI files are a good way to transmit information about a musical performance or composition, although the quality of the resulting sound depends on the synthesizer or computer program that is used to replicate the performance.