The machine that goes ping (or, Being stuck in the MIDI of it all)
April 10, 2003
Want to dig even deeper? Post to the new MacEdition Forums (beta)!
Mac OS X has good support for audio applications. Apple has provided some fundamental technologies that applications can build upon, as well as developer APIs. Additionally, third parties are adding to the available tools.
First, the Apple technologies ...
The Cocoa toolkit provides the NSSound class to provide very basic playback functions. NSSound supports loading and playing of AIFF, WAV and the NeXT “.snd” file formats (but not MP3). Some of the features NSSound supports:
- creating a sound from a file or via an URL
- creating a sound from data on the clipboard (also termed “pasteboard” in Cocoa parlance)
- pasting a sound to the clipboard
- playing the sound
- pausing, resuming and stopping the sound
There is also a delegate method which lets you know when the sound is finished playing.
NSSound lets you associate sounds with names, giving you a basic sound dictionary. If there are sound files in one of your library directories (like /Library/Sounds or ~/Library/Sounds), you can used the named-sound features to load those sounds without having to determine their paths.
NSSound doesn't support other sound formats (particularly the popular MP3 format) and it only supports sampled sounds, so it won't be able to play MIDI. Pausing and resuming of sounds doesn't work very well.
As NSSound is simple and straightforward (and limited in features), the Core Audio frameworks are complex and feature-rich. For Mac OS X, Apple has implemented an audio architecture that spans from low-level device banging to playing and sequencing music. In particular, Apple has taken responsibility for providing audio and MIDI services so that individual applications won't have to create their own technologies for handling those aspects of sound.
At the bottom, you have the HAL (hardware abstraction layer), which talks to the IOKit in the kernel to read and write to the actual physical devices, such as synthesizers, microphones and speakers. The HAL presents a uniform programming interface to applications, removing applications from the details of device access.
Above the HAL are Audio Units. These units can be either a source, like a software synthesizer, or a destination, like a speaker. Some Audio Units can be both a source and a destination, like special effects units that provide delay or reverb.
Specific Audio Units are collected together into a graph topology, the AUGraph. This graph describes the particular units and their connections. The graph can be serialized, so it can be written out to disk and then reconstructed in memory at a later time.
On top of the Audio Units are Music Sequences, which collect tracks of music events (like MIDI events). These events can be edited, and applications can iterate over the data to determine what's actually in there. There are features in the framework to load MIDI files and create sequences from them.
The Core Audio API has a Carbon flavor, with functions returning result codes, and callback functions for dealing with incoming and outgoing audio data.
QuickTime is a venerable and complex technology, but it is very complete and well supported. If you're wanting to play almost any audio format (yes, including MP3), you can use QuickTime.
Even though these are “non-Apple” technologies, they're still ultimately based on the foundation provided by Core Audio. Mike Thornburgh has created the MTCoreAudio framework, a set of Objective-C wrappers around the low levels of the Core Audio frameworks (the HAL layer), which gives it a nice Cocoa flavor to accessing the lower-level features.
The primary class in this framework is the MTCoreAudioDevice class, which provides features for determining the default input and output devices, as well as locating all of the audio devices attached to the system. Class delegate methods are provided to monitor hardware changes, as in the default input or output devices, or if devices are added or removed. Audio device delegate methods are provided to notify interested parties when low-level device properties have changed: channel configuration changes, the device being removed, sample rate changes or the device getting overloaded. Classes are also provided for accessing the data streams (input and output) for specific audio devices.
SndKit and MusicKit
Back in the mists of time, the original NeXT systems came with two frameworks dedicated to audio programming. There was the SndKit for dealing with sampled sounds and pre-recorded special effects, and the MusicKit, which provided tools for composing, storing and performing music. When the NeXTcube first came out, these two kits were some of the most compelling features of the machines for me. (At that point in time, they were only selling machines to designated educational institutions, and unfortunately the Little Liberal Arts School in central Arkansas that I went to wasn't one of those institutions, so I could only sit back and drool.) NeXT released the source code to Stanford in 1994, and now these kits are undergoing development to make them work under OS X. Some features work, and some are still in progress.
developer.apple.com/audio/ : Apple's Audio Developer page – news and technical resources
developer.apple.com/audio/coreaudio.html : Core Audio architecture page
aldebaran.armory.com/~zenomt/macosx/MTCoreAudio/ : MTCoreAudio home
www.musickit.org/ : home of the SndKit and the MusicKit
www.mat.ucsb.edu:8000/CoreAudio : CoreAudio and CoreMIDI Swiki – a Wiki dedicated to CoreAudio
The SndKit, so sayeth the documentation, lets you examine and manipulate sound data “with microscopic precision.” It will load and play sound files, it supports named sound files like NSSound, and also provides access to the underlying raw sound data via the SndSoundStruct. Other classes support editing of the sound data via the standard clear, copy and paste commands. When playing sounds, the SndKit supports sound data that is fragmented in memory, which means that the actual editing features are pretty low-cost – no slinging huge amounts of data around to edit part of a sound. The original SndKit provided a visual sound editor (almost an oxymoron) that let the end user record sounds and then edit them. (The original NeXT email program would let you record and attach them to emails. Not bad for 15 years ago.)
While the SndKit is for sound effects, the MusicKit is for music. The MusicKit has a number of classes to represent music, such as a Note class for the actual note along with a list of attributes like frequency, amplitude and duration. Notes are organized into Parts, and those parts are organized into a Score. There are also classes to support sound synthesis, like wave tables that describe the timbral information of the notes, envelopes describing the attack and decay for sounds, and tuning systems (in case you wanted to see just how well-tempered that clavier is). The MusicKit supports a textual representation of music known as a ScoreFile. ScoreFiles support some C-style programming features for describing the music. The kit can read in ScoreFiles to create the parts and score, and can take a score and write out ScoreFiles and standard MIDI files.
Once you have a representation, the Conductor class schedules the notes to play, and tells the Performer to play them. The Performer has Instruments that ultimately render the sound. The MusicKit has been used in some colleges' computer music curriculum, and there is a set of course materials. They're based on OpenStep, and so there will be some work involved to get the examples working under OS X.
That's a brief overview of some of the audio toolkits out there. Next month we'll dig into these toolkits a little deeper and explore how to perform basic audio tasks.
Mark Dalrymple (firstname.lastname@example.org) has been wrangling Mac and Unix systems for entirely too many years. In addition to random consulting and custom app development at Borkware, he also teaches the Core Mac OS X and Unix Programming class for the Big Nerd Ranch.