The AV Macs Arrive

They’re designed for digital media

Apple used Macworld Boston to unveil its much-discussed (and much anticipated) “Audio Visual” Macintoshes: the Centris 660AV and the Quadra 840AV. These are intended for applications that involve heavy processing of digitized data: image editing, digital video, music, sound and voice input/output.

They include a digital signal processor (DSP) integrated into the machine à la Next, composite video and S-video input connectors, a special microphone optimized for voice input (which is available as a separate unit or built into the housing of a new Apple Audio-Vision 14-inch color monitor) and Apple’s “Casper” voice-recognition technology.

We’ve often criticized Apple for the confusion created by its mind-boggling array of product lines and product permutations. The AV Macs add a new twist to the element of confusion. They are called Centris and Quadras, yet from a technological standpoint they bear little resemblance to the earlier line of Centris and Quadras, even sporting different motherboards. There are virtually no interchangeable parts between the old Quadras and the AV models, and the integration of DSP technology gives them a significantly different hardware architecture than that of any previous Macintoshes. We don’t know why Apple didn’t choose a new name for this new family.

Casper. We are impressed with the Casper voice recognition. At the demos we have received at the Apple development labs during the past couple of years, the thing that has always impressed us about Casper is how speaker-independent it can be. Unlike most voice recognition systems, a native English speaker with a North American accent can walk up to a Casper-equipped machine, talk to it, and the machine will recognize a remarkable number of words.

For the released product (officially named PlainTalk), Apple has elected to capitalize on this attribute. Unlike other voice recognition systems, this one is not trainable. It has been supplied with a phonetic dictionary that allows it to recognize an (almost) full range of Macintosh commands as spoken by a North American adult. (There will have to be other dictionaries for other parts of the world.) In short, as supplied by Apple it is intended as an alternative means of commanding your computer to perform functions rather than as a dictation-driven typist.

Controlling your computer it does remarkably well. Voice commands (except for really dangerous ones such as “erase disk”) are interchangeable with mouse or typed commands. They can even be incorporated into scripts and macros. At Macworld Expo, we found CE Software touting its latest version of QuicKeys, which is now AppleScript-compliant. We walked into a glass cubicle, where CE had set up 10 sample QuicKeys macros to be triggered by voice commands. Every one of them worked on the first try. Other show-goers reported the same result.

Speaker-independent voice recognition has long been a difficult technical challenge. There is, of course, a tradeoff: you can have speaker-independence with a tiny recognition vocabulary, or a large vocabulary that has been “trained” to one particular speaker, but not both. CE Software’s demo falls at one extreme on that spectrum, and would have commercial possibilities for kiosks and games.

We have heard rumors (but no announcements) that developers are writing dictation and speech transcription applications, which would fall at the other extreme. Such products would be of interest to publishers, but we suspect that they will be a while in coming.

Voice output. The other side of Apple’s PlainTalk speech technology is new routines for synthesizing speech from text input. We had originally seen this demonstrated a couple of years ago in the Apple Labs with a reading of Moby Dick. This is still computer-generated speech, and it can still mangle specialized words in amusing ways. However, it is a dramatic improvement over most previous systems.

What’s it good for? We have trouble seeing the value of voice input and output for normal office applications. The mouse and keyboard are faster for command input, and reading is faster than listening. However, voice I/O makes a lot of sense in situations where your hands are tied up doing other things, or where you cannot look at the screen and would rather have the computer read things to you. The technology also makes sense for kiosk and other consumer applications.

Voice input can be a big help for people who have trouble using a mouse or a keyboard. Voice output can be vital for people who are visually impaired. However, the Apple software out of the box does not provide the kind of comprehensive facilities for visually impaired people found in systems such as the one IBM developed in conjunction with Recording for the Blind.

Video processing. The AV Macs will accept video input in NTSC, PAL and SECAM formats. As you would expect, QuickTime movies can be captured and played with larger frames and faster frame rates than are possible on non-AV Macs. However, these machines are still not up to processing full-frame, 30-frame-per-second video.

A new Digital Audio Video (DAV) connector allows external add-on devices to access the Macintosh digital audio and video information. This will almost certainly encourage a new crop of add-on compression and digital audio/video processing cards.

Sound. The DSP allows the AV Macs to process 16-bit stereo sound at the full 44.1-KHz audio-CD sampling rate. These are the first desktop computers able to do this.

Telephony. If the computer can handle CD-quality sound, fooling a phone system should be a snap. AV Macs have the built-in ability to emulate telephones, data modems and fax machines. An optional external box called the GeoPort Telecom Adapter performs the necessary analog-to-digital conversions. Accompanying software running in the Macintosh does the rest. It will support standard protocols up to V.32 running at 9,600 bps.

Some of the applications for this technology are built-in fax software — both send and receive — bundled with the machines and the ability to use a Macintosh as a digital answering machine/speakerphone. (Just to put things in perspective, AT&T makes a very nice System 1337 digital answering machine that sells for less than $100.)

The AV Macs have been eagerly awaited by many of the key developers of publishing applications. The machines are expensive, but the developers who have been using them are convinced that high-end users will flock to them, especially for imaging and video applications. It is less clear how widely the voice recognition and speech synthesis features will be used until they find their way down to less pricy computers. But the AV machines will probably be good test-beds for voice technology, and once you have gone to the expense of building in the DSP technology, the voice capabilities add very little incremental cost.

Peter Dyson