N E G R O P O N T E    Issue 2.07 - July 1996

Object Orientated Television

By Nicholas Negroponte



Message 37:
Date: 1.7.96
From: nicholas@media.mit.edu
To: oem@wired.co.uk

The Media Lab's Michael Bove believes that a television set should be more like a movie set. But movies require locations, actors, budgets, scripts, producers and directors. What would it mean, Bove wonders, if your TV worked with sets instead of scan lines ?

Sets and actors

For too long, TV has taken its lead from photography, which collapses the three-dimensional world onto a plane. Except for the image-sensing mechanism attached to the back, today's TV camera is very similar to a Renaissance camera obscura. This long-standing construct is perhaps the wrong way to think about television. Maybe there is a way to capture the scene as a collection of objects moving in three dimensions versus capturing a single viewpoint on the scene. Think of it as a computer graphics process, more like Toy Story than Seinfeld.

The networked virtual reality language VRML has such a model behind it. But it's difficult to author good virtual worlds from thin air, so there aren't any on the Web that are as funny as Seinfeld or as appealing to the public as basketball. What we need is "real virtuality", the ability to point a computer or camera at something and later look at it from any point of view.

This is particularly important to Hollywood, because most of the cost of a movie is in front of the camera, not behind it. Object-oriented television should cost less both in front and behind, and not look cartoon-like. It will still involve cameras, but instead of giving the post-production people (or the viewers of an interactive programme) a switch that flips between cameras one and two, these cameras will contribute what they observe to a database from which any viewpoint can be constructed.

Similarly, TV sound should be object-oriented. Instead of left and right channels, sound can be represented as individual sound sources in an acoustically modelled space. So on playback, we can resynthesise the speaker to correspond with the arrangement of things on the screen and the viewer's path through them.

The bit budget

TV is a bandwidth pig. Ten years ago, a common assumption was that 45 million bits per second were needed to obtain studio-quality television. Today, that level of performance is possible at four million bps - quite an improvement, but compared with the 29,000 bps you get when connecting to the Internet (if you're lucky), we still have a long way to go.

There is one fundamental reason for this profligate use of bandwidth. TV receivers are dumb - in particular, they are forgetful. On a per-cubic-centimetre basis, your microwave oven may be smarter. A TV set is spoon-fed pixels - line by line, frame by frame. Even if you compress them by removing the enormous redundancy that occurs within and between frames and by taking advantage of the characteristics of human vision, video today still uses many more bits than a computer graphics database that can synthesise the same images.

Inefficiency also results from a lack of memory. Your TV doesn't remember that the set of the local news changes only about once every three years, it doesn't remember the architecture of sports arenas and it doesn't remember the commercials we've seen six times each hour.

The digital TV sets about to hit the market are able to do a lot more multiplications per second than your microwave oven, but they still aren't "clever". They decode a closed-form standard, known as MPEG-2 (derived from the Motion Picture Experts Group), which may be among the last standards for which anyone bothers to develop a dedicated chip. Why ? Because a single data standard for digital video, one that is always best, just does not exist.

We need a flexible decoder capable of interpreting whatever the originator (or an automatic process) decides is the best way to encode a given scene. For example, it would be more efficient (and legible!) to transmit the fine print during car-lease commercials as PostScript (a common standard for typography and printers) instead of MPEG. Your TV's decoding capabilities might be updated as often as your PC's Web browser is now.

Storytelling

Having actors and sets hang around in our TVs isn't going to do us a lot of good unless we can tell them to do something interesting. So, in addition to objects, we need a script that tells the receiver what to do with the objects in order to tell a story.

TV conceived as objects and scripts can be very responsive. Consider hyperlinked TV, where touching an athlete produces relevant statistics, or touching an actor reveals that his necktie is on sale this week. Bits that contain more information about pixels than their colour, that tell them how to behave and where to look for further instructions, can be embedded.

These bits-about-the-bits will resolve a problem that has beleaguered Hollywood directors faced with one-version-fits-all screens and made them envious of graphic designers, who can design postage stamps, magazine ads and highway billboards using different rules of visual organisation. Television programmes could react according to the originator's intention when viewed under different circumstances (for instance, more close-ups and cuts on a small screen).

You think Java is important - wait until we have a similar language for storytelling. TV is, after all, an entertainment medium. Its technology will be judged by the richness of the connection between creator and viewer. As Bran Ferren of Disney has said, "We need dialogue lines, not scan lines."

This article was co-authored by V. Michael Bove (vmb@media.mit.edu), Alexander Dreyfoos Career Development professor, MIT Media Lab.