VIRTUAL reality never quite lived up to the hype. In the 1990s films such as “Lawnmower Man” and “The Matrix” depicted computer-generated worlds in which people could completely immerse themselves. In some respects this technology has become widespread: think of all those video-game consoles capable of depicting vivid, photorealistic environments, for example. What is missing, however, is a convincing sense of immersion. Virtual reality (VR) doesn't feel like reality.
One way to address this is to use fancy peripherals—gloves, helmets and so forth—to make immersion in a virtual world seem more realistic. But there is another approach: that taken by VR's sibling, augmented reality (AR). Rather than trying to create an entirely simulated environment, AR starts with reality itself and then augments it. “In augmented reality you are overlaying digital information on top of the real world,” says Jyri Huopaniemi, director of the Nokia Research Centre in Tampere, Finland. Using a display, such as the screen of a mobile phone, you see a live view of the world around you—but with digital annotations, graphics and other information superimposed upon it.
The data can be as simple as the names of the mountains visible from a high peak, or the names of the buildings visible on a city skyline. At a historical site, AR could superimpose images showing how buildings used to look. On a busy street, AR could help you choose a restaurant: wave your phone around and read the reviews that pop up. In essence, AR provides a way to blend the wealth of data available online with the physical world—or, as Dr Huopaniemi puts it, to build a bridge between the real and the virtual.
AR, me hearties
It all sounds rather distant and futuristic. The idea of AR has, in fact, been around for a few years without making much progress. But the field has recently been energised by the ability to implement AR using advanced mobile handsets, rather than expensive, specialist equipment. Several AR applications are already available. Wikitude, an AR travel-guide application developed for Google's Android G1 handset, has already been downloaded by 125,000 people. Layar is a general-purpose AR browser that also runs on Android-powered phones. Nearest Tube, an AR application for Apple's iPhone 3GS handset, can direct you in London to the nearest Underground station. Nokia's “mobile augmented reality applications” (MARA) software is being tested by staff at the world's largest handset-maker, with a public launch imminent.
What has made all this possible is the emergence of mobile phones equipped with satellite-positioning (GPS) functions, tilt sensors, cameras, fast internet connectivity and, crucially, a digital compass. This last item is vital, and until recently it was the one bit of hardware that was missing from the iPhone, says Philipp Breuss-Schneeweis of Mobilizy, the Austrian software house which developed Wikitude. (A compass is standard on the Android G1 handset.) But the launch of the compass-equipped iPhone 3GS handset in June is expected to trigger a deluge of AR apps.
The combination of GPS, tilt sensors and a compass enables a handset to determine where it is, its orientation relative to the ground, and which direction it is being pointed in. The camera allows it to see the world, and the wireless-internet link allows it to retrieve information relating to its surroundings, which is combined with the live view from the camera and displayed on the screen. All this is actually quite simple, says Mr Breuss-Schneeweis. In the case of Wikitude, the AR software works out the longitudes and latitudes of objects in the camera's field of view so that they can be tagged accordingly, he says.
Precisely which items in the real world are labelled varies from one AR application to another. Wikitude, as its name implies, draws information from Wikipedia, the online encyclopedia, by scouring it for entries that list a longitude and latitude—which includes everything from the Lincoln Memorial to the Louvre. Using the application a tourist can stroll through the streets of a city and view the names of the landmarks in the vicinity. The full Wikipedia entry on any landmark can then be summoned with a click. There are 600,000 Wikipedia entries that include longitude and latitude co-ordinates, says Mr Breuss-Schneeweis, and the number is increasing all the time.
Information from social networks can be overlaid on the real world.
Another way to identify nearby landmarks is to draw upon existing databases, such as those used in satellite navigation systems. That is how Nokia's MARA system works. It is doubly clever because harvesting local points of interest from the NAVTEQ software built into many Nokia phones means no wireless-internet connection is needed to look them up.
However it is done, the result of both approaches is to present detailed information about the user's surroundings. That said, the precision of the tagging can vary somewhat, because satellite-positioning technology is only accurate to within a few metres at best. This can cause problems when standing very close to a landmark. “The farther you are away from the buildings the more accurate it seems to be,” says Mr Breuss-Schneeweis.
But there is a way to improve the accuracy of AR tagging at close quarters. Total Immersion, a firm based in Paris, is one of several companies using object recognition. By looking for a known object in the camera's field of view, and then analysing that object's position and orientation, it can seamlessly overlay graphics so that they appear in the appropriate position relative to the object in question.
Together with Alcatel-Lucent, a telecoms-equipment firm, Total Immersion is developing a mobile-phone service that allows users to point their phone's camera at an object, such as the Mona Lisa. The software recognises the object and automatically retrieves related information, such as a video about Leonardo da Vinci. The same approach will also allow advertisements in newspapers and on billboards to be augmented, too. Point your camera at a poster of a car, for example, and you might see a 3-D rendering of the vehicle floating in space, which can be viewed from any angle by moving around.
The simplest way to make all this work, says Greg Davis of Total Immersion, is to put 2-D bar-codes on posters and advertisements, which are detected and used to retrieve content which is then superimposed on the device's screen. But the trend is towards “markerless” tracking, where image recognition is used to identify targets. Putting a 2-D bar-code on the Mona Lisa, after all, is not an option.
Nokia's Point-and-Find software uses the markerless approach. It is a mobile-phone application, currently in development, that lets you point your phone at a film poster in order to call up local viewing times and book tickets. In theory this approach should also be able to recognise buildings and landmarks, such as the Eiffel Tower, although recognising 3-D objects is much more difficult than identifying static 2-D images, says Mr Davis. The way forward may be to combine image-recognition with satellite-positioning, to narrow down the possibilities when trying to identify a nearby building. The advantage of the image-recognition approach, says Mr Davis, is that graphics can be overlaid on something no matter where it is, or how many times it gets moved.
One category of moving objects that should be easy to track is people, or at least those carrying mobile phones. Information from social networks, such as Facebook, can then be overlaid on the real world. Clearly there are privacy concerns, but Latitude, a social-networking feature of Google Maps, has tested the water by letting people share their locations with their friends, on an opt-in basis. The next step is to let people hold up their handsets to see the locations and statuses of their friends, says Dr Huopaniemi, who says Nokia is working on this very idea.
As well as being able to see what your friends are up to now, it can be useful to see into the past. Nokia has developed an AR system called Image Space which lets users record messages, photos and videos and tag them with both place and time. When someone else goes to a particular location, they can then scroll back through the messages that people have left in the vicinity. More practically, Wikitude can also link virtual messages to real places by overlaying user-generated reviews of bars, hotels and restaurants from a website called Qype onto the establishments in question.
Other obvious uses for AR are turn-by-turn navigation, in which the route to a particular destination is painted onto the world; house-hunting, using AR to indicate which houses are for sale in a particular street; and providing additional information at sporting events, such as biographies of individual players and on-the-spot instant replays. Some of those attending this year's Wimbledon tennis tournament got a taste of things to come with a special version of Wikitude, called Seer, developed for the Android G1 handset in conjunction with IBM and Ogilvy, an advertising agency. It could direct users to courts, restaurants and loos, provide live updates from matches, and even show if there was a queue in the bar or at the taxi rank.
These sorts of application really are just the beginning, says Dr Huopaniemi. Virtual reality never really died, he says—it just divided itself in two, with AR enhancing the real world by overlaying information from the virtual realm, and VR becoming what he calls “augmented virtuality”, in which real information is overlaid onto virtual worlds, such as players' names in video games. AR may be a relatively recent arrival, but its potential is huge, he suggests. “It's a very natural way of exploring what's around you.” But trying to imagine how it will be used is like trying to forecast the future of the web in 1994. The building-blocks of the technology have arrived and are starting to become more widely available. Now it is up to programmers and users to decide how to use them.
This article appeared in the Technology Quarterly section of the print edition
In this chapter from Augmented Reality: Principles and Practice, Dieter Schmalstieg and Tobias Hollerer provide an introduction to the research field and practical occurrences of augmented reality, present a brief history of the field, take you on a whirlwind tour of AR application examples, and conclude with a discussion of related fields.This chapter is from the book
Virtual reality is becoming increasingly popular, as computer graphics have progressed to a point where the images are often indistinguishable from the real world. However, the computer-generated images presented in games, movies, and other media are detached from our physical surroundings. This is both a virtue—everything becomes possible—and a limitation.
The limitation comes from the main interest we have in our daily life, which is not directed toward some virtual world, but rather toward the real world surrounding us. Smartphones and other mobile devices provide access to a vast amount of information, anytime and anywhere. However, this information is generally disconnected from the real world. Consumers with an interest in retrieving online information from and about the real world, or linking up online information with the real world, must do so individually and indirectly, which, in turn, requires constant cognitive effort.
In many ways, enhancing mobile computing so that the association with the real world happens automatically seems an attractive proposition. A few examples readily illustrate this idea’s appeal. Location-based services can provide personal navigation based on the Global Positioning System (GPS), while barcode scanners can help identify books in a library or products in a supermarket. These approaches require explicit actions by the user, however, and are rather coarse grained. Barcodes are useful for identifying books, but not for naming mountain peaks during a hiking trip; likewise, they cannot help in identifying tiny parts of a watch being repaired, let alone anatomic structures during surgery.
Augmented reality holds the promise of creating direct, automatic, and actionable links between the physical world and electronic information. It provides a simple and immediate user interface to an electronically enhanced physical world. The immense potential of augmented reality as a paradigm-shifting user interface metaphor becomes apparent when we review the most recent few milestones in human–computer interaction: the emergence of the World Wide Web, the social web, and the mobile device revolution.
The trajectory of this series of milestones is clear: First, there was an immense increase in access to online information, leading to a massive audience of information consumers. These consumers were subsequently enabled to also act as information producers and communicate with one another, and finally were given the means to manage their communications from anywhere, in any situation. Yet, the physical world, in which all this information retrieval, authoring, and communication takes place, was not readily linked to the users’ electronic activity. That is, the model was stuck in a world of abstract web pages and services without directly involving the physical world. A lot of technological advancement has occurred in the field of location-based computing and services, which is sometimes referred to as situated computing. Even so, the user interfaces to location-based services remain predominantly rooted in desktop-, app-, and web-based usage paradigms.
Augmented reality can change this situation, and, in doing so, redefine information browsing and authoring. This user interface metaphor and its enabling technologies form one of today’s most fascinating and future-oriented areas of computer science and application development. Augmented reality can overlay computer-generated information on views of the real world, amplifying human perception and cognition in remarkable new ways.
After providing a working definition of augmented reality, we will briefly review important developments in the history of the research field, and then present examples from various application areas, showcasing the power of this physical user interface metaphor.
Definition and Scope
Whereas virtual reality (VR) places a user inside a completely computer-generated environment, augmented reality (AR) aims to present information that is directly registered to the physical environment. AR goes beyond mobile computing in that it bridges the gap between virtual world and real world, both spatially and cognitively. With AR, the digital information appears to become part of the real world, at least in the user’s perception.
Achieving this connection is a grand goal—one that draws upon knowledge from many areas of computer science, yet can lead to misconceptions about what AR really is. For example, many people associate the visual combination of virtual and real elements with the special effects in movies such as Jurassic Park and Avatar. While the computer graphics techniques used in movies may be applicable to AR as well, movies lack one crucial aspect of AR—interactivity. To avoid such confusion, we need to set a scope for the topics discussed in this book. In other words, we need to answer a key question: What is AR?
The most widely accepted definition of AR was proposed by Azuma in his 1997 survey paper. According to Azuma , AR must have the following three characteristics:
Combines real and virtual
Interactive in real time
Registered in 3D
This definition does not require a specific output device, such as a head-mounted display (HMD), nor does it limit AR to visual media. Audio, haptics, and even olfactory or gustatory AR are included in its scope, even though they may be difficult to realize. Note that the definition does require real-time control and spatial registration, meaning precise real-time alignment of corresponding virtual and real information. This mandate implies that the user of an AR display can at least exercise some sort of interactive viewpoint control, and the computer-generated augmentations in the display will remain registered to the referenced objects in the environment.
While opinions on what qualifies as real-time performance may vary depending on the individual and on the task or application, interactivity implies that the human–computer interface operates in a tightly coupled feedback loop. The user continuously navigates the AR scene and controls the AR experience. The system, in turn, picks up the user’s input by tracking the user’s viewpoint or pose. It registers the pose in the real world with the virtual content, and then presents to the user a situated visualization (a visualization that is registered to objects in the real world).
We can see that a complete AR system requires at least three components: a tracking component, a registration component, and a visualization component. A fourth component—a spatial model (i.e., a database)—stores information about the real world and about the virtual world (Figure 1.1). The real-world model is required to serve as a reference for the tracking component, which must determine the user’s location in the real world. The virtual-world model consists of the content used for the augmentation. Both parts of the spatial model must be registered in the same coordinate system.
Figure 1.1 AR uses a feedback loop between human user and computer system. The user observes the AR display and controls the viewpoint. The system tracks the user’s viewpoint, registers the pose in the real world with the virtual content, and presents situated visualizations.