Augmented Reality


Brief Description:

illustrates an embodiment of a superimposing logic 102.

Detailed Description:

Figure 1 illustrates an embodiment of an augmented reality environment 100. A user 110 wearing headset 114 interacts with physical objects virtualized in the augmented reality environment 100. In this example the user 110 interacts with either a purely virtual document, or a physical document that is virtualized as a virtual document 112 on a virtual surface 104 in the augmented reality environment 100. In this embodiment, an imaging sensor 108 is directed toward a physical surface 106, and superimposing logic 102 receives a sensor output 116 (e.g., image or video) from the imaging sensor 108.  Superimposing logic 102 transforms the sensor output 116 into a virtual document 112 superimposed on a virtual surface 104 representing the physical surface 106 in the augmented reality environment 100.

In other embodiments there may be no physical surface 106 and no physical document on the physical surface 106, in which case the environment would be a purely virtual reality (VR) environment, not an augmented reality environment 100. Thus there are many possibilities for the environment – it could be purely virtual, or a physical surface 106 that is virtualized and augmented with a virtual document, or both the physical surface 106 and a physical document could be virtualized.

Brief Description:

illustrates an AR or VR system 200 in accordance with one embodiment.

Detailed Description:

Figure 2 illustrates an AR or VR system 200 in accordance with one embodiment. A

virtual environment 202 receives input from the user 214 and in response sends an interaction signal to a virtual object 206, a virtual surface 210 or an application 212.  The virtual object 206 or virtual surface 210 or application 212 sends an action to an operating system 204 and in response the operating system 204 operates the hardware 208 to implement the action in the augmented or virtual environment.  

Brief Description:

illustrates a device 300 in accordance with one embodiment.

Detailed Description:

Figure 3 illustrates a perspective view of a wearable augmented reality (“AR”) device ( device 300), from the perspective of a wearer of the device 300 (“AR user”). The device 300 is a computer device in the form of a wearable headset. 

The device 300 comprises a headpiece 302, which is a headband, arranged to be worn on the wearer’s head. The headpiece 302 has a central portion 304 intended to fit over the nose bridge of a wearer, and has an inner curvature intended to wrap around the wearer’s head above their ears.

The headpiece 302 supports a left optical component 306 and a right optical component 308, which are waveguides. For ease of reference herein an optical component will be considered to be either a left or right component, because in the described embodiment the components are essentially identical apart from being mirror images of each other. Therefore, all description pertaining to the left-hand component also pertains to the right-hand component. The device 300 comprises augmented reality device logic 400 that is depicted in Figure 4.

The augmented reality device logic 400 comprises a graphics engine 402, which may comprise a micro display and imaging optics in the form of a collimating lens (not shown). The micro display can be any type of image source, such as liquid crystal on silicon (LCOS) displays, transmissive liquid crystal displays (LCD), matrix arrays of LED’s (whether organic or inorganic) and any other suitable display. The display is driven by circuitry known in the art to activate individual pixels of the display to generate an image. Substantially collimated light, from each pixel, falls on an exit pupil of the graphics engine 402. At the exit pupil, the collimated light beams are coupled into each of the left optical component 306 and the right optical component 308 into a respective left in-coupling zone 310 and rightin-coupling zone 312. In-coupled light is then guided, through a mechanism that involves diffraction and TIR, laterally of the optical component in a respective left intermediate zone 314 and 416, and also downward into a respective left exit zone 318 and right exit zone 320 where it exits towards the users’ eye. 

The collimating lens collimates the image into a plurality of beams, which form a virtual version of the displayed image, the virtual version being a virtual image at infinity in the optics sense. The light exits as a plurality of beams, corresponding to the input beams and forming substantially the same virtual image, which the lens of the eye projects onto the retina to form a real image visible to the user. In this manner, the left optical component 306 and the right optical component 308 project the displayed image onto the wearer’s eyes. 

The various optical zones can, for example, be suitably arranged diffractions gratings or holograms. Each optical component has a refractive index n which is such that total internal reflection takes place to guide the beam from the light engine along the respective intermediate expansion zone, and down towards respective the exit zone.

Each optical component is substantially transparent, whereby the wearer can see through it to view a real-world environment in which they are located simultaneously with the projected image, thereby providing an augmented reality experience.

To provide a stereoscopic image, i.e. that is perceived as having 3D structure by the user, slightly different versions of a 2D image can be projected onto each eyefor  example from multiple graphics engine 402  (i.e. two micro displays), or from the same light engine (i.e. one micro display) using suitable optics to split the light output from the single display.

The device 300 is just one exemplary configuration. For instance, where two light-engines are used, these may instead be at separate locations to the right and left of the device (near the wearer’s ears). Moreover, whilst in this example, the input beams that form the virtual image are generated by collimating light from the display, an alternative light engine based on so-called scanning can replicate this effect with a single beam, the orientation of which is fast modulated whilst simultaneously modulating its intensity and/or colour. A virtual image can be simulated in this manner that is equivalent to a virtual image that would be created by collimating light of a (real) image on a display with collimating optics. Alternatively, a similar AR experience can be provided by embedding substantially transparent pixels in a glass or polymer plate in front of the wearer’s eyes, having a similar configuration to the left optical component 306 and right optical component 308 though without the need for the zone structures.

Other headpiece 302 embodiments are also within the scope of the subject matter. For instance, the display optics can equally be attached to the users head using a frame (in the manner of conventional spectacles), helmet or other fit system. The purpose of the fit system is to support the display and provide stability to the display and other head borne systems such as tracking systems and cameras. The fit system can be designed to meet user population in anthropometric range and head morphology and provide comfortable support of the display system.

The device 300 also comprises one or more camera 404 — for example left stereo camera 322 and right stereo camera 324 mounted on the headpiece 302 and configured to capture an approximate view (“field of view”) from the user’s left and right eyes respectfully in this example. The cameras are located towards either side of the user’s head on the headpiece 302, and thus capture images of the scene forward of the device form slightly different perspectives. In combination, the stereo camera‘s capture a stereoscopic moving image of the real-wold environment as the device moves through it. A stereoscopic moving image means two moving images showing slightly different perspectives of the same scene, each formed of a temporal sequence of frames to be played out in quick succession to replicate movement. When combined, the two images give the impression of moving 3D structure.

A left microphone 326 and a right microphone 328 are located at the front of the headpiece (from the perspective of the wearer), and left and right channel speakers, earpiece or other audio output transducers are to the left and right of the headpiece 302. These are in the form of a pair of bone conduction audio transducers functioning as a left speaker 330 and right speaker 332 audio channel output.

Brief Description:

illustrates an augmented reality device logic 400 in accordance with one embodiment.

Detailed Description:

Figure 4 illustrates components of an exemplary augmented reality device logic 400. The augmented reality device logic 400 comprises a graphics engine 402, a camera 404, processing units 406, including one or more CPU 408 and/or GPU 410, a WiFi 412 wireless interface, a Bluetooth 414 wireless interface, speakers 416microphones 418, and one or more memory 420.

The processing units 406 may in some cases comprise programmable devices such as bespoke processing units optimized for a particular function, such as AR related functions. The augmented reality device logic 400 may comprise other components that are not shown, such as dedicated depth sensors, additional interfaces etc.


Some or all of the components in Figure 4 may be housed in an AR headset. In some embodiments, some of these components may be housed in a separate housing connected or in wireless communication with the components of the AR headset. For example, a separate housing for some components may be designed to be worn or a belt or to fit in the wearer’s pocket, or one or more of the components may be housed in a separate computer device (smartphone, tablet, laptop or desktop computer etc.) which communicates wirelessly with the display and camera apparatus in the AR headset, whereby the headset and separate device constitute the full augmented reality device logic 400.

The memory 420 comprises logic 422 to be applied to the processing units 406 to execute. In some cases, different parts of the logic 422 may be executed by different components of the processing units 406. The logic 422 typically comprises code of an operating system, as well as code of one or more applications configured to run on the operating system to carry out aspects of the processes disclosed herein.

Brief Description:

illustrates an AR device 500 that may implement aspects of the machine processes described herein.

Detailed Description:

Figure 5 illustrates more aspects of an AR device 500 according to one embodiment.  The AR device 500 comprises processing units 502, input devices 504, memory 506,  output devices 508, storage devices 510, a network interface 512, and various logic to carry out the processes disclosed herein.

The input devices 504 comprise transducers that convert physical phenomenon into machine internal signals, typically electrical, optical or magnetic signals. Signals may also be wireless in the form of electromagnetic radiation in the radio frequency (RF) range but also potentially in the infrared or optical range. Examples of input devices 504 are keyboards which respond to touch or physical pressure from an object or proximity of an object to a surface, mice which respond to motion through space or across a plane, microphones which convert vibrations in the medium (typically air) into device signals, scanners which convert optical patterns on two or three dimensional objects into device signals. The signals from the input devices 504 are provided via various machine signal conductors (e.g., busses or network interfaces) and circuits to memory 506

The memory 506 provides for storage (via configuration of matter or states of matter) of signals received from the input devices 504, instructions and information for controlling operation of the processing units 502, and signals from storage devices 510. The memory 506 may in fact comprise multiple memory devices of different types, for example random access memory devices and non-volatile (e.g., FLASH memory) devices.

Information stored in the memory 506 is typically directly accessible to the processing units 502 of the device. Signals input to the AR device 500 cause the reconfiguration of the internal material/energy state of the memory 506, creating logic that in essence forms a new machine configuration, influencing the behavior of the AR device 500 by affecting the behavior of the processing units 502 with control signals (instructions) and data provided in conjunction with the control signals. 

The storage devices 510 may provide a slower but higher capacity machine memory capability. Examples of storage devices 510 are hard disks, optical disks, large capacity flash memories or other non-volatile memory technologies, and magnetic memories. 

The processing units 502 may cause the configuration of the memory 506 to be altered by signals in the storage devices 510. In other words, the processing units 502 may cause data and instructions to be read from storage devices 510 in the memory 506 from which may then influence the operations of processing units 502 as instructions and data signals, and from which it may also be provided to the output devices 508. The processing units 502 may alter the content of the memory 506 by signaling to a machine interface of memory 506 to alter the internal configuration, and then converted signals to the storage devices 510 to alter its material internal configuration. In other words, data and instructions may be backed up from memory 506, which is often volatile, to storage devices 510, which are often non-volatile.

Output devices 508 are transducers which convert signals received from the memory 506 into physical phenomenon such as vibrations in the air, or patterns of light on a machine display, or vibrations (i.e., haptic devices) or patterns of ink or other materials (i.e., printers and 3-D printers).  

The network interface 512 receives signals from the memory 506 or processing units 502  and converts them into electrical, optical, or wireless signals to other machines, typically via a machine network. The network interface 512 also receives signals from the machine network and converts them into electrical, optical, or wireless signals to the memory 506 or processing units 502.

Brief Description:

illustrates an AR device logic 600 in accordance with one embodiment.

Detailed Description:

Figure 6 illustrates a functional block diagram of an embodiment of AR device logic 600. The AR device logic 600 comprises the following functional modules: a rendering engine 616local augmentation logic 614, local modeling logic 608, a model aggregator 616 (deleted)device tracking logic 606, an encoder 612, and a decoder 620. Each of these functional modules may be implemented in software, dedicated hardware, firmware, or a combination of these logic types.

The rendering engine 616 controls the graphics engine 618 to generate a stereoscopic image visible to the wearer, i.e. to generate slightly different images that are projected onto different eyes by the optical components of a headset substantially simultaneously, so as to create the impression of 3D structure.

The stereoscopic image is formed by rendering engine 616 rendering at least one virtual display element (“augmentation”), which is perceived as a 3D element, i.e. having perceived 3D structure, at a real-world location in 3D space by the user.

An augmentation is defined by an augmentation object stored in the memory 602. The augmentation object comprises: location data defining a desired location in 3D space for the virtual element (e.g. as (x,y,z) Cartesian coordinates); structural data defining 3D surface structure of the virtual element, i.e. a 3D model of the virtual element; and image data defining 2D surface texture of the virtual element to be applied to the surfaces defined by the 3D model. The augmentation object may comprise additional information, such as a desired orientation of the augmentation.

The perceived 3D effects are achieved though suitable rendering of the augmentation object. To give the impression of the augmentation having 3D structure, a stereoscopic image is generated based on the 2D surface and 3D augmentation model data in the data object, with the augmentation being rendered to appear at the desired location in the stereoscopic image.

A 3D model of a physical object is used to give the impression of the real-world having expected tangible effects on the augmentation, in the way that it would a real-world object. The 3D model represents structure present in the real world, and the information it provides about this structure allows an augmentation to be displayed as though it were a real-world 3D object, thereby providing an immersive augmented reality experience. The 3D model is in the form of 3D mesh.

For example, based on the model of the real-world, an impression can be given of the augmentation being obscured by a real-world object that is in front of its perceived location from the perspective of the user; dynamically interacting with a real-world object, e.g. by moving around the object; statically interacting with a real-world object, say by sitting on top of it etc.

Whether or not real-world structure should affect an augmentation can be determined based on suitable rendering criteria. For example, by creating a 3D model of the perceived AR world, which includes the real-world surface structure and any augmentations, and projecting it onto a plane along the AR user’s line of sight as determined using pose tracking (see below), a suitable criteria for determining whether a real-world object should be perceived as partially obscuring an augmentation is whether the projection of the real-world object in the plane overlaps with the projection of the augmentation, which could be further refined to account for transparent or opaque real world structures. Generally the criteria can depend on the location and/or orientation of the augmented reality device and/or the real-world structure in question.

An augmentation can also be mapped to the mesh, in the sense that its desired location and/or orientation is defined relative to a certain structure(s) in the mesh. Should that structure move and/or rotate causing a corresponding change in the mesh, when rendered properly this will cause corresponding change in the location and/or orientation of the augmentation. For example, the desired location of an augmentation may be on, and defined relative to, a table top structure; should the table be moved, the augmentation moves with it. Object recognition can be used to this end, for example to recognize a known shape of table and thereby detect when the table has moved using its recognizable structure. Such object recognition techniques are known in the art.

An augmentation that is mapped to the mash in this manner, or is otherwise associated with a particular piece of surface structure embodied in a 3D model, is referred to an “annotation” to that piece of surface structure. In order to annotate a piece of real-world surface structure, it is necessary to have that surface structure represented by the 3D model in question—without this, the real-world structure cannot be annotated.

The local modeling logic 608 generates a local 3D model “LM” of the environment in the memory 602, using the AR device’s own sensor(s) e.g. cameras 610 and/or any dedicated depth sensors etc. The local modeling logic 608 and sensor(s) constitute sensing apparatus.

The device tracking logic 606 tracks the location and orientation of the AR device, e.g. a headset, using local sensor readings captured from the AR device. The sensor readings can be captured in a number of ways, for example using the cameras 610  and/or other sensor(s) such as accelerometers. The device tracking logic 606 determines the current location and orientation of the AR device and provides this information to the rendering engine 616, for example by outputting a current “pose vector” of the AR device. The pose vector is a six dimensional vector, for example (x, y, z, P, R, Y) where (x,y,z) are the device’s Cartesian coordinates with respect to a suitable origin, and (P, R, Y) are the device’s pitch, roll and yaw with respect to suitable reference axes.

The rendering engine 616 adapts the local model based on the tracking, to account for the movement of the device i.e. to maintain the perception of the as 3D elements occupying the real-world, for example to ensure that static augmentations appear to remain static (which will in fact be achieved by scaling or rotating them as, from the AR user’s perspective, the environment is moving relative to them).

The encoder 612 receives image data from the cameras 610 and audio data from the microphones 604 and possibly other types of data (e.g., annotation or text generated by the user of the AR device using the local augmentation logic 614) and transmits that infomation to other devices, for example the devices of collaborators in the AR environment. The decoder 620 receives an incoming data stream from other devices, and extracts audio, video, and possibly other types of data (e.g., annotations, text) therefrom.

Parts List


augmented reality environment


superimposing logic


virtual surface


physical surface


imaging sensor




virtual document




sensor output


AR or VR system


virtual environment


operating system


virtual object




virtual surface










central portion


left optical component


right optical component


left in-coupling zone


rightin-coupling zone


left intermediate zone


right intermediate zone


left exit zone


right exit zone


left stereo camera


right stereo camera


left microphone


right microphone


left speaker


right speaker


augmented reality device logic


graphics engine




processing units


















AR device


processing units


input devices




output devices


storage devices


network interface










AR device logic






device tracking logic


local modeling logic






local augmentation logic


rendering engine


graphics engine






virtual surface

projection location


imaging sensor

texture image


filtered texture

virtual environment



virtual reality

the computer-generated simulation of a three-dimensional environment that can be interacted with in a seemingly real or physical way by a person using special electronic equipment, such as a headset with a display and gloves fitted with sensors.

augmented reality

technology that superimposes computer-generated imagery on a user’s view of the real world, thus providing a composite view.


converting a physical thing to a computer-generated simulation of that thing.