|Image source: Adrian Boeing blog|
In the context of AR and VR systems, predictive tracking refers to the process of predicting the future orientation and/or position of an object or body part. For instance, one might want to predict the orientation of the head or the position of the hand.
Why is predictive tracking useful?
One common use of predictive tracking is to reduce the apparent “motion to photon” latency, meaning the time between movement and when that movement is reflected in the drawn scene. Since there is some delay between movement and an updated display (more on the sources of that delay below), using an estimated future orientation and position as the data used in updating the display, could shorten that perceived latency.
While a lot of attention has been focused on predictive tracking in virtual reality applications, it is also very important in augmented reality. For instance, if you are displaying a graphical overlay to appear on top of a physical object that you see with an augmented reality goggles, it is important that the overlay stays on the object even when you rotate your head. The object might be recognized with a camera, but it takes time for the camera to capture the frame, for a processor to determine where the object is in the frame and for a graphics chip to render the new overlay. By using predictive tracking, you can get better apparent registration between the overlay and the physical object.
How does it work?
If you saw a car travelling at a constant speed and you wanted to predict where that car will be one second in the future, you could probably make a fairly accurate prediction. You know the current position of the car, you might know (or can estimate) the current velocity, and thus you can extrapolate the position into the near future.
Of course if you compare your prediction with where the car actually is in one second, your prediction is unlikely to be 100% accurate every time: the car might change direction or speed during that time. The farther out you are trying to predict, the less accurate your prediction will be: predicting where the car will be in one second is likely much more accurate than predicting where it will be in one minute.
The more you know about the car and its behavior, the better chance you have of making an accurate prediction. For instance, if you were able to measure not only the velocity but also the acceleration, you can make a more accurate prediction.
If you have additional information about the behavior of the tracked body, this can also improve prediction accuracy. For instance, when doing head tracking, understand how fast the head can possibly rotate and what are common rotation speeds, can improve the tracking model. Similarly, if you are doing eye tracking, you can use the eye tracking information to anticipate head movements as discussed in this post
Sources of latency
- Sensing delays. The sensors (e.g. gyroscope) may be bandwidth-limited and do not instantaneously report orientation or position changes. Similarly, camera-based sensors may exhibit delay between when the pixel on the camera sensor receives light from the tracked object to that frame being ready to be sent to the host processor.
- Processing delays. Sensors are often combined using some kind of sensor fusion algorithm, and executing this algorithm can add latency.
- Data smoothing. Sensor data is sometimes noisy and to avoid erroneous jitter, software or hardware-based low pass algorithms are executed.
- Transmission delays. For example, if orientation sensing is done using a USB-connected device, there is some non-zero time between the data available to be ready by the host processor and the time data transfer over USB is completed.
- Rendering delays. When rendering a non-trivial scene, it takes some time to have the image ready to be sent to the display device.
- Frame rate delays. If a display is operating at 100 Hz, for instance, there is a 10 mSec time between successive frames. Information that is not precisely current to when a particular pixel is drawn may need to wait until the next time that pixel is drawn on the display.
How much to track into the future?
- There are objects with different end-to-end delays. For instance, a hand tracked with a camera may be have different latency than a head tracker, but both need to be drawn in sync in the same scene, so predictive tracking with different ‘look ahead’ times will be used.
- In configurations where a single screen – such as a cell phone screen – is used to provide imagery to both eyes, it is often the case that the image for one eye appears with a delay of half a frame (e.g. half of 1/60 seconds, or approx 8 mSec) relative to the other eye. In this case, it is best to use predictive tracking that looks ahead 8 mSec more for that delayed half of the screen.
Common prediction algorithms
- Dead reckoning. This is a very simple algorithm: if the position and velocity (or angular position and angular velocity) is known at a given time, the predicted position assumes that the last know position and velocity are correct and the velocity remains the same. For instance, if the last known position is 100 units and the last known velocity is 10 units/sec, then the predicted position 10 mSec (0.01 seconds) into the future is 100 + 10 x 0.01 = 100.1. While this is very simple to compute, it assumes that the last position and velocity are accurate (e.g. not subject to any measurement noise) and that the velocity is constant. Both these assumptions are often incorrect.
- Kalman predictor. This is based on a popular Kalman filter that is used to reduce sensor noise in systems where there exists a mathematical model of the system’s operation. See here for more detailed explanation of the Kalman filter.
- Alpha-beta-gamma. The ABG predictor is closely related to the Kalman predictor, but is less general and has simpler math, which we can explain here at a high level. ABG tries to continuously estimate both velocity and acceleration and use them in prediction. Because the estimates take into account actual data, they provide some measurement noise reduction. Configuring the parameters (alpha, beta and gamma) provide the ability to emphasize responsiveness as opposed to noise reduction. If you’d like to follow the math, here it goes: