Coursera Visual Perception for Self-Driving Cars 02

2019-08-06

From Coursera, State Estimation and Localization for Self-Driving Cars by University of Toronto
https://www.coursera.org/learn/visual-perception-self-driving-cars

Visual Features - Detection, Description and Matching

Introduction to Image features and Feature Detectors

Introduction to feature detecors

Feature extraction

Application: image stitching
- given two images from two different cameras, we would like to stitch them together to form a panorama.
- First, we need to identify distinctive points in our images. We call this point image features.
- Second, we associate a descriptor for each feature from its neighborhood.
- Finally, we use these descriptors to match features across two or more images.
Features are points of interest in an image
Points of interest should have the following characteristics:
- Saliency: distinctive, identifiable,and different from its immediate neighborhood
- Repeatability: can be found in multiple images using same operations
- Locality: occupies a relatively small subset of image space
- Quantity: enough points represented In the image
- Efficiency: reasonable computation time
How to choose the points of interest:
- Repetitive texture less patches are challenging to detect consistently
- Patches with large contrast changes (gradients) are easier to detect(edges)
- Gradients in at least two(significantly) different orientations are the easiest to detect(corners)
- The most famous corner detector is the Harris Corner Detector, which uses image gradient information to identify pixels that have a strong change in intensity in both x and y directions.

Algorithms used

Harris[corners]: Easy to compute, but not scale invariant.
- meaning that the corners can look different depending on the distance the camera is away from the object generating the corner.
Harris-Laplace[corners]: Same procedure as Harris detector, addition of scale selection based on Laplacian. Scale invariance.
Features from accelerated segment test(FAST) (corners): Machine learning approach for fast corner detection.
- high computational efficiency and solid detection performance
Laplacian of Gaussian(LOG) detector[blobs]: Uses the concept of scale space in a large neighborhood(blob). Somewhat scale invariant.
Difference of Gaussian(DOG) detector[blobs]: Approximates LOG but is faster to compute

Feature Descriptors

Feature descriptor definition

Feature: Point of interest in an image defined by its image pixel coordinates [u,v]
Descriptor f: an n dimensional vector associated with each feature.
- The descriptor has the task of providing a summary of the image information in the vicinity of the feature itself, and can take on many forms.
Feature descriptors should have the following characteristics:
- Repeatability: manifested as robustness and invariance to translation, rotation,scale, and illumination changes
- Distinctiveness: should allow us to distinguish between two close by features,very important for matching later on
- Compactness & Efficiency: reasonable computation time
Designing Invariant Descriptors: SIFT
- Scale Invariant Feature Transform (SIFT)descriptors
- Given a feature in the image, the shift descriptor takes a 16 by 16 window of pixels around it, we call this window the features local neighborhood.
- then separate this window in to four 4 by 4 cells such that each cell contains 16 pixels.
- Next we compute the edges and edge orientation of each pixel in each cell using the gradient filters.
- For stability of the descriptor, we suppress weak edges using a predefined threshold as they are likely to vary significantly in orientation with small amounts of noise between images.
- Finally, we compute a 32 dimensional histogram of orientations for each cell. And concatenate the histograms for all four cells to get a final 128 dimensional histogram for the feature at hand, we call this histogram or descriptor.
Scale Invariant Feature Transform
- SIFT is an example of a very well human engineered feature descriptor, and is used in many state-of-the-art systems
- The above process is usually compute on rotatedand scaled version of the 16x16 window, allowing for better scale robustness
- Combined with the DOG feature detector, SIFT descriptors provide a scale,rotation, and illumination invariant detector/descriptor pair.

Algorithms
Other Descriptors:

Speeded-Up Robust Features (SURF)
Gradient Location-Orientation Histogram (GLOH)
Binary Robust Independent Elementary Features (BRIEF)
Oriented Fast and Rotated Brief (ORB): free to use commertially
Many more!

Feature Matching

Match features based on a predefined distance function

Feature Matching: Given a feature and its descriptor in image 1, find the best match in image 2
Define a distance function $d(f_i,f_j)$ that compares the two descriptors
For every feature $f_i$ in Image 1:
- Compute $d(f_i,f_j)$ with all features $f_j$ in image 2
- Find the closest match $f_c$, the match that has the minimum distance. This feature is known as the nearest neighbor.
- Keep this match only if $d(f_i,f_j)$ is below threshold $\delta$
Distance Function
- Sum of Squared Differences (SSD)： $d(f_i, f_j) =\sum^D_{k=1} (f_{i,k} - f_{j,k})^2$
- Sum of absolute differences (SAD): $d(f_i, f_j) = \sum^D_{k=1} |f_{i,k} - f_{j,k}|$
- Hamming Distance: $d(f_i, f_j) = \sum^D_{k=1} XOR(f_{i,k} - f_{j,k})$
Brute force feature matching might not be fast enough for extremely large amounts of features
- Use a multidimensional search tree, usually a k-d tree to speed the search by constraining it spatially
- Both of these matchers are implemented in OpenCV as: cv2.BFMatcher(): Brute force matcher; cv2.FlannBasedMatcher(): K-D tree based approximate nearest neighbor matcher

Feature Matching: Handling Ambiguity in Matching

Ambiguous matches: more than one feature points can be found to have the minimum distance when do feature matching.
The solution is to use distance ratio:
- Compute $d(f_i,f_j)$ for each feature, $f_i$, in image 1, with all features, $f_j$, in image 2
- Find the closest match $f_c$
- Find the second closest match $f_s$
- Find how better the closest match is than the second closest match. This can be done through distance ratio: $0 \leq \frac{d(f_i,f_c)}{d(f_i,f_s)} \leq 1$
- If the distance ratio is close to one, it means that according to our descriptor and distance function, fi matches both fs and fc. In this case, we don’t want to use this match in our processing later on, as it clearly is not known to our matcher which location in image two corresponds to the feature in image one.
So the Updated Brute Force Feature Matching:
- Define a distance function $d(f_i,f_j)$ that compares the two descriptors
- Define distance ratio threshold $\rho$
- For every feature $f_i$ in Image 1:
  - Compute $d(f_i,f_j)$ with all features $f_j$ in image 22.
  - Find the closest match $f_c$ and the second closest match $f_s$
  - Compute the distance ratio
  - Keep matches with distance ratio $< \rho$

Outlier Rejection

Three-step feature extraction framework for the real-world problem of vehicle localization

Localization problem is defined as follows:
- given any two images of the same scene from different perspectives, find the translation $T=[t_u, t_v]$, between the coordinate system of the first image , and the coordinate system of the second image.
- also need to solve for the scale and skew due to different viewpoints.
Matched feature pairs in images 1 and 2:
- $f^{(1)}_i, f^{(2)}_i i \in [0…N]$
- $f^{(1)}_i = (u^{(1)}_i, v^{(1)}_i)$
- Model: $u^{(1)}_i + t_u = u^{(2)}_i$, $v^{(1)}_i + t_v = v^{(2)}_i$
- solve using least squares: $t_u = \frac 1N \sqrt{\sum_i (u^{(1)}_i - u^{(2)}_i)^2}$ $t_v = \frac 1N \sqrt{\sum_i (v^{(1)}_i - v^{(2)}_i)^2}$

Outliers and ANSAC algorithm

Outliers can be handled using a model-based outlier rejection method called Random Sample Consensus (RANSAC)
RANSAC algorithm:
Initialization:
- Given a model, find the smallest number M of data points or samples needed to compute the parameters of this model. In the above case is $t_u, t_v$
Main loop:
- Randomly select M samples from the data.
- Compute the model parameters using only the M samples selected from the data set.
- Use the computed parameters and count how many of the remaining data points agree with this computed solution. The accepted points are retained and referred to as inliers.
- if the number of inliers C is satisfactory, or if the algorithm has iterated a pre-set maximum number of iterations, terminate and return the computed solution and the inlier set.
- Else, go back to step two and repeat.
Finally, recompute and return the model parameters from the best inlier set: The one with the largest number of features.

Visual Odometry

Visual Odometry(VO): is the process of incrementally estimating the pose of the vehicle by examining the changes that motion induces on the images of its onboard cameras

VO Pros:
- Not affected by wheel slip in uneven terrain, rainy/snowy weather, or other adverse conditions.
- More accurate trajectory estimates compared to wheel odometry.
VO Cons:
- Usually need an external sensor to estimate absolute scale
- Camera is a passive sensor,might not be very robust against weather conditions and illumination changes
- Any form of odometry (incremental state estimation) drifts over time
Problem Formulation:
- Estimate the camera motion $T_k$ between consecutive images $l_{k-1}$ and $l_k$
- Concatenating these single movements allows the recovery of the full trajectory of the camera, given frames $C_1, …, C_m$
General process of visual odometry:
- Given: two consecutive image frames $I_{k-1}$ and $I_k$
- First, perform feature detection and description. We end up with a set of features $f_{k-1}$ in image k-1 and $f_k$ in image of k.
- We then proceed to match the features between the two frames to find the ones occurring in both of our target frames.
- After that, we use the matched features to estimate the motion between the two camera frames represented by the transformation $T_k$.
Motion estimation:
- depends on what type of feature representation we have:
  - 2D-2D: both $f_{k-1}$ and $f_k$ are defined in Image coordinates
  - 3D-3D: both $f_{k-1}$ and $f_k$ are specified in 3D
  - 3D-2D: $f_{k-1}$ is specified in 3D and $f_k$ are their corresponding projection on 2D
Perspective N Point (PNP)
- Given feature locations in 3D, their corresponding projection in 2D, and the camera intrinsic calibration matrix k,
- Solve for initial guess of [R|t] using Direct Linear Transform(DLT): Forms a linear model and solves for [R|t], with methods such as singular value decomposition(SVD)
- Improve solution using Levenberg-Marquardt algorithm(LM)
- Need at least 3 points to solve(P3P), 4 if we don’t wantambiguous solutions.
- Finally, RANSAC can be incorporated into PnP by assuming that the transformation generated by PnP on four points is our model.
- We then choose a subset of all feature matches to evaluate this model and check the percentage of inliers that result to confirm the validity of the point matches selected.

展开全文 >>

Coursera Visual Perception for Self-Driving Cars 01

2019-08-01

From Coursera, State Estimation and Localization for Self-Driving Cars by University of Toronto
https://www.coursera.org/learn/visual-perception-self-driving-cars

Basics of 3D Computer Vision

The Camera Sensor

Pinhole Camera Model:

focal length: the distance between the pinhole and the image plane. It defines the size of the object projected onto the image
camera center: The coordinates of the center of the pinhole. These coordinates defined the location on the imaging sensor that the object projection will inhabit.

Camera Projective Geometry

Let’s define the problem we need to solve: a point $O_{world}$ defined at a particular location in the world coordinate frame. We want to project this point from the world frame to the camera image plane.

Light travels from the $O_{world}$ on the object through the camera aperture to the sensor surface.
The projection onto the sensor surface through the aperture results in flipped images of the objects in the world.
need to develop a model for how to project a point from the world frame coordinates x, y and z to, image coordinates u and v:
- First, select a world frame in which to define the coordinates of all objects and the camera.
- define the camera coordinate frame as the coordinate frame attached to the center of our lens aperture known as the optical sensor.
- We refer to the parameters of the camera pose as the extrinsic parameters, as they are external to the camera and specific to the location of the camera in the world coordinate frame.
- define image coordinate frame as the coordinate frame attached to our virtual image plane emanating from the optical center. The image pixel coordinate system is attached to the top left corner of the virtual image plane.
- So we need to adjust the pixel locations to the image coordinate frame.
- we define the focal length is the distance between the camera and the image coordinate frames along the z-axis of the camera coordinate frame.
Finally, the projection problem reduces to two steps.
- We first need to project from the world to the camera coordinates, then we project from the camera coordinates to the image coordinates.
- We can then transform image coordinates to pixel coordinates through scaling and offset.
Computing the projection:
- World -> Camera: $$o_{camera} = [R|t]O_{world}$$
- Camera -> Image: $$o_{image} = [f \; 0 \; u_0 ; 0 \; f \; v_0 ; 0 \; 0 \; 1]o_{camera} = Ko_{camera}$$
  - K is a 3x3 matrix, which depends on camera intrinsic parameters: camera geometry and the camera lens characteristics
- World -> Image: $$P = K[R|t]$$
- therefore, $o_{image} = PO = K[R|t]O_{world}$
Image coordinates to Pixel coordinates: $[x y z]^T -> [u v 1]^T = \frac 1z[x y z]^T$

The digital image:

an image is represented digitally as an M by N by three array of pixels, with each pixel representing the projection of a 3D point onto the 2D image plane.

Camera Calibration

The camera calibration problem is defined as finding these unknown intrinsic and extrinsic camera parameters, given n known 3D point coordinates and their corresponding projection to the image plane.

Our approach will comprise of getting the P matrix first and then decomposing it into the intrinsic parameters K and the extrinsic rotation parameters R and translation parameters t.
Use scenes with known geometry to:
- Correspond 2D image coordinates to 3D world coordinates
- Find the Least Squares Solution (or non-linear solution)of the parameters of P
The most commonly used example would be a 3D checkerboard, with squares of known size providing a map of fixed point locations to observe.
If we have N 3D points and their corresponding N 2D projections, set up homogeneous linear system
- Solved with Singular Value Decomposition(SVD)

Visual Depth Perception - Stereopsis

Stereo Camera Model

A stereo sensor is usually created by two cameras with parallel optical axes.
Given a known rotation and translation between the two cameras and a known projection of a point $O$ in 3D to the two camera frames resulting in pixel locations $O_L$ and $O_R$ respectively, we can formulate the necessary equations to compute the 3D coordinates of the point $O$.
Assumptions:
- First, we assume that the two cameras used to construct the stereo sensors are identical.
- Second, we will assume that while manufacturing the stereo sensor, we tried as hard as possible to keep the two cameras optical axes aligned.
- project the previous figure to bird’s eye view for easier visualization.
Some parameters:
- focal length: the distance between the camera center and the image plane.
- the baseline is defined as the distance along the shared x-axis between the left and right camera centers.
- By defining a baseline to represent the transformation between the two camera coordinate frames, we are assuming that the rotation matrix is identity and there is only a non-zero x component in the translation vector. The $[R|t]$ transformation therefore boils down to a single baseline parameter B.
Define the quantities to compute:
- to compute the x and z coordinates of the point $O$ with respect to the left camera frame.
- The y coordinate can be estimated easily after the x and z coordinates are computed.
- by constructing the similarity, we get $\frac Zf = \frac X{x_L}$ and $\frac Zf = \frac {X-b}{x_R}$
- finally we can get: $X=\frac{zx_L}{f}$, $Y=\frac{zy_L}{f}$

Derive the location of a point in 3D

Two main problems:
- We need to know $f,b,u_o,v_o$: Use stereo camera calibration
- We need to find corresponding $x_R$ for each $x_L$: Use disparity computation algorithms

Visual Depth Perception - Computing the Disparity

Disparity: The difference in image location of the same 3D point under perspective to two different cameras

Correspond pixels in the left image to those in the right image to find matches

Estimate the disparity through stereo matching

if moving our 3D point along the line connecting it with the left cameras center.
- Its projection on the left camera image plane does not change. But for the projection on the right camera plane, the projection moves along the horizontal line.
- This is called an epipolar line and follows directly from the fixed lateral offset and image plane alignment of the two cameras in a stereo pair. We can constrain our correspondence search to be along the epipolar line, reducing the search from 2D to 1D.
- One thing to note is that horizontal epipolar lines only occur if the optical axes of the two cameras are parallel.
- In the case of non parallel optical axis, the epipolar lines are skewed.
- We can use stereo rectification to warpimages originating from two cameras with non-parallel optical axes to force epipolar lines to be horizontal.
A Basic Stereo Algorithm
- Given: Rectified Images and Stereo Calibration.
- For each epipolar line take a pixel on this line in the left image, compare these left image pixels to every pixel in the right image on the same epipolar line.
- select the right image pixel that matches the left pixel the most closely which can be done by minimizing the cost.
- a very simple cost can be the squared difference in pixel intensities.
- Finally, we can compute the disparity by subtracting the right image location from the left one.

Image filtering

Cross correlation

The idea to reduce salt-pepper noise is to compute the mean of the whole neighborhood, and replace the outlier pixel with this mean value: $$G[u,v] = \frac 1{(2k+1)^2} \sum^k_{i=-k} \sum^k_{j=-k} I[u-i, v-j]$$
- where (2k+1) is the filter size, (u,v) is the center pixel coordinates
The mean equation can be generalized by adding a weight to every pixel in the neighborhood, resulting in cross-correlation. The weight matrix H is called a kernel.
- Kernal could be: mean filter, gaussian filter
Application：
- Template matching: The pixel with the highest response from Cross-correlation is the location of the template in an image
- Image gradient computation: Define a finite difference kernel,and apply it to the image to get the image gradient

Convolution

A convolution is a cross-correlation where the filter is flipped both horizontally and vertically before being applied to the image
Unlike Cross-Correlation, Convolution is associative. If H and F are filter kernels then: $H(FI) = (HF)I$
Precompute filter convolutions(H*F)then apply it once to the image to reduce runtime.

展开全文 >>

Coursera State Estimation and Localization for Self-Driving Cars 05

2019-08-01

From Coursera, State Estimation and Localization for Self-Driving Cars by University of Toronto
https://www.coursera.org/learn/state-estimation-localization-self-driving-cars

An Autonomous Vehicle State Estimator

State Estimation in Practice

Accuracy Requirements

How accurate does the estimator need to be for safe self-driving?
Typically less than a meter for highway lane keeping
Less for driving in dense trafic
GPS accuracy is 1-5 meters in optimal conditions
Need additional sensors!

Speed Requirements

How fast do we need to update the vehicle state to ensure safe driving?
How much computation power does the vehicle have on-board?
How much power can our computing resources consume?

Localization Failures

How can localization fail?
Sensors fail or provide bad data (e.g., GPS in a tunnel)
Estimation error (e.g, linearization error in the EKF)
Large state uncertainty (e.g, relying on IMU for too long)

Multisensor Fusion for State Estimation

Develop an error state extended Kalman Filter for estimating position, velocity and orientation using an IMU, GNSS sensor, and LIDAR.

Why use GNSS with IMU & LIDAR?

Eror dynamics are completely different and uncorrelated
IMU provides ‘smoothing’ of GNSS, fill-in during outages due to jamming or maneuvering
Wheel odometry is also possible (if only 20 position orientation is desired)·
GNSS provides absolute positioning information to mitigate IMU drift
LIDAR provides accurate local positioning within known maps

Types of EKF coupling

Tightly coupling:
- use the raw pseudo range and point cloud measurements from our GNSS and LIDAR as observations
- GNSS/LIDAR Measurement: Pseudo-ranges to satellites LIDAR point clouds
- Accuracy: potentially Higher
- Complexity: Higher
Loosely
- assume data has already been preprocessed to produce a noisy position estimate
- GNSS/LIDAR Measurement: Position
- Accuracy: potentially lower
- Complexity: Lower

EKF - IMU+GNSS+LIDAR

use the IMU measurements as noisy inputs to the motion model. This will give us our predicted state, which will update every time we have an IMU measurement. This can happen hundreds of times a second.
Then we incorporate GNSS and LIDAR measurements whenever they become available(at a much slower rate, say once a second or slower), and use them to correct our predicted state

What is state?
- we’ll use a 10-dimensional state vector that includes a 3D position, a 3D velocity, and a 4D unit quaternion that will represent the orientation of our vehicle with respect to a navigation frame. $$x_k=[p_k \; v_k \; q_k]^T \in R^{10}$$
- assume that IMU output specific forces and rotational rates in the sensor frame, and combine them into a single input vector u. - note that we’re not going to track accelerometer or gyroscope biases. These are often put into the state vector, estimated, and then subtracted off of the IMU measurements. For clarity, we’ll emit them here and assume our IMU measurements are unbiased.
- therefore, the motion model input will consist of specific force and rotational rates from IMU: $$u_k = [f_k \; \omega_k]^T \in R^6$$
Loop
- Update state with IMU inputs
- Propagate uncertainty
- If GNSS or LIDAR position available:
  - 1. Compute Kalman gain
  - 1. Compute error state
  - 1. Correct predicted state
  - 1. Computed orrected covariance

Sensor Calibration - A Necessary Evil

Intrinsic Calibration

deals with sensors specific parameters
ways to get the parameters:
- Manufacturer specifications
- Measure by hand
- Estimate as part of the state

Extrinsic Calibration

deals with how the sensors are positioned and oriented on the vehicle

Temporal Calibration

deals with the time offset between different sensor measurements
ways to deal with
- Assume zero
- Hardware synchronization
- Estimate as part of the state

Loss of One or More Sensors

展开全文 >>

Coursera State Estimation and Localization for Self-Driving Cars 04

2019-07-31

From Coursera, State Estimation and Localization for Self-Driving Cars by University of Toronto
https://www.coursera.org/learn/state-estimation-localization-self-driving-cars

LIDAR Sensing

Light Detection and Ranging Sensors

The operating principles of LIDAR sensors

For a basic LIDAR in one dimension, it has three components: a laser, a photodetector, and a very precise stopwatch.
The laser first emits a short pulse of light usually in the near infrared frequency band along some known ray direction. At the same time, the stopwatch begins counting. The laser pulse travels outwards from the sensor at the speed of light and hits a distant target.
As long as the surface of the target isn’t too polished or shiny, the laser pulse will scatter off the surface in all directions, and some of that reflected light will travel back along the original ray direction. The photodetector catches that return pulse and the stopwatch tells you how much time has passed between when the pulse first went out and when it came back. That time is called the round-trip time.
then the distance from the LIDAR to the target is simply half of the round-trip distance calculated by the time and the speed of the light
This technique is called time-of-flight ranging
the photodetector also tells the intensity of the return pulse relative to the intensity of the pulse that was emitted. This intensity information is less commonly used for self-driving, but it provides some extra information about the geometry of the environment and the material the beam is reflecting off of.

The basic LIDAR sensor models in 2D and 3D

But how do to use the above knowledge to measure a whole bunch of distances in 2D or in 3D?
- The trick is to build a rotating mirror into the LIDAR that directs the emitted pulses along different directions.
- As the mirror rotates, you can measure distances at points in a 2D slice around the sensor.
- If you then add an up and down nodding motion to the mirror along with the rotation, you can use the same principle to create a scan in 3D.
Measurement Models for 3D LIDAR Sensors
- LIDARs measure the position of points in 3D using spherical coordinates, range or radial distance from the center origin to the 3D point, elevation angle measured up from the sensors XY plane, and azimuth angle, measured counterclockwise from the sensors x-axis.
- The azimuth and elevation angles are measured using encoders that tell you the orientation of the mirror, and the range is measured using the time of flight as we’ve seen before.
- suppose we want to determine the cartesian XYZ coordinates of our scanned point in the sensor frame, which is something we often want to do when we’re combining multiple LIDAR scans into a map. To convert from spherical to Cartesian coordinates, we use the inverse sensor model: $$[x \; y \; z]^T = h^{-1}(r,\alpha, \varepsilon) = [r\cos\alpha \cos \varepsilon \quad r \sin \alpha \cos \varepsilon \quad r \sin \varepsilon]^T$$ where $\alpha$ is the azimuth angle, $\varepsilon$ is the elevation angle
- Therefore, the forward sensor model is: (which from Cartesian coordinates to spherical coordinates) $$[r \alpha \varepsilon]^T = h(x,y,z) = [\sqrt{x^2+y^2+z^2} \tan^{-1}(\frac yx) \sin^{-1}(\frac{z}{\sqrt{x^2+y^2+z^2}})]^T$$

The major sources of measurement error for LIDAR sensors

Uncertainty in determining the exact time of arrival of the refected signal
Uncertainty in measuring the exact orientation of the mirror
Interaction with the target(surface absorption, specular reflection, etc.): e.g.: if the surface is completely black, it might absorb most of the laser pulse. Or if it’s very shiny like a mirror, the laser pulse might be scattered completely away from the original pulse direction.
Variation of propagation speed (e.g., through materials)
These factors are commonly accounted for by assuming additive zero-mean Gaussian noise on the spherical coordinates with an empirically determined or manually tuned covariance.
Motion Distortion
- Typical scan rate for a 3D LIDAR is 5-20 Hz
- For a moving vehicle, each point in a scan is taken from a slightly different place
- Need to account for this if the vehicle is moving quickly, otherwise motion distortion becomes a problem

LIDAR Sensor Models and Point Clouds

The basic point cloud data structure

assign an index to each of the points, say point 1 through point n, and store the x, y, z coordinates of each point as a 3 by 1 column vector

Common spatial operations on point clouds

Translation
Rotation
Scaling
Put them together

Least squares to fit a plane to a point cloud

Plane fitting
One of the most common and important applications of plane-fitting for self-driving cars is figuring out where the road surface is and predicting where it’s going to be as the car continues driving.
- we have a bunch of measurements of x, y and z from our LIDAR point cloud, and we want to find values for the plane by the parameters a, b, and c defined: $z=a+bx+cy$
- to find the best fit, we use least-squares estimation
- define a measurement error $e$ for each point in the point cloud: $e_j = \hat z_j - z_j = (\hat a + \hat bx_j + \hat c y_j) - z_j \quad j=1,…n$
- We can stack all of the measurement errors into matrix form, ad minimize the squared-error criterion to get the least-squares solutions for the parameters
Open-source Point Cloud Library(PCL) has many useful functions for doing basic and advanced operations on point clouds in C++

Pose Estimation from LIDAR Data

Point set registration problem

one of the most important problems in computer vision and pattern recognition. used to estimate the motion of a self-driving car from point clouds
the point set registration problems says, given 2 point clouds in two different coordinate frames, and with the knowledge that they correspond to or contain the same object in the world, how shall we align them to determine how the sensor must have moved between the two scans?
More specifically, we want to figure out the optimal translation and the optimal rotation between the two sensor reference frames that minimizes the distance between the 2 point clouds.
ICP is the most popular algorithm to solve this problem.

Iterative Closest Point(ICP) algorithm

Intuition: When the optimal motion is found, corresponding points will be closer to each other than to other points
Heuristic:For each point,the best candidate for a corresponding point is the point that is closest to it right now
Steps of ICP:
- Get an initial guess for the transformation ${\check C_{S’S}, \check r^{S’S}_{S}}$:
  - the initial guess can come from a number of sources.
  - One of the most common sources is a motion model, which could be supplied by an IMU or by wheel odometry or something really simple like a constant velocity or even a zero velocity model
  - How complex the motion model needs to be to give us a good initial guess really depends on how smoothly the car is driving.
  - If the car is moving slowly relative to the scanning rate of the LIDAR sensor, one may even use the last known pose as the initial guess.
- Associate each point in $P_{S’}$ with the nearest point in $P_S: transform the coordinates of the points in one cloud into the reference frame of the other
- Solve for the optimal transformation ${\hat C_{S’S}, \hat r^{S’S}_S}$
- Repeat until convergence
Solving for the Optimal Transformation
- use least-squares:
- Special attention need to be paid to the rotation matrix: as two rotation matrices addition will not necessarily results in a valid rotation matrix. 3D rotations belong to something called the special orthogonal group or SO3
Steps:
- Compute the centroids of each point cloud:
  - $\mu_S = \frac 1n \sum^n_{j=1}P^{j}_S$
  - $\mu_{S’} = \frac 1n\sum^n_{j=1}P^{(j)}_{S’}$
- Compute a matrix capturing the spread of the two point clouds: $$W_{S’S} = \frac 1n \sum^n_{j=1}(P^{(j)}{s} - \mu_{S})(P^{(j)}{s’}-\mu_{S’})^T$$ this W matrix can be regarded as something like an inertia matrix you might encounter in mechanics.
- finding the optimal rotation matrix using SVD of W matrix: $$USV^T = W_{S’S}$$ where U V are rotation and S is the scaling matrix. As we’re dealing with rigid body motion in this problem, we don’t want any scaling in a rotation estimate, so we’ll replace the S matrix with something like the identity matrix to remove the scaling.
- Use the optimal rotation to get the optimal translation by aligning the centroids
Estimate the uncertainty:
- We can obtain an estimate of the covariance matrix of the ICP solution using a formula
- This expression tells us how the covariance of the estimated motion parameters is related to the covariance of the measurements in the two point clouds using certain second-order derivatives of the least squares cost function.
Variants
- Point-to-point ICP minimizes the Euclidean distance between each point in $P_{S’}$, and the nearest point in $P_S$
- Point-to-plane ICP minimizes the perpendicular distance between each point in $P_{S’}$, and the nearest plane in $P_S$·
  - This tends to work well in structured environments like cities or indoors
  - first fit a number of planes to the first point cloud and then minimize the distance between each point in the second cloud and its closest plane in the first cloud.

Common pitfalls of the ICP algorithm

Outliers - Objects in Motion:
- be careful to exclude or mitigate the effects of outlying points that violate our assumptions of a stationary world
- One way to do this is by fusing ICP motion estimates with GPS and INS estimates.
- Another option is to identify and ignore moving objects, which we could do with some of the computer vision techniques.
- But an even simpler approach for dealing with outliers like these is to choose a different loss function that is less sensitive to large errors induced by outliers than our standard squared error loss, Robust Loss Functions:

展开全文 >>

Coursera State Estimation and Localization for Self-Driving Cars 03

2019-07-29

From Coursera, State Estimation and Localization for Self-Driving Cars by University of Toronto
https://www.coursera.org/learn/state-estimation-localization-self-driving-cars

GNSS/INS Sensing for Pose Estimation

3D Geometry and Reference Frames

Reference frame and vector coordinates

Vectors can be expressed in different coordinate frames
The coordinates of the vector are related through a rotation matrix: $r_b = C_{ba}r_a$, $C_{ba}$ is the rotation matrix takes coordinates in frame a and rotates them into frame b

Rotation representations

rotation matrix (direction cosine matrix):
- $C_{ba}=[b_1 \;b_2\; b_3]^T[a_1 \; a_2\; a_3] \in R^{3\times 3}$
- $r_b = C_{ba}r_a$
- $C_{ba}C^T_{ba} = C_{ba}C_{ab} = 1$
Unit quaternions
- $q=[q_w \; q_v]T = [\cos \frac \phi 2 \quad \hat u \sin \frac \phi 2]^T$
- $| q| = 1$
- $r_b = C(q_{ba}r_a)$
- $C(q) = (q^2_w-q^T_vq_v)1+2q_vq^T_v + 2q_w[q_v]_x$
- Quaternion multiplication and rotations
Euler angles
- $C(\theta_3,\theta_2,\theta_1) = C_3(\theta_3)C_2(\theta_2)C_1(\theta_1) $
- suffer from singularity

The importance of the ECEF, ECIF and Navigation reference frames

Reference Frames

ECIF: Earth-Centred Inertial Frame
- ECIF coordinate frame is fixed, Earth rotates about the z axis.
ECEF: Earth-Centred Earth-Fixed Frame
- ECEF coordinate frame rotates with the Earth.
- x axis aligns the prime meridian
Navigation
- NED frame
- ENU frame
Sensor/Vehicle frame

The Inertial Measurement Unit (IMU)

Components

gyroscopes
- measures a rotational rate in the sensor frame
- Microelectromechanical systems(MEMS) are much smaller and cheaper
- Measure rotational rates instead of orientation directly
- Measurements are noisy and drift over time
accelerometers
- measures a specific force (or acceleration relative to free-fall) in the sensor frame
- Cheaper MEMS based accelerometers use a miniature cantilever beam with a proof mass attached to it. When the sensor is accelerated, the beam deflects.
- More expensive sensors may also use Piezoelectric materials
- Accelerometers measure acceleration relative to free-fall-this is also called the proper acceleration or specific force: $$a_{mean} = f = \frac{F_{non-gravity}}{m}$$
- In localization, we typically require the acceleration relative to a fixed reference frame
  - ‘coordinate’acceleration
  - computed using fundamental equation for accelerometers in a gravity field: $f+g=\ddot r_i$

Global Navigation Satellite System (GNSS) is a catch-all term for a satellite system(s) that can be used to pinpoint a receiver’s position

GPS - Computing Position:

Each GPS satellite transmits a signal that encodes
- its position (via accurate ephemeris information)
- time of signal transmission (via onboard atomic clock)
To compute a GPS position fix in the Earth-centred frame, the receiver uses the speed of light to compute distances to each satellite based on time of signal arrival
At least four satellites are required to solve for 3D position, three if only 2D is required

GPS I Error Sources:

Ephemeris & clock errors
- A clock error of $lx10^{-6}$s gives a 300m position error!
Geometric Dilution of Precision (GDOP)
- The configuration of the visible satellites affects position precision

Improvements of GPS:

Basic GPS:
- mobile receiver
- no error correction
- ~ 10m accuracy
Differential GPS (DGPS):
- mobile receiver + fixed base station(s)
- estimate eror caused by atmospheric effects
- ~10m accuracy
Real-Time Kinematic (RTK) GPS
- mobile receiver + fixed base station(s)
- estimate relative position using phase of carrier signal
- ~2cm accuracy

展开全文 >>

Coursera State Estimation and Localization for Self-Driving Cars 02

2019-07-25

From Coursera, State Estimation and Localization for Self-Driving Cars by University of Toronto
https://www.coursera.org/learn/state-estimation-localization-self-driving-cars

Linear and Nonlinear Kalman Filters

The (Linear) Kalman Filter

The Kalman Filter requires the following motion and measurement models:
- Motion madel: $x_k=F_{k-1}x_{k-1}+G_{k-1}u_{k-1} + w_{k-1}$
- measurement model: $y_k = H_kx_k+v_k$
with the following noise properties: $$v_k ~ N(0,R_k) \quad w_k ~ N(0,Q_k)$$
- $v_k$ is the measurement noise, $w_k$ is the process/motion noise

The Kalman filter can be regarded as a recursive least squares estimator that also includes a motion model.
It has twp stages: the prediction and the

Prediction: $$\check x_k = F_{k-1}x_{k-1}+G_{k-1}u_{k-1}$$ $$\check P_k = F_{k-1}\hat P_{k-1}\hat F^T_{k-1} + Q_{k-1}$$
Optimal gain: $$K_k = \check P_k H^T_k(H_k\check P_k H^T_k + R_k)^{-1}$$
Correction: $$\hat x_k = \check x_k + K_k(y_k-H_k\check x_k)$$ $$\hat P_k = (1-K_kH_k)\check P_k$$

Summary:

The Kalman Filter is very similar to RLS but includes a motion model that tells us how the state evolves over time
The Kalman Filter updates a state estimate through two stages:
- prediction using the motion model
- correction using the measurement model

Kalman Filter and The Bias BLUEs

Bias

Bias = $E(\hat p_k) - p_k$ ,where pk is the true position
A filter is an unbiased if for all k : $E[\hat e_k] = E[\hat p_k -p_k] = E[\hat p_k] - p_k = 0$
How to compute bias analytically?
- consider the error dynamics:
- predicted state error: $\check e_k = \check x_k - x_k$
- corrected estimate error: $\hat e_k = \hat x_k - x_k$
- using Kalman Filter equation we can derive: $$\check e_k = F_{k-1}\check e_{k-1} - w_k$$ $$\hat e_k = (1-K_kH_k)\check e_k + K_kv_k$$
For Kalman filter, for all k: $$E[\check e_k] = E[F_{k-1}\check e_{k-1} - w_k] = F_{k-1} E[\check e_{k-1} - E[w_k] = 0$$ $$E[\hat e_k] = E[(1-K_kH_k)\check e_k +K_kv_k] = (1-K_kH_k)E[\check e_k]+K_kE[v_k]=0 $$
- unbiased prediction!
- so long as $E[\hat e_0]=0, E[v]=0, E[w]=0 $, which is the white uncorrelated noise

consistency

the filter is consistent if for all k: $E[\hat e^2_k]=E[(\hat p_k-p_k)^2]=\hat P_k $
- that is to say: for all time steps k, the filter co-variants $P_k$ matches the expected value of the square of our error.
Practically, this means that our filter is neither overconfident, nor underconfident in the estimate it has produced:
- A filter that is overconfident, and hence inconsistent, will report a covariance that is optimistic: the filter will essentially place too much emphasis on its own estimate and will be less sensitive to future measurement updates.
So long as the initial estimate is consisten and we have white zero noise, then all the estimates will be consistent

Kalman filter is the Best Linear Unbiased

given white uncorrelated zero mean noise, the Kalman Filter is unbiased and consistent.
We can also say that the filter is consistent
In general，if we have white，uncorrelated zero-mean noise，the Kalman filter is the best（i.e，lowest variance）unbiased estimator that uses only a linear combination of measurements

The Extended Kalman Filter

EKF: Uses first-order linearization to turn a nonlinear problem into a linear one

As we introduced, the Kalman filter is actually the best of all possible estimators for linear systems
However, linear systems don’t exist in reality.
Therefore, we need to use a kind of Kalman filter that can apply to non-linear system. That is the extended kalman filter
The key concept in the Extended Kalman Filter is the idea of linearizing a nonlinear system

Role of Jacobian matrices in the EKF and how to compute them

Linearizing a nonlinear system

Linearization means to choose an operating point $a$ and finding a linear approximation to the nonlinear function in the neighborhood of $a$
in two dimensions, this means finding the tangent line to the function f of x when x equals a.
Mathematically, we do this by taking the Taylor series expansion of the function: $$f(x)\approx f(a) + \frac{\partial f(x)}{\partial x}|_{x=a}(x-a) + \frac{1}{2!}\frac{\partial^2f(x)}{\partial x^2}|{x=a}(x-a)^2 + … $$
For linearization, we’re only interested in the first order terms of the Taylor series expansion.

Pick the most recent state estimate as the operating point:

we can get the linearized motion and measurment model
the model involves Jacobian Matrices:
- In vector calculus，a Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function
- Intuitively，the Jacobian matrix tells you how fast each output of the function is changing along each input dimension

Apply the EKF to a simple nonlinear tracking problem

The Error State Extended Kalman Filter (ES-EKF)

Error-state formulation of the Extended Kalman Filter

We can think of the vehicle state as composed of two parts: a large part called the nominal state, $\hat x$, and a small part called the error state, $\delta x$: $x=\hat x+\delta x$
We can think of the error state as the place where all of these modelling errors and process noise accumulate over time, so that the error state is just the difference between the nominal state and the true state at any given time.
If we can figure out what the error state is, we can actually use it as a correction to the nominal state to bring us closer to the true state.
So in the ES-EKF, instead of doing Kalman filtering on the full state which might have lots of complicated non-linear behaviors, we’re going to use the EKF to estimate the error state instead, and then use the estimate of the error state as a correction to the nominal state
Mathematically, we’re going to rearrange our linearized motion model so that we now have an equation that can tell us how the difference between the true state at time, k, and our predicted state at time, k, is related to the same difference at time, $k-1$.
- that means we can build the equations called the error state kinematics.

Loop:

update the nominal state using the non-linear motion model and the current best estimate of the state.
keep track of the state covariance, which grows as we integrate more and more process noise from the motion model.
repeat the loop updating the nominal state and the error state covariance for as long as we like until we receive the measurement and want to do a correction.
- then compute the Kalman gain
- compute the best estimate of the error state using the Kalman gain, the measurement, and our nonlinear measurement model
- update the nomial state by just adding our estimate of the error state to the nominal state to get the correct state estimate for the full state
- finally update the state covariance using the usual equations.

Advantages of the Error-state EKF over the vanilla EKF:

Better performance compared to the vanilla EKF
- The”small”error state is more amenable to linear filtering than the “large” nominal state, which can be integrated nonlinearly
Easy to work with constrained quantities(e.g., rotations in 3D)
- We can also break down the state using a generalized composition operator

Limitations of the EKF

Linearization error

The EKF works by linearizing the nonlinear motion and measurement models to update the mean and covariance of the state
The difference between the linear approximation and the nonlinear function is called linearization error
In general, the linearization error depends on two things:
- how non-linear the original function is to begin with. If our nonlinear function very slowly or is quite flat much of the time, linear approximation is going to be a pretty good fit.
- how far away from the operating point the linear approximation is being used. The further away you move from the operating point, the more likely the linear approximation is to diverge from the true function.

Computing Jacobians

Analytical differentiation is prone to human eror
Numerical differentiation can be slow and unstable
Automatic differentiation (e.g., at compile time) can also behave unpredictably

An Alternative to the EKF - The Unscented Kalman Filter

an alternative approach to non-linear common filtering that relies on something called the unscented transform to pass probability distributions through nonlinear functions.

Use the Unscented transform to pass a probability distribution through a nonlinear function

The basic idea in the unscented transform has three steps.
- First, we choose a set of sample points from our input distribution. These aren’t random samples, but deterministic samples chosen to be a certain number of standard deviations away from the mean.
- these samples are called sigma points, and the unscented transform is sometimes called the sigma point transform.
- After getting the sigma points, pass each sigma point through the nonlinear function, producing a new set of sigma points belonging to the output distribution.
- Finally, compute the sample mean and covariance of the output sigma points with some carefully chosen weights, and these will give us a good approximation of the mean and covariance of the true output distribution.
The unscented transform - Choosing Sigma Points
- In general, for an $n$ dimensional probability distribution, we need $2n+1$ sigma points, one for the mean and the rest symmetrically distributed about the mean.
- The first step in determining where the sigma point should be is taking something called the Cholesky decomposition of the covariance matrix associated with the input distribution: $LL^T=\Sigma_{xx}$ (L is the lower triangular)
- In fact, if the input PDF is one dimensional the Cholesky decomposition is just the square root of the variants, which is the standard deviation.
The unscented transform - Transforming nad recombing
- pass each of the sigma points through nonlinear function to get a new set of transformed sigma points.
- And finally compute the mean and covariance of the output PDF: each of the points gets a specific weight in the mean and covariance calculations, and that weight depends on the parameter kappa and the dimension of the input distribution N.

How the unscented Kalman filter (UKF)uses the Unscented transform in the prediction and correction steps

We can easily use the Unscented Transform in our Kalman Filtering framework with nonlinear models;
Prediction step:
- To propagate the state from time (k-1) to time k, apply the Unscented Transform using the current best guess for the mean and covariance
- decompose the estimated state covariance from time k- 1 and compute sigma point centered around the estimated means state from time k- 1
- propagate our sigma points through our nonlinear motion model to get a new set of sigma points for the predicted state at time k.
- finally, calculate the predicted mean and covariance for the state at time K. At this point it’s important to account for the process noise by adding its covariance to the covariance of the transformed sigma points to get the final predicted covariance.
Correction step
- To correct the state estimate using measurements at time k,use the nonlinear measurement model and the sigma points from the prediction step to predict the measurements
- First, redraw our sigma points using the predicted covariance matrix. We need to do this a second time because we added process noise at the end of the last step, and this will modify the positions of some of the sigma points.
- plug these new sigma points one by one into our nonlinear measurement model to get another set of sigma points for the predicted measurements, then we can estimate the mean and covariance of the predicted measurements using the sample mean and covariance formulas.
- To compute the common gain, we’re also going to need the cross covariance between the predicted state and the predicted measurements, which tells us how the measurements are correlated with the state.
- Then use the Kalman gain to optimally correct the mean and covariance of the predicted state

Advantages of the UKF over the EKF

recommand to use UKF
address the limitations of EKF

Apply the UKF to a simple nonlinear tracking problem

展开全文 >>

Coursera State Estimation and Localization for Self-Driving Cars 01

2019-07-23

From Coursera, State Estimation and Localization for Self-Driving Cars by University of Toronto
https://www.coursera.org/learn/state-estimation-localization-self-driving-cars

Least Squares

Ordinary and weighted least squares

Ordinary least squares

example from estimating resistance:

let the measurement model be: $$y_i = x + v_i$$
- $x$ is the actual resistance
- $v_i$ is the measurement noise
- then the squared error is : $e^2_i = (y_i-x)^2$
The squared error criterion: $$ \hat x_{LS}=argmin_x (e^2_1+e^2_2+e^2_3+e^2_4) = L_{LS}(x)$$
- the ‘best’ estimate of resistance is the one that minimizes the sum of squared errors.
- rewrite into vector form: $$ e= y- H x$$
- $H$ is called Jacobian: $[1 1 1 1]^T$
- therefore, $$L_{LS}(x) = (e^2_1+e^2_2+e^2_3+e^2_4) = e^Te$$ $$=(y-Hx)^T(y-Hx)$$ $$=y^Ty - x^TH^Ty - y^THx + x^TH^THx$$
- therefore, the rest thing is to minimize the above equation
- get partial derivative with respect to the parameter, set to 0:
  $$\frac{\partial L}{\partial x}|_{x=\hat x} = -y^TH-y^TH+2\hat x^T H^TH=0$$ $$-2y^TH+2\hat x^TH^TH=0$$
- re-arrange, we get (when H is full-column rank) $$\hat x_{LS} = (H^TH)^{-1} H^Ty$$
Assumptions:
- Our measurement model,$y=x+v$,is linear
- Measurements are equally weighted (we do not suspect that some have more noise than others)

weighted least squares

The same example: Suppose we take measurements with multiple multimeters,some of which are better than others

Consider the general linear measurement model for m measurements and n unknowns: $$y = Hx+v$$
In regular least squares, we implicitly assumed that each noise term was of equal variance: $$E(v^2_i) = \sigma^2 (i=1,…,m)$$ $$R=E(vv^T) = diag(\sigma^2,…,\sigma^2)$$
If we assume each noise term is independent, but of different variance
$E(v^2_i) = \sigma_i^2 (i=1,…,m)$, $R=E(vv^T) = diag(\sigma_1^2,…,\sigma_m^2)$
Then we can define a weighted least squares criterion as: $$L_{WLS}(x)=e^TR^{-1}e = \frac{e^2_1}{\sigma^2_1} + \frac{e^2_2}{\sigma^2_2} + … + \frac{e^2_m}{\sigma^2_m}$$, where $e = y-Hx$
expanding the new criterion: $$L_{WLS}(x) = e^TR^{-1}e = (y-Hx)^TR^{-1}(y-Hx)$$
- get the derivative: $$\frac{\partial L}{\partial x}|_{x=\hat x} = 0 = -y^TR^{-1}H + \hat x^T H^TR^{-1}H$$
- get $$H^T R^{-1} H \hat x_{WLS} = H^TR^{-1}y$$
- the weighted normal equation: $$\hat x = (H^TR^{-1}H)^{-1} H^T R^{-1}y$$

Comparison between ordinary and weighted least squares:

Summary:

Measurements can come from sensors that have different noisy characteristics
Weighted least squares lets us weight each measurement according to noise variance

Recursive least squares

In the above example, what if we have a stream of data?
We can use linear recursive estimator

Suppose we have an optimal estimate, $\hat X_{k-1}$, of our unknown parameters at time k-1
Then we obtain a new measurement at time k: $y_k = H_kx+v_k$
We can use a linear recursive update: $\hat x_k = \hat x_{k-1} + K_k(y_k-H_k \hat x_{k-1})$
- We update our new state as a linear combination of the previous best guess and the current measurement residual (or error), weighted by a gain matrix $K_k$
What is teh gain matrix?
- We can compute the gain matrix by minimizing a similar least squares criterion, but this time we’ll use a probabilistic formulation
We wish to minimize the expected value of the sum of squared errors of our current estimate at time step k: $$L_{RLS} = E[(x_k-\hat x_k)^2]=\sigma_k^2$$
If we have n unknown parameters at time step k,we generalize this to $$L_{RLS} = E[(x_{1k}-\hat x_{1k})^2 + … +(x_{nk}-\hat x_{nk})^2] = Trace(P_k)$$
- $P_k$ is the estimator covariance
Using linear recursive formulation, we can express covariance as a function of $K_k$: $$P_k = (1-K_kH_k)P_{k-1} (1-K_kH_k)^T + K_kR_kK^T_k$$
We can show(through matrix calculus) that this is minimized when $$K_k = P_{k-1}H^T_k(H_kP_{k-1}H^T_k + R_k)^{-1}$$
With this expression, we can also simplify our expression for $P_k$: $$P_k = P_{k-1} - K_kH_kP_{k-1} = (1-K_kH_k)P_{k-1}$$

Recursive Least Squares I Algorithm

Initialize the esimator: $$\hat x_0 = E[x]$$ $$P_0 = E[(x-\hat x_0)(x-\hat x_0)^T]$$
Set up the measurement model，defining the Jacobian and the measurement covariance matrix: $$y_k = H_kx+v_k$$
Update the estimate of $\hat x_k$ and the covariance $P_k$ using: $$K_k = P_{k-1}H^T_k(H_kP_{k-1}H^T_k +R_k)^{-1}$$ $$\hat x_k = \hat x_{k-1} + K_k(y_k-H_k\hat x_{k-1}) $$ $$P_k = (1-K_kH_k)P_{k-1}$$

Summary

RLS produces a ‘running estimate’ of parameter(s)for a stream of measurements
RLS is a linear recursive estimator that minimizes the (co)variance of the parameter(s) at the current time

Maximum likelihood and the method of least squares

The maximum likelihood estimate, given additive Gaussian noise, is equivalent to the least squares or weighted least squares solutions we derived earlier.

Summary:

LS and WLS produce the same estimates as maximum likelihood assuming Gaussian noise
Central Limit Theorem states that complex errors will tend towards a Gaussian distribution
Least squares estimates are significantly affected by outliers

展开全文 >>

Coursera Self-Driving 04 Vehicle Dynamic Modeling

2019-07-22

From Coursera, Introduction to Self-Driving Cars by University of Toronto
https://www.coursera.org/specializations/self-driving-cars?action=enroll

Creating a good vehicle model is essential for model-based control development.

Kinematic Vs Dynamic Modeling:

kinematics: positions and velocities
dynamics: forces and torques of a car and how they connect
Kinematic modeling: at low speeds,i t is often sufficient to look only at kinematic models of vehicles
- Examples: Two wheeled robot, Bicycle model
Dynamic modeling is more involved, but captures vehicle behavior more precisely over a wide operating range
- Examples: Dynamic vehicle model

Kinematic Modeling in 2D

Coordinate Frames

Right handed by convention
Inertial frame (global world coordinate)
- Fixed reference frame, usually relative to earth
- we often represent this coordinate frame as East North Up, ENU, relative to a reference point nearby.
- Or Earth-Centered Earth Fixed, ECEF, as is used in GNSS systems.
Body frame
- Attached to vehicle, origin at vehicle center of gravity, or center of rotation
- is moving and rotating with respect to the fixed inertial frame as the vehicle moves about.
Sensor frame
- Attached to sensor, convenient for expressing sensor measurements
Coordinate transformation e.g.:
- Location of point (P) in Body Frame(B):
  $$P_B = C_{EB}(\theta) P_E + O_{EB}$$
- Location of point (P) in Inertial Frame(E):
  $$P_E = C_{BE}(\theta) P_B + O_{BE}$$
- $O_{BE}$ or $O_{EB}$ is the translation term, expressed in corresponding frame

2D Kinematic Modeling

An example of a ball in the sky: The kinematic constraint is nonholonomic
- A constraint on rate of change of degrees of freedom
- Vehicle velocity always tangent to current path
  $$\frac{dy}{dx} = tan \theta = \frac{sin \theta}{cos \theta}$$
- Nonholonomic constraint: $$ \dot y cos \theta - \dot x sin \theta = 0 $$
- Velocity components: $$\dot x = v cos \theta, \quad \dot y = v sin \theta$$

state-based models

A state is a set of variables often arranged in the form of a vector that fully describe the system at the current time.

Two-Wheeled Robot Kinematic Model

Assume control inputs are wheel speeds
- Center: p
- Wheel to center: l
- Wheel radius:r
- Wheel rotation rates: w1, w2
Kinematic constraint $$V_i = r w_i$$
Velocity is the average of the two wheel velocities $$v=\frac{v_1 + v_2}{2} = \frac {r w_1 + r w_2} {2}$$
Use the instantaneous center of rotation(ICR)
Equivalent triangles give the angular rate of rotation $$w = \frac{-v_2}{\rho} = \frac{-(v_2 - v_1)}{2l}$$ $$w = \frac{rw_1 - rw_2}{2l}$$
continuous time model: $$\dot x = [(\frac{rw_1+rw_2}{2})cos \theta]$$ $$\dot y = [(\frac{rw_1+rw_2}{2})sin \theta]$$ $$\dot \theta = (\frac{rw_1-rw_2}{2l})$$
discrete time model: $$ x_{k+1} = x_k + [(\frac{rw_{1,k}+rw_{2,k}}{2})cos \theta_k] \delta t$$ $$ y_{k+1} = y_k + [(\frac{rw_{1,k}+rw_{2,k}}{2})sin \theta_k] \delta t$$ $$ \theta_{k+1} = \theta_k + (\frac{rw_{1,k}-rw_{2,k}}{2l}) \delta t$$

source: https://www.coursera.org/lecture/intro-self-driving-cars/lesson-1-kinematic-modeling-in-2d-pScZH

The Kinematic Bicycle Model

2Dbicycle model(simplified car model)
Front wheel steering

Rear Wheel Reference Point:

Apply Instantaneous Center of Rotation（ICR）: $$\dot \theta = \omega = \frac VR$$
Similar triangles: $$tan \delta = \frac LR$$
Rotation rate equation: $$\dot \theta = \omega = \frac vR = \frac{v tan \delta}{L}$$

Rear Axle Bicycle Model

If the desired point is at the center of the rear axle: $$\dot x_r = v cos \theta$$ $$\dot y_r = v sin \theta$$ $$\dot \theta = \frac {v tan \delta}{L}$$
If the desired point is at the center of the front axle: $$\dot x_f = v cos (\theta + \delta)$$ $$\dot y_f = v sin (\theta+\delta)$$ $$\dot \theta = \frac {v sin \delta}{L}$$
If the desired point is at the center of the gravity(cg): $$\dot x_c = v cos (\theta + \beta)$$ $$\dot y_c = v sin (\theta+\beta)$$ $$\dot \theta = \frac {v cos \beta tan \delta}{L}$$ $$\beta = tan^{-1} (\frac {l_r tan \delta}{L})$$

State-space Representation

Modify CG kinematic bicycle model to use steering rate input
- state: $[x, y, \theta, \delta]^T$
- inputs: $[v,\phi]^T$ ($\phi$ is the modified input, the rate of the change of steering angle)
  $$\dot x_c = v cos(\theta + \beta)$$
  $$\dot y_c = v sin(\theta + \beta)$$
  $$\dot \theta = \frac {v cos \beta tan \delta}{L}$$ $$\dot \delta = \phi$$

Dynamic Modeling in 2D

mainly on Newton’s second law

Longitudinal Vehicle Modeling

Dynamic force balance on a vehicle
Powertrain component models
Connect models to create a full longitudinal motior model
Total resistance load: $$F_{load} = F_{aero} + R_x + mga$$
The aerodynamic force can depend on air density, frontal area,on the speed of the vehicle: $$F_{aero} = 1/2C_a \rho A\dot x^2 = c_a \dot x^2$$
The rolling resistance can depend on the tire normal force,tire pressures and vehicle speed: $$R_x = N(\hat c_{r,0}+\hat c_{r,1}|\dot x| + \hat c_{r,2}\dot x^2) \approx c_{r,1}|\dot x|$$

Lateral Dynamics of Bicycle Model

Vehicle Actuation

Tire Slip and Modeling

·Basics of kinematic and coordinates
·Kinematic model development of a bicycle
·Basics of dynamic modeling
·Vehicle longitudinal dynamics and modeling
·Vehicle lateral dynamics and modeling
·Vehicle actuation system
·Tire slips and modeling

展开全文 >>

MATLAB 自动驾驶相关

2019-07-21

MATLAB EXPO 2019 MATLAB和Simulink用于开发自动驾驶的新特性

创建虚拟驾驶场景

Sensor Fusion using synthetic radar and vision data:
- https://www.mathworks.com/help/driving/examples/sensor-fusion-using-synthetic-radar-and-vision-data.html
模拟道路和车辆
添加基于统计概率的视觉与雷达传感器
测试传感器融合与目标跟踪
可视化传感器覆盖区域, 检测列表, 目标跟踪列表

图形化的驾驶场景设计器:

Driving Scenario Designer: https://www.mathworks.com/help/driving/ref/drivingscenariodesigner-app.html
创建道路与车道线标记
添加车辆与行驶轨迹
设置车辆尺寸与雷达截面积(RCS)
提供预定义的ADAS场景
支持导入OpenDRIVE格式路网文件

Simulink仿真驾驶场景:

Test Open-Loop ADAS Algorithm Using Driving Scenario: https://www.mathworks.com/help/driving/ug/test-open-loop-adas-algorithm-using-driving-scenario.html
- 编辑驾驶场景
- 在Simulink中读取场景
- 添加传感器模型
- 可视化传感器输出
- 调节仿真速度

设计车辆的横向与纵向控制

Lane following control with sensor fusion: https://www.mathworks.com/help/mpc/ug/lane-following-control-with-sensor-fusion-and-lane-detection.html
- 将场景集成到Simulink
- 设计横向（车道保持）与纵向
  （间距管理）模型预测控制器
- 设计传感器融合
- 生成C/C++代码
- 软件在环 (SIL) 测试
可视化传感器检测列表与目标跟踪列表

控制算法的自动化测试:

Testing a Lane-Following Controller with Simulink Test: https://www.mathworks.com/help/sltest/examples/testing-a-lane-following-controller.html
指定测试需求与被测模型
指定测试通过判据
测试结果绘图与报告生成
自动化整个测试过程

从录制的实车数据生成驾驶场景:

Scenario Generation from Recorded Vehicle Data: https://www.mathworks.com/help/driving/examples/scenario-generation-from-recorded-vehicle-data.html
回放录制的视频
导入OpenDRIVE路网
导入GPS数据（本车位置）
导入传感器目标列表（其他车辆
位置）

车道跟随控制器与视觉算法的集成仿真

Lane-Following Control with Monocular Camera Perception: https://www.mathworks.com/help/mpc/ug/lane-following-control-with-monocular-camera-perception.html
集成Simulink控制器模块
- 车道跟随
- 间距控制
集成MATLAB图像算法
- 车道边界检测
- 车辆检测
通过“虚幻”引擎合成理想视觉传感器图像

传感器融合与目标跟踪

感知

多目标跟踪器（Multi-object tracker）
- GNN跟踪器（Global Nearest Neighbor tracker）
- JPDA跟踪器（Joint Probabilistic Data Association tracker）
- TOMHT跟踪器（Track-Oriented Multi-Hypothesis Tracker）
- PHD跟踪器（Probability Hypothesis Density tracker）
跟踪滤波器：
- 线性, 扩展, 无迹卡尔曼滤波器
- 粒子滤波器
- 高斯和滤波器
- 交互式多模型（IMM）滤波器

点目标跟踪与扩展目标跟踪：

Extended Object Tracking： https://www.mathworks.com/help/fusion/examples/extended-object-tracking.html
- 自定义的扩展目标跟踪器
- 利用高精度传感器对单个目标生成的多个检测
- 可获取更多目标属性：大小、形状、方向等
- 评估跟踪性能和误差指标
- 评估算法在桌面的执行时间
点目标跟踪
- 点目标跟踪器 multiObjectTracker
- 传感器对单个目标生成单个检测或经过聚类后形成单个检测
- 将目标简化为一个点进行跟踪

将激光雷达点云转换为目标列表：

Track Vehicles Using Lidar: https://www.mathworks.com/help/vision/ug/track-vehicles-using-lidar.html
- 设计3-D边框检测器
- 设计目标跟踪器
- 生成C/C++代码

规划

连接HERE高精度实时地图：读取道路和限速属性:

Use HERE HD Live Map Data to Verify Lane Configurations: https://www.mathworks.com/help/driving/examples/use-here-hd-live-map-data-to-verify-lane-configurations.html
- 载入摄像机与GPS数据
- 读取道路限速
- 读取车道配置
- 可视化组合数据

设计路径规划器:

Automated Parking Valet: https://www.mathworks.com/help/driving/examples/automated-parking-valet.html
- 创建环境的代价地图
- 膨胀代价地图用于碰撞检测
- 指定目标位置
- 使用快速搜索随机树 (RRT*)算法规划路径

设计路径规划器与车辆控制器:

Automated Parking Valet in Simulink: https://www.mathworks.com/help/driving/examples/automated-parking-valet-in-simulink.html
- 路径规划器（RRT*算法）
- 车辆横向与纵向控制器（基于运动学的Stanley算法）
- 与车辆动力学模型结合进行闭环仿真

规划与控制算法生成C/C++代码:

https://www.mathworks.com/help/driving/examples/code-generation-for-path-planning-and-vehicle-control.html
- 独立的模型文件
- 配置代码生成选项
- 生成C/C++代码
- 软件在环 (SIL) 测试
- 评估代码执行时间

控制

通过闭环仿真设计实际的ADAS功能

Autonomous Emergency Braking with Sensor Fusion： https://www.mathworks.com/help/driving/examples/autonomous-emergency-braking-with-sensor-fusion.html
- 指定驾驶场景
- 设计AEB逻辑
- 设计传感器融合算法
- 仿真完整系统
- 生成C/C++代码
- 软件在环 (SIL) 测试

训练用于ADAS控制的增强学习网络：

Train DDPG Agent for Adaptive Cruise Control：https://www.mathworks.com/help/reinforcement-learning/ug/train-ddpg-agent-for-adaptive-cruise-control.html
- 创建环境接口
- 创建agent
- 训练agent
- 仿真训练的agent

集成其他资源

与ROS集成的三种方式：
- 回放通过ROS记录的数据：Work with rosbag Logfiles https://www.mathworks.com/help/robotics/examples/work-with-rosbag-logfiles.html
- 实时连接ROS系统：Exchange Data with ROS Publishers and Subscribers https://www.mathworks.com/help/robotics/examples/exchange-data-with-ros-publishers.html
- 生成独立的ROS节点：Generate a Standalone ROS Node from Simulink https://www.mathworks.com/help/robotics/examples/generate-a-standalone-ros-node-in-simulink.html
从MATLAB调用C++, Python, OpenCV:
- Import C++ Library Functionality into MATLAB: https://www.mathworks.com/help/matlab/matlab_external/what-you-need-to-import-cpp-library-functions-into-matlab.html
- Call Python from MATLAB: https://www.mathworks.com/help/matlab/matlab_external/call-python-from-matlab.html
- Install and Use Computer Vision Toolbox OpenCV Interface: https://www.mathworks.com/help/vision/ug/opencv-interface.html

用到的toolbox:
Model Predictive Control Toolbox
Automated Driving Toolbox
Embedded Coder
Sensor Fusion and Tracking Toolbox
Reinforcement Learning Toolbox

Reference:
https://www.matlabexpo.com/content/dam/mathworks/mathworks-dot-com/images/events/matlabexpo/cn/2019/whats-new-in-matlab-and-simulink-for-adas.pdf

展开全文 >>

Coursera Self-Driving 03 Safety Assurance for Autonomous Vehicles

2019-07-21

From Coursera, Introduction to Self-Driving Cars by University of Toronto
https://www.coursera.org/specializations/self-driving-cars?action=enroll

Safety Assurance for Autonomous Vehicles

Safety Assurance for Self-Driving Vehicles

Autonomous driving crashes
Formal definitions
- Safety: absence of unreasonable risk of harm
- Hazard: potential source of unreasonable risk of harm
Major hazard sources
- Mechanical
- Electrical
- Hardware
- Software
- Sensors
- Bahavior
- Fallback
- Cyber
Safety requirements
- NHTSA: safety framework
  - Systems engineering approach to safety
  - Autonomy design: ODD, OEDR, Fallback, Traffic laws, cybersecurity, HMI
  - Testing & Crash mitigation: crashworthiness, post crash, data recording, consumer education

Industry Methods for Safety Assurance and Testing

Industry perspectives on self driving safety

Waymo:
- Behavior safety, Functional safety, Crash safety, Operational safety, Non collision safety, Approaches to demonstrating autonomy safety
- Safety process:
  - Identify hazard scenarios & potential mitigations
  - Use hazard assessment methods to define safety requirements
    - Preliminary analysis
    - Fault tree
    - Design Failure Modes & Effects Analyses
- Levels of testing to ensure safety
  - Simulation testing: Test rigorously with simulation,thousands of variations,fuzzing of neighbouring vehicles
  - Closed-course testing:
    - Follow 28 core + 19 additional scenario competencies on private test tracks
    - Focus on four most common crashes: Rear-end,intersection,road departure,lane change
  - Real-world driving
GM:
- Address all 12 elements of NHTSA Safety Framework
- Iterative Design: Analyze, build, simulate, drive
- Safety through Comprehensive Risk Management and Deep Integration:
  - identify and address risks,validate solutions
  - prioritize elimination of risks,not just mitigation
- All hardware,software systems meet self-set standards for performance,crash protection, reliability,serviceability,security,safety
- Safety process:
  - Deductive Analysis: fault tree analysis
  - Inductive Analysis: Design &Process FMEA
  - Exploratory Analysis: HAZOP:Hazard & Operability Study
- Safety Thresholds
  - All GM vehicles are equipped with two key safety thresholds
  - Fail safes: There is redundant functionality(second controllers,backup systems etc)such that even if primary systems fail,the vehicle can stop normally
  - SOTIF: All critical functionalities are evaluated for unpredictable scenarios
- Testing:
  - Performance testing at different levels
  - Requirements validation of components, levels
  - Fault injection testing of safety critical functionality
  - Intrusive testing such as electromagnetic interference, etc.
  - Durability testing and simulation based testing

Approaches to demonstrating autonomy safety

Analytical vs Data Driven: Definitions
- Analytical Safety: Ensuring the system works in theory and meets safety requirements found by hazard assessment
Data driven safety
- Safety guarantee due to the fact that the system has performed autonomously without fail on the roads for a very large number of kms

Safety Frameworks for Self-Driving

Generic Safety Frameworks

Fault Tree Analysis
- Top down deductive failure analysis
- Boolean logic
Probabilistic Fault Tree Analysis
- Assign probabilities to fault “leaves”
- Use logic gates to construct failure tree
Failure Mode and Effects Analyses（FMEA）
- Bottom up process to identify all the effects of faults in a system
- Failure Mode:
  - Modes or ways in which a component of the system may fail
- Effects Analysis:
  - Analyzing effects of the failure modes on the operation of the system
HAZOP - a variation on FMEA
- Hazard and operability study(HAZOP)
- Qualitative brainstorming process, needs “imagination”
- Uses guide words to trigger brainstorming(not, more, less etc.)
- Applied to complex’ processes’
  - Sufficient design information is available, and not likely to change significantly

Functional safety frameworks

FuSa HARA: safety requirements through risk analysis
SOTIF: behavior risk assessment

展开全文 >>

Visual Features - Detection, Description and Matching

Introduction to Image features and Feature Detectors

Introduction to feature detecors

Feature Descriptors

Feature Matching

Feature Matching: Handling Ambiguity in Matching

Outlier Rejection

Visual Odometry

Basics of 3D Computer Vision

The Camera Sensor

Camera Projective Geometry

Camera Calibration

Visual Depth Perception - Stereopsis

Visual Depth Perception - Computing the Disparity

Image filtering

An Autonomous Vehicle State Estimator

State Estimation in Practice

Multisensor Fusion for State Estimation

Sensor Calibration - A Necessary Evil

Loss of One or More Sensors

LIDAR Sensing

Light Detection and Ranging Sensors

LIDAR Sensor Models and Point Clouds

Pose Estimation from LIDAR Data

GNSS/INS Sensing for Pose Estimation

3D Geometry and Reference Frames

The Inertial Measurement Unit (IMU)

The Global Navigation Satellite Systems (GNSS)

Linear and Nonlinear Kalman Filters

The (Linear) Kalman Filter

Kalman Filter and The Bias BLUEs

The Extended Kalman Filter

The Error State Extended Kalman Filter (ES-EKF)

Limitations of the EKF

An Alternative to the EKF - The Unscented Kalman Filter

Least Squares

Ordinary and weighted least squares

Recursive least squares

Maximum likelihood and the method of least squares

Kinematic Modeling in 2D

The Kinematic Bicycle Model

Dynamic Modeling in 2D

Longitudinal Vehicle Modeling

Lateral Dynamics of Bicycle Model

Vehicle Actuation

Tire Slip and Modeling

MATLAB EXPO 2019 MATLAB和Simulink用于开发自动驾驶的新特性

创建虚拟驾驶场景

传感器融合与目标跟踪

集成其他资源

Safety Assurance for Autonomous Vehicles

Safety Assurance for Self-Driving Vehicles

Industry Methods for Safety Assurance and Testing

Safety Frameworks for Self-Driving