Probabilistic Movement Primitives Part 1: Introduction
In the previous post, we talked about Dynamic Movement Primitive (DMP) framework. While DMP is an attractive MP architecture for generating stroke-based and rhythmic movements, it is a deterministic approach that can only represent the mean solution, which is known to be suboptimal. This creates a need for a more sophisticated MP architecture that not can not only overcome these problems but can also exhibit various useful properties like conditioning of trajectories and adaptation to new situations. Probabilistic Movement Primitives (ProMPs) is the only existing approach that exhibits many such properties in one unified framework. In this ProMP series, we will be talking about the theory behind this framework and simultaneously code each part of this framework in Python.
Probabilistic Movement Primitives is a probabilistic formulation of MPs that maintains a distribution over trajectories. Such a distribution over trajectories automatically encodes the variance of the movements, hence modeling the stochastic behavior (ensuring optimality). This probabilistic approach facilitates the generation of many properties by manipulating probabilistic distributions of the trajectories.
Probabilistic Trajectory Representation
For simplicity we start with a case of single degree of freedom. Consider a scalar joint angle $q$ defined over time (which for multi DOF extends to multiple joint angles). One movement execution is modeled as a trajectory $\tau = \{q_t\}_{t=0...T}$, defined by the joint angle $q_t$ over time. A single trajectory is compactly represented as,
\begin{equation} y_t = \begin{bmatrix} q_t\\ \dot{q}_t \end{bmatrix} = \Phi_t w + \epsilon_y = \begin{bmatrix} \phi_t\\ \dot{\phi}_t \end{bmatrix}w + \epsilon_y, \label{eq9} \end{equation}
where $w$ is the weight vector, $\Phi_t = [\phi_t, \dot{\phi}_t]^T$ defines the 2 × $K$ (=number of basis functions) dimensional time-dependent basis function matrix corresponding to both joint positions $q_t$ and velocities $\dot{q}_t$. The basis functions for velocities ($\dot{\phi}_t$) is the time derivative of $\phi_t$ and $\epsilon_y \sim \mathcal{N}(0,\,\Sigma_y)$ represents zero-mean i.i.d. Gaussian noise. The probability of observing a state $y_t$ given the weight vector $w$ can now be represented as,
\begin{equation} p(y_t| w) = \mathcal{N}(y_t| \Phi_t w, \, \Sigma_y), \label{eq10} \end{equation}
A distribution $p(w; \theta)$ is introduced over the weights vector $w$ with parameter $\theta$ to capture the variance in the data. By marginalizing out the weight vector $w$, the probability distribution of a state $y_t$ can now be calculated as,
\begin{equation} p(y_t| \theta) = \int p(y_t| w) \; p(w; \theta) dw. \label{eq11} \end{equation}
Let's Code
In this series of posts, we will write a code to learn a stroke-based ProMP over a custom one-dimensional dynamical system. Consider a system having the following dynamics $\dot{x} = Ax + Bu$, where $x$ is the state of the system, $A$ is the system matrix, $B$ is the control matrix and $u$ the control commands. $x = [p \; \dot{p}]^T$ represents the vector containing the one dimensional position and velocity. Dimension of $A = (2 \times 2)$ (no of states $\times$ no of states) and dimension of $B = (2 \times 1)$ (no of states $\times$ control dims). The next state $x_t$ is by integrating the dynamics discretely: \begin{equation} x_t = (I + A dt) + (B dt \times u) \end{equation}
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
num_trajs = 100 # number of demo trajectories to be generated
num_points = 200 # number of points in each trajectory.
A = np.array([ [0.,1.], [0., 0.] ])
B = np.array([ [0.], [1.] ])
start_x = np.array([[0],[0]]) #initial state (pos, vel)
def next_x(prev_x, cmd):
dt = 0.005
return np.dot(np.eye(2) + A*dt, prev_x) + np.dot(B*dt, cmd)
We generate a set of 100 demonstration trajectories using these dynamics with some noise for variations. Each trajectory is set such that it starts at 0 positions and has 200 points. The change in time variable $dt$ is set to be $0.005$.
for _ in range(num_trajs):
prev_x = start_x
# arrays to hold the position and velocity points in each trajectory
pos_arr = np.zeros(num_points)
vel_arr = np.zeros(num_points)
pos_arr[0] = start_x[0]
vel_arr[0] = start_x[1]
# control commands in the form of sine wave
x = (np.linspace(0,4*np.pi,num_points))
cmd = np.sin(x) + np.random.randn(num_points)*0.1 #noise added for variations
for i in range(1,num_points):
x = next_x(prev_x, cmd[i-1])
pos_arr[i] = x[0]
vel_arr[i] = x[1]
prev_x = x
The control commands are computed using the sine wave from $0$ to $4\pi$. The 100 demonstration trajectories generated from the above code are shown in the Figure below. The demonstration trajectory distribution generated using the dynamics is shown in red. The solid red line is the mean of the distribution and the shaded area denotes two times the standard deviation.
P.S.: The code is explained and written in fragments in this series. You will have to combine everything in one script one-after-another to have the ProMP running.
References:
Comments
Post a Comment