Probabilistic Movement Primitives Part 3: Supervised Learning

In this post, we describe how a Probabilistic Movement Primitive can be learnt from demonstrations using supervised learning.


Learning from Demonstrations

To simplify the learning of the parameters θ, a Gaussian is assumed for p(w;θ)=N(w|μw,Σw) over w. The distribution p(yt|θ) for time step t is written as,

(1)p(yt|θ)=p(yt|w)p(w;θ)dw,=N(yt|Φtw,Σy)N(w|μw,Σw),=N(yt|Φtμw,ΨtΣwΦtT+Σy).

It can be observed from the above equation, that the learnt ProMP distribution is Gaussian with, (2)μt=Φtμw,Σt=ΦtΣwΦtT+Σy


Learning Stroke-based Movements

For stroke-based movements, the parameter θ={μw,Σw} which specifies the mean and variance of w, can be learnt by maximum likelihood estimation using multiple demonstrations. The weights of each trajectory are estimated individually with linear ridge regression,

(3)wi=(ΦTΦ+λI)1ΦTYi

where Yd represents the positions of all joints from the demonstration d for all time steps, Φ represents the basis function matrix and I is the identity matrix. The demonstrations are aligned by varying the phase variable. For each demonstration, it assumed that zbegin = 0 and zend = 1. The ridge factor λ is generally set to a very small value (λ=1012), as larger value debases the estimation of the trajectory distribution.

The mean μw and covariance Σw are computed from wi as,

(4)μw=1Ni=1Nwi,Σw=1N1i=1N(idμw)(idμw)T

where N is the number of demonstrations. Since one trajectory is generated from one demonstration, N also denoted number of trajectories.


Let's code


W = []  # list that stores all the weights
mean_W  = None
sigma_W = None

mid_term = np.dot(Phi, Phi.T) + np.dot(Phi_dot, Phi_dot.T)

for demo_traj in pos_list:
    interpolate = interp1d(z, demo_traj, kind='cubic')
    stretched_demo = interpolate(z)[None,:] # strech the trajectory to fit 0 to 1

    w_d = np.dot(np.linalg.inv(mid_term + 1e-12*np.eye(n_bfs)),
    np.dot(Phi, stretched_demo.T)).T  # weights for each trajectory
    W.append(w_d.copy()) # append the weights to the list

W =  np.asarray(W).squeeze()

mean_W = np.mean(W, axis=0)
sigma_W = np.cov(W.T)

# Computing the mean and sigma of the sampled trajectory
mean_of_sample_traj = np.dot(Phi.T, mean_W)
sigma_of_sample_traj = np.dot(np.dot(Phi.T, sigma_W), Phi) + 1e-10  # Sigma_y

The learnt stroke-based ProMP distribution is shown in blue over the demonstration distribution in red.


References: 
1. Paraschos, Alexandros, et al. "Using probabilistic movement primitives in robotics." Autonomous Robots 42.3 (2018): 529-551.

Comments

Popular posts from this blog

Three Wheeled Omnidirectional Robot : Motion Analysis

The move_base ROS node

Overview of ATmega328P