Publication Details
Issue: Vol 4, No 5 (2026)
ISSN: 2993-2637
Visit Journal Website

Abstract

Manipulation tasks often call for complex combinations of compliance control objectives such as trajectory tracking, energy efficiency, and smoothness, which cannot be jointly satisfied by a fixed-parameter impedance controller for different scenarios. Here we introduce a DL+RL solution to learn variable impedance control policies for a 7-DOF robot manipulator. We train an LSTM with multi-head self-attention module to refine reference trajectories with behavior cloning, and learn a PPO agent to continuously adjust per joint stiffness at runtime. Our physics-informed auto-damping formulation is based on critical damping theory which automatically links damping coefficients with stiffness, reducing the degrees-of-freedom of the action space while yielding mechanically principled impedance behaviour. We benchmark this method, trained on the DROID robotic manipulation dataset, against baselines comprising fixed-parameter impedance controllers, DL-only models and RL-only models. Our DL+RL method reduces control energy by 25.9% and motion jerk by 96% relative to the fixed-parameter baseline with statistical significance determined by paired t-tests. An ablation study highlights the benefit of each architectural choice, including our proposed physics-informed damping formulation and attention mechanism.

Keywords
Variable impedance control compliance tuning deep reinforcement learning LSTM attention PPO energy efficiency robotic manipulation mechatronic systems