Recent successes in robot learning have significantly enhanced autonomous systems across a wide range of tasks. However, they are prone to generate similar or the same solutions, limiting the controllability of the robot to behave according to user intentions. These limited robot behaviors may lead to collisions and potential harm to humans. In this paper, we introduce a semi-autonomous teleoperation where the user can operate a robot by selecting a high-level command, referred to as $\textit{option}$ generated by the learned policy. To generate effective and diverse options, we propose a quality-diversity (QD) based sampling method that simultaneously optimizes both the quality and diversity of options using reinforcement learning (RL). Additionally, we propose a mixture of latent variable models to learn a policy function defined as multiple option distributions. In experiments, we show that the proposed method achieves superior performance in terms of the success rate and diversity of the options in simulation environments. We further demonstrate that our method outperforms manual keyboard control for time duration over cluttered real-world environments.
We compare the performance of MLPG to Proximal Policy Optimization (PPO)[1], Soft Actor-Critic (SAC)[2], and Deep Latent Policy Gradient (DLPG)[3]. To ensure a fair comparison, we use stochastic distributions for both PPO and SAC.