publications | Alexander von Rohr

* denotes equal contribution. You can also find my articles on Google Scholar.

2025

SCL
Robust Direct Data-Driven Control for Probabilistic Systems

Alexander von Rohr, Dmitrii Likhachev, and Sebastian Trimpe

Systems & Control Letters, 2025

Abs DOI arXiv Bib HTML PDF Code

We propose a data-driven control method for systems with aleatoric uncertainty, such as robot fleets with variations between agents. Our method leverages shared trajectory data to increase the robustness of the designed controller and thus facilitate transfer to new variations without the need for prior parameter and uncertainty estimation. In contrast to existing work on experience transfer for performance, our approach focuses on robustness and uses data collected from multiple realizations to guarantee generalization to unseen ones. Our method is based on scenario optimization combined with recent formulations for direct data-driven control. We derive upper bounds on the minimal amount of data required to provably achieve quadratic stability for probabilistic systems with aleatoric uncertainty and demonstrate the benefits of our data-driven method through a numerical example. We find that the learned controllers generalize well to high variations in the dynamics even when based on only a few short open-loop trajectories. Robust experience transfer enables the design of safe and robust controllers that work “out of the box” without additional learning during deployment.
@article{vonrohr2025robust, author = {{von Rohr}, Alexander and Likhachev, Dmitrii and Trimpe, Sebastian}, title = {Robust Direct Data-Driven Control for Probabilistic Systems}, journal = {Systems & Control Letters}, volume = {196}, pages = {106011}, year = {2025}, issn = {0167-6911}, doi = {10.1016/j.sysconle.2024.106011}, }
TMLR
Event-Triggered Time-Varying Bayesian Optimization

Paul Brunzema, Alexander von Rohr, Friedrich Solowjow, and Sebastian Trimpe

Transactions on Machine Learning Research, 2025

Abs arXiv Bib HTML PDF Code

We consider the problem of sequentially optimizing a time-varying objective function using time-varying Bayesian optimization (TVBO). Current approaches to TVBO require prior knowledge of a constant rate of change to cope with stale data arising from time variations. However, in practice, the rate of change is usually unknown. We propose an event-triggered algorithm, ET-GP-UCB, that treats the optimization problem as static until it detects changes in the objective function and then resets the dataset. This allows the algorithm to adapt online to realized temporal changes without the need for exact prior knowledge. The event trigger is based on probabilistic uniform error bounds used in Gaussian process regression. We derive regret bounds for adaptive resets without exact prior knowledge of the temporal changes and show in numerical experiments that ET-GP-UCB outperforms competing GP-UCB algorithms on both synthetic and real-world data. The results demonstrate that ET-GP-UCB is readily applicable without extensive hyperparameter tuning.
@article{brunzema2025eventtriggered, title = {Event-Triggered Time-Varying Bayesian Optimization}, author = {Brunzema, Paul and {von Rohr}, Alexander and Solowjow, Friedrich and Trimpe, Sebastian}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2025}, }
T-RO
Simulation-Aided Policy Tuning for Black-Box Robot Learning

Shiming He, Alexander von Rohr, Dominik Baumann, Ji Xiang, and Sebastian Trimpe

IEEE Transactions on Robotics, 2025

Abs DOI arXiv Bib HTML Code

How can robots learn and adapt to new tasks and situations with little data? Systematic exploration and simulation are crucial tools for efficient robot learning. We present a novel black-box policy search algorithm focused on data-efficient policy improvements. The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process. At the core of the algorithm, a probabilistic model learns the dependence of the policy parameters and the robot learning objective not only by performing experiments on the robot, but also by leveraging data from a simulator. This substantially reduces interaction time with the robot. Using this model, we can guarantee improvements with high probability for each policy update, thereby facilitating fast, goal-oriented learning. We evaluate our algorithm on simulated fine-tuning tasks and demonstrate the data-efficiency of the proposed dual-information source optimization algorithm. In a real robot learning experiment, we show fast and successful task learning on a robot manipulator with the aid of an imperfect simulator.
@article{he2025simulation, author = {He, Shiming and {von Rohr}, Alexander and Baumann, Dominik and Xiang, Ji and Trimpe, Sebastian}, journal = {IEEE Transactions on Robotics}, title = {Simulation-Aided Policy Tuning for Black-Box Robot Learning}, year = {2025}, volume = {}, number = {}, pages = {1-17}, doi = {10.1109/TRO.2025.3539192}, }

2024

TMLR
Discovering Model Structure of Dynamical Systems with Combinatorial Bayesian Optimization

Lucas Rath, Alexander von Rohr, Andreas Schultze, Sebastian Trimpe, and Burkhard Corves

Transactions on Machine Learning Research, 2024

Abs Bib HTML Code

Deciding on a model structure is a fundamental problem in machine learning. In this paper we consider the problem of building a data-based model for dynamical systems from a library of discrete components. In addition to optimizing performance, we consider crash and inequality constraints that arise from additional requirements, such as real-time capability and model complexity. We address this task of model structure selection with a focus on dynamical systems and propose to search over potential model structures efficiently using a constrained combinatorial Bayesian Optimization (BO) algorithm. We propose expressive surrogate models suited for combinatorial domains and an acquisition function that can handle inequality and crash constraints. We provide simulated benchmark problems within the domain of equation discovery of nonlinear dynamical systems. Our method outperforms the state-of-the-art in constrained combinatorial optimization of black-box functions and has a favorable computational overhead compared to other BO methods. As a real-world application example, we apply our method to optimize the configuration of an electric vehicle’s digital twin while ensuring its real-time capability for the use in one of the world’s largest driving simulators.
@article{rath2024discovering, title = {Discovering Model Structure of Dynamical Systems with Combinatorial Bayesian Optimization}, author = {Rath, Lucas and {von Rohr}, Alexander and Schultze, Andreas and Trimpe, Sebastian and Corves, Burkhard}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2024}, }
at
Local Bayesian Optimization for Controller Tuning with Crash Constraints

Alexander von Rohr, David Stenger, Dominik Scheurenberg, and Sebastian Trimpe

at - Automatisierungstechnik, 2024

Abs DOI Bib HTML

Controller tuning is crucial for closed-loop performance but often involves manual adjustments. Although Bayesian optimization (BO) has been established as a data-efficient method for automated tuning, applying it to large and high-dimensional search spaces remains challenging. We extend a recently proposed local variant of BO to include crash constraints, where the controller can only be successfully evaluated in an a-priori unknown feasible region. We demonstrate the efficiency of the proposed method through simulations and hardware experiments. Our findings showcase the potential of local BO to enhance controller performance and reduce the time and resources necessary for tuning.
@article{vonrohr2024local, title = {Local Bayesian Optimization for Controller Tuning with Crash Constraints}, author = {{von Rohr}, Alexander and Stenger, David and Scheurenberg, Dominik and Trimpe, Sebastian}, journal = {at - Automatisierungstechnik}, doi = {10.1515/auto-2023-0181}, year = {2024}, pages = {281--292}, volume = {72}, number = {4}, }
preprint
Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

Oliver Hausdörfer, Alexander von Rohr, Éric Lefort, and Angela P. Schoellig

arXiv, 2024

Abs arXiv Bib HTML Code

Deep Reinforcement Learning (DRL) in simulation often results in brittle and unrealistic learning outcomes. To push the agent towards more desirable solutions, prior information can be injected in the learning process through, for instance, reward shaping, expert data, or motion primitives. We propose an additional inductive bias for robot learning: latent actions learned from expert demonstration as priors in the action space. We show that these action priors can be learned from only a single open-loop gait cycle using a simple autoencoder. Using these latent action priors combined with established style rewards for imitation in DRL achieves above expert demonstration level of performance and leads to more desirable gaits. Further, action priors substantially improve the performance on transfer tasks, even leading to gait transitions for higher target speeds.
@article{hausdorfer2024latent, author = {Hausdörfer, Oliver and {von Rohr}, Alexander and Lefort, Éric and Schoellig, Angela P.}, title = {Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning}, journal = {arXiv}, year = {2024}, }
preprint

Diffusion Predictive Control with Constraints

Ralf Römer, Alexander von Rohr, and Angela P. Schoellig

arXiv, 2024

Abs arXiv

Diffusion models have recently gained popularity for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are inherently stochastic and typically trained offline, limiting their ability to handle unseen and dynamic conditions where novel constraints not represented in the training data must be satisfied. To overcome this limitation, we propose diffusion predictive control with constraints (DPCC), an algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. DPCC uses constraint tightening and incorporates model-based projections into the denoising process of a trained trajectory diffusion model. This allows us to generate constraint-satisfying, dynamically feasible, and goal-reaching trajectories for predictive control. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
EWRL
Viability of Future Actions: Robust Reinforcement Learning via Entropy Regularization

Pierre-François Massiani^*, Alexander von Rohr^*, Lukas Haverbeck, and Sebastian Trimpe

In Seventeenth European Workshop on Reinforcement Learning, 2024

Abs Bib HTML

Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under disturbances remains open. This paper reveals how robustness arises naturally by combining two common practices in unconstrained RL: entropy regularization and constraints penalization. Our results provide a method to learn robust policies, model-free and with standard popular algorithms. We begin by showing how entropy regularization biases the constrained RL problem towards maximizing the number of future viable actions, which is a form of robustness. Then, we relax the safety constraints via penalties to obtain an unconstrained RL problem, which we show approximates its constrained counterpart arbitrarily closely. We support our findings with illustrative examples and on popular RL benchmarks.
@inproceedings{massiani2024viability, author = {Massiani, Pierre-Fran{\c{c}}ois and {von Rohr}, Alexander and Haverbeck, Lukas and Trimpe, Sebastian}, title = {Viability of Future Actions: Robust Reinforcement Learning via Entropy Regularization}, booktitle = {Seventeenth European Workshop on Reinforcement Learning}, year = {2024}, }

2022

CDC22
On Controller Tuning with Time-Varying Bayesian Optimization

Paul Brunzema^*, Alexander von Rohr^*, and Sebastian Trimpe

In Proceedings of the IEEE Conference on Decision and Control, 2022

Abs DOI arXiv Bib HTML Code

Changing conditions or environments can cause system dynamics to vary over time. To ensure optimal control performance, controllers should adapt to these changes. When the underlying cause and time of change is unknown, we need to rely on online data for this adaptation. In this paper, we will use time-varying Bayesian optimization (TVBO) to tune controllers online in changing environments using appropriate prior knowledge on the control objective and its changes. Two properties are characteristic of many online controller tuning problems: First, they exhibit incremental and lasting changes in the objective due to changes to the system dynamics, e.g., through wear and tear. Second, the optimization problem is convex in the tuning parameters. Current TVBO methods do not explicitly account for these properties, resulting in poor tuning performance and many unstable controllers through over-exploration of the parameter space. We propose a novel TVBO forgetting strategy using Uncertainty-Injection (UI), which incorporates the assumption of incremental and lasting changes. The control objective is modeled as a spatio-temporal Gaussian process (GP) with UI through a Wiener process in the temporal domain. Further, we explicitly model the convexity assumptions in the spatial dimension through GP models with linear inequality constraints. In numerical experiments, we show that our model outperforms the state-of-the-art method in TVBO, exhibiting reduced regret and fewer unstable parameter configurations.
@inproceedings{brunzema2022controller, author = {Brunzema, Paul and {von Rohr}, Alexander and Trimpe, Sebastian}, title = {On Controller Tuning with Time-Varying Bayesian Optimization}, booktitle = {Proceedings of the IEEE Conference on Decision and Control}, pages = {4046-4052}, year = {2022}, doi = {10.1109/CDC51059.2022.9992649}, }
CDC22
Improving the Performance of Robust Control through Event-Triggered Learning

Alexander von Rohr, Friedrich Solowjow, and Sebastian Trimpe

In Proceedings of the IEEE Conference on Decision and Control, 2022

Abs DOI arXiv Bib HTML Code

Robust controllers ensure stability in feedback loops designed under uncertainty but at the cost of performance. Model uncertainty in time-invariant systems can be reduced by recently proposed learning-based methods, thus improving the performance of robust controllers using data. However, in practice, many systems also exhibit uncertainty in the form of changes over time, e.g., due to weight shifts or wear and tear, leading to decreased performance or instability of the learning-based controller. We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem with rare or slow changes. Our key idea is to switch between robust and learned controllers. For learning, we first approximate the optimal length of the learning phase via Monte-Carlo estimations using a probabilistic model. We then design a statistical test for uncertain systems based on the moment-generating function of the LQR cost. The test detects changes in the system under control and triggers re-learning when control performance deteriorates due to system changes. We demonstrate improved performance over a robust controller baseline in a numerical example.
@inproceedings{vonrohr2022improving, author = {{von Rohr}, Alexander and Solowjow, Friedrich and Trimpe, Sebastian}, title = {Improving the Performance of Robust Control through Event-Triggered Learning}, booktitle = {Proceedings of the IEEE Conference on Decision and Control}, pages = {3424-3430}, year = {2022}, doi = {10.1109/CDC51059.2022.9993350}, }

2021

3rd_L4DC
Probabilistic robust linear quadratic regulators with Gaussian processes

Alexander von Rohr, Matthias Neumann-Brosig, and Sebastian Trimpe

In Proceedings of the 3rd Conference on Learning for Dynamics and Control, 2021

Abs arXiv Bib HTML PDF Code

Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. While learning-based control has the potential to yield superior performance in demanding applications, robustness to uncertainty remains an important challenge. Since Bayesian methods quantify uncertainty of the learning results, it is natural to incorporate these uncertainties in a robust design. In contrast to most state-of-the-art approaches that consider worst-case estimates, we leverage the learning methods’ posterior distribution in the controller synthesis. The result is a more informed and thus efficient trade-off between performance and robustness. We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin. The formulation is based on a recently proposed algorithm for linear quadratic control synthesis, which we extend by giving probabilistic robustness guarantees in the form of credibility bounds for the system’s stability. Comparisons to existing methods based on worst-case and certainty-equivalence designs reveal superior performance and robustness properties of the proposed method.
@inproceedings{vonrohr2021probabilistic, author = {{von Rohr}, Alexander and Neumann-Brosig, Matthias and Trimpe, Sebastian}, title = {Probabilistic robust linear quadratic regulators with Gaussian processes}, booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control}, pages = {324--335}, year = {2021}, editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.}, volume = {144}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, }
NeurIPS_2021
Local policy search with Bayesian optimization

Sarah Müller^*, Alexander von Rohr^*, and Sebastian Trimpe

In Advances in Neural Information Processing Systems, 2021

Abs arXiv Bib HTML PDF Code

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.
@inproceedings{mueller2021local, author = {M\"{u}ller, Sarah and {von Rohr}, Alexander and Trimpe, Sebastian}, title = {Local policy search with Bayesian optimization}, booktitle = {Advances in Neural Information Processing Systems}, pages = {20708--20720}, year = {2021}, editor = {Ranzato, M. and Beygelzimer, A. and Dauphin, Y. and Liang, P.S. and Vaughan, J. Wortman}, publisher = {Curran Associates, Inc.}, volume = {34}, }

2020

CoRL_2019
A Learnable Safety Measure

Steve Heim^*, Alexander von Rohr^*, Sebastian Trimpe, and Alexander Badri-Spröwitz

In Proceedings of the Conference on Robot Learning, 2020

Abs arXiv Bib HTML PDF

Failures are challenging for learning to control physical systems since they risk damage, time-consuming resets, and often provide little gradient information. Adding safety constraints to exploration typically requires a lot of prior knowledge and domain expertise. We present a safety measure which implicitly captures how the system dynamics relate to a set of failure states. Not only can this measure be used as a safety function, but also to directly compute the set of safe state-action pairs. Further, we show a model-free approach to learn this measure by active sampling using Gaussian processes. While safety can only be guaranteed after learning the safety measure, we show that failures can already be greatly reduced by using the estimated measure during learning.
@inproceedings{heim2020learnable, author = {Heim, Steve and {von Rohr}, Alexander and Trimpe, Sebastian and Badri-Spr\"{o}witz, Alexander}, title = {A Learnable Safety Measure}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {627--639}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, }
preprint

Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures

Alonso Marco, Alexander von Rohr, Dominik Baumann, José Miguel Hernández-Lobato, and Sebastian Trimpe

arXiv, 2020

Abs arXiv

When learning to ride a bike, a child falls down a number of times before achieving the first success. As falling down usually has only mild consequences, it can be seen as a tolerable failure in exchange for a faster learning process, as it provides rich information about an undesired behavior. In the context of Bayesian optimization under unknown constraints (BOC), typical strategies for safe learning explore conservatively and avoid failures by all means. On the other side of the spectrum, non conservative BOC algorithms that allow failing may fail an unbounded number of times before reaching the optimum. In this work, we propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Empirical validation shows that our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods. In addition, we propose an original algorithm for unconstrained Bayesian optimization inspired by the notion of excursion sets in stochastic processes, upon which the failures-aware algorithm is built.

2018

IROS_2018
Gait Learning for Soft Microrobots Controlled by Light Fields

Alexander von Rohr, Sebastian Trimpe, Alonso Marco, Peer Fischer, and Stefano Palagi

In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Abs DOI arXiv Bib HTML

Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing conditions. Albeit, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semi-synthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot’s locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on light-controlled soft microrobots and probabilistic learning control.
@inproceedings{vonrohr2018gait, author = {{von Rohr}, Alexander and Trimpe, Sebastian and Marco, Alonso and Fischer, Peer and Palagi, Stefano}, title = {Gait Learning for Soft Microrobots Controlled by Light Fields}, booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems}, pages = {6199-6206}, year = {2018}, doi = {10.1109/IROS.2018.8594092}, }