RL-based space manipulator trajectory planning for efficient eddy current debris detumbling

The space debris crisis has emerged as a significant challenge in recent years. To mitigate this crisis, research in Active Debris Removal (ADR) has intensified, with capture methods gaining prominence. However, capturing space debris is complex; these objects often tumble at rates between 3 and 30 degrees per second, making their collection using space manipulators risky due to the potential for collisions, which could generate additional fragments. One way to reduce this risk is through prior detumbling of the debris. Several detumbling methods have been proposed, among which eddy current-based strategies have shown considerable promise. This method offers a crucial advantage by eliminating the need for direct contact with debris, thereby reducing the risk of collision. Previous research has demonstrated the feasibility of such methods, typically involving a chaser spacecraft using an electromagnet oriented in the along-track direction during rendezvous maneuvers. While this design is relatively straightforward, studies have shown that varying the magnetic field direction can significantly shorten detumbling times. This research proposes a novel approach employing robotic arms equipped with an electromagnetic end-effector. This electromagnetic end-effector can later be replaced with a grasping tool once the debris's angular velocity is reduced to a safe level. This strategy allows for optimizing the induction of eddy currents by adjusting the magnetic field direction. Maximizing the Eddy Current Torque (ECT) requires maintaining the perpendicularity between the relative angular velocity (RAV) vector and the magnetic field, a trajectory that is rarely within the manipulator's workspace. Therefore, the optimal feasible compromise must be determined. In this context, a Deep Reinforcement Learning (Deep RL) approach, specifically using the Deep Deterministic Policy Gradient (DDPG) algorithm, is explored to determine the optimal trajectory for the end-effector. The results demonstrate that the agent can learn a policy that reduces detumbling times by 71.73% compared to previous methods. Additionally, the agent exhibits strong robustness against stochastic uncertainties in sensor measurements of the RAV, further validating the effectiveness of this approach.

Per mitigare la crisi dei detriti spaziali, la ricerca sulla Rimozione Attiva dei Detriti (ADR) si è intensificata, con i metodi di cattura che stanno guadagnando importanza. Tuttavia, la cattura dei detriti spaziali è complessa; questi oggetti spesso ruotano a velocità comprese tra 3 e 30 gradi al secondo, rendendo la loro raccolta tramite manipolatori rischiosa a causa del potenziale pericolo di collisioni, che potrebbero generare ulteriori frammenti. Un modo per ridurre questo rischio è il detumbling preventivo dei detriti. Sono stati proposti diversi metodi di detumbling, tra i quali le strategie basate sulle correnti parassite (eddy currents) hanno dimostrato un notevole potenziale. Questo metodo offre un vantaggio cruciale eliminando la necessità di contatto diretto con i detriti, riducendo così il rischio di collisioni. Ricerche precedenti hanno dimostrato la fattibilità di tali metodi, che generalmente coinvolgono un satellite inseguitore (Chaser) che utilizza un elettromagnete orientato lungo la direzione del moto orbitale (along-track) durante le manovre di rendezvous. Sebbene questo design sia relativamente semplice, studi hanno dimostrato che variando la direzione del campo magnetico si possono ridurre significativamente i tempi di detumbling. Questa ricerca propone un approccio innovativo che utilizza un braccio robotico equipaggiato con un end-effector elettromagnetico. Questa strategia consente di ottimizzare l'induzione delle correnti parassite regolando la direzione del campo magnetico. Massimizzare la Coppia da Correnti Parassite (ECT) richiede il mantenimento della perpendicolarità tra il vettore della velocità angolare relativa (RAV) e il campo magnetico, una traiettoria che raramente rientra nello spazio di lavoro del manipolatore. Pertanto, deve essere determinato un compromesso ottimale fattibile. In questo contesto, si esplora un approccio di Deep Reinforcement Learning (Deep RL), utilizzando specificamente l'algoritmo Deep Deterministic Policy Gradient (DDPG), per determinare la traiettoria ottimale per l'end-effector. I risultati dimostrano che l'agente può apprendere una policy che riduce i tempi di detumbling del 71,73% rispetto a metodi precedenti. Inoltre, l'agente mostra una forte robustezza contro le incertezze stocastiche nelle misurazioni dei sensori della RAV, confermando ulteriormente l'efficacia dell'approccio.