Finally, the designed maneuver decision method is verified through the ablation study and confrontment tests. The training process is composed of three phases to shorten the training time. In particular, the reward function is divided into dense reward, event reward and end-game reward to ensure the training feasibility. In addition, the action space with 15 basic actions and well-designed reward function are proposed to combine the air combat environment and PPO. The actor network's input is the observation of UAV, however, the input of the critic network, named state, includes the blood values which cannot be observed directly. The gate recurrent unit (GRU) can help PPO make decisions with continuous timestep data. Secondly, some improved points based on proximal policy optimization (PPO) are proposed to enhance the maneuver decision-making ability. On this basic, the combat process is established. Firstly, the combat environment, including UAV motion model and the position and velocity relationships, is described. In this paper, a type of short-range air combat maneuver decision method based on deep reinforcement learning is proposed. The short-range air combat situation is rapidly changing, and the UAV has to make the autonomous maneuver decision as quickly as possible. The unmanned aerial vehicle (UAV) has been applied in unmanned air combat because of its flexibility and practicality. Simulation results show that the proposed maneuver decision model and training method can help the UAV achieve autonomous decision in the air combats and obtain an effective decision policy to defeat the opponent. Finally, one-to-one short-range air combats are simulated under different target maneuver policies. Then, a phased training method, called "basic-confrontation", which is based on the idea that human beings gradually learn from simple to complex is proposed to help reduce the training time while getting suboptimal but efficient results. However, such model includes a high dimensional state and action space which requires huge computation load for DQN training using traditional methods. In this paper, an autonomous maneuver decision model is proposed for the UAV short-range air combat based on reinforcement learning, which mainly includes the aircraft motion model, one-to-one short-range air combat evaluation model and the maneuver decision model based on deep Q network (DQN). A bottleneck that constrains the capability of UAVs against manned vehicles is the autonomous maneuver decision, which is a very challenging problem in the short-range air combat undergoing highly dynamic and uncertain maneuvers of enemies. With the development of artificial intelligence and integrated sensor technologies, unmanned aerial vehicles (UAVs) are more and more applied in the air combats.
0 Comments
Leave a Reply. |