陈思畏, 李建军, 邹信迅, 罗旭, 崔希. 基于孪生延迟DDPG强化学习的电−热耦合系统低碳经济调度[J]. 现代电力. DOI: 10.19725/j.cnki.1007-2322.2023.0058
引用本文: 陈思畏, 李建军, 邹信迅, 罗旭, 崔希. 基于孪生延迟DDPG强化学习的电−热耦合系统低碳经济调度[J]. 现代电力. DOI: 10.19725/j.cnki.1007-2322.2023.0058
CHEN Siwei, LI Jianjun, ZOU Xinxun, LUO Xu, CUI Xi. Low-carbon Economic Dispatch of Electric-thermal Coupling System Based on Twin Delayed DDPG Reinforcement Learning[J]. Modern Electric Power. DOI: 10.19725/j.cnki.1007-2322.2023.0058
Citation: CHEN Siwei, LI Jianjun, ZOU Xinxun, LUO Xu, CUI Xi. Low-carbon Economic Dispatch of Electric-thermal Coupling System Based on Twin Delayed DDPG Reinforcement Learning[J]. Modern Electric Power. DOI: 10.19725/j.cnki.1007-2322.2023.0058

基于孪生延迟DDPG强化学习的电−热耦合系统低碳经济调度

Low-carbon Economic Dispatch of Electric-thermal Coupling System Based on Twin Delayed DDPG Reinforcement Learning

  • 摘要: 对含可再生能源接入的电–热耦合系统,提出一种用于电–热耦合系统低碳经济调度的强化学习方法。首先,建立计及经济性和碳排放的电–热耦合系统低碳经济调度模型;然后,将含可再生能源的电–热耦合系统低碳经济调度过程转化为马尔可夫决策过程(Markov decision process,MDP),以经济性和碳排放最小为目标,结合惩罚约束机制,设计多目标奖励函数,并基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)的改进算法,采用孪生延迟DDPG算法对强化学习智能体进行交互学习训练。最后,算例分析结果表明,所提方法训练的智能体能够实时响应可再生能源和电/热负荷的不确定性,在线对含可再生能源的电–热耦合系统低碳经济调度优化。

     

    Abstract: For the electric-thermal coupling system with renewable energy access, a reinforcement learning method is proposed for low-carbon economic dispatch of electric-thermal coupling systems. Firstly, a low-carbon economic dispatch model of the electric-thermal coupling system is established with both the economy and carbon emissions taken into account. The low-carbon economic dispatch process of the electric-thermal coupling system containing renewable energy is subsequently transformed into a Markov decision process (MDP). With the aim of minimizing both the economy and carbon emissions, a multi-objective reward function is designed by combining the penalty constraint mechanism. Additionally, based on the improved algorithm of deep deterministic policy gradient (DDPG), a twin delayed DDPG algorithm is utilized to train reinforcement learning agents interactively. Finally, the numerical result demonstrates that the agent trained by the proposed method can respond to the uncertainty of renewable energy and electric/thermal load in real time, enabling the optimization of the low-carbon economic scheduling for the electric-thermal coupling system containing renewable energy online.

     

/

返回文章
返回