Skip to main content
SHARE
Publication

Operation Optimization Using Reinforcement Learning with Integrated Artificial Reasoning Framework...

by Junyung Kim, Daniel Mikkelson, Xinyan Wang, Xingang Zhao, Hyun Gook Kang
Publication Type
Conference Paper
Book Title
Proceedings of the 13th International Topical Meeting on Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies (NPIC&HMIT 2023)
Publication Date
Page Numbers
1678 to 1687
Publisher Location
Illinois, United States of America
Conference Name
13th International Topical Meeting on Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies (NPIC&HMIT 2023)
Conference Location
Knoxville, Tennessee, United States of America
Conference Sponsor
American Nuclear Society
Conference Date
-

In large and complex systems, operational decision-making requires a systematic analysis with a vast amount of data from both process parameters and component status monitoring. In this paper, we present an integrated artificial reasoning approach for system state transition models that can help operational decision-making with explainable and traceable reasoning. The integrated artificial reasoning framework is a physics-based approach of defining the system structure in a Bayesian network, so we leveraged it in a Markov decision process (MDP) for finding optimal operational solutions. In our proposed framework, the MDP is implemented on a dynamic Bayesian network (DBN), which represents causalities in a system. The multilevel flow modeling was utilized in order to extract these causalities in a more efficient and objective manner. Since multilevel flow modeling is based on the fundamental energy and mass conservation laws, the target system is decomposed into several mass, energy, and information structures, which serve as the basis for a DBN. The MDP consists of the processes of finding a solution for the Bellman equation, which can be derived from the conditional probability equations of the constructed DBN. System operators can capture stochastic system dynamics as multiple subsystem state transitions based on their physical relations and uncertainties coming from the component degradation process or random failures. We analyzed a simplified example system to illustrate finding an optimal operational policy with this approach.