Abstract
Optimal sub-sampling of large datasets from fluid dynamics simulations is essential for training reduced-order machine learned models. A method using Shannon entropy was developed to weight flow features according to their level of information content, such that the most informative features can be extracted and used for training a surrogate model. The method is demonstrated in the canonical flow over a cylinder problem simulated with OpenFOAM. Both time-independent predictions and temporal forecasting were investigated as well as two types of prediction targets: local per-grid-point predictions and global per-time-step predictions. When tested on training a surrogate model, results indicate that our entropy-based sampling method typically outperforms random sampling and yields more reproducible results in less iterations. Finally, the method was used to train a surrogate model for modeling turbulence in magnetohydrodynamic flows, which revealed various challenges and opportunities for future research.