2021

M. K. Ballard, R. Amici, V. Shankar, L. A. Ferguson, M. Braginsky, R. M. Kirby.
**“Towards an Extrinsic, CG-XFEM Approach Based on Hierarchical Enrichments for Modeling Progressive Fracture,”** Subtitled **“arXiv preprint arXiv:2104.14704,”** 2021.

We propose an extrinsic, continuous-Galerkin (CG), extended finite element method (XFEM) that generalizes the work of Hansbo and Hansbo to allow multiple Heaviside enrichments within a single element in a hierarchical manner. This approach enables complex, evolving XFEM surfaces in 3D that cannot be captured using existing CG-XFEM approaches. We describe an implementation of the method for 3D static elasticity with linearized strain for modeling open cracks as a salient step towards modeling progressive fracture. The implementation includes a description of the finite element model, hybrid implicit/explicit representation of enrichments, numerical integration method, and novel degree-of-freedom (DoF) enumeration algorithm. This algorithm supports an arbitrary number of enrichments within an element, while simultaneously maintaining a CG solution across elements. Additionally, our approach easily allows an implementation suitable for distributed computing systems. Enabled by the DoF enumeration algorithm, the proposed method lays the groundwork for a computational tool that efficiently models progressive fracture. To facilitate a discussion of the complex enrichment hierarchies, we develop enrichment diagrams to succinctly describe and visualize the relationships between the enrichments (and the fields they create) within an element. This also provides a unified language for discussing extrinsic XFEM methods in the literature. We compare several methods, relying on the enrichment diagrams to highlight their nuanced differences.

M. Carlson, X. Zheng, H. Sundar, G. E. Karniadakis, R. M. Kirby.
**“An open-source parallel code for computing the spectral fractional Laplacian on 3D complex geometry domains,”** In *Computer Physics Communications*, Vol. 261, North-Holland, pp. 107695. 2021.

We present a spectral element algorithm and open-source code for computing the fractional Laplacian defined by the eigenfunction expansion on finite 2D/3D complex domains with both homogeneous and nonhomogeneous boundaries. We demonstrate the scalability of the spectral element algorithm on large clusters by constructing the fractional Laplacian based on computed eigenvalues and eigenfunctions using up to thousands of CPUs. To demonstrate the accuracy of this eigen-based approach for computing the factional Laplacian, we approximate the solutions of the fractional diffusion equation using the computed eigenvalues and eigenfunctions on a 2D quadrilateral, and on a 3D cubic and cylindrical domain, and compare the results with the contrived solutions to demonstrate fast convergence. Subsequently, we present simulation results for a fractional diffusion equation on a hand-shaped domain discretized with 3D hexahedra, as well as on a domain constructed from the Hanford site geometry corresponding to nonzero Dirichlet boundary conditions. Finally, we apply the algorithm to solve the surface quasi-geostrophic (SQG) equation on a 2D square with periodic boundaries. Simulation results demonstrate the accuracy, efficiency, and geometric flexibility of our algorithm and that our algorithm can capture the subtle dynamics of anomalous diffusion modeled by the fractional Laplacian on complex geometry domains. The included open-source code is the first of its kind.

J. Chilleri, Y. He, D. Bedrov, R. M. Kirby.
**“Optimal allocation of computational resources based on Gaussian process: Application to molecular dynamics simulations,”** In *Computational Materials Science*, Vol. 188, Elsevier, pp. 110178. 2021.

Simulation models have been utilized in a wide range of real-world applications for behavior predictions of complex physical systems or material designs of large structures. While extensive simulation is mathematically preferable, external limitations such as available resources are often necessary considerations. With a fixed computational resource (i.e., total simulation time), we propose a Gaussian process-based numerical optimization framework for optimal time allocation over simulations at different locations, so that a surrogate model with uncertainty estimation can be constructed to approximate the full simulation. The proposed framework is demonstrated first via two synthetic problems, and later using a real test case of a glass-forming system with divergent dynamic relaxations where a Gaussian process is constructed to estimate the diffusivity and its uncertainty with respect to the temperature.

V. Keshavarzzadeh, M. Alirezaei, T. Tasdizen, R. M. Kirby.
**“Image-Based Multiresolution Topology Optimization Using Deep Disjunctive Normal Shape Model,”** In *Computer-Aided Design*, Vol. 130, Elsevier, pp. 102947. 2021.

We present a machine learning framework for predicting the optimized structural topology design susing multiresolution data. Our approach primarily uses optimized designs from inexpensive coarse mesh finite element simulations for model training and generates high resolution images associated with simulation parameters that are not previously used. Our cost-efficient approach enables the designers to effectively search through possible candidate designs in situations where the design requirements rapidly change. The underlying neural network framework is based on a deep disjunctive normal shape model (DDNSM) which learns the mapping between the simulation parameters and segments of multi resolution images. Using this image-based analysis we provide a practical algorithm which enhances the predictability of the learning machine by determining a limited number of important parametric samples(i.e.samples of the simulation parameters)on which the high resolution training data is generated. We demonstrate our approach on benchmark compliance minimization problems including the 3D topology optimization where we show that the high-fidelity designs from the learning machine are close to optimal designs and can be used as effective initial guesses for the large-scale optimization problem.

V. Keshavarzzadeh, R. M. Kirby, A. Narayan.
**“Multilevel Designed Quadrature for Partial Differential Equations with Random Inputs,”** In *SIAM Journal on Scientific Computing*, Vol. 43, No. 2, Society for Industrial and Applied Mathematics, pp. A1412-A1440. 2021.

We introduce a numerical method, multilevel designed quadrature for computing the statistical solution of partial differential equations with random input data. Similar to multilevel Monte Carlo methods, our method relies on hierarchical spatial approximations in addition to a parametric/stochastic sampling strategy. A key ingredient in multilevel methods is the relationship between the spatial accuracy at each level and the number of stochastic samples required to achieve that accuracy. Our sampling is based on flexible quadrature points that are designed for a prescribed accuracy, which can yield less overall computational cost compared to alternative multilevel methods. We propose a constrained optimization problem that determines the number of samples to balance the approximation error with the computational budget. We further show that the optimization problem is convex and derive analytic formulas for the optimal number of points at each level. We validate the theoretical estimates and the performance of our multilevel method via numerical examples on a linear elasticity and a steady state heat diffusion problem.

V. Keshavarzzadeh, R. M. Kirby, A. Narayan.
**“Robust topology optimization with low rank approximation using artificial neural networks,”** In *Computational Mechanics*, 2021.

DOI: 10.1007/s00466-021-02069-3

We present a low rank approximation approach for topology optimization of parametrized linear elastic structures. The parametrization is considered on loading and stiffness of the structure. The low rank approximation is achieved by identifying a parametric connection among coarse finite element models of the structure (associated with different design iterates) and is used to inform the high fidelity finite element analysis. We build an Artificial Neural Network (ANN) map between low resolution design iterates and their corresponding interpolative coefficients (obtained from low rank approximations) and use this surrogate to perform high resolution parametric topology optimization. We demonstrate our approach on robust topology optimization with compliance constraints/objective functions and develop error bounds for the the parametric compliance computations. We verify these parametric computations with more challenging quantities of interest such as the p-norm of von Mises stress. To conclude, we use our approach on a 3D robust topology optimization and show significant reduction in computational cost via quantitative measures.

A.S. Krishnapriyan, A. Gholami, S. Zhe, R.M. Kirby, M.W. Mahoney.
**“Characterizing possible failure modes in physics-informed neural networks,”** Subtitled **“arXiv preprint arXiv:2109.01050,”** 2021.

Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena even for simple PDEs. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves differential operators, can introduce a number of subtle problems, including making the problem ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture, but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization, and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task, rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training.

E. Laughton, V. Zala, A. Narayan, R. M. Kirby, D. Moxey.
**“Fast Barycentric-Based Evaluation Over Spectral/hp Elements,”** Subtitled **“arXiv preprint arXiv:2103.03594,”** 2021.

As the use of spectral/*hp* element methods, and high-order finite element methods in general, continues to spread, community efforts to create efficient, optimized algorithms associated with fundamental high-order operations have grown. Core tasks such as solution expansion evaluation at quadrature points, stiffness and mass matrix generation, and matrix assembly have received tremendousattention. With the expansion of the types of problems to which high-order methods are applied, and correspondingly the growth in types of numerical tasks accomplished through high-order methods, the number and types of these core operations broaden. This work focuses on solution expansion evaluation at arbitrary points within an element. This operation is core to many postprocessing applications such as evaluation of streamlines and pathlines, as well as to field projection techniques such as mortaring. We expand barycentric interpolation techniques developed on an interval to 2D (triangles and quadrilaterals) and 3D (tetrahedra, prisms, pyramids, and hexahedra) spectral/*hp* element methods. We provide efficient algorithms for their implementations, and demonstrate their effectiveness using the spectral/*hp* element library Nektar++.

M. Penwarden, S. Zhe, A. Narayan, R. M. Kirby.
**“Multifidelity Modeling for Physics-Informed Neural Networks (PINNs),”** Subtitled **“arXiv preprint arXiv:2106.13361,”** 2021.

Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates for these types of approaches due to the significant difference in training times required when different fidelities (expressed in terms of architecture width and depth as well as optimization criteria) are employed. In this paper, we propose a particular multifidelity approach applied to PINNs that exploits low-rank structure. We demonstrate that width, depth, and optimization criteria can be used as parameters related to model fidelity, and show numerical justification of cost differences in training due to fidelity parameter choices. We test our multifidelity scheme on various canonical forward PDE models that have been presented in the emerging PINNs literature.

M. Rasouli, R. M. Kirby, H. Sundar.
**“A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication,”** In *The International Conference on High Performance Computing in Asia-Pacific Region*, pp. 110-119. 2021.

Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The performance of these tuned implementations on unsupported types can be poor, and this is critical in cases where the structure of the computations is associated with varying degrees of sparsity. One such example is Algebraic Multigrid (AMG), a popular solver and preconditioner for large sparse linear systems. In this work, we present a new divide and conquer sparse GEMM, that is also highly performant and scalable when the matrix becomes dense, as in the case of AMG matrix hierarchies. In addition, we implement a lossless data compression method to reduce the communication cost. We combine this with an efficient communication pattern during distributed-memory GEMM to provide 2.24 times (on average) better performance than the state-of-the-art library PETSc. Additionally, we show that the performance and scalability of our method surpass PETSc even more when the density of the matrix increases. We demonstrate the efficacy of our methods by comparing our GEMM with PETSc on a wide range of matrices.

N. Truong, C. Yuksel, C. Watcharopas, J. A. Levine, R. M. Kirby.
**“Particle Merging-and-Splitting,”** In *IEEE Transactions on Visualization and Computer Graphics*, IEEE, 2021.

Robustly handling collisions between individual particles in a large particle-based simulation has been a challenging problem. We introduce particle merging-and-splitting, a simple scheme for robustly handling collisions between particles that prevents inter-penetrations of separate objects without introducing numerical instabilities. This scheme merges colliding particles at the beginning of the time-step and then splits them at the end of the time-step. Thus, collisions last for the duration of a time-step, allowing neighboring particles of the colliding particles to influence each other. We show that our merging-and-splitting method is effective in robustly handling collisions and avoiding penetrations in particle-based simulations. We also show how our merging-and-splitting approach can be used for coupling different simulation systems using different and otherwise incompatible integrators. We present simulation tests …

W. W. Xing, A. A. Shah, P. Wang, S. Zhe, Q. Fu, R. M. Kirby.
**“Residual Gaussian process: A tractable nonparametric Bayesian emulator for multi-fidelity simulations,”** In *Applied Mathematical Modelling*, Vol. 97, Elsevier, pp. 36-56. 2021.

Challenges in multi-fidelity modelling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. The resulting model is equipped with a closed-form solution for the predictive posterior, making it applicable to advanced, high-dimensional tasks that require uncertainty estimation. Its advantages are demonstrated on univariate benchmarks and on three challenging multivariate problems. It is shown how active learning can be used to enhance the model, especially with a limited computational budget. Furthermore, error bounds are derived for the mean prediction in the univariate case.

W. W. Xing, R. M. Kirby, S. Zhe.
**“Deep coregionalization for the emulation of simulation-based spatial-temporal fields,”** In *Journal of Computational Physics*, Academic Press, pp. 109984. 2021.

Data-driven surrogate models are widely used for applications such as design optimization and uncertainty quantification, where repeated evaluations of an expensive simulator are required. For most partial differential equation (PDE) simulations, the outputs of interest are often spatial or spatial-temporal fields, leading to very high-dimensional outputs. Despite the success of existing data-driven surrogates for high-dimensional outputs, most methods require a significant number of samples to cover the response surface in order to achieve a reasonable degree of accuracy. This demand makes the idea of surrogate models less attractive considering the high-computational cost to generate the data. To address this issue, we exploit the multifidelity nature of a PDE simulation and introduce deep coregionalization, a Bayesian nonparametric autoregressive framework for efficient emulation of spatial-temporal fields. To effectively extract the output correlations in the context of multifidelity data, we develop a novel dimension reduction technique, residual principal component analysis. Our model can simultaneously capture the rich output correlations and the fidelity correlations and make high-fidelity predictions with only a small number of expensive, high-fidelity simulation samples. We show the advantages of our model in three canonical PDE models and a fluid dynamics problem. The results show that the proposed method can not only approximate simulation results with significantly less cost (by bout 10%-25%) but also further improve model accuracy.

Y. Xu, V. Keshavarzzadeh, R. M. Kirby, A. Narayan.
**“A bandit-learning approach to multifidelity approximation,”** Subtitled **“arXiv preprint arXiv:2103.15342,”** 2021.

Multifidelity approximation is an important technique in scientific computation and simulation. In this paper, we introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates of the parameters of interest. Under a linear model assumption, we formulate a multifidelity approximation as a modified stochastic bandit, and analyze the loss for a class of policies that uniformly explore each model before exploiting. Utilizing the estimated conditional mean-squared error, we propose a consistent algorithm, adaptive Explore-Then-Commit (AETC), and establish a corresponding trajectory-wise optimality result. These results are then extended to the case of vector-valued responses, where we demonstrate that the algorithm is efficient without the need to worry about estimating high-dimensional parameters. The main advantage of our approach is that we require neither hierarchical model structure nor\textit a priori knowledge of statistical information (eg, correlations) about or between models. Instead, the AETC algorithm requires only knowledge of which model is a trusted high-fidelity model, along with (relative) computational cost estimates of querying each model. Numerical experiments are provided at the end to support our theoretical findings.

V. Zala, R. M. Kirby, A. Narayan.
**“Structure-preserving Nonlinear Filtering for Continuous and Discontinuous Galerkin Spectral/hp Element Methods,”** Subtitled **“arXiv preprint arXiv:2106.08316,”** 2021.

Finite element simulations have been used to solve various partial differential equations (PDEs) that model physical, chemical, and biological phenomena. The resulting discretized solutions to PDEs often do not satisfy requisite physical properties, such as positivity or monotonicity. Such invalid solutions pose both modeling challenges, since the physical interpretation of simulation results is not possible, and computational challenges, since such properties may be required to advance the scheme. We, therefore, consider the problem of computing solutions that preserve these structural solution properties, which we enforce as additional constraints on the solution. We consider in particular the class of convex constraints, which includes positivity and monotonicity. By embedding such constraints as a postprocessing convex optimization procedure, we can compute solutions that satisfy general types of convex constraints. For certain types of constraints (including positivity and monotonicity), the optimization is a filter, i.e., a norm-decreasing operation. We provide a variety of tests on one-dimensional time-dependent PDEs that demonstrate the method's efficacy, and we empirically show that rates of convergence are unaffected by the inclusion of the constraints.

2020

T. A. J. Ouermi, R. M. Kirby, M. Berzins.
**“Numerical Testing of a New Positivity-Preserving Interpolation Algorithm,”** Subtitled **“arXiv,”** 2020.

An important component of a number of computational modeling algorithms is an interpolation method that preserves the positivity of the function being interpolated. This report describes the numerical testing of a new positivity-preserving algorithm that is designed to be used when interpolating from a solution defined on one grid to different spatial grid. The motivating application is a numerical weather prediction (NWP) code that uses spectral elements as the discretization choice for its dynamics core and Cartesian product meshes for the evaluation of its physics routines. This combination of spectral elements, which use nonuniformly spaced quadrature/collocation points, and uniformly-spaced Cartesian meshes combined with the desire to maintain positivity when moving between these necessitates our work. This new approach is evaluated against several typical algorithms in use on a range of test problems in one or more space dimensions. The results obtained show that the new method is competitive in terms of observed accuracy while at the same time preserving the underlying positivity of the functions being interpolated.

2018

Y. He, M. Razi, C. Forestiere, L. Dal Negro, R.M. Kirby.
**“Uncertainty quantification guided robust design for nanoparticles' morphology,”** In *Computer Methods in Applied Mechanics and Engineering*, Elsevier BV, pp. 578--593. July, 2018.

DOI: 10.1016/j.cma.2018.03.027

The automatic inverse design of three-dimensional plasmonic nanoparticles enables scientists and engineers to explore a wide design space and to maximize a device's performance. However, due to the large uncertainty in the nanofabrication process, we may not be able to obtain a deterministic value of the objective, and the objective may vary dramatically with respect to a small variation in uncertain parameters. Therefore, we take into account the uncertainty in simulations and adopt a classical robust design model for a robust design. In addition, we propose an efficient numerical procedure for the robust design to reduce the computational cost of the process caused by the consideration of the uncertainty. Specifically, we use a global sensitivity analysis method to identify the important random variables and consider the non-important ones as deterministic, and consequently reduce the dimension of the stochastic space. In addition, we apply the generalized polynomial chaos expansion method for constructing computationally cheaper surrogate models to approximate and replace the full simulations. This efficient robust design procedure is performed by varying the particles' material among the most commonly used plasmonic materials such as gold, silver, and aluminum, to obtain different robust optimal shapes for the best enhancement of electric fields.

A. Jallepalli, J. Docampo-Sánchez, J.K. Ryan, R. Haimes, R.M. Kirby.
**“On the treatment of field quantities and elemental continuity in fem solutions,”** In *IEEE Transactions on Visualization and Computer Graphics*, Vol. 24, No. 1, IEEE, pp. 903--912. Jan, 2018.

DOI: 10.1109/tvcg.2017.2744058

As the finite element method (FEM) and the finite volume method (FVM), both traditional and high-order variants, continue their proliferation into various applied engineering disciplines, it is important that the visualization techniques and corresponding data analysis tools that act on the results produced by these methods faithfully represent the underlying data. To state this in another way: the interpretation of data generated by simulation needs to be consistent with the numerical schemes that underpin the specific solver technology. As the verifiable visualization literature has demonstrated: visual artifacts produced by the introduction of either explicit or implicit data transformations, such as data resampling, can sometimes distort or even obfuscate key scientific features in the data. In this paper, we focus on the handling of elemental continuity, which is often only C0 continuous or piecewise discontinuous, when visualizing primary or derived fields from FEM or FVM simulations. We demonstrate that traditional data handling and visualization of these fields introduce visual errors. In addition, we show how the use of the recently proposed line-SIAC filter provides a way of handling elemental continuity issues in an accuracy-conserving manner with the added benefit of casting the data in a smooth context even if the representation is element discontinuous.

V. Keshavarzzadeh, R.M. Kirby, A. Narayan.
**“Numerical integration in multiple dimensions with designed quadrature,”** In *CoRR*, 2018.

We present a systematic computational framework for generating positive quadrature rules in multiple dimensions on general geometries. A direct moment-matching formulation that enforces exact integration on polynomial subspaces yields nonlinear conditions and geometric constraints on nodes and weights. We use penalty methods to address the geometric constraints, and subsequently solve a quadratic minimization problem via the Gauss-Newton method. Our analysis provides guidance on requisite sizes of quadrature rules for a given polynomial subspace, and furnishes useful user-end stability bounds on error in the quadrature rule in the case when the polynomial moment conditions are violated by a small amount due to, e.g., finite precision limitations or stagnation of the optimization procedure. We present several numerical examples investigating optimal low-degree quadrature rules, Lebesgue constants, and 100-dimensional quadrature. Our capstone examples compare our quadrature approach to popular alternatives, such as sparse grids and quasi-Monte Carlo methods, for problems in linear elasticity and topology optimization.

T.A.J, Ouermi, R. M. Kirby,, M. Berzins.
**“Performance Optimization Strategies for WRF Physics Schemes Used in Weather Modeling,”** In *International Journal of Networking and Computing*, Vol. 8, No. 2, IJNC , pp. 301--327. 2018.

DOI: 10.15803/ijnc.8.2_301

Performance optimization in the petascale era and beyond in the exascale era has and will require modifications of legacy codes to take advantage of new architectures with large core counts and SIMD units. The Numerical Weather Prediction (NWP) physics codes considered here are optimized using thread-local structures of arrays (SOA). High-level and low-level optimization strategies are applied to the WRF Single-Moment 6-Class Microphysics Scheme (WSM6) and Global Forecast System (GFS) physics codes used in the NEPTUNE forecast code. By building on previous work optimizing WSM6 on the Intel Knights Landing (KNL), it is shown how to further optimize WMS6 and GFS physics, and GFS radiation on Intel KNL, Haswell, and potentially on future micro-architectures with many cores and SIMD vector units. The optimization techniques used herein employ thread-local structures of arrays (SOA), an OpenMP directive, OMP SIMD, and minor code transformations to enable better utilization of SIMD units, increase parallelism, improve locality, and reduce memory traffic. The optimized versions of WSM6, GFS physics, GFS radiation run 70, 27, and 23 faster (respectively) on KNL and 26, 18 and 30 faster (respectively) on Haswell than their respective original serial versions. Although this work targets WRF physics schemes, the findings are transferable to other performance optimization contexts and provide insight into the optimization of codes with complex physical models for present and near-future architectures with many core and vector units.