1 Introduction
Uncovering dynamic models explaining physical phenomena and dynamic behaviors has been active research for centuries ^{1}^{1}1For example, Isaac Newton developed his fundamental laws on the basis of measured data.. When a model describing the underlying dynamics is available, it can be used for several engineering studies such as process design, optimization, predictions, and control. Conventional approaches based on physical laws, and empirical knowledge are often used to derive dynamical models. However, this is impenetrable for many complex systems, e.g., understanding the Arctic ice pack dynamics, sea ice, power grids, neuroscience, or finance, to only name a few applications.. Datadriven methods to discover models have enormous potential to better understand transient behaviors in the latter cases. Furthermore, data acquired using imaging devices or sensors are contaminated with measurement noise. Therefore, systematic approaches that learn a dynamic model with proper treatment of noise are required. In this work, we discuss a deep learningbased approach to learn a dynamic model by attenuating noise with a RungeKutta scheme, thus allowing us to learn models quite accurately even when data are highly corrupted with measurement noise.
Datadriven methods to learn the governing equations of dynamic models have been studied for several decades, see, e.g., [juang1994applied, ljung1999system, billings2013nonlinear]. Learning linear models from inputoutput data goes back to Ho and Kalman [ho1966effective]. There have been several algorithmic developments for linear systems, for example, the eigensystem realization algorithm (ERA) [juang1985eigensystem, longman1989recursive]
, and Kalman filterbased approaches
[juang1993identification, phan1993linear, phan1992identification]. Dynamic mode decomposition (DMD) has also emerged as a promising approach to construct models from inputoutput data and has been widely applied in fluid dynamics applications, see, e.g., [kalman1960new, schmid2010dynamic, tu2014dynamic]. Furthermore, there has been a series of developments to learn nonlinear dynamic models. This includes, for example, equations free modeling [kevrekidis2003equation], nonlinear regression [voss1999amplitude], dynamic modeling [ye2015equation], and automated inference of dynamics [schmidt2011automated, daniels2015automated, daniels2015efficient]. Utilizing symbolic regression and an evolutionary algorithm
[bongard2007automated, schmidt2009distilling], learning compact nonlinear models becomes possible. Moreover, leveraging sparsity (also known as sparse regression), several approaches have been proposed [brunton2016sparse, mangan2016inferring, tran2017exact, schaeffer2020extracting, mangan2017model, morGoyB21a]. We also mention the work [raissi2018hidden] that learns models using Gaussian process regression. All these methods have particular approaches to handle noise in the data. For example, sparse regression methods, e.g., [brunton2016sparse, mangan2016inferring, morGoyB21a] often utilize smoothing methods before identifying models, and the work [raissi2018hidden] handles measurement noise as data represented like a Gaussian process.Even though the aforementioned nonlinear modeling methods are appealing and powerful in providing analytic expressions for models, they are often built upon model hypotheses. For example, the success of sparse regression techniques relies on the fact that the nonlinear basis functions, describing the dynamics, lies in a candidate features library. For many complex dynamics, such as the melting Arctic ice, the utilization of these methods is not trivial. Thus, machine learning techniques, particularly deep learningbased ones, have emerged as powerful methods capable of expressing any complex function in a blackbox manner given enough training data. Neural networkbased approaches in the context of dynamical systems have been discussed in [chen1990non, rico1993continuous, gonzalez1998identification, milano2002neural]
decades ago. A particular type of neural networks, namely recurrent neural networks, intrinsically models sequences and is often used for forecasting
[lu2018attractor, pan2018long, pathak2017using, pathak2018hybrid, vlachas2018data]. Deep learning is also utilized to identify a coordinate transformation so that the dynamics in the transformed coordinates are almost linear or sparse in a highdimensional feature basis, see, e.g., [lusch2018deep, takeishi2017learning, yeung2019learning, champion2019data]. Furthermore, we mention that classical numerical schemes are incorporated with feedforward neural networks to have discretetime steppers for predictions, see
[gonzalez1998identification, raissi2018multistep, raissi2019physics, raissi2020hidden]. The approaches in [gonzalez1998identification, raissi2018multistep]can be interpreted as nonlinear autoregressive models
[billings2013nonlinear]. A crucial feature of deep learningbased approaches that integrates numerical integration schemes is that vector fields are estimated using neural networks. Also, timestepping is done using a numerical integration scheme. However, measurement data are often corrupted with noise, and these mentioned approaches do not perform any specific noise treatment. The work in
[rudy2019deep] proposes a framework that explicitly incorporates the noise into a numerical timestepping method. Though the approach has shown promising directions, its scalability remains ambiguous as the approach explicitly needs noise estimates and aims to decompose the signal explicitly into noise and ground truth.Our work introduces a framework to learn dynamics models by innovatively blending deep learning with numerical integration methods from noisy and sparse measurements. Precisely, we aim at learning two networks; one that implicitly represents given measurement data and the second one approximates the vector field; we connect these two networks by enforcing a numerical integration scheme as depicted in Figure 1.1. The appeal of the approach is that we do not require an explicit estimate of noise to learn a model. Furthermore, the approach is applicable even if the dependent variables are sampled on different time grids. The remaining structure of the paper is as follows. In Section 2, we present our deep learningbased framework for learning dynamics from noisy measurements by combining two networks. One of these networks implicitly represents measurement data, and the other one approximates the vector field. It is followed by connecting these two networks by enforcing a numerical integration scheme. We briefly discuss suitable architectures of neural networks for our framework in Section 4. In the subsequent section, we demonstrate the effectiveness of the proposed methodology using various synthetic data with increasing levels of noise, describing various physical phenomena. We conclude the paper with a summary and future research directions.
2 Learning Dynamical Models using Deep Learning Constraint by a RungeKutta Scheme
Datadriven methods to learn dynamic models have flourished significantly in the last couple of decades. For these methods, quality of measurement data plays a significant role to ensure accuracy of the learned models. While dealing with realworld measurements, sensor noise in the collected data is inevitable. Thus, before employing any datadriven method, denoising the data is a vital step and is typically done using classical methods, e.g., smothering techniques, moving averages, or the noise is explicitly estimated along with dynamics that imposes a challenge in a largescale setting. In this section, we discuss our framework to learn dynamic models using noisy measurements without explicitly estimating noise. To achieve the goal, we utilize the powerful approximation capabilities of deep neural networks and its automatic differentiation feature with a numerical integration scheme. In this work, we focus on the fourthorder RungeKutta (RK4) scheme; however, the framework is flexible to use any other numerical integration scheme, or higherorder RungeKutta schemes. Before we proceed further, we briefly outline the RK4 scheme. For this, let us consider an autonomous nonlinear differential equation:
(2.1) 
where denotes the solution at time , and the continuous function defines the vector field. Furthermore, the solution can be explicitly given as follows:
(2.2) 
Furthermore, we approximate the integral term using the RK4 scheme, which can be determined by a weighted sum of the vector field computed at specific locations as follows:
(2.3) 
where , and
Consequently, we can write
(2.4) 
In what follows, we assume that the groundtruth (or denoised) sequence approximately follows the RK4 steps. We emphasize that the information of the vector field at is directly utilized in the RK4 scheme.
Having described the RK4 scheme, we are now ready to proceed to discuss our framework to learn dynamical models from noisy measurements by blending deep neural networks with the RK4 scheme. The approach involves two networks. The first network implicitly represents the variable as shown in Figure 1.1(b), and the second network approximates the vector field, or the function . These two networks are connected by attenuating the RK4 constraints. That is, the output of the implicit network is not only in the vicinity of the measurement data but also approximately follows the RK4 scheme as depicted in Figure 1.1(c). To make things mathematically precise, let us denote noisy measurement data at time by . Furthermore, we consider a feedforward neural network, denoted by parameterized by , that approximately yields an implicit representation of measurement data, i.e.,
(2.5) 
where with being the total number of measurements. Additionally, let us denote another neural network by parameterized by that approximates the vector field . We connect these two networks by enforcing the output of the network to respects the RK4 scheme, i.e.,
(2.6) 
As a result, our goal becomes to determine the network parameters such that the following loss is minimized:
(2.7) 
where

denotes the root mean square error of the output of the network and noisy measurements, i.e.,
(2.8) The loss enforces measurement data to be close to the output of the implicit network.

The term links the two networks by the RK4 scheme. Precisely, the term castigates the mismatch between and , i.e.,
(2.9) and the parameter defines its weight in the total loss.

The vector field at the output of the implicit network can also be computed directly using automatic differentiation, but it also can be computed using the network . The term penalizes its mismatch as follows:
(2.10)
and is its corresponding regularization parameter.
The total loss can be minimized using a gradientbased optimizer such as Adam [kingma2014adam]. Once the networks are trained and have found their parameters that minimize the loss, we can generate the denoised variables using the implicit network , and the vector field by the network . Note that due to the implicit nature of the network, the measurement data can be at variable time steps, and we can estimate the solution at any arbitrary time. Moreover, we also obtain the network that approximately provides the vector field for ; hence, one can use it to make predictions.
3 Possible Extensions of the Approach
In many instances, dynamical processes may involve system parameters, and by varying them, the processes exhibit different dynamics. Also, on several occasions, dynamics are governed by underlying partial differential equations. In this section, we shortly discuss extensions of the proposed approach to these two cases.
3.1 Parametric models
The approach discussed in the previous section readily extends to parametric cases. Let us consider a parametric differential equation as follows:
(3.1) 
where is the system parameter. To handle parameter , we can simply take the parameter as an additional input to the implicit network that yields the dependent variables at a given time and parameter. Furthermore, to learn the function , we take the parameter as an input as well along with to obtain a parameterized dynamical model to predict the vector field at a given and parameter.
3.2 Partial differential equations
Many cases, for example, dynamics of flows, dynamical behaviors, are governed by partial differential equations; thus, the dependent variable is highly influenced by its neighbors. In such a case, we construct an implicit representation for measurement data such that the implicit network takes time and the spatial coordinates as inputs and yields dependent variable . Then, we compute containing at userspecified spatial locations. This can be used to learn a dynamic model that describes dynamics at these spatial locations. Consequently, with these discussed alterations, one can employ the approach discussed in the previous section. The strength of the approach is that the collected measurement data can be at any arbitrary spatial location. These locations can also vary with time since we construct an implicit network that is independent of any structure in the collected measurements.
4 Suitable Neural Networks Architectures
Here, we briefly discuss neural network architectures suitable for our proposed approach. We require two neural networks for our framework, one for learning the implicit representation and the second one
is to learn the vector field. For implicit representation, we use a fully connected multilayer perceptron (MLP) as depicted in
Figure 4.1(a) with periodic activation functions (e.g.,
) [sitzmann2020implicit] which has shown its ability to capture finely detailed features as well as the gradients of a function. To approximate the vector field, we consider two possibilities depending on applications. If the data do not have any spatial dependency, then we consider a simple residualtype network as illustrated in Figure 4.1(b) with exponential linear unit (ELU) as an activation function [clevert2015fast]. We choose ELU as the activation function since it is continuous and differentiable and resembles a widely used activation function, namely rectified linear unit (ReLU). On the other hand, when the data has spatial correlations, e.g., dynamics in data are governed by a partial differential equation, then it is more intuitive to use a convolutional neural network (CNN) with residual connections as depicted in
Figure 4.1(c). It explicitly makes use of the spatial correlation. For CNN, we also employ the batch normalization scheme
[ioffe2015batch] after each convolution step for a better distribution of the input to the next layer and use ELU as an activation function.5 Numerical Experiments
In this section, we investigate the performance of the approach discussed in Section 2 to denoise measurement data as well as learning a model for estimating the vector field. To that aim, we consider data obtained by solving several (partial) differential equations that are then corrupted using white Gaussian noise by varying the noise level. For a given percentage of noise, we determine the noise as follows:
(5.1) 
We have implemented our framework using the deep learning library PyTorch
[paszke2019pytorch] and have optimized all networks together using the Adam optimizer [kingma2014adam]. Furthermore, to train implicit networks, we map the input data to as recommended in [sitzmann2020implicit]. Additionally, to avoid overfitting, we add regularization (also referred to as weight decay) of the parameters of the networks and set the regularization parameter to for all examples. All the networks are trained for epochs with batch size , and learning rates used to train networks are stated in each example in their respective subsections. We have run all our experiments on a A100 GPU.Example  Networks  Neurons 

Learning rates  

FHN  For implicit representation  20  4  
For approximating vector field  20  4  
Cubic oscillator  For implicit representation  20  4  
For approximating vector field  20  4 
5.1 FitzHugh Nagumo model
In the first example, we discuss the FitzHugh Nagumo (FHN) model that explains neural dynamics in a simplistic way [fitzhugh1955mathematical]. This has been used as a test case to discover the model using dictionarybased sparse regression [morGoyB21a]. The dynamics are given as follows:
(5.2)  
where and describe the dynamics of activation and deactivation of neurons. We collect measurements in time at a regular interval by simulating using the initial condition . We then corrupt the data artificially by adding various levels of noise. We build two networks with the information provided in Table 5.1. We have set both the parameters and
in the loss function (
2.7) to .Having trained networks, we obtain denoised measurement data using the implicit network and estimate the vector field using the neural network. The results are shown in Figure 5.1. The figure demonstrates the robustness of the approach with respect to noise. The method can recover the data very close to clean data (see the first two columns of the figure) even when the measurements are corrupted with relatively more significant noise, e.g., to . Furthermore, the vector field is estimated quite accurately, at least in the regime of the collected measurements, see the third and fourth columns of the figure. However, as expected, the vector field estimates are inadequate away from the measurements, thus, showing the limitation in extrapolating to the regime where no data is available. Nevertheless, this can be improved by collecting more measurements in a different regime by varying initial conditions.
5.2 Cubic damped model
In the second example, we consider a damped cubic system, which is described by
(5.3)  
It has been one of the benchmark examples in discovering models using data, see, e.g., [brunton2016discovering, morGoyB21a] but there, it is assumed that the dynamics can be given sparsely in a highdimensional feature dictionary. Here, we do not make any such assumptions and instead learn the vector field using a neural network along the line of [rudy2019data]. For this example, we take data points in the time interval by simulating the model using the initial condition as done in [rudy2019data]. We add various levels of noise in the clean data to have noisy measurements synthetically. We, again, perform similar experiments as done in the previous example. We construct neural networks for implicit representation and the vector field with the parameters given in Table 5.1.
Having trained networks with parameters and in the loss function (2.7), we have an implicit network to obtain denoised signal and a neural network approximating the vector field. We plot the results in Figure 5.2, where we show noisy, clean, and denoised data in the first two columns, and in the third and fourth columns, we plot the streamlines of the vector field, obtained using the trained neural network. We observe that the denoised data faithfully matches with the clean data even for a high noise level, and the vector field is also close to the ground truth, at least in the region where measurement data are sampled. However, in the region where no data are available, the vector field approximation is poor, as one can expect. However, having richer data covering a larger training regime can improve the performance of the neural network, approximating the vector field.
5.3 Burgers equation
Next, we examine the case, where collected measurements have spatial correlation as well, meaning there is an underlying partial differential equation, describing the dynamics. Here, we consider a 1D viscous Burger equation. It explains several phenomenons occurring in fluid dynamics and is governed by
(5.4) 
where is the viscosity; and denote the first and second derivatives with respect to the spatial variable , and the equation is also subject to a boundary condition. We have taken the data from [rudy2017data]
, followed by artificially corrupting them using various levels of Gaussian white noise. In brief, the measurements are collected at
grid point in the domain and at the time interval . For more details, we refer to [rudy2017data].Example  Networks 


Learning rates  

Burgers example  For implicit representation  10  4  
For approximating vector field  8  4  

For implicit representation  50  4  
For approximating vector field  16  4 
Since the data has spatial correlations, we make use of convolutional neural networks to learn the vector field of , instead of a classical MLP as shown in Figure 4.1. Thus, we build an MLP for the implicit representation and a CNN with details given in Table 5.2. Once we train the network, in Figure 5.3, we plot the performance of the proposed approach to denoise the spatialtemporal data for an increasing level of noise. We observe that the proposed methodology is able to recover the data faithfully even with significant noise in data. Furthermore, in the last columns of Figure 5.3, we observe the approximating capability of the convolutional NN for the vector field, e.g., . We observe that the model also predicts the vector field with good accuracy. We mark that the vector field of the clean data is estimated using a finitedifference scheme on the clean data since the true function is not known to us.
5.4 Kuramoto–Sivashinsky equation
In our last test case, we take the data of a chaotic dynamics which are obtained by simulating the Kuramoto–Sivashinsky equation which is of form:
(5.5) 
where , , and denote the first, second, and fourth derivatives with respect to . The equation explains several physical phenomena such as instabilities of dissipative trapped ion modes in plasma, or fluctuation in fluid films, see, e.g., [kuramoto1978diffusion]. We again use the data provided in [rudy2017data] for the equations which is simulated using a spectral method with spatial grid points and timesteps. Since the dynamics present in the data is very rich, complex, and exhibits chaotic behavior, we require networks that are more expressive as compared to the previous example; the details about the networks are provided in Table 5.2.
In Figure 5.4, we report the ability of our method to remove the noise from the spatialtemporal data. We observe that the proposed methodology profoundly removes noise from the data. Also, the vector field is approximated very well using the learned CNN (see the last column of the figure). The vector field of the clean data is computed using a finite difference method. We draw particular attention to the last row of Figure 5.4. The algorithm recovers several minor details that are damaged due to the presence of a highlevel noise ().
6 Discussion
In this work, we have presented a new paradigm for learning dynamical models from highly noisy (spatial)temporal measurement data. Our framework blends powerful approximation capabilities of deep neural networks with a numerical integration scheme, namely the fourthorder RungeKutta scheme. The proposed scheme involves two networks to learn an implicit representation of the measurement data and of the vector field. These networks are combined by enforcing that the output of the implicit network respects the integration scheme. Furthermore, we highlight that the proposed approach can readily handle arbitrary sampled points in space and time. In fact, the dependent variables need not be collected at the same time and the exact location. This is because we first construct an implicit representation of the data that do not require data to be of a particular structure.
We note that the approach becomes computationally expensive when the spatial dimension increases. Indeed, it becomes impracticable when the data are collected for 2D or 3D space. A large system parameter space imposes additional challenges. However, we know that the dynamics often lie in a lowdimensional manifold. Therefore, in our future work, we aim to utilize the concept of lowdimensional embedding to make learning computationally more efficient. Furthermore, we learn a dynamic model as a blackbox neural network. Hence, interpretability and generalizability remain opaque. In the future, it could be interesting to combine or use the denoised data with sparse or symbolic regression, as, e.g., in [rudy2017data, cranmer2020discovering, both2021deepmod] to obtain an analytic expression for a (partial) differential equations explaining the data.
Comments
There are no comments yet.