Deep learning wavefront sensing and aberration correction in atmospheric turbulence

Deep learning neural networks are used for wavefront sensing and aberration correction in atmospheric turbulence without any wavefront sensor (i.e. reconstruction of the wavefront aberration phase from the distorted image of the object). We compared and found the characteristics of the direct and indirect reconstruction ways: (i) directly reconstructing the aberration phase; (ii) reconstructing the Zernike coefficients and then calculating the aberration phase. We verified the generalization ability and performance of the network for a single object and multiple objects. What’s more, we verified the correction effect for a turbulence pool and the feasibility for a real atmospheric turbulence environment.


Introduction
In general, the wavefront aberrations induced by fluid (such as atmospheric turbulence) or biological tissues in an imaging system can be corrected by a deformable mirror (DM) or a spatial light modulator (SLM) [1]. To obtain the appropriate DM or SLM control signal, there are two types of methods: optimization method and wavefront sensing method. The former searches the appropriate control signal by stochastic, local or global search algorithm [2], which is time-consuming because of the large number of iterations and measurements. The latter restores the wavefront distortion by a wavefront sensor (such as Hartmann-Shack sensor) to guide the control signal of DM or SLM [3], which suffers from costly optical elements, multiple measurements and strict calibration requirements. For an imaging system without wavefront aberration, the object can be clearly imaged. When atmospheric turbulence or other wavefront-affecting media exists in the imaging path, the image of the object would be distorted. Different wavefront aberrations lead to different image distortions, which means that there is a mapping relationship between them. Supervised deep learning has played an important role in computer vision [4,5]. For example, convolutional neural networks are used for classification and recognition [6,7], that is, learning the mapping relationship from images to categories and locations; encoder-decoder-mode neural networks are used for semantic segmentation [8,9], that is, learning the mapping relationship from the image to the category of each pixel. It is natural to ask: can the deep learning neural network learn the mapping relationship from image distortion to wavefront aberration?
In fact, currently, deep learning has become a powerful tool to solve various inverse problems in computational imaging by learning the corresponding mapping relationship, such as digital holography (from hologram to phase and intensity images of objects) [10,11], phase unwrapping (from wrapped phase to absolute phase) [12,13], imaging through scattering media (from speckle map to object image) [14,15]. In addition, the phase distribution of an object can be directly restored from a single intensity image by the deep learning neural network [16,17]. Similarly, from the distorted intensity image, the deep learning neural network is also used to reconstruct the wavefront aberration phase [18,19] or its Zernike coefficients [20][21][22][23][24][25][26][27][28], called deep learning wavefront sensing. As an end-to-end method, the deep learning wavefront sensing can be done just from a camera without the need of traditional wavefront sensors, which has great significance in free-space optical communications, astronomical observations and laser weapons. But in these works [18][19][20][21][22][23][24][25][26][27][28], firstly, only one of the two ways of aberration phase reconstruction or Zernike coefficients reconstruction was studied. on the other hand, there are no real-environment experiments (either purely numerical simulations [18,20,21,[25][26][27], or use SLM or Lens movement to simulate aberration phases [19,[22][23][24]28]).
In this paper, we test the generalization ability for single and multiple objects cases by employing deep learning neural network, and compare the performance of using the wavefront aberration phase or its corresponding Zernike coefficients as ground truth (GT) in simple and complex cases. What's more, the correction effect in the turbulent pool and the feasibility in real atmospheric turbulence are verified.

Method
As shown in Fig. 1(a), due to the atmospheric turbulence, a wavefront aberration ϕ( x, y) is induced into the object field O(x, y), where x and y represent transverse spatial coordinates. Then the distorted intensity distribution I(x, y) is given by where FT{} represents the Fourier transform. That is, there exists a mapping relationship between the distorted wavefront aberration, the object field and the intensity distribution: The deep learning neural network can learn this mapping relationship from a large number of datasets, and reconstructs the wavefront aberration phase (or its corresponding Zernike coefficients) from the intensity distribution, just as the red part in Fig. 1, which is the main pursuit of this paper. Then DM or SLM can be used to correct the wavefront aberration by the guidance of the network output, as shown in Fig. 1 The convolution neural network (CNN) architectures are inspired by U-Net [8], Residual block [29] and Inception module [30], as illustrated in Fig. 2. The CNN1 consists of an encoding path (left), a decoding path (right) and a bridge path (middle). The encoding and decoding paths each contain four Residual blocks, while the Residual block of the encoding path is followed by max pooling for downsampling and the Residual block of the decoding path is preceded by transposed convolution for upsampling. The CNN2 consists of an encoding path and two fully connected layers. The numbers in Fig. 2(a) and (b) represent the number of channels in the convolutional layer and the number of neurons in the fully connected layer. We add the idea of the Inception module to the Residual block, as shown in Fig. 2(c), where the four right paths separately use one, two, three, and four 3 × 3 convolutions for more effective extraction of the features in different scales. The concatenations in Fig. 2(a) transmit gradient and information to improve the convergence speed of the CNN, while the concatenations in Fig. 2(c) merge the feature in different scales. The CNN1 and CNN2 are used to reconstruct the wavefront aberration phase and its Zernike coefficients from the intensity distribution, respectively. The parameter quantity of CNN2 is approximately equal to half of CNN1. The adaptive moment estimation (ADAM) based optimization is used to train all the networks. The batch size is 64 and the learning rate is 0.01 (75 % drop per epoch if the learning rate is greater than 10 − 7 ). The epoch size is 200 for 10,000 pairs of datasets. The L2 norm and cross-entropy loss functions are used for CNN1 and CNN2, respectively.
All the networks are implemented by Pytorch 1.0 based on Python 3.6.1, which is performed on a PC with Core i7-8700 K CPU (3.8 GHz) and 16 GB of RAM, using NVIDIA GeForce GTX 1080Ti GPU. The training time is about 6 h for CNN1 and 4 h for CNN2, while the testing time is about 0.05 s for CNN1 and 0.04 s for CNN2.
Three parameters are used to evaluate the accuracy of the neural networks: i. SSIM: Structural similarity index.
ii. RMSE: Root mean square error.
iii. MAE: Mean absolute error (and its percentage of the aberration phase range).

Simulation
In the simulation, the samples are distorted by the wavefront aberration phase generated by Zernike polynomials with 2-15 order coefficients. The coefficients are randomly set within the range of [-5, 5]. The aberration phases (Zernike coefficients) and corresponding distorted images are used as the GT and input, respectively. For the mapping relationship in Eq. (2), there are two cases: the same object with different wavefront aberration phases (single object) or different objects with different wavefront aberration phases (multiple objects). It is thus necessary to compare the performance of the network in these two cases.
For the single object, we use a grid as the object to generate 11,000 pairs of data (10, 000 for training and 1,000 for testing), partially shown in Fig. 3. The shape of the grid deforms correspondingly with the aberration phases, which guides the convergence of the neural network.
For the multiple objects, Number (from EMNIST [31]) and LFW [32] datasets are separately used as the object to generate 10,000 pairs of data for training and 1,000 pairs of data for testing, while Letter (from EMNIST) and ImageNet [33] are used as to generate 1,000 pairs of data for testing. Note that the aberration phase used in generating the dataset is the same as that used for a single object. After training, the three CNN1s are tested. The accuracy evaluation of the networks is shown in Table 1 and Fig. 4, from which the following can be observed: i. Whether it is a single object or multiple objects, neural networks have the ability to learn the mapping relationships among them. ii. The accuracy of the neural network on a single object is higher than that of multiple objects. iii. The neural network trained with a type of dataset (Number or LFW) can also work on another type of similar dataset (Letter or ImageNet). iv. Note that when using the EMNIST-trained network to reconstruct the LFW or ImageNet distorted image, wrong results will be obtained, and vice versa. Therefore, in actual applications, it is recommended to use similar objects to create dataset for the target objects.
In addition to directly reconstructing the aberration phase, it is also an option to reconstruct the Zernike coefficient which is then used to calculate the aberration phase. We compare these two ways in two cases: aberration phase without details (simple) and aberration phase with internal details (complex).
For the simple case, the Zernike coefficients of the aberration phase from the Grid dataset in Sect. 3.2 are used as the GT of the CNN2 (M = 14). For the complex case, as shown in Fig. 5, to generate the complex aberration phase as the GT of the CNN1, a random phase is added into the sample aberration phase; then the Zernike coefficients (2-101 orders) calculated from the complex aberration phase are used as the GT of the CNN2 (M = 100). After training, the three networks are tested, in which the coefficients from CNN2 are calculated to the phase to compare with CNN1. The accuracy evaluation of the networks is shown in Table 2 and Fig. 6, from which the following can be observed: i. For the simple case, CNN1 and CNN2 have the same accuracy. ii. For the complex case, the accuracy of CNN2 drops a lot, due to the loss of detailed information (lower resolution). iii. Given that SLM has a higher resolution than DM in general, CNN1 (direct reconstruction of aberration phase) has a higher resolution which is more suitable for SLM, while CNN2 (reconstruction of Zernike coefficient) has fewer network parameter quantity but lower resolution which is more suitable for DM.

Correction experiment
In order to verify the correction effect of this method, we used the way of directly reconstructing the wavefront aberration phase to train and test CNN1 in the turbulence pool. As shown in Fig. 7, the setup contains five parts including the aberration phase  acquisition part, the distortion image acquisition part, the correction part, the calculation part and the turbulence generating part: i. The distortion phase acquisition part includes a laser source (532nm), a Mach-Zehnder interferometer for generating the hologram, a telecentric lens for conjugating the calibration plane and the CCD1 target plane, and a CCD1 for recording the hologram. ii. The distorted image acquisition part includes an ISLM (intensity-type SLM) with a white LED for generating objects (grid), a double lens for adjusting the beam size and a CDD2 with a lens (300mm) for imaging. iii. The correction part includes a triplet lens for adjusting the beam size while conjugating the calibration plane and the PSLM (phase-type SLM) target plane, and a PSLM for correction. When collecting the dataset, a constant plane is loaded on the PSLM. We use CCD1 to record the hologram and reconstruct the aberration phase as GT, and use CCD2 to record the distorted image loaded on the ISLM as input, which is partially shown in Fig. 8.
After training, in real time, the computer controls the PSLM by the aberration phase reconstructed from the network to correct the turbulence (correction frequency is about 100HZ). In order to verify the correction effect, we use CCD1 to continuously record the hologram (phase), and then turn on the correction system. As shown in Fig. 9, we calculate the standard deviation (StdDev) of the phase recorded by CCD1, and display the phases of the frames 1, 101, 201, 301, 401, 501, 601, 701, 801, 901 below. The average StdDev of the phase for the first 500 frames (before correction) is 7.51, while that of the phase for the next 500 frames (after correction) is 1.79.
To further test the correction effect, we blocked the reference light in the setup, replaced the TL with a convex lens, and moved the CCD1 to the focal plane of the  convex lens. Then the focus spots before and after correction are recorded and compared in Fig. 10. From Fig. 10(a) and (b), it can be found that the energy of the corrected spot is more concentrated. To be more quantitative, in Fig. 10(c), we plot the intensity across the horizontal lines of Fig. 10(a) and (b), from which we can find that the maximum intensity of the focus spot after correction is about 2.5 times that before correction.

Real atmospheric turbulence experiment
In order to verify the feasibility of this method in real atmospheric turbulence, we transferred the setup in Fig. 7 to an open-air environment. Since the reference beam of the holographic part is not stable enough at a long distance, a Hartmann-Shack sensor is used to measure the wavefront aberration phase as GT. The focal length of the CCD2 lens is increased to 600mm to photograph a stationary object near the Hartmann-Shack sensor as input. The length of the atmospheric turbulence is about 130 m. The Hartmann-Shack sensor and the camera are triggered synchronously using a section of optical fiber at a frequency of 10HZ. 11,000 pairs of data are sampled to train and test with the CNN1 (10,000 for training and 1,000 for testing).
The partial results of the networks are shown in Fig. 11, while the SSIM, RMSE and MAE are 0.961, 2.912 and 2.419 (5.69 %), respectively, which means that the network can reconstruct the real turbulent phase but the performance is lower than the single sample case in Sect. 3.2. As indicated by the red arrow in the second column, there are relatively large errors in the reconstruction results of individual samples (8 %). We attribute this performance degradation to more constantly changing factors in the real environment, such as ambient light intensity, wind speed, humidity, etc. More in-depth exploration will be carried out in our follow-up work.

Conclusions
In this paper, we have verified the feasibility of deep learning wavefront distortion sensing for single and multiple objects. Compared with a single object, the network performance of multiple objects will be a little reduced. We compared the two ways of direct phase reconstruction or Zernike coefficient reconstruction by the Fig. 10 Focus spots before and after correction. a Focus spot before correction; b Focus spot after correction; c Intensity across the medium horizontal lines of (a) and (b) network, and found that the direct way is more accurate for the complex aberration phase. In addition, the correction effect of this method has been verified in a turbulent pool environment, and the feasibility of the method has been verified in a real atmospheric turbulent environment.