### SAsLFM optical setup

Our approach is applicable in various schematics of phase-space imaging technologies, such as LFM and sLFM. In this work, we chose the unfocused sLFM for experimental demonstration (Supplementary Fig. 11). sLFM is a wide-field-esque imaging method with a microlens array placed at the native image plane. A camera is places at the back focal plane of the MLA. At the conjugated pupil plane, a piezo tilt platform was placed to rapidly scan the image plane at small intervals, which is determined by different experimental conditions, to balance the trade-off between spatial and temporal sampling rates. Meanwhile, the periodic scan was synchronized with the image acquisition. Detailed imaging conditions, including the microlens array, cameras, number of sub-apertures, and the scan trajectory, are illustrated in Supplementary Table 1.

### Stochastic path integration of pupil phases

Even the most elaborate optical systems inevitably have system aberrations, which affect the imaging performance. Although the DAO capability of sLFM mitigates the distortion caused by aberrations to a certain extent, the imaging quality still suffers from higher-order optical aberrations. To lift the burden of DAO and to circumvent the laborious process of capturing 3D phase-space PSFs, we used a phase-retrieval-based algorithm to generate and calibrate the PSFs with the wave optics model [38] as previously described [27]. For each optical system, we calibrated the PSFs twice, once for sLFM and the other for SAsLFM. We firstly captured images of sub-diffraction-limited fluorescence beads (with or without a certain amount of water) as the target distribution and simulated the PSFs of sLFM without any additional aberration. During each iteration, we calculated the correlations between the simulated PSFs and the captured images along all viewing directions. We can obtain the residual wavefront on the pupil plane by integrating the correlation map. Here we adopted an effective integration method based on stochastic paths, which provides a smoother and more accurate estimation. There is a total of \({C}_{r+c}^r\) optional integration paths connecting the central point to the point with coordinates. (*r*, *c*). on the pupil plane. For each point, we randomly selected 1000 paths and integrated the correlation map along these paths. Then we averaged the integration values as the final estimation of the residual wavefront at (*r*, *c*). The estimated wavefront is then fitted with Zernike polynomials and appended to the pupil plane to generate a new simulated PSF. The above iteration procedure continuously shrinks the disparities between the generated PSFs and the captured ones until the residual wavefront converges. We show the effectiveness of our method in Supplementary Fig. 12. In different experiments, there is a small difference between the actual PSF and the calibrated ones because the thickness of the water used in the experiments is not the same as that used in the calibration process. However, the distortion induced by the slight difference can be eliminated by the light-field’s DAO capability.

### Phase-space deconvolution with a circular trajectory

The phase-space deconvolution method proposed by Zhi Lu et al. [39] increases the convergence speed. Here we make further improvements to this algorithm in Fourier space with a circular trajectory. Based on the ADMM algorithm [40], the update process can be represented as:

$${\displaystyle \begin{array}{c}V{\left(\boldsymbol{x},z\right)}^{\left(j,k\right)}={\alpha}_{\boldsymbol{u}}\beta \frac{{\frac{I\left(\boldsymbol{u}\right)}{\delta +{\sum}_zV{\left(\boldsymbol{x},z\right)}^{\left(j-1,k\right)\ast }W\left(\boldsymbol{x},z,\boldsymbol{u}\right)}}^{\ast }{W}^T\left(\boldsymbol{x},z,\boldsymbol{u}\right)}{J^{\ast }{W}^T\left(\boldsymbol{x},z,\boldsymbol{u}\right)}V{\left(\boldsymbol{x},z\right)}^{\left(j-1,k\right)}\\ {}+\left(1-{\alpha}_{\boldsymbol{u}}\beta \right)V{\left(\boldsymbol{x},z\right)}^{\left(j-1,k\right)}\end{array}}$$

Where *I*(*u*) is the sub-aperture component, *δ* is a small number to avoid a division by zero, ^{∗} signifies 2D convolution operation, *V*(*j*, *k*) is the estimated volume within iteration j ^{th} of inner loop and *k* ^{th} iteration of outer loop. During the inner loop, each iteration uses information from a single angular component to update *V*. After going through all the sub-aperture components, it goes into the *k* + 1 ^{th} iteration of the outer loop. *W*(*x*, *z*, *u*)^{T} is the adjoint of the point spread function *W*(*x*, *z*, *u*). *J* is an all-one matrix to compensate for energy loss at the edges of the images. *β* is an empirical coefficient to determine the convergence speed. \({\alpha}_{\texttt{u}}\) is an update weight calculated according to the energy distribution of each angular component:

$${\alpha}_{\boldsymbol{u}}=\frac{{\left\Vert W\left(\boldsymbol{u}\right)\right\Vert}_1}{\sum_{m=1}^N{\left\Vert W(m)\right\Vert}_1}$$

The aforementioned method may lead to oscillation during the convergence process and crosstalk between adjacent layers. As the focus depth of each component is determined by the relative distance between the position of the sub-aperture and the center of the pupil plane, we provided an update scheme based on a circular trajectory. The volume is updated by the formula:

$${\displaystyle \begin{array}{l}V{\left(\boldsymbol{x},z\right)}^{\left(j,k\right)}=\beta {\sum}_{\boldsymbol{u}\in {G}_i}\frac{{\frac{\alpha_{\boldsymbol{u}}I\left(\boldsymbol{u}\right)}{\delta +{\sum}_zV{\left(\boldsymbol{x},z\right)}^{\left(j-1,k\right)\ast }W\left(\boldsymbol{x},z,\boldsymbol{u}\right)}}^{\ast }{W}^T\left(\boldsymbol{x},z,\boldsymbol{u}\right)}{M_i{J}^{\ast }{W}^T\left(\boldsymbol{x},z,\boldsymbol{u}\right)}V{\left(\boldsymbol{x},z\right)}^{\left(j-1,k\right)}p(z)\\ {}+\frac{M_i-{\sum}_{\boldsymbol{u}\in {G}_i}{\alpha}_{\boldsymbol{u}}\beta }{M_i}V{\left(\boldsymbol{x},z\right)}^{\left(j-1,k\right)}\left(1-p(z)\right)\end{array}}$$

Here we divided the sub-aperture components into several groups according to the position relative to the center of the pupil plane. *i* is the index of each group, *G*_{i} is the index set of angular components belonging to the *i* ^{th} group. *M*_{i} is the total number of components in *G*_{i}, *p*(*z*) denotes an update rate changing with the depth. It has many options, such as a sigmoid function. Starting from the outermost or the innermost ring, the new reconstruction method with a circular trajectory uses all the components in one group to update the volume during each inner iteration. With our approach, as the focus depth of each group is assigned to a different position, an empirical *p*(*z*) is used to merge all the information together to generate a large-scale high-resolution volume. To accelerate the update process, we derived the forward and backward projection in Fourier space. Due to the properties of Fourier transform, the forward projection process can be represented as:

$${\displaystyle \begin{array}{l}{\left.{\sum}_zV\left(\boldsymbol{x},z\right)\ast W\Big(\boldsymbol{x},z,\boldsymbol{u}\left)={IFT}^3\right({FT}^3\left(V\right(\boldsymbol{x},z\left)\right)\cdot {FT}^3\left(W\right(\boldsymbol{x},z,\boldsymbol{u}\left)\right)\Big)\right|}_{z=0}\\ {}{\left.={IFT}^1\left({IFT}^2\right({FT}^3\left(V\left(\boldsymbol{x},z\right)\right)\cdot {FT}^3\left(W\right(\boldsymbol{x},z,\boldsymbol{u}\left)\right)\left)\right)\right|}_{z=0}\\ {}={sum}_z\left({IFT}^2\left({FT}^3\left(V\left(\boldsymbol{x},z\right)\right)\cdot {FT}^3\left(W\left(\boldsymbol{x},z,\boldsymbol{u}\right)\right)\right)\right)\\ {}={IFT}^2\left({sum}_{fz}\left({FT}^3\left(V\left(\boldsymbol{x},z\right)\right)\cdot {FT}^3\left(W\left(\boldsymbol{x},z,\boldsymbol{u}\right)\right)\right)\right)\end{array}}$$

Where *FT*^{n} and *IFT*^{n} denote n-dimensional Fourier transform and inverse Fourier transform respectively. *sum*_{z} signifies the sum operation along z-axis in the spatial domain and *sum*_{fz} is the sum operation along *fz* axis in the frequency domain. To demonstrate the speed improvement, we firstly assume that the size of *V* is *N*_{x} × *N*_{y} × *N*_{z}, the size of sub-aperture PSF is *n*_{x} × *n*_{y} × *n*_{z}. For simplicity, we assume that *N*_{x} and *N*_{y} are larger than *n*_{x} and *n*_{y}. The computational complexity of the forward projection in spatial space is

$$O\left({N}_x{N}_y{n}_x{n}_y{N}_z+{N}_x{N}_y{N}_z\right)=O\left({N}_x{N}_y{n}_x{n}_y{N}_z\right)$$

While the corresponding computational complexity in Fourier space is

$${\displaystyle \begin{array}{l}O\left({N}_x{N}_y{N}_z\left(\log \left({N}_x\right)+\log \left({N}_y\right)+\log \left({N}_z\right)\right)+{N}_x{N}_y{N}_z+{N}_x{N}_y\left(\log \left({N}_x\right)+\log \left({N}_y\right)\right)\right)\\ {}=O\left({N}_x{N}_y{N}_z\left(\log \left({N}_x\right)+\log \left({N}_y\right)+\log \left({N}_z\right)\right)\right)\end{array}}$$

It illustrates that calculating the projection in frequency space significantly reduces the computational cost, especially when the size of *V* is large. In terms of the backward projection, the computational complexity is equivalent to that of the forward projection. As the projection operations are repeated for each sub-aperture component during iterations, deconvolution in Fourier space provides orders of magnitude reductions in computational costs.

### Evaluation metrics

We used the SSIM to quantitatively evaluate the reconstruction performance. SSIM is defined as

$$SSIM\left(A,B\right)=\frac{\left(2{\mu}_A{\mu}_B+{C}_1\right)\left(2{\sigma}_{AB}+{C}_2\right)}{\left({\mu_A}^2+{\mu_B}^2+{C}_1\right)\left({\sigma_A}^2+{\sigma_B}^2+{C}_2\right)}$$

Where *μ*_{A} and *μ*_{B} are the local means of *A* and *B*, *σ*_{A} ， *σ*_{B} and *σ*_{AB} are standard deviations and cross-deviations for images *A* and *B*. *C*_{1} and *C*_{2} are constants to avoid a division by null. *A* and *B* are converted to grayscale images with a range from 0 to 1.

We used a sharpness metric to evaluate the amount of high-frequency information contained in a single image, which can be calculated by:

$${s}_I=\frac{\sum_{v=1}^V{\sum}_{u=1}^U Gradient{\left(u,v\right)}^2}{UV}$$

Where *Gradient*(*u*, *v*) is a function calculating the numerical gradient of a 2D matrix *I*(*x*, *y*). *U* and *V* are the height and width of the gradient map, respectively.

### Network architecture

U-net is a generic deep-learning solution for various quantification tasks such as cell segmentation and biomedical image deconvolution. The network architecture in this work was inspired by previous research [41] (Supplementary Fig. 10). We captured ~ 120 sets of confocal images and used a wave optics model to generate synthetic SAsLFM images. To mimic the camera acquisition process, we added independent Poisson noise to the generated SAsLFM data. The data pairs were divided into two groups, ~ 100 pairs were used for network training and the rest served as the validation set to prevent overfitting. The loss function of the network consists of a pixel-wise L1-norm loss term, an L2-norm loss term and an SSIM term. The size of the input data was set to 513 × 513 × 21 pixels (*x*- *y*- *u*), where the last term is the angular index. And the size of the output data was 513 × 513 × 201 pixels (*x*- *y*- *z*). Of note, among all the 225 phase-space components, we selected 21 components, considering the data redundancy and the computational consumption. Before training the network, all data were normalized in the range from 0 to 0.9. The network was trained using the Adam optimizer with the learning rate set to 0.0002, and the exponential decay rates for the first-moment and second-moment estimates were 0.5 and 0.99, respectively. The training procedure costs about 10 h to converge with a single GPU (GeForce Titan RTX, Nvidia).

### Calcium trace extraction of mouse brain

We employed a CNMF framework to decompose the MIPs of the volumetric data of the mouse brain into a matrix that encodes the footprints of segmented neurons. In the algorithm, the regions of interest (ROIs) were set to **~** 15 × 15 × 15 μm^{3} to match the size of neurons of the mice brain. Because the vessel has a significant effect on the selection of ROIs, resulting in a biased segmentation of neurons, we manually performed exclusion guided by the visual information. The calcium responses were then extracted directly from the abovementioned ROIs in the volumetric time-lapse stacks. The temporal traces of the calcium activities were calculated by the formula:

$$\Delta F/{F}_0=\left(F-{F}_0\right)/{F}_0$$

Where *F* is the raw averaged intensity of the extracted ROI, and *F*_{0} is the corresponding intensity baseline, which was calculated by averaging the intensity of the signals that below 120% of the mean value of the entire trace.

### Semi-automated tracking of *Drosophila* embryo cells

For cell tracking in *Drosophila* embryos, we used a semi-automated framework rather than performing manually as the cells are densely labeled and the number of cells is too large, which makes manual labeling very time-consuming and cumbersome. We calculated the MIPs of two volumes separated by a time interval of ~ 255 s. During this time period, the distribution of the cells is slightly changed while the overall morphology remains almost the same. Then, we adopted an optical flow estimation method based on conjugation gradient [31] to calculate the distribution of velocities of movements of the bright patterns (the cells). Segmentation was applied using ImageJ’s threshold and binary functions to find out the contours of the cells, together with the fill holes function to compensate for the over-segmentation. Large areas were excluded by threshold filtering. Finally, 554 connected domains were segmented and then we extracted the values of the optical flow map on these coordinates as the estimation of the cell motions.

### Data analysis

All data analyses were performed with customized MATLAB (MathWorks, 2020b) programs, open-source ImageJ and Amira (Thermo Fisher Scientific, Amira 2019). The hardware was controlled by LabVIEW 2018. The 3D tracking of 8 tentacles of the freely-swimming jellyfish was carried out manually in MATLAB. Details of the parameters and rendering models are listed in Supplementary Table 2.

### Imaging of *Drosophila* embryos

All *Drosophila* experimental procedures were conducted with ethical approval from the Animal Care and Use Committee of Tsinghua University. All *Drosophila* in the experiments expressed His2Av-mrfp1. *Drosophila* embryos were dechorionated with 50% (vol/vol) sodium hypochlorite solution. During live imaging, *Drosophila* embryos were embedded in 0.4% low-melting agarose in a 35-mm petri dish with the temperature kept at 25 °C. For SAsLFM, we filled the petri dish with water to introduce the spherical aberration.

### Zebrafish vascular system imaging

All zebrafish experimental procedures were conducted with ethical approval from the Animal Care and Use Committee of Tsinghua University. We cultured *flk: EGFP* transgenic zebrafish embryos at 28.5 °C in Holtfreter’s solution. The zebrafish larvae were anesthetized by ethyl 3-aminobenzoate methanesulfonate salt (100 mg/L) at 4–5 days postfertilization (dpf) and mounted in 1% low-melting-point agarose in a petri dish filled with water at 26–27 °C during the imaging process.

### In vivo mouse experiments

All procedures involving mice were approved by the Institutional Animal Care and Use Committee of Tsinghua University. We used both male and female C57BL/6 mice 10 weeks to 6 months old without randomization or blinding. Mice were group-housed under a cycle of 12 h light/dark (lights on at 7 a.m.) and provided with water and food ad libitum. The relative humidity was 50% at 20–22 °C.

The craniotomy surgery was performed on the stereotaxic apparatus (RWD, China). Mice were anesthetized with 1.5–2% isoflurane. After the surgery, flunixin meglumine (Sichuan Dingjian Animal Medicine Co., Ltd) was injected subcutaneously (1.25 mg/kg) for at least 3 days to reduce inflammation.

The scalp was removed by sterile surgical scissors to expose the entire dorsal skull. The skull was thoroughly cleaned with saline to remove all fascia above the skull. Then, a piece of skull (8 mm in diameter) was removed using a cranial drill and replaced with a crystal skull. The edge of the crystal skull and the skin incision was sealed with a thin layer of cyanoacrylate adhesive (Krazy glue, Elmer’s Products Inc). A custom-made head-post was implanted above the skull and fixed with dental cement.

For acute imaging, we used adult double-transgenic Rasgrf2-2A-dCre/Ai148D mice (JAX No.: 022864 and 030328) to specifically label cortical layer 2/3 neurons. Trimethoprim saline solution (solution concentration: 5 mg/ml, dose: 10ul/kg) was injected for 3 days to induce cre expression. For chronic imaging, adult C57BL/6 mice injected with diluted AAV9-hSyn-GCaMP6s virus (from BrainVTA Technology Co., Ltd., China) were allowed to recover for at least 2 weeks after craniotomy. During imaging, awake mice were placed in a tube with the head restrained under the objective to minimize vibration. For SAsLFM, a container was mounted on the head of the mouse to hold water (Supplementary Fig. 13).

### Cubic-MACS clearing

Firstly, the mice were anesthetized with a 0.5% pentobarbital sodium solution (0.4 ml/30 g body weight). To flush blood vessels, the mice were transcardially perfused with 0.01 M PBS (Sigma-Aldrich Inc., St. Louis, MO, United States) with 4% paraformaldehyde (PFA, Sigma-Aldrich Inc., St. Louis, MO, United States) in PBS (pH 7.4) for fixation. Then the sample was post-fixed with 4% PFA for 2 days under 4 °C. The brain samples were washed with PBS for 1 day with the solution replaced at 8 and 16 h. Then the samples were delipidated with a CUBIC-1 solution (~ 50 ml) for 6 days at room temperature. The brain samples were washed again with PBS for 1.5 days with the solution changed every 8 h at room temperature and then immersed in CUBIC-X_{1} swelling solution for 2.5 days with the solution replaced every 12 hours. Finally, for refractive index matching, the samples were immersed with CUBIC-X_{2} for 1.5 days with the solution replaced every 12 h. During imaging, the samples were dissected and embedded in 4% agarose mixed with CUBIC-X2.