Methods | Data processing | Advantages | Limitations | Suitable application |
---|---|---|---|---|
Peak intensity analysis | • Peak intensity: Inormal vs Icancer • Peak intensity ratio (R = I12xx/I16xx): Rnormal vs Rcancer | • Straightforward • Simple | • Low accuracy • Data size increases, accuracy downs | • Obvious characteristic peaks and large differences • Small-scale sample data (10 ~ 100) |
Multivariate statistical analysis | - | • High interpretability • Easily implement | • Facing lager amount data, accuracy has limitation | • Medium-scale sample data (100 ~ 1000) |
PCA | • Reduces the original Raman spectra to PCs while preserving the features that contribute most to the difference in the Raman spectra | • Reduces data dimensionality to PCs • Retains important data information • Removes background noise | • Relatively low classification accuracy | • Unsupervised method • Exploratory study • Data-reduction algorithm |
PLS | • Regression modeling for independent and dependent variables of Raman spectra | • Reduces data dimensionality to key factors • Better selects characteristic variables | • Relatively low classification accuracy | • Supervised method • Data-reduction algorithm |
KCA | • Takes the mean of the nearest point to the seed constantly to cluster analysis of Raman spectra | • Simple algorithm principle • Fast processing speed | • K value is difficult to determine • Not necessarily global optimal, but only local optimal | • Unsupervised clustering technique • Exploratory study • Samples with large differences between groups |
LDA | • Projects the Raman spectra into the vector space with the maximum between-class distance and the minimum within-class distance | • Commonest classification method • High accuracy | • Overfit if data insufficient | • Powerful supervised technique for classification • Integrates with PCA method |
QDA | • Estimates the single covariance matrix for each type of Raman spectra | • Variant of LDA • High accuracy | • Can’t for data dimension reduction | • Supervised technique for classification • Sample analysis |
GA | • Feature extraction of Raman spectra, as a stage prior to classification | • Feature selection • Strong robustness | • Low computation speed • Complex programming process | • General optimization technique • Feature extraction of data |
Classical machine learning | - | • Higher accuracy • Easily implement | • Poor in training efficiency when processing large-scale data | • Large-scale sample data (1000 ~ 10,000) |
SVM | • Seeks to determine the optimal hyperplane that maximizes the distance between the hyperplane and the nearest Raman spectra data sample in a high-dimensional space | • Less prone to overfitting • Avoids local optimum and “curse of dimensionality” | • Poor training efficiency when processing large-scale data | • Nonlinear, multi-dimensional problems • Small sample learning problems |
BT | • Changes the weight of Raman spectra data, learns multiple classifiers, and combines these classifiers linearly to improve the performance of classification | • Ensemble learning method • No need to do feature normalization | • Sensitive to abnormal data and • Easy to overfit | • Low dimensional data • Layers not too high |
RF | • Uses multiple trees to train and predict Raman spectra data | • Ensemble learning method • Low risk of overfitting | • Relatively lower learning speed | • Limited samples |
KNN | • Uses proximity of a single Raman spectral data point to classify or predict groupings | • High precision • Insensitive to outliers | • Relatively large time complexity • Large space complexity | • Small-size samples • Low-dimensional data |
Deep learning | - | • Higher accuracy • Good portability | • Large amounts of computation • Complex model design | • Larger-scale sample data (1000 + , 10,000 + , …) |
CNN | • Raman spectra/figures as input data, prefers Raman figures as input • Extracts features from input data directly and classifies the observed objects | • Directly extracts features from input data • Classifies the observed objects • Simple architecture • Ease of use | • Depends on quality and features of the data | • Most of the modeling tasks (classification and regression) |
RNN | • Raman spectra as input data • Mines wavenumber and intensity information in the Raman spectra data | • Strong learning ability of time series nonlinear data behavior • Stores more long-term sequence information • Mines temporal and semantic information in the data | • Risks of gradient exploding and gradient vanishing | • Sequence data • Time series nonlinear data behavior • Classification and prediction |