Recent application of Raman spectroscopy in tumor diagnosis: from conventional methods to artificial intelligence fusion

Qi, Yafeng; Liu, Yuhong; Luo, Jianbin

doi:10.1186/s43074-023-00098-0

PhotoniX

Table 2 Data processing, advantages, limitations and suitable application of classification methods

From: Recent application of Raman spectroscopy in tumor diagnosis: from conventional methods to artificial intelligence fusion

Methods	Data processing	Advantages	Limitations	Suitable application
Peak intensity analysis	• Peak intensity: I_normal vs I_cancer • Peak intensity ratio (R = I_12xx/I_16xx): R_normal vs R_cancer	• Straightforward • Simple	• Low accuracy • Data size increases, accuracy downs	• Obvious characteristic peaks and large differences • Small-scale sample data (10 ~ 100)
Multivariate statistical analysis	-	• High interpretability • Easily implement	• Facing lager amount data, accuracy has limitation	• Medium-scale sample data (100 ~ 1000)
PCA	• Reduces the original Raman spectra to PCs while preserving the features that contribute most to the difference in the Raman spectra	• Reduces data dimensionality to PCs • Retains important data information • Removes background noise	• Relatively low classification accuracy	• Unsupervised method • Exploratory study • Data-reduction algorithm
PLS	• Regression modeling for independent and dependent variables of Raman spectra	• Reduces data dimensionality to key factors • Better selects characteristic variables	• Relatively low classification accuracy	• Supervised method • Data-reduction algorithm
KCA	• Takes the mean of the nearest point to the seed constantly to cluster analysis of Raman spectra	• Simple algorithm principle • Fast processing speed	• K value is difficult to determine • Not necessarily global optimal, but only local optimal	• Unsupervised clustering technique • Exploratory study • Samples with large differences between groups
LDA	• Projects the Raman spectra into the vector space with the maximum between-class distance and the minimum within-class distance	• Commonest classification method • High accuracy	• Overfit if data insufficient	• Powerful supervised technique for classification • Integrates with PCA method
QDA	• Estimates the single covariance matrix for each type of Raman spectra	• Variant of LDA • High accuracy	• Can’t for data dimension reduction	• Supervised technique for classification • Sample analysis
GA	• Feature extraction of Raman spectra, as a stage prior to classification	• Feature selection • Strong robustness	• Low computation speed • Complex programming process	• General optimization technique • Feature extraction of data
Classical machine learning	-	• Higher accuracy • Easily implement	• Poor in training efficiency when processing large-scale data	• Large-scale sample data (1000 ~ 10,000)
SVM	• Seeks to determine the optimal hyperplane that maximizes the distance between the hyperplane and the nearest Raman spectra data sample in a high-dimensional space	• Less prone to overfitting • Avoids local optimum and “curse of dimensionality”	• Poor training efficiency when processing large-scale data	• Nonlinear, multi-dimensional problems • Small sample learning problems
BT	• Changes the weight of Raman spectra data, learns multiple classifiers, and combines these classifiers linearly to improve the performance of classification	• Ensemble learning method • No need to do feature normalization	• Sensitive to abnormal data and • Easy to overfit	• Low dimensional data • Layers not too high
RF	• Uses multiple trees to train and predict Raman spectra data	• Ensemble learning method • Low risk of overfitting	• Relatively lower learning speed	• Limited samples
KNN	• Uses proximity of a single Raman spectral data point to classify or predict groupings	• High precision • Insensitive to outliers	• Relatively large time complexity • Large space complexity	• Small-size samples • Low-dimensional data
Deep learning	-	• Higher accuracy • Good portability	• Large amounts of computation • Complex model design	• Larger-scale sample data (1000 + , 10,000 + , …)
CNN	• Raman spectra/figures as input data, prefers Raman figures as input • Extracts features from input data directly and classifies the observed objects	• Directly extracts features from input data • Classifies the observed objects • Simple architecture • Ease of use	• Depends on quality and features of the data	• Most of the modeling tasks (classification and regression)
RNN	• Raman spectra as input data • Mines wavenumber and intensity information in the Raman spectra data	• Strong learning ability of time series nonlinear data behavior • Stores more long-term sequence information • Mines temporal and semantic information in the data	• Risks of gradient exploding and gradient vanishing	• Sequence data • Time series nonlinear data behavior • Classification and prediction

BT Boosted tree, CNN Convolutional neural network, GA Genetic algorithm, KCA k-means cluster analysis, KNN k-nearest neighbors, LDA Linear discriminate analysis, PCA Principal component analysis, PLS Partial least squares, QDA Quadratic discriminant analysis, RF Random forest, RNN Recursive neural network, SVM Support-vector machines

Back to article page