Liver-Metastasis-Related Genes are Potential Biomarkers for Predicting the Clinical Outcomes of Patients with Pancreatic Adenocarcinoma

It is widely acknowledged that metastasis determines the prognosis of pancreatic adenocarcinoma (PAAD), and the liver is the most primary distant metastatic location of PAAD. It is worth exploring the value of liver-metastasis-related genetic prognostic signature (LM-PS) in predicting the clinical outcomes of PAAD patients post R0 resection. We collected 65 tumors and 165 normal pancreatic data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project (GTEx), respectively. Differentially expressed genes (DEGs) between primary tumor and normal pancreatic samples were intersected with DEGs between primary tumor samples with liver metastasis and those without new tumor events. The intersected 45 genes were input into univariate Cox regression analysis to identify the prognostic genes. Thirty-three prognostic liver-metastasis-related genes were identified and included in least absolute shrinkage and selection operator (LASSO) analysis to develop a seven-gene LM-PS, which included six risk genes (ANO1, FAM83A, GPR87, ITGB6, KLK10, and SERPINE1) and one protective gene (SMIM32). The PAAD patients were grouped into low- and high-risk groups based on the median value of risk scores. The LM-PS harbored an independent predictive ability to distinguish patients with a high-risk of death and liver metastasis after R0 resection. Moreover, a robust prognostic nomogram based on LM-PS, the number of positive lymph nodes, and histologic grade were established to predict the overall survival of PAAD patients. Besides, a transcription factor‐microRNA coregulatory network was constructed for the seven LM-PS genes, and the immune infiltration and genomic alterations were systematically explored in the TGCA-PAAD cohort.


INTRODUCTION
Pancreatic adenocarcinoma (PAAD) is one of the most lethal malignancies with strong metastatic ability. It is predicted to become the second most common cause of cancer-related death within the next decade [1,2]. Despite advances in diagnostic techniques, radiotherapies and systemic therapies for PAAD, the five-years overall survival (OS) remains 10%, because 80-85% of the patients are initially diagnosed with either unresectable or metastatic tumors [3,4]. For the small fraction of the patients with a resectable and localized tumor, the five-years survival rate post-operation was 20% [3]. It is widely accepted that metastasis determines the prognosis of patients with pancreatic cancer [5]. Compared with patients with localized tumors, the survival of patients with metastasis is only 6-8 months [6]. The most primary distant metastatic location of PAAD is the liver [7]. Therefore, it is meaningful to explore the potential biomarkers with the ability to distinguish patients with unfavorable prognosis and high-risk of liver metastasis.
The hepatic metastasis process of PAAD involves complicated steps such as the adhesion of dissociated pancreatic cancer cells toward the liver, the formation of the remodeled extra-cellular matrix (ECM), the angiogenesis for micro-metastasis, and the construction of immune escape [7]. Previous studies have revealed that pancreatic tumors are highly heterogeneous at both cellular and molecular levels [8,9]. Molecular biomarkers play increasingly important roles in predicting the prognosis of patients with PAAD [10]. Multiple prognostic models with reliably predictive value have been established based on the mining of public databases such as The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and Gene Expression Omnibus (GEO) [11][12][13]. Recently, Venkat et al. reported that the alternative polyadenylation promoted the expression of the protumorigenic gene in pancreatic ductal adenocarcinoma (PDAC) by mining the integrated data of the Genotype-Tissue Expression (GTEx) project and TCGA [14]. There are few studies on mRNA combination biomarkers for the liver metastasis of PAAD. In the current study, we hypothesized that the differentially expressed genes associated with liver metastasis might harbor the potential for predicting the prognosis of patients with PAAD.
In this study, we integrated the mRNA expression data of normal pancreas and PAAD tissues from the GTEx and the TCGA datasets, respectively. A seven-gene prognostic signature was constructed with the ability to predict both OS and liver metastasis for PAAD patients following R0 resection.

Patient Selection and Data Retrieval
Patient inclusion criteria in the TCGA dataset were [1] PAAD patients with a surgical operation of R0 resection [2], patients with no new tumor event or with liver metastasis during followup. The exclusion criteria were [1] patients with survival period <30 days or with incomplete survival data [2], patients with missing data of gene expression. Sixty-five patients were included in this study. During the follow-up period, the numbers of patients with liver metastasis and with no new tumor event were 15 and 50, respectively. The corresponding clinicopathological information of the PAAD patients was obtained via Genomic Data Commons Data Portal (https:// portal.gdc.cancer.gov/).
The mRNA expression data [Fragments Per Kilobase of exon model per Million mapped fragments (FPKM)] of the PAAD samples in the TCGA dataset and the normal pancreas samples in the GTEx dataset were reprocessed using UCSC xena (https://xenabrowser.net/) to avoid data imbalance and transformed into log 2 (FPKM+1). The batch effects between the TCGA and the GTEx cohorts were minimized by "limma" R package. To assess the batch effects, the variation of housekeeping gene expression between the TCGA and the GTEx datasets was evaluated as Eisenberg et al. reported [15]. The mRNA expression profiles of the 65 PAAD and 165 normal pancreas samples were obtained for downstream analysis. The GRCh38 file was downloaded from the Ensembl website (https://asia.ensembl.org/index.html) for annotation of the mRNAs.

Identification of Differentially Expressed Genes Associated with Liver Metastasis
Based on the threshold of |logFC| > 1 and p value <0.05, the differentially expressed genes (DEGs) of normal pancreas samples vs. primary tumor samples, and DEGs of primary tumor samples without new tumor event vs. primary tumor samples with liver metastasis during follow-up were screened by Wilcoxon test. Venn diagram tool (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to identify the intersection of DEGs in PAAD tissues and PAAD tissues with liver metastasis.
Prognostic Gene Signature Construction Based on Livermetastasis-related mRNAs.
Univariate regression Cox analysis was performed to filter mRNAs associated with OS. Then the identified mRNAs were included in the least absolute shrinkage and selection operator (LASSO) Cox regression model to develop a liver-metastasisrelated prognostic signature (LM-PS) for the PAAD patients involving seven liver-metastasis-related genes and derive the regression coefficient of each gene by "glmnet" and "survival" R packages. Thereafter, risk scores for patients were calculated according to the following formula: where Exp i represents the expression of each prognostic mRNA in the LM-PS, and Coef i represents the coefficient of the corresponding mRNA. PAAD patients were grouped into lowand high-risk groups based on the median value of the risk scores. Principal component analysis (PCA) was performed for the two groups using the R package "factoextra".

Independence of the Liver-Metastasis-Related Prognostic Signature for Predicting Overall Survival and Liver Metastasis
In the TCGA dataset, the liver-metastasis-related prognostic signature and corresponding clinicopathological factors of the PAAD patients were included in univariate/multivariate Cox proportional hazards analysis and logistic regression model analysis to identify the independent predictors of OS and liver metastasis, respectively. A nomogram was established based on the independent predictors of OS with the "rms" R package. The receiver operating characteristic (ROC) curves and the area under the curve (AUC) values were used to evaluate the predictive ability of the established models in this study.

Functional Enrichment Analysis
Based on the threshold of |logFC| > 1 and p value <0.05, 408 DEGs between high-and low-risk groups were identified and uploaded to the "Metascape" website (https://metascape.org/) for enrichment analysis involving Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG).

The Immune Infiltration Analysis of PAAD
The infiltration levels of the 22 immune cells were determined to assess the tumor microenvironment between high-and low-risk groups and between liver-metastasis and non-new-tumor groups in the TGCA dataset by the CIBERSORT R script v1.03. A p value <0.05 was set as the significance threshold.

Gene Mutation Profiles of the PAAD Patients in the TCGA Dataset
The cBioPortal website (http://www.cbioportal.org/) is a large depository that provides access to cancer genomics data carried out by many institutions including the TCGA database [16]. In this study, the cBioPortal was used to explore the connection between the gene mutation and the prognosis of the 65 PAAD patients in the TCGA dataset.

Transcription Factor-microRNA Coregulatory Network Construction
The seven liver-metastasis-related genes identified by the Lasso Cox regression model were uploaded to NetworkAnalyst (https://www. networkanalyst.ca/faces/home.xhtml) to construct a transcription factor-microRNA coregulated network. The transcription factor-microRNA coregulatory data were predicted from the Regulatory Network Repository. The transcription factor-microRNA coregulatory network was visualized by Cytoscape 3.8.1.

Statistical Analysis
PAAD patients in the TCGA cohort were divided into high-and low-risk groups based on the median value of the risk scores. Mean ± standard deviation or medians (with interquartile range), and counts with percentages were used to present continuous variables and categorical variables, respectively. Continuous variables were compared by Student's t-test or Mann-Whitney U-test, and categorical variables by Chi-square test. Kaplan-Meier curves were plotted to analyze patient survival. The log-rank tests were performed to analyze the differences in patient survival. The statistical analysis was carried out with the R software 4.0.2, SPSS 22.0, and GraphPad Prism 8.0.1. p values <0.05 were considered to indicate statistical significance (two-sided).

Identification of Liver-Metastasis-Related Genes in PAAD Patients
This study included 65 PAAD patients with R0 resection from the TCGA dataset, and the corresponding clinicopathological factors of the patients were presented in Table 1. As shown in the table, with the exception of survival status and pathologic M stage, there was no significant difference in terms of all the other clinicopathological variables that were analyzed. Previous studies successfully compared the gene expression between the GTEx and TCGA datasets, and minimized the batch effects in data processing [14,17]. To assess the batch effects in this study, the variation of housekeeping genes expression between the TCGA and GTEx datasets was compared, and a high correlation between the PAAD and normal pancreas tissues was observed (Pearson R 0.89, p < 0.0001, Supplementary Figure S1), indicating minimal batch effects between the two datasets. Subsequently, we screened DEGs based on the standards of |logFC| > 1 and p < 0.05 between the 65 PAAD tissues vs. 165 normal pancreas tissues and between the 15 primary PAAD tissues with liver metastasis vs. 50 primary PAAD tissues without new tumor event. The results showed that 4,996 and 62 genes were differently expressed in PAAD tissues and PAAD tissues with liver metastasis, respectively. Thereafter, the Venn diagram web tool was used to get the intersection of the two groups of DEGs, and 45 liver-metastasis-related genes were identified. The study flow chart is shown in Figure 1.

Construction of the LM-PS in the TCGA Cohort
To build the LM-PS for forecasting the OS of PAAD patients, we included the expression of the 45 liver-metastasis-related genes into the univariate Cox regression analysis and identified 33 genes significantly associated with OS (Supplementary Table S1). Subsequently, based on the 33 prognostic genes, the LASSO Cox analysis (Figures 2A,B) was performed to construct a seven-gene LM-PS containing anoctamin-1 (ANO1), family with sequence similarity 83, member A (FAM83A), G proteincoupled receptor 87 (GPR87), integrin beta-6 (ITGB6), kallikrein-10 (KLK10), serine protease inhibitor, clade E member 1 (SERPINE1), and small integral membrane protein 32 (SMIM32). The detailed information and the prognostic ability of the seven liver-metastasis-related genes were presented in Table 2 and Figure 2C, respectively. Next, the risk score for predicting OS was calculated for each patient in the training dataset based on the expression and coefficients ( Figure 2D) of the seven liver-metastasis-related genes. Then the patients were separated into low-and high-risk subgroups based on the median value of the risk scores ( Figure 2E). The survival status of the patients in the high-and low-risk groups was visualized in Figure 2F showing that the mortality rate post R0 resection was higher in the high-risk group than in the lowrisk group. The heatmap showed that the expression of the risk genes (ANO1, FAM83A, GPR87, ITGB6, KLK10, and SERPINE1) were up-regulated, while the expression of the protective gene (SMIM32) was down-regulated ( Figure 2G) with the increasing risk score. In addition, the PCA was used to assess the biological difference between low-and high-risk groups based on the expression of the seven liver-metastasisrelated genes. The result demonstrated that the low-and highrisk patients were distributed separately in distinct directions ( Figure 2H).

Prognostic Analysis of the Seven Liver-Metastasis-Related Genes and the LM-PS
The patients in the training dataset were divided into two groups based on the median expression of each liver-metastasis-related gene. The Kaplan-Meier curves showed that high expression of ANO1, FAM83A, GPR87, ITGB6, KLK10, SERPINE1, and low expression of SMIM32 were significantly associated with worse patient survival ( Figures 3A-G). In addition, based on the two different risk groups distinguished by the LM-PS, the Kaplan-Meier curve also showed that patients with high-risk had a significantly lower OS rate than those with low-risk (p 6.093E-05, Figure 3H). The OS predictive ability of the LM-PS was confirmed by the ROC curves in the TCGA cohort with 1year AUC 0.864, 2-years AUC 0.899, and 3-years AUC 0.856 ( Figure 3I).
The distribution of clinicopathological factors with the risk score increasing is shown in Figure 3J. In addition, risk scores were observed to be higher in patients with G2 and G3/G4, pathologic stage II, and liver metastasis compared with the corresponding control groups but were not significantly associated with age, gender, pathologic T stage, or pathologic N stage ( Figures 3K-Q).

The Prognostic Ability of the LM-PS to Predict Liver Metastasis
As expected, the Kaplan-Meier curve showed that the OS rate was remarkably lower in PAAD patients with liver metastasis after R0 resection than those without new tumor events ( Figure 4A). This study tried to provide an insight into the factors associated with the liver metastasis of PAAD. The risk score, age, gender, histologic grade, pathologic stage, pathologic T stage, pathologic N stage, and the number of positive lymph nodes were included in univariate logistic regression model analysis. Interestingly, only risk score was  Table 3]. The ROC curve confirmed that the risk score harbored a good ability to predict liver metastasis of PAAD in the TCGA cohort (AUC 0.756, Figure 4B).

Validation of the LM-PS in the TCGA, GEO and ICGC Datasets
In the whole TCGA-PAAD cohort, the forest plot showed that high expression of ANO1, FAM83A, GPR87, ITGB6, KLK10, SERPINE1, and low expression of SMIM32 were significantly   associated with worse patient survival ( Figure 5A). Based on the LM-PS, risk scores were calculated for patients in the whole TCGA-PAAD and GSE57495 cohort and obtained from the SurvExpress website for the ICGC cohort, respectively. The clinicopathological data of the GSE57495 were presented in Supplementary Table S2. Patients in the three datasets were divided into high-and low-risk groups based on the median value of risk scores, respectively. The Kaplan-Meier curves showed that the high-risk patients had significantly lower OS rates than the low-risk patients in the whole TCGA-PAAD, GSE57495, and ICGC datasets ( Figures 5B-D). Risk scores and survival status distributions of GSE57495 and ICGC were presented in Supplementary Figures S2A-D.
Then the ROC curves were plotted to assess the prognostic values of the risk scores. The 1-, 2-, and 3-years AUCs were 0.766, 0.786, and 0.748 in the whole TCGA dataset, respectively ( Figure 5E); 0.601, 0.710, and 0.645 in the GSE57495 dataset, respectively (Supplementary Figure  S2E); and 0.719, 0.664, and 0.601 in the ICGC dataset, respectively (Supplementary Figure S2F). These results indicated that the LM-PS harbored a moderate OS predictive ability for PAAD patients.

The Independent Prognostic Value of LM-PS for PAAD Patients
Univariate and multivariate Cox regression analyses were used to assess the prognostic value of the LM-PS in the TCGA-PAAD dataset. In the training set, the risk score and clinicopathological factors of PAAD patients were included in univariate Cox analysis, and the results showed that risk score was significantly associated with OS [hazard ratio (HR) 1.132, 95% CI 1.077-1.190, p < 0.001; Table 4]. Furthermore, the multivariate Cox analysis demonstrated that risk score was an independent predictor of OS (HR 1.126, 95% CI: 1.060-1.197, p < 0.001; Table 4).
Univariate and multivariate Cox regression analyses were also performed in the whole TCGA-PAAD dataset. The univariate Cox regression analysis showed that risk score was significantly associated with OS (HR 2.668, 95% CI 1.761-4.042, p < 0.001; Table 5). The multivariate Cox regression analysis further indicated that risk score was an independent predictor of OS in the whole TCGA-PAAD dataset (HR 2.762, 95% CI: 1.680-4.541, p < 0.001; Table 5).

Construction and Validation of a LM-PS-Based Nomogram in the TCGA Dataset
In the training cohort, the risk score, number of positive lymph nodes, and histologic grade were included to establish a clinically applicable quantitative nomogram to predict the OS of PAAD patients ( Figure 6A). The calibration plot revealed that the predicted 1-, 2-and 3-years OS using nomogram agreed with the observed OS in the TCGA dataset ( Figure 6B). Furthermore, ROC curves were used to assess the prediction accuracy of the nomogram both in the training cohort and the whole TCGA-PAAD cohort. The AUCs values for 1-, 2-and 3-years in PAAD were 0.947, 0.913, and 0.940 in the training cohort, respectively   Then the ROC curves were also plotted to assess the prognostic values of the nomogram and other independent predictors (risk score, histologic grade, and the number of positive lymph nodes) in the whole TCGA-PAAD dataset. The AUCs of the risk score, histologic grade, number of positive lymph nodes, and the nomogram were 0.753, 0.619, 0.665, and 0.815, respectively, indicating that the inclusion of risk score added to the accuracy of the other two factors included in the nomogram ( Figure 6E).

The Landscape of Immune Infiltration in PAAD
The landscape of immune cell infiltration of PAAD in the TCGA cohort was explored between low-and high-risk groups and between liver-metastasis and no-new-tumor groups using the CIBERSORT algorithm. The violin plots showed that patients in the high-risk subgroup harbored significantly higher prevalence in M0 macrophages (p 0.004) and Eosinophils (p 0.003), but a lower prevalence in naive B cells (p 0.022) than those in the low-risk subgroup (Supplementary Figure S3A). However, no immune cell proportions were observed to be different between nonnew-tumor and liver-metastasis subgroups (Supplementary Figure S3B).

Transcription Factor -microRNA Coregulatory Network
The transcription factor-microRNA coregulatory network based on ANO1, FAM83A, GPR87, ITGB6, KLK10, SERPINE1, and SMIM32 was obtained from NetworkAnalyst. This network contains the seven genes of the LM-PS, 63 transcription factors, and 37 microRNAs, with 86 associations between the LM-PS genes and transcription factors, and 38 associations between the LM-PS genes and microRNAs. As shown in Figure 7, the transcription factors FOS, JUND, USF1, and MYC were observed to regulate three LM-PS genes, the microRNA hsa-miR-224 was observed to modulate two LM-PS genes, and SERPINE1 was the most frequently modulated LM-PS gene. The increase of ANO1, FAM83A, GPR87, ITGB6, and the decrease of SMIM32 were significantly associated with the increase of SERPINE1, while no significant correlation was observed between the expression of KLK10 and SERPINE1 in the TCGA dataset ( Supplementary Figures S4A-F).

Genomic Alterations in PAAD
We explored the genomic alteration for the TCGA-PAAD cohort in this study (data of 64 out of the 65 patients could be found on the cBioPortal website). In this study, 45.3% of the patients (29/ 64) had at least one LM-PS gene alteration with FAM83A, SERPINE1, and GPR87 being the most frequently altered genes (22%, 14%, and 11%, respectively; Supplementary Figure S5A). The oncoprints of low-and high-risk subgroups are shown in Figures 8A,B. Patients in the high-risk subgroup LM-PS, liver-metastasis-related prognostic signature; OS, overall survival; PAAD, pancreatic adenocarcinoma; TCGA, The Cancer Genome Atlas; HR, hazard ratio; CI, confidence interval.  harbored significantly higher genetic alteration frequency in FAM83A (p 0.002), GPR87 (p 0.037), but lower frequency in SMIM32 (p 0.024) than those in the low-risk subgroup (Supplementary Table S3). In total, no significant difference was observed in OS between the altered and non-altered subgroups (Supplementary Figure S5B). For the individual LM-PS gene alteration, compared to patients with no genetic alteration, patients with alteration of FAM83A and ITGB6 were significantly associated with lower OS (Supplementary Figures S5C,D), while those with alteration of SMIM32 were significantly associated with higher OS (Supplementary Figure S5E). The most frequent mutated genes between the patients with no LM-PS gene altered and those with at least one LM-PS gene altered were also explored. The results showed that the altered subgroup harbored higher enrichment of the V-  Figure S5F).

Functional Enrichment Analysis of DEGs
GO and KEGG enrichment analyses were performed for the 408 DEGs between high-and low-risk subgroups to identify the most relevant biological processes and pathways. The GO annotations showed that the DEGs enriched in GO terms mainly related to ECM, cell adhesion, and locomotion such as ECM organization, positive regulation of locomotion, cell-substrate adhesion, cell adhesion molecule binding, cell junction organization, collagen binding, regulation of cell adhesion, and ECM binding ( Figure 9A). The most enriched KEGG pathway was also ECM-receptor interaction. Additionally, the KEGG enrichment analysis showed that DEGs were enriched in several tumorrelated pathways such as proteoglycans in cancer, pathways in cancer, transcriptional misregulation in cancer, prostate cancer, and p53 signaling pathway ( Figure 9B).

DISCUSSION
The current study was mainly based on the TCGA dataset. DEGs between primary PAAD tissues with liver metastasis and those FIGURE 7 | Transcription factor (blue squares)-microRNA (green squares) coregulatory network for the seven liver-metastasis-related prognostic genes (red circles).
without new tumor events were intersected with DEGs between primary tumor and normal pancreatic tissues, and 45 livermetastasis-related genes were identified. Thirty-three out of the forty-five DEGs were identified to be significantly associated with the OS of PAAD patients, and seven of them (ANO1, FAM83A, GPR87, ITGB6, KLK10, SERPINE1, and SMIM32) were identified by the LASSO model to construct an LM-PS for predicting the OS of PAAD patients post R0 resection. The up-regulation of ANO1, FAM83A, GPR87, ITGB6, KLK10, and SERPINE1 was significantly associated with poor outcomes of PAAD patients, however, the up-regulation of SMIM32 was protective for the prognosis of PAAD patients. Based on the median value of risk scores generated from the LM-PS, PAAD patients were grouped into the low-and high-risk subgroups and the high-risk subgroup had a significantly lower OS rate than those in the low-risk subgroup. The prognostic value of the LM-PS in the TCGA dataset was validated in the GEO and ICGC datasets. Subsequently, the multivariate Cox analysis and logistic regression analysis confirmed that the LM-PS was a reliable predictor of OS and liver metastasis, respectively. Furthermore, a robust prognostic nomogram based on risk scores, the number of positive lymph nodes, and histologic grade were established to predict 1-, 2-, and 3-years OS for PAAD patients. In addition, immune infiltration analysis showed that only a few kinds of immune cell proportions were different between high-and lowrisk subgroups. A transcription factor-microRNA coregulatory network based on the seven LM-PS genes was constructed for viewing the potential regulatory mechanism of these genes. The cBioPortal was used to give an insight into the impact of genomic alterations on the prognosis of PAAD patients. The genetic alteration frequency in FAM83A, GPR87, and SMIM32 was significantly different between high-and low-risk groups. No significant difference was observed in patient survival between altered and non-altered subgroups. However, compared with those with non-alteration, patients with alterations of FAM83A and ITGB6 were significantly associated with lower OS, while patients with alterations of SMIM32 were significantly associated with higher OS. These results indicate that mutations of FAM83A, ITGB6, and SMIM32 may play important roles in tumor modulation. Additionally, the LM-PS gene altered subgroup was observed to harbor a gene mutation enrichment in KRAS, CDKN2A, and TP53. The regulatory mechanism of the KRAS, TP53, and CDKN2A has been well discussed. It has been widely acknowledged that KRAS mutation is associated with the poor prognosis of patients with pancreatic cancer [18]. KRAS mutates in approximately 90% of the PDAC cases and promotes the disease mainly by activating the RAS-RAF and PI3K-AKT pathways (an intracellular signaling pathway regulating the cell cycle) [10]. CDKN2A, which encodes protein p16, mutates in more than 90% of the PDAC cases, and the CDKN2A function loss could promote the cell cycle transition from the G1 to S phase [3,19]. TP53, whose mutation was observed in 50-70% PDAC, is one of the critical tumor suppressor genes mutating in the later stages of the disease. The aberrant regulation of TP53 promotes tumor progression by inhibition of apoptosis, regulation of cell cycle, and improvement in cell survival [3].
The modulation of most of the LM-PS genes in this study has been reported to be associated with the progression of pancreatic cancer. A Danish study reported that ANO1 (TMEM16A) was the main component of the Ca 2+ -activated Cl − channel, which was upregulated in PDAC cell lines. ANO1 was reported to be crucial for cell migration but was not significantly associated with cell proliferation in PDAC [20]. Crottès et al. further demonstrated that the high expression of ANO1 promoted pancreatic cancer cell migration by regulating liganddependent EGFR signaling pathway and resulted in a low probability of patient survival [21]. These results are consistent with our study, which indicates that ANO1 is a metastasis-related gene in pancreatic cancer.
As a member of the family with sequence similarity 83 (FAM83), FAM83A is, remarkably, up-regulated in PAAD and significantly associated with poor clinical outcomes. Overexpression of FAM83A markedly promotes cancer stem cell traits and chemoresistance by activating the Wnt/β-catenin signaling and TGF-β signaling in pancreatic cancer [22]. The high expression of FAM83A is also essential for the tumorigenesis and the maintaining of MEK/ERK signaling to prevent cell death in pancreatic cancer cells [23]. These results indicate that FAM83A is an important oncogene and a potential biomarker for predicting prognosis and therapeutic targets in pancreatic cancer.
As a G protein-coupled receptor, GPR87 is significantly overexpressed in PAAD tissues and is an independent risk predictor of OS. Elevated GPR87 expression promotes cancer stem cell expansion by regulating the JAK2/STAT3 pathway [24]. A study by Wang et al. demonstrated that up-regulation of GPR87 markedly promoted angiogenesis, proliferation, and resistance to chemotherapy-induced apoptosis of pancreatic cancer by activating the NF-κB signaling pathway [25].
The Kallikreins family are a series of serine proteases modulating the proteolysis scene. They have been reported to be closely correlated with angiogenesis and metastasis [26,27]. Cao et al. reported that the over-expression of KLK10 was observed in the PDAC, especially in those with lymphatic and distant metastasis, and was significantly associated with poor prognosis. The study further revealed that the KLK10 promoted the invasive and metastatic phenotype of PDAC by regulating EMT and FAK-SRC-ERK signaling [28].
As a crucial regulator in the plasminogen activator system, SERPINE1 encodes the plasminogen activator inhibitor and was reported to markedly modulate tumor invasion and proliferation and was negatively associated with the OS of PDAC patients [29,30]. Recently, the study of Akula et al. showed that SERPINE1 was mainly targeted by TP53/miR-34a axis in PDAC [31]. Studies about ITGB6, a member of the ITGB superfamily, in pancreatic carcinogenesis were limited. Only Zhuang et al. reported that the overexpression of ITGB6 in PAAD was significantly associated with the methylation level of CpGs (cg23008083) in promoter region [32]. There has been no study concerning the modulation of SMIM32 in pancreatic cancer, which is a potential protective biomarker and is worth exploring in the future.
The GO enrichment analysis showed that the DEGs were mainly enriched in GO terms associated with ECM, cell adhesion, and locomotion. Additionally, the most enriched KEGG pathway was also ECM-receptor interaction. As a reservoir of numerous signaling molecules, the ECM plays an essential role in the tumor microenvironment and has been found to promote metastasis by the degradation of 500-600 proteases [33][34][35]. In addition, several tumor-related pathways were also highlighted in the KEGG analysis. Therefore, we hypothesize that some phenotypes of pancreatic cancers harbor the ability to disorder normal ECM organization and cell-substrate adhesion in the tumor microenvironment to promote invasion and metastasis.
Recent studies have demonstrated that a low fraction of naive B cells and a high fraction of M0 macrophages were correlated with the decreased OS of PDAC patients [36,37]. These results were consistent with our results showing that naive B cells were significantly down-regulated, whereas M0 macrophage was significantly up-regulated in the high-risk subgroup. As crucial mediators of the tumor microenvironment, macrophages have been reported to promote angiogenesis, proliferation, and metastasis in solid tumors [38]. The M0 macrophages can be polarized into antitumoral M1 phenotype or protumoral M2 phenotype. The functions and phenotypes of tumor-associated macrophages have been reported to be similar to M2 mononuclear cells [39]. Ye et al. reported that tumor associated macrophages promoted pancreatic cancer by regulating the Warburg effect via the CCL18/NF-kB/VCAM-1 axis [40]. However, no significant difference in the proportion of M1 or M2 was observed between low-and high-risk subgroups in our study. Therefore, the mechanism of how M0 macrophages impact the prognosis of pancreatic cancer needs further exploration.
In addition, the transcription factors FOS, JUND, USF1, and MYC were found to regulate three LM-PS genes in the transcription factor-microRNA coregulatory network. FOS and JUND belong to the Activator Protein 1 family [41]. FOS, which encodes leucine zipper protein, was reported to be over-expressed in pancreatic cancer and was closely correlated with tumor proliferation, differentiation, and apoptosis [42]. JUND was also reported to regulate the progression of pancreatic cancer by activating the tumor suppressor gene RASSF10 [43]. The upregulation of USF1 has been reported to promote multiple solid tumors such as hepatocellular carcinoma, gastric carcinogenesis, and glioma [44][45][46]. However, there are few reports on how USF1 modulates the progression of pancreatic cancer, and further studies are needed. As an oncogene that is widely implicated in the pathogenesis of malignancies, MYC is reported to be indispensable in KRAS-driven pancreatic carcinogenesis, and the activation of MYC promotes sporadic liver metastasis in PDAC [47].
The limitations of our study are first, that it was a retrospective study and, therefore, no adjustment could be made for confounding factors that might have influenced the clinical outcomes. Second, batch effects between TCGA and GTEx datasets cannot be eliminated, though they have been minimized and assessed. Third, limited samples of PAAD patients were included in this study, and prospective large sample studies are needed in the future.
In conclusion, our study constructed a seven-gene signature with the prognostic ability to identify PAAD patients with a highrisk of death and liver metastasis after R0 resection. Furthermore, a clinically applicable nomogram incorporating genetic features and clinicopathological factors was established to predict the OS of PAAD patients. The nomogram may facilitate an individual therapeutic strategy, early intervention, and delayed cancer progression in clinical practice.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
DF and JL conceived and designed the study. YD, JT, and BY were involved in the acquisition, analysis, and interpretation of the data. YD and JT wrote the original draft of the manuscript. BY and KL prepared the figures and tables. DF and JL critically revised the manuscript for important intellectual content. All authors reviewed and approved the manuscript.