Mutation Spectrum of EGFR From 21,324 Chinese Patients With Non-Small Cell Lung Cancer (NSCLC) Successfully Tested by Multiple Methods in a CAP-Accredited Laboratory

Genotyping epidermal growth factor receptor (EGFR) gene in patients with advanced non-small cell lung cancers (NSCLC) is essential for identifying those patients who may benefit from targeted therapies. Systemically evaluating EGFR mutation detection rates of different methods currently used in clinical setting will provide valuable information to clinicians and laboratory scientists who take care of NSCLC patients. This study retrospectively reviewed the EGFR data obtained in our laboratory in last 10 years. A total of 21,324 NSCLC cases successfully underwent EGFR genotyping for clinical therapeutic purpose, including 5,244 cases tested by Sanger sequencing, 13,329 cases tested by real-time PCR, and 2,751 tested by next-generation sequencing (NGS). The average EGFR mutation rate was 45.1%, with 40.3% identified by Sanger sequencing, 46.5% by real-time PCR and 47.5% by NGS. Of these cases with EGFR mutations identified, 93.3% of them harbored a single EGFR mutation (92.1% with 19del or L858R, and 7.9% with uncommon mutations) and 6.7% harbored complex EGFR mutations. Of the 72 distinct EGFR variants identified in this study, 15 of them (single or complex EGFR mutations) were newly identified in NSCLC. For these cases with EGFR mutations tested by NGS, 65.3% of them also carried tumor-related variants in some non-EGFR genes and about one third of them were considered candidates of targeted drugs. NGS method showed advantages over Sanger sequencing and real-time PCR not only by providing the highest mutation detection rate of EGFR but also by identifying actionable non-EGFR mutations with targeted drugs in clinical setting.


INTRODUCTION
Lung cancer is the leading cause of cancer-related mortality worldwide [1,2]. Approximately 610,000 lung cancer-related deaths were reported in China in 2015 [3]. Non-small cell lung cancer (NSCLC) is the most common histological subtype of lung cancer, accounting for approximately 80-85% of the disease. Targeted therapy based on the identification of actionable genetic/genomic alterations in the disease has led to integration of molecular testing for planning treatment strategies for advanced NSCLC patients [4,5].
Activating mutation in the epidermal growth factor receptor gene (EGFR) is the most frequent genetic alteration in NSCLC. EGFR mutations can be detected in 15% of adenocarcinoma subtype of Caucasian NSCLC but in 40-50% of same subtype of East Asian NSCLC. Exon -19 deletions (19del) and L858R substitution in exon-21 are the 2 classical EGFR mutations which could predict tumor responses to EGFR tyrosine kinase inhibitors (EGFR-TKIs) in NSCLC patients [6][7][8][9][10]. Other uncommon EGFR mutations have also been found to show sensitiveness (eg., Exon-19 insertions, p.L861Q in exon-21, p.G719X in exon-18, and p.S768I in exon- 20) or resistance (eg., most exon-20 insertions) to EGFR-TKIs. T790M substitution in exon-20 is a well-known acquired mutation resistant to first or second generation EGFR-TKIs, but sensitive to third generation EGFR-TKIs. Currently, genotyping EGFR has been recommended by both laboratory and clinical guidelines as evidence-based standard care for advanced NSCLC patients [11][12][13][14].
Various technical platforms are clinically available for genotyping EGFR, including commonly used methods, such as Sanger sequencing, real-time PCR and NGS and occasionally used methods, such as denaturing high performance liquid chromatography (DHPLC) and Luminex liquid chip. Systemically comparing EGFR mutation detection rates detected by these methods routinely used in clinical setting is rarely reported [15,16]. In this study, we retrospectively reviewed our successfully tested EGFR results for 21,324 unselected Chinese NSCLC patients whose specimens were performed in our laboratory, a College of American Pathologists (CAP)certified reference laboratory providing EGFR mutation testing for NSCLC patients.

Patients
In total, 21,324 Formalin-Fixed and Paraffin-Embedded (FFPE) tumor specimens of NSCLC patients from 30 provinces of China were successfully performed in our laboratory for testing EGFR mutations from June 2009 to December 2018. The EGFR testing results were retrospectively reviewed and reported with exclusion of those cases with failed EGFR testing. For the cases with duplicated tests, only the result of the first successful test was counted. Ages of these patients ranged from 16 to 96 years old (median: 63 years old) and the gender percentages were 56.9% for females, 41.5% for males, and 1.6% with unknown gender information. All the EGFR tests were ordered by physicians for therapeutic purposes and were performed at a single testing center. This study was approved by the ethics review board of KingMed Diagnostics.

Identification of Epidermal Growth Factor Receptor Mutations
The numbers of cases tested by different methods were shown in Supplementary Figure S1. According to the standard operation procedures (SOPs) validated in our laboratory, prior to EGFR mutation testing, tumor cell content (TCC) of the specimens was assessed by at least an experienced pathologist. A specimen with ≥20% TCC was required by Sanger sequencing, and recommended for real-time PCR and NGS methods. A specimen with 1-20% TCC was acceptable for testing via realtime PCR or NGS but not by Sanger sequencing. DNA from the specimens was extracted using QIAamp DNA FFPE Tissue Kit according to the manufacturer's protocol (Qiagen China, Shanghai, China). MagMAX FFPE DNA/RNA Ultra Kit was used to isolate both DNA and RNA from the same section of FFPE tissues for the amplicon-based NGS testing (Life Technologies Corporation, Austin, United States). Technical parameters, such as sensitivities and specificities etc., were determined before these methods were used in clinical setting (data not shown).
Sanger sequencing for genotyping EGFR was launched since the year of 2009 and a total of 5,244 cases successfully tested by this technique was included in this study. EGFR exons-18 to -21 were amplified using polymerase chain reaction (PCR) and directly sequenced using ABI 3730xl (Applied Biosystems, Foster City, United States).
Since 2013, real-time PCR-based method for detecting EGFR mutations was applied in our laboratory using a commercial kit, EGFR RGQ PCR Kit (Qiagen China, Shanghai, China). A total of 13,329 cases successfully tested by this technique was included in this study. This technique covers 29 known mutations spanning exons-18 to -21 including exon-18 missense mutations at G719X (G719S, G719A and G719C), exon-19 deletions (19del), exon-20 missense mutations (S768I and T790M) and insertions (20ins), and exon-21 missense mutations (L858R and L861Q). Genotyping EGFR using NGS was launched in 2016 and so far, a total of 2,751 cases was successfully performed and the results were included in this study. and TP53) as well as fusions of 4 genes (ALK, ROS1, RET and NTRK1) (ThermoFisher, Waltham, United States). For these cases, sequencing was performed on an Ion Torrent PGM instrument and data analysis was performed using Torrent Suite Software and Torrent Server. For the remaining 1,662 cases, we adopted a validated capture-based method for library preparation (Integrated DNA Technologies, Inc., Coralville, United States) and performed DNA sequencing using Illumina Nextseq 500 or NovaSeq 6000 systems (Illumina, San Diego, United States). After sequencing, a clinically validated bioinformatics pipeline was used to identify variants in the targeted genes. Sequence variants were interpreted and reported according to the guideline compiled by Association of Molecular Pathology (AMP) [17]. In brief, tier 1 (variants of strong clinical significance) and tier 2 (variants of potential clinical significance) variants were reported while tier 3 (variants of unknown clinical significance) and tier 4 (benign or likely benign variants) variants were not reported. Detailed technical procedures for all the 3 methods were listed in Supplementary Material.

Statistical Analysis
We used Chi-square testing for comparing EGFR mutation detection rates identified by the 3 methods. p-values less than 0.05 were considered statistically significant in all scenarios. IBM SPSS Statistics Version 19 was used for all statistical analysis.

Tumor Cell Content in Different Testing Groups
Our records showed that 5,244 NSCLC samples with 20-90% TCCwere successfully tested by Sanger sequencing, 13,329 NSCLC samples with 1-95% TCC were tested by real-time PCR, and 2,751 NSCLC samples with 1-90% TCC were tested by NGS. In details, 58.5% (12,

Epidermal Growth Factor Receptor Mutation Rates
Of the 21,324 NSCLC samples successfully tested for EGFR, 9,621 of them carried somatic EGFR mutations, representing an average positive EGFR detection rate of 45.1% including 40.3% (2,111/ 5,244) tested by Sanger sequencing, 46.5% (6,202/13,329) tested by real-time PCR, and 47.5% (1,308/2,751) tested by NGS respectively (Figure 1A.). The detection rates obtained by realtime PCR and NGS methods were significantly higher than that obtained by Sanger sequencing (p < 0.001) ( Figure 1A).
Since our SOPs allowed testing NSCLC samples with TCC from 1 to 20% by both real-time PCR and NGS methods, we compared EGFR mutation detection rates by these two methods in the samples with different levels of TCC (1, 5, 10, 15, and 20%). Detailed information was listed in Figure 1B.  were no significant difference about the detection rates by the 2 methods in each of the 5 subgroups (p > 0.05), although real-time PCR method showed higher detection rates in relatively lower TCC (1, 5, and 10%) in contrast to the higher detection rates in samples with relatively higher TCC (15 and 20%) by NGS.
Among the EGFR mutations identified in the 9,621 cases, a total of 72 distinct variants could be recognized (different exon-19 deletions were classified as 19del, various exon-20 insertions were grouped as 20ins, and different G719, E709 and R776 mutations  Figure 3A) as well as MAP2K1 (0.1%), AKT1 (0.1%) and FGFR3 (0.1%) (Data not shown in Figure 3A); 2) 18.3% (N 504,504/2,751) were found to have tumor initiation-and/or progression-related mutations in non-EGFR genes, such as TP53, SMAD4, FBXW7, CTNNB1 and NOTCH1. However    there are no targeted drugs or potential drugs available to these mutated genes products currently; 3) 7.2% (N 197,197/2,751) of them were found to carry neither oncogenic mutations in EGFR nor tumor initiation-and/or progression-related mutations in non-EGFR genes.
We compared the mutation rates of some driver genes including EGFR, ALK, ROS1, RET and KRAS by the two different platforms of NGS testing. The positive detection rates of amplicon-based vs capture-based sequencing were 47.8% ( Statistically, there were no significant difference about the detection rates for these driver gene mutations by the 2 NGS methods (p > 0.05), although amplicon-based method showed relatively higher detection rates for ALK fusions compared to capture-based method.

DISCUSSION
To our knowledge, this research project represents the largest data analysis of EGFR mutational status in Chinese patients with NSCLC by multiple platforms, providing several interesting findings valuable in clinical settings.
NGS testing expanded mutational spectrum of EGFR in NSCLC patients. Although only 2,751 of the 21,324 (12.9%) NSCLC specimens were tested by NGS, we identified 11 novel EGFR mutations (not being previously reported in NSCLC) in exons-18 to -21 by the NGS methods while Sanger sequencing and real-time PCR identified only 2 and 3 novel EGFR mutations in this region respectively (Supplementary Tables S1 and S2). We believe that if all of the 21,324 NSCLC specimens had been analyzed by NGS, all of the rare variants found by Sanger sequencing and real-time PCR would have been identified. Although both NGS and Sanger sequencing detected the region covering exons-18 to -21 of EGFR, Sanger sequencing will miss those EGFR mutations with frequencies of mutant alleles less than 20% (cut-off value determined in our validated data) due to its low technical sensitivity. Both NGS and real-time PCR showed similar technical sensitivity (1% frequency of mutant alleles), but real-time PCR method was designed to detect only 29 hotspot mutations of EGFR and might have missed mutations in the non-hotspot regions. Furthermore, real-time PCR couldn't distinguish the differences within or around 19del, G719X, 20ins variants, for example G719A or G719C, though these variants may show different responses to EGFR-TKIs [18][19][20] (Table 3). In summary, NGS method shows obvious advantages over Sanger sequencing and real-time PCR methods for identifying novel actionable EGFR mutations.
NGS identified a long list of non-EGFR mutations related to tumor initiation and progression, adding additional therapeutic opportunities and/or assessing the prognostic outcomes for NSCLC patients. In this study, 52.5% (1,443/2,751) of the cases tested by NGS did not carry EGFR mutations, similar to the results reported in previous Asian lung adenocarcinoma series [10]. Of the cases tested by NGS, 27% of them were found to have actionable mutations, including ALK (5%), ROS1 (1%), BRAF For ALK, ROS1 and BRAF V600E alterations, targeted drugs have been available in clinical setting. For the other genes in this list, potential targeted drugs providing possible therapeutic opportunities for NSCLC patients are emerging in some international clinical trials (www.clinicaltrials.gov). In addition, 18.3% of the cases tested by NGS carried tumor-mutations in non-EGFR genes classified currently as non-actionable, such as TP53, SMAD4, FBXW7, CTNNB1 and NOTCH1. However, pathogenic variants in these genes were considered to have prognostic or predictive significances for NSCLC patients [21,22]. For the 7.2% of the cases tested by NGS without finding any of tier 1 and tier 2 mutations in neither EGFR nor non-EGFR genes, but we think tier 3 variants (variants of unknown clinical significance-VUS, not reported) found in these cases might represent potentially useful biomarkers for monitoring cancer treatment effects via liquid biopsy. Interestingly, of the 1,308 cases with EGFR mutations found by NGS, 65.3% of the cases (854/1,308) also harbored non-EGFR mutations in 18 tumor-related genes. In NSCLC, oncogenic driver mutations are typically mutually exclusive. However, cases of mutations in multiple driver genes are increasingly reported, as well as in our retrospective study. EGFR and other driver genes including KRAS, ALK, BRAF, NRAS, RET co-alterations are likely to represent certain proportion of cases with multiple mutations in NSCLC. These coalterations may provide prognostic or predictive effects to EGFR-TKIs as reported previously [23][24][25][26].
NGS is the method which could maximize the findings of EGFR mutations as shown in this project. The total EGFR positive mutation rate in our data was 45.1%, concordant with several previous studies of Chinese patients with NSCLC [10,16]. By comparing the EGFR mutation detection rates showed by the 3 methods, 40.3% (2,111/5,244) by Sanger sequencing was significantly lower than that found by real-time PCR (46.5%, 6,202/13,329) and NGS (47.5%, 1,308/2,751) (p < 0.001), indicating that the Sanger sequencing had missed some EGFR mutations which would have been identified if NGS or real-time PCR methods had been applied. The main reason leading to higher detection rates by NGS or real-time PCR is attributed to higher sensitivities of the NGS and real-time PCR (1% frequency of mutant alleles) than that of Sanger sequencing (20% frequency of mutant alleles) ( Table 3). As expected, comparable EGFR-positive rates identified by real-time PCR and NGS were observed in the current study. By stratifying different levels of TCC from 1 to 20% in tumor specimens for analyzing EGFR detection rates, there were no significant differences for EGFR positive mutations rates by either realtime PCR or NGS. This finding suggested that EGFR mutations present in these specimens with TCC from 1 to 20% could be fairly identified by real-time PCR or NGS, providing therapeutic opportunities using EGFR-TKIs for these patients. EGFR mutation rates by Sanger sequencing in the FFPE samples with <40 and ≥40% TCC were 30.8% (757/ 2,454) and 48.5% (1,354/2790) respectively, showing similar EGFR positive rates in FFPE samples with TCC ≥40% by the 3 methods. Based on these findings, we recommend EGFR mutation testing of specimens with ≥1% TCC by NGS or real-time PCR. Sanger sequencing could be considered for specimens with ≥40% TCC due to its low cost (Table 3). However, considering the facts that the EGFR VAFs overlapped greatly among groups with different TCC, showing high levels of variations regarding EGFR VAFs ranging from 0.01 to 0.97 (EGFR VAF was up to 0.675 in one sample with 1% TCC, down to 0.01 in another sample with 20% TCC), some low level of EGFR VAFs in high level TCC specimens could have been missed by Sanger sequencing method.
We acknowledge that there are several limitations or concerns in this research: 1) Since some features, such as pathologic diagnosis (adenocarcinoma, squamous or other histological types), smoking status, grades, and stages of the disease, were not fully described in their requisition forms when EGFR testing were ordered for therapeutic purpose, further stratification analysis were not implemented. 2) Follow-up data were not available for monitoring responses of TKIs treatment targeting to those uncommon EGFR mutations, complex EGFR mutations or co-existing mutations in both EGFR and non-EGFR genes. Even for NSCLC patients with a classical EGFR mutation (L858R in exon-21 or 19del), it was speculated that different outcomes after EGFR-TKIs treatments might be present [27]. 3) Although some identified novel variants of EGFR and non-EGFR were considered relevant to the initiation and progression of NSCLC, further investigations about their abnormal functions and pathogenicity are required.
In conclusions, we presented a large dataset of EGFR mutations in Chinese NSCLC patients including a long list of novel EGFR mutations identified, providing valuable information for the diagnosis and subsequent treatment of the disease. NGS method showed advantages over Sanger sequencing and real-time PCR not only by providing the highest mutation detection rates of EGFR but also by identifying actionable non-EGFR mutations with available targeted drugs in clinical setting.

DATA AVAILABILITY STATEMENT
Data are available from the corresponding author on reasonable request.

ETHICS STATEMENT
This study was approved by the ethics review board of KingMed Diagnostics.

AUTHOR CONTRIBUTIONS
LM and SY conceptualized the study; XHO, YX, YT, XYO and CH performed experiments; WZ, XL, SZ, CZ, DZ, XD and PL performed the data analysis; LM and SY wrote the manuscript with input from all authors; All authors have read and approved the final version of the manuscript.