Development and internal validation of a model to predict type 2 diabetes complications after gestational diabetes


The data used for this study has been anonymized and the ethical review and consent of participants have been revoked by the Institutional Review Board of the University of Montreal Hospital Center. All methods were applied in accordance with the Tri-Council Policy Statement: Ethics for Research Involving Humans.

Study population

We conducted a retrospective cohort study of women who gave birth in a hospital setting in Quebec, Canada, from April 1989 to March 2016 (cohort entry); the women were then followed through 2018 to identify outcomes14. The cohort was constructed from the Maintenance and use of data registry for the study of hospital patients, which includes > 99% of deliveries in Quebec.

People aged 18 to 45 who had GDM in at least one pregnancy were included, with the cohort entry point (t0) in the first pregnancy affected by GDM. GD was defined as abnormal maternal glucose tolerance, first identified during pregnancy and identified using the International Classification of Diseases (ICD) 9th and 10th revision diagnostic codes (Table S1 ). These codes have already been validated and adequately capture GDM diagnoses with >90% specificity and >80% positive predictive values15.16. There are some variations in approaches to GDM identification in different centres, ie one-step or two-step approaches; however, both approaches are endorsed by Diabetes Canada17.

Women who died during their first affected pregnancy and women with pre-existing diabetes or its complications were excluded (Fig. 1).

Figure 1

Development of the study cohort.


The primary outcome was hospitalization for complications of type 2 diabetes within 10 years of delivery of the first pregnancy affected by GDM. Complications of type 2 diabetes were defined as a diagnosis of type 2 diabetes with the development of one or more of the following complications: diabetic coma, acidosis, renal, ophthalmic, neurological, circulatory or other complications resulting from diabetes and identified using ICD-9. and 10 codes, previously validated in studies with 99% specificity and >80% positive predictive values ​​(Table S1).

The secondary outcome was complications of type 2 diabetes occurring at any time (up to 29 years) after delivery of the first pregnancy affected by DG.

The women were followed from entry into the cohort until the appearance of any of the results, death or the end of the study period (March 31, 2018).

statistical analyzes

We developed Cox proportional hazards regression models to predict type 2 diabetes complications, following the steps previously described.18.19and report the process using the transparent report of a multivariate prediction model for individual prognosis or diagnostic guidelines (TRIPOD) (Table S2)20.

Candidate predictors, variable selection and coding

We considered demographic, reproductive and clinical factors known to be associated with an increased risk of type 2 diabetes as potential predictor variables5.21. These factors included maternal age, substance use, morbid obesity, socio-economic deprivation (measured using a composite score of neighborhood income, education and employment)22pregnancy factors such as parity and multifetal pregnancy, and pregnancy complications such as hypertensive disorders of pregnancy (HDP), severe maternal morbidity (SMM)23, stillbirth, preterm delivery, low birth weight, and admission to a neonatal intensive care unit (NICU) or adult intensive care unit (ICU). Candidate predictors were measured at the time of index delivery (cohort entry).

Clinical variables that had low incidence were combined with other similar variables (eg, prior obstetric complications such as MMS, stillbirth, preterm delivery, low birth weight, NICU admission, or neonatal death were combined). History of obstetric complications was then combined with parity as follows: prior obstetric complication (in multiparous women), no prior obstetric complication (in multiparous women), and no prior obstetric complication (in primiparous women). When collinearity (r > 0.5) existed between variables, the most clinically relevant variable was selected.

Continuous candidate predictor variables (e.g., mother’s age) were modeled using restricted cubic splines with three node locations19. We assessed interaction terms and selected predictors that were statistically significant (alpha = 0.10)18. The final variables of the model were selected using LASSO (Least Absolute Selection and Shrinkage Operator) regression18.

Model performance and internal validation

The predictive performance of the model was assessed based on discriminatory accuracy, calibration, and risk stratification18. Discrimination was measured by the vs-statistics, which is the area under the receiver operating characteristic curve (AUROC)19. AUROC ≥ 0.7 was interpreted as good discrimination and 0.6 to 0.7), bad (0.5 24.

Using a risk classification table, we examined the model’s ability to stratify the population into low- and high-risk categories. We divided the population into four risk groups, with the highest calculated risk group corresponding to the overall incidence rate of the outcome in the study population25. Likelihood ratios (LRs) were calculated to assess classification accuracy within each group26. For clinical use, positive LRs (LR+) > 5 or > 10 were interpreted as fair or good rule tests, respectively, while negative LRs (LR−) 24.

The model was assessed for internal validity using the bootstrap method with 200 iterations and overoptimism (i.e. the degree to which a model is overfitted) was reported18.

Secondary analyzes

Using the same selected final variables, we also developed a predictive model for type 2 diabetes complications up to 29 years postpartum and assessed the model’s discriminatory performance.

Sample size

We estimated our sample size based on the rule of thumb of 10 to 20 events per degree of freedom19, to avoid overfitting the model. With a total of 1025 events during follow-up, we had sufficient sample size to consider up to 50 degrees of freedom for candidate predictors.

Analyzes were performed using R version 3.5.1 (The R Project for Statistical Computing).

Comments are closed.