Reanalysis of cancer mortality in Japanese A-bomb survivors exposed to low doses of radiation: bootstrap and simulation methods

Dropkin, Greg

doi:10.1186/1476-069X-8-56

Research
Open access
Published: 09 December 2009

Reanalysis of cancer mortality in Japanese A-bomb survivors exposed to low doses of radiation: bootstrap and simulation methods

Greg Dropkin¹

Environmental Health volume 8, Article number: 56 (2009) Cite this article

5841 Accesses
5 Citations
2 Altmetric
Metrics details

Abstract

Background

The International Commission on Radiological Protection (ICRP) recommended annual occupational dose limit is 20 mSv. Cancer mortality in Japanese A-bomb survivors exposed to less than 20 mSv external radiation in 1945 was analysed previously, using a latency model with non-linear dose response. Questions were raised regarding statistical inference with this model.

Methods

Cancers with over 100 deaths in the 0 - 20 mSv subcohort of the 1950-1990 Life Span Study are analysed with Poisson regression models incorporating latency, allowing linear and non-linear dose response. Bootstrap percentile and Bias-corrected accelerated (BCa) methods and simulation of the Likelihood Ratio Test lead to Confidence Intervals for Excess Relative Risk (ERR) and tests against the linear model.

Results

The linear model shows significant large, positive values of ERR for liver and urinary cancers at latencies from 37 - 43 years. Dose response below 20 mSv is strongly non-linear at the optimal latencies for the stomach (11.89 years), liver (36.9), lung (13.6), leukaemia (23.66), and pancreas (11.86) and across broad latency ranges. Confidence Intervals for ERR are comparable using Bootstrap and Likelihood Ratio Test methods and BCa 95% Confidence Intervals are strictly positive across latency ranges for all 5 cancers. Similar risk estimates for 10 mSv (lagged dose) are obtained from the 0 - 20 mSv and 5 - 500 mSv data for the stomach, liver, lung and leukaemia. Dose response for the latter 3 cancers is significantly non-linear in the 5 - 500 mSv range.

Conclusion

Liver and urinary cancer mortality risk is significantly raised using a latency model with linear dose response. A non-linear model is strongly superior for the stomach, liver, lung, pancreas and leukaemia. Bootstrap and Likelihood-based confidence intervals are broadly comparable and ERR is strictly positive by bootstrap methods for all 5 cancers. Except for the pancreas, similar estimates of latency and risk from 10 mSv are obtained from the 0 - 20 mSv and 5 - 500 mSv subcohorts. Large and significant cancer risks for Japanese survivors exposed to less than 20 mSv external radiation from the atomic bombs in 1945 cast doubt on the ICRP recommended annual occupational dose limit.

Peer Review reports

Background

Analyses of cancer mortality in survivors of the atomic bombing of Japan in 1945 have played a central role in establishing risk estimates and radiation protection limits for workers and the wider public. While many people exposed at Hiroshima and Nagasaki received fairly high doses, the recommended occupational and environmental dose limits aim to control the risks from much lower doses. Typically, analyses of the full cohort of Japanese survivors, whose doses ranged from 0 - 4000 mSv or more, are extrapolated downwards to predict the risks from doses below 20 mSv [1]. However, even if a particular model gives a good overall account of the full range of data, fitting that model and extrapolating the results may misrepresent the risks at low doses where the dose response may be quite distinct.

My previous paper [2] considered the cancer mortality data for those survivors whose external (flash) dose from the A-bombs was below 20 mSv, the annual occupational dose limit recommended by the International Commission on Radiological Protection (ICRP). I applied several different models, each of which used as their exposure variable the radiation dose lagged by an unknown latency. One model adopted the standard assumption that Excess Relative Risk (ERR) is proportional to dose, in this case the lagged dose. Other models allowed that ERR might be a non-linear function of dose, as suggested by radiation cell biology studies of the bystander effect. All the models detected ERR values for a dose of 10 mSv which were far higher than predicted by extrapolating the standard analyses of the full cohort. The linear model found large, significant results for the liver, while the preferred non-linear model found large, significant results for the stomach, liver, and lung. These results occurred over a range of latencies.

The statistical method I used to assess the significance of the findings with non-linear models was criticised by Mark Little [3]. In this paper the data is re-analysed with other methods of statistical inference for the preferred non-linear model.

The models use Poisson regression to analyse the grouped data in the 0 - 20 mSv subcohort of the 1950-1990 Life Span Study LSS12 [4]. The models take the form λ = λ_o(1+ERR) where λ is the cancer risk, λ_o depends only on control variables, and ERR depends only on the lagged radiation dose. Profile Likelihood Confidence Intervals for ERR were based in [2] on the Likelihood Ratio Test (LRT) for comparisons between nested models. Once latency is fixed, the models depend smoothly on all other parameters. If Wilks Theorem [5] holds, the asymptotic null distribution of LRT is χ² on d degrees of freedom when the null hypothesis is specified by fixing the values of d parameters in the wider model. For example, the LRT comparison of the linear model ERR = βD with the control model ERR = 0 is asymptotically χ² on 1 d.f. if the control model holds. Here the null hypothesis is β = 0 and Wilks Theorem does apply to this nested pair. As Little [3] pointed out, the non-linear models are less well behaved. The transient model ERR = σDexp(-τD) and the two-phase model ERR = βD + σDexp(-τD) are indeterminate in τ when σ = 0, so one of the regularity conditions for Wilks Theorem fails. In any case, even if the asymptotic null distribution of LRT were known, the distribution for a given finite set of records may differ from the asymptotic case.

However, the actual distribution of LRT can be estimated by simulation before comparing two competing models or constructing confidence intervals. Bootstrap methods [6, 7] can also be used to construct confidence intervals more directly.

In response to questions raised during peer review I also consider whether the data justifies fitting the two-phase model, investigate uncertainties in the "optimal latency" as estimated from the data, extend the analysis to include the covariate "city", and compare results from the 0 - 20 mSv and 5 - 500 mSv dose ranges.

Methods

An initial analysis of latency is carried out for all cancer sites with at least 100 cases (deaths) in the 0 - 20 mSv subcohort. Sites which appear to have a raised dose response are then analysed by the bootstrap and the distribution of LRT is estimated by simulation. A fuller simulation is carried out for the stomach, the site with the largest number of cases in this subcohort. The analysis focuses on the behaviour of the two-phase model (defined below) and its relation to the linear model. It aims to compare confidence intervals obtained by bootstrap and LRT simulation methods and to discover without relying on Wilks Theorem, whether the apparent elevation and non-linearity of the dose response in the 0 - 20 mSv subcohort are statistically significant. Further analysis considers variation in latency, the impact of "city", male and female subcohorts, and the 5 - 500 mSv dose range. Computations use the statistical freeware R[8], giving a further check on previous results. Possible sources of error in the Japanese data itself and the potential for confounding by other covariates as discussed in [2], are not reconsidered here.

As previously source data for the 1950-1990 mortality cohort LSS12 was obtained from RERF [9] via the CEDR [10], and the 0 - 20 mSv subcohort defined by restricting the weighted adjusted colon dose. The subcohort comprises 3011 data cells with 1690391.75 p-y. The control model, linear model, and two-phase model are defined as in [2]. Briefly, the control model is log-linear in 14 indicator variables for 5 year age-at-exposure categories, an indicator for gender, and the numerical variable log mean attained age, each of which is well-defined for the data cells. The control model is specified as λ = λ₀ = exp(α+Σβ_jx_j) where α and the β_j are unknown parameters, x_j are the covariates, exp denotes exponential, and the Poisson parameter in cell i with T_i person-years is λ_iT_i where λ_i is evaluated using the covariates in cell i. At this stage, as in [2], "city" (Hiroshima or Nagasaki) is not included as a covariate.

Other models are defined by λ = λ₀(1+ERR) where ERR depends only on D_ϕ, the radiation dose lagged by a latency parameter ϕ. D_ϕ is defined as the weighted adjusted colon dose in 10 mSv units when time-since-exposure ≥ ϕ, and 0 otherwise.

For the linear model, ERR = βD_ϕ

For the two-phase model, ERR = βD_ϕ + σD_ϕexp(-τD_ϕ) where exp denotes exponential.

The models require 1+ERR > 0 in all cells so that the Poisson distributions are defined. For the two-phase model τ > 0, slightly modifying the previous approach where τ ≥ 0. However, τ = 0 reduces to the linear model as does σ = 0.

In fitting a model at latency ϕ by Poisson regression, the unknown parameters α,β_j and β, σ, τ (as appropriate to the model) are chosen to maximise the likelihood of the observed data assuming independent Poisson distributions in each cell. Equivalently, fitting minimises the sum over the 3011 data cells of E_i - O_iln(E_i), where O_i and E_i = λ_iT_i are the observed and expected number of cases in cell i.

Latency ϕ is fixed during the minimisation. The models depend smoothly on all other parameters. ERR_D,ϕ denotes the Excess Relative Risk evaluated at D for the fitted model with latency ϕ. Note ERR_1,ϕ is the estimated risk at 10 mSv lagged by ϕ years. For model I nested within model J, LRT_J-I,ϕ is the Likelihood Ratio Test computed at latency ϕ.

Code written in R emulates the previous model fitting with Excel-Solver, using the Newton-Raphson minimisation routine "optim" [11]. Results for the stomach at latency ϕ = 5, 6,... 44 are checked against the values obtained previously.

At a given cancer site, the control, linear, and two-phase models are fitted at latency ϕ = 5, 6,... 44 to give LRT_J-I,ϕ for each pair of nested models: linear against control, two-phase against control, and two-phase against linear. Sites are selected for further analysis if LRT_lin-con,ϕ > 4 for some ϕ, or if LRT_2p-lin,ϕ > 6 for some ϕ (not assuming that this criterion establishes statistical significance). For sites meeting the criterion for the linear model, Profile Likelihood Confidence Intervals for β = ERR_1,ϕ are computed using Wilks Theorem at integer values of ϕ. For sites meeting the criterion for the two-phase model, ϕ = ϕ_mabs is chosen to maximise LRT_2p-con,ϕ (not LRT_2p-lin,ϕ) so that fitting at ϕ_mabs gives the absolute Maximum Likelihood Estimate (MLE) for the model. The "optimal latency" ϕ_m is chosen to maximise LRT_2p-con,ϕ subject to the constraint ERR_1,ϕ ≥ 0. Bootstrap confidence intervals and LRT simulations are then computed at ϕ_m . Details are given in the next two more technical subsections, and examples with commentary are supplied with Additional File 1. The method for taking account of uncertainty in the "optimal latency" is outlined in a final subsection.

Bootstrap

Fitting the two-phase model at ϕ = ϕ_m gives a maximum likelihood estimate for the vector of all 20 model parameters and determines fitted values of λ_i in each data cell. The parametric bootstrap assumes the observed data arose from sampling this fitted model, and considers the variation arising if other data had been sampled from that same model.

Sampling the independent Poisson distributions with parameters λ_iT_i gives simulated observations in each data cell. Fitting the two-phase model to this simulated data gives simulated parameter values including , and , and therefore a simulated value of ERR = ERR_D,ϕ = ). The entire process is repeated B times to give the bootstrap replications ERR*.

There are many well-established methods to obtain confidence intervals from bootstrap replications [7, 12]. The two options chosen here are the original "percentile" method, which is easy to describe, and the "Bias-corrected accelerated" method which is much faster to compute, and in general more accurate.

The 1-2α "percentile" confidence interval has endpoints

where denotes the Bα^th ordered value of ERR*. With B = 1000 replications and α = 0.025, the endpoints of the 0.95 percentile interval are the 25^th and 975^th order statistics of ERR*.

For the Bias-corrected accelerated (BC_a) confidence interval (see [7] Chapter 5 pp. 203-207 and p.249) simulation is again carried out using λ_iT_i. However, the simulated observations are now used to fit the model to the "Least Favourable family" of distributions, obtained by restricting the parameter vector ψ to a line parametrised by ζ, passing through the MLE and defined by where

Here i^-1() is the inverse of the Fisher Information evaluated at and is the gradient of the function h which defines the variable of interest (in this case ERR) in terms of the model parameters ψ.

For each bootstrap replication, fitting the simulated data along the LF family determines ζ and thus the simulated value ERR = . If ERR*_LFdenotes the resulting set of B replications, the BC_a confidence interval has endpoints

where

Here Φ is the standard normal cumulative distribution and z^(α) is the 100αth percentile point of a standard normal distribution. The parameters w and a are estimated from the bootstrap replicates. The bias- correction w is computed as

i.e. the fraction of bootstrap replications along the LF family which are below the original fitted value of ERR, in normal units.

The acceleration a is computed as

where m₃ and m₂ are the 3^rd and 2^nd moments of , the partial derivative of the log likelihood of the simulated data with respect to ζ, evaluated at ζ = 0.

Code for this calculation is shown in Additional File 1, Code File AC1.txt, which calls Additional File 1, Data Files stomdat1.txt, lambda2.txt and theta2.txt. Additional File 1, Commentary File ACom1.doc discusses this code.

Despite the intricate formulas, computation is much faster for BC_a than for the simpler percentile method, because optimisation is now restricted to a line within the full 20-dimensional parameter space. The percentile CIs are computed at B = 1000, the recommended minimum number, while BC_a results are obtained at B = 5000.

Coverage for the percentile and BC_a methods is tested using the stomach data. For each of 400 sets of simulated observations generated by sampling the Poisson distributions determined by the original fitted parameters, the model is refitted to the simulated data. Confidence intervals for the simulated ERR are obtained by the bootstrap methods (B = 1000) using the refitted parameters and counted as 1 or 0 if they contain or exclude the "true" value of ERR from the original fitted parameters. To streamline these computations of coverage for the percentile method, data is restricted to the Hiroshima subcohort (N = 1536 cells) and the model simplified to use 10-year rather than 5-year age-at-exposure categories. Estimating coverage for the BC_a method uses the full 0 - 20 mSv dataset and model. The application of both methods to compute CIs from the observed data uses the full dataset and model.

Simulation of LRT

The Likelihood Ratio Test (LRT) for comparing nested models H₀ ≤ H₁ (H₀ contained within H₁) given the observed data is LRT = 2[ℒ(H₁, ) - ℒ(H₀, )] where ℒ(H_i, ) is the log likelihood of the observed data after fitting the model H_i. In well-behaved situations where Wilks theorem [5] applies, the asymptotic distribution of LRT is known, provided the data was drawn from a distribution which satisfies the null hypothesis H₀. In particular, an appropriate critical value t_0.95 can be chosen for rejecting H₀ if LRT > t_0.95 and this will result in a 5% error rate of rejecting H₀ when it is true. Typically H₀ fixes the values of d parameters in the full model H₁ which allows those parameters to vary freely in an open neighbourhood. In this case Wilks theorem says that LRT is asymptotically χ² distributed on d degrees of freedom if the observed data was generated by the model with those d parameters fixed as specified. On that basis, the null hypothesis can be rejected if LRT > t_0.95 = F^-1(0.95) where F is the cumulative of a χ² distribution on d degrees of freedom.

Wilks theorem depends on many regularity conditions for the models H₀ and H₁. If these fail, LRT may not be χ² distributed; its distribution may depend on H₀ rather than just on the number of specified parameters, and there may be no simple universal 95% critical value for rejecting H₀. Simulation aims to estimate the distribution of LRT and then to obtain conservative critical values which are sufficient to reject H₀. Confidence intervals for functions of the parameters are constructed using these critical values.

For the two-phase model is the Likelihood Ratio Test for the hypothesis H₀: (β, σ, τ) = (β₀,σ₀,τ₀) vs. the alternative (β, σ, τ) ≠ (β₀,σ₀,τ₀), i.e. H₁ is the union of H₀ with its alternative, and leaves (β, σ, τ) free to vary, subject only to the defining conditions of the two-phase model. For various choices of H₀ the null distribution of assuming H₀ holds is sampled by Monte Carlo simulation (see [7] Chapter 4; for another context see [13] p.84) as follows.

One) The model is fitted subject to H₀. The resulting parameter estimates give fitted values and therefore T_i for each cell, subject to H₀.

Two) Poisson distributions with parameters T_i are sampled independently to give simulated cases in each cell

Three) The model is fitted to the simulated data subject to H₀, yielding new fitted values for each cell. Put .

Four) The model is fitted to the simulated data with (β, σ, τ) allowed to vary freely within the parameter space, yielding new fitted values for each cell. Put K = . The simulated value of is defined as LRT* = K₀ - K. Note that up to a constant which depends on the simulated data but not on the model, K₀ = -2ℒ(H₀, ) while K = -2ℒ(H₁, ).

Five) Steps Two through Four are repeated 500 (or 1000) times to give the sample LRT*

An example of the required code is given in Additional File 1, Code File AC2.txt which calls Additional File 1, Data Files stomdat1.txt, lambda1.txt, theta1.txt, and Additional File 1, Code File AC2s.txt. Additional File 1, Commentary File ACom2.doc discusses this code.

While step Four is time-consuming, steps One and Three are quick as the model is log-linear in the remaining parameters. Step Two is trivial in R. Note that in step Four the possible values of (β, σ, τ) are still constrained by the requirement that 1+ERR > 0 in all cells. For computability, the constraint is set at 1+ERR ≥ 0.001.

To check whether Wilks theorem holds, LRT* is first tested against a χ² distribution on 3 d.f. by the Kolmogorov-Smirnov (one-sample) test. LRT* is then tested against gamma distributions with unknown shape and rate parameters. A γ distribution is fitted to LRT* using the "fitdistr" subroutine of R (MASS library) [14]. When testing LRT* against the fitted distribution with shape parameter and rate parameter , the Kolmogorov statistic D is obtained from the (one sample) K-S test but its p-value is determined by simulation (as in the Lilliefors test). When LRT* has 500 elements, 500 random deviates are taken from the fitted γ distribution with parameters and to form a sample S, a new gamma distribution Γ is fitted to S, the K-S statistic D* for testing S against Γ (one-sample) is computed, and the process is repeated 5000× to estimate the probability that D* > D when samples are drawn from γ(, ). R code was adapted from [15].

For the special case (β₀,σ₀,τ₀) = (0,0, arbitrary) the sample LRT* uses 1000 replications; note that in this case = LRT_2p-con. Likewise at the MLE estimate (β₀,σ₀,τ₀) = (), LRT* is produced during the (percentile) bootstrap with 1000 replications. In both cases the simulations required for the p-value of D use 1000 random deviates of the fitted γ distribution.

For the liver, lung, pancreas and leukaemia, LRT simulation (×1000) is carried out at (0,0, arbitrary) and (), and (×500) at 3 points chosen at random with σ₀ = 0 and β₀ in the intervals -0.5 < β₀ < 0, 0 < β₀ < 1, and 1 < β₀ < 10, and also at 4 points chosen at random in a neighbourhood N₁₀ of (). N₁₀ comprises those (β₀,σ₀,τ₀) for which the observed (not simulated) < 10. Each set of simulations gives a sample LRT _i* and corresponding fitted gamma distributions γ_i

For the stomach, simulations and corresponding fitted γ_i are carried out at the MLE and 50 other random points in the parameter space: (×500) at 11 points with σ₀ = 0, 4 points in a neighbourhood N₁₀ of (), 34 other points in the (β, σ, τ) parameter space, and (×1000) at (0,0, arbitrary) and ().

Estimates of a 95% critical value for the null distribution of LRT are then obtained in three ways. For each cancer site, the fitted gamma distributions and the order statistics of the simulated LRT _i* are used to estimate a global 95% critical value as = max(sup(t_.95,i), sup(s_.95,i)) where t_.95,i is the 95^th percentile of γ_i and s_.95,i is the 95^th percentile of LRT _i*. That is, at each value of (β₀,σ₀,τ₀) tested for a particular site, 95% of the simulated LRT values and the 95^th percentile of the fitted gamma distribution are both below . An overall conservative estimate t_.95 is chosen to exceed the corresponding maximum for all simulations and all cancer sites investigated. At each site, a refined estimate = max(sup(t_.95,j)) is obtained by considering only the fitted gamma distributions and restricting to simulations at points (β₀,σ₀,τ₀) within N₁₀. At each cancer site ≤ ≤ t_.95.

The neighbourhood N₁₀ of () is termed "appropriate" if the overall estimate t_.95 ≤ 10 so that for (β₀,σ₀,τ₀) outside N₁₀, > t_.95 . This is not a circular definition; t_.95 is computed from the behaviour of simulated LRT at points outside as well as inside N₁₀ for each site, while for (β₀,σ₀,τ₀) within N₁₀ and even at () simulation may, if N₁₀ is inappropriate, give LRT* with 95^th percentile > 10.

Profile Likelihood Confidence Intervals for ERR_1,ϕ and ERR_0.025,ϕ are then constructed using all three estimates for the critical value. With ϕ fixed, let U_ϕ be the set of those (β₀,σ₀,τ₀) which cannot be ruled out with 95% confidence as < t_.95(alternatively <, or <) Define the 95%CI for ERR_D,ϕ as the range of ERR_D,ϕ over U_ϕ.

When σ₀ = 0, is the Likelihood Ratio Test for the hypothesis H₀: β = β₀ and σ = 0 vs. the alternative β ≠ β₀ and σ = 0, while LRT_2p-lin is the Likelihood Ratio Test for the hypothesis σ = 0 vs. the alternative σ ≠ 0. Note LRT_2p-lin + = . Simulated null distributions of all three LRT's are obtained for points with σ = 0. Conservative, global, and refined estimates T_.95, and of the 95% critical value for LRT_2p-lin are then defined in the same fashion as t_.95, and for . For each cancer, is the supremum of the 95^th percentiles of γ distributions fitted to the simulations of LRT_2p-lin, is the supremum of 95^th percentiles of the fitted γ distributions and simulations of LRT_2p-lin, and T_.95 is an overall conservative value exceeding for all cancer sites considered. Likewise and are defined from the 99^th and 99.9^th percentiles of the fitted γ distributions. These various estimates are used as critical values for LRT_2p-lin when testing the linear model against the two-phase model, given the observed data.

Latency

For latency ϕ = 5, 6,... 44 BC_a 95%CIs for ERR_1,ϕ and ERR_0.025,ϕ are computed.

Since the observed data is regarded as a sample from an underlying distribution, the optimal latency ϕ_m as inferred from the data is an estimate, whose distribution is again obtained by simulation. The two-phase model is fitted at ϕ_m and the fitted model is resampled to provide simulated data, from which a new estimate is determined. To simplify, is restricted to integer values 5, 6,... 44. The process is repeated ×200 for the stomach, liver, lung, pancreas and leukaemia. The resulting set of is compared with the range of latencies for which BC_a 95%CIs are strictly positive, and likewise with the range for which LRT_2p-lin exceeds the estimated critical values.

To test whether the distributions of LRT_2p-con and LRT_2p-lin depend on latency, null distributions at (β₀,σ₀,τ₀) = (0,0, arbitrary) are simulated (×500) for the stomach with ϕ = 5, 10, 15,... 40 and compared with the null distribution from simulation (×1000) at ϕ_m.

City, dose range, and gender

The models are refitted with "city" as an added control covariate. For each of the 5 cancers, the optimal latency Φ_m (restricted to 5, 6,... 44) is determined along with latency ranges over which BC_a 95% and 90% CIs for ERR₁ are strictly positive.

The models including "city" are fitted to the 5 - 500 mSv subcohort. A limited analysis of the null distributions of LRT_2p-con and LRT_2p-lin including "city" is carried out on the 0 - 20 mSv and 5 - 500 mSv dose ranges. For each cancer at its Φ_m, H₀: (β₀,σ₀,τ₀) = (0,0, arbitrary) is used for simulation of LRT_2p-con and H₀: (β₀,σ₀,τ₀) = (,0, arbitrary) where is the fitted parameter for the linear model, is used for simulation of LRT_2p-lin. Gamma distributions fitted to the simulated LRT* are used to estimate 95% critical values and to assign p-values to the LRT_2p-con and LRT_2p-lin arising from the observed data.

The models including "city" are fitted separately to the male and female subcohorts in the 0 - 20 mSv dose range for the stomach, liver, and lung.

Finally, the linear model including "city" is fitted to the 0 - 5 mSv, 5 - 20 mSv, 0 - 0.5 mSv, and 0.5 - 20 mSv data for the liver and lung.

The very extensive simulations of LRT were shared across a number of PC's.

Results

Table 1 shows the individual cancer sites with over 100 cases (deaths) in the subcohort.

Table 1 Cancer sites with over 100 cases (deaths) in the 0-20 mSv subcohort

Full size table

To check R against Excel, the two-phase model is fitted to the stomach data for latencies ϕ = 5, 6,... 44 and LRT_2p-con,ϕ computed in both programmes. The difference (LRT_2p-con,ϕ,R) - (LRT_{2p-con, ϕ, Excel}) has a range of (-0.079, 0.000), mean = -0.004, s.d. = 0.014. At the optimal latency ϕ_m = ϕ_mabs = 11.89, LRT_{2p-con, ϕ} = 21.1903 by either method. The parameter estimates in R are = 0.4586, = 120.0076, = 82.6729, confirming the point estimates reported in [2]. On that basis, the methods are taken as comparable. Results are also comparable when computing LRT-based confidence intervals.

For the linear model ERR = βD_ϕ with ϕ = 5, 6,... 44 only the liver and urinary tract showed significant results. Figure 1 shows LRT_lin-con,ϕ against ϕ for these two cancers, while Figure 2 shows the fitted values ERR₁ = when ϕ ≥ 30 with 95%CIs (Profile Likelihood, assuming LRT ~ χ² on 1 d.f. for testing β = β₀ against β ≠ β₀). For the liver with this model, ϕ_m = ϕ_mabs = 38.58, at which = 0.69 (0.25, 1.26). Of the 540 cases in the subcohort, 171 have ϕ > 38.58. For the urinary tract, ϕ is locally optimal at 41.32 at which = 2.14 (0.51, 5.12). Of the 103 cases in the subcohort, 23 have ϕ > 41.32.

The stomach, liver, lung, pancreas, and leukaemia were selected for further analysis as LRT_2p-lin,ϕ > 6 for some ϕ, an indication that the dose response may be non-linear. For the pancreas and leukaemia, Figure 3 shows LRT_2p-lin,ϕ and LRT_2p-con,ϕ while Figure 4 shows ERR_1,ϕ = as estimated from fitting the two-phase model at latency ϕ. Note that may differ greatly from the value obtained by fitting the linear model at ϕ. Related graphs for the stomach, liver, and lung were shown in [2], though their statistical significance is reconsidered below.

Table 2 shows the latencies ϕ_mabs and ϕ_m which maximise LRT_2p-con,ϕ without and with the constraint ERR_1,ϕ ≥ 0, the values of LRT_2p-con,ϕ and LRT_2p-lin,ϕ at ϕ_m along with the fitted parameters and corresponding point estimates of ERR₁ and ERR_0.025 and their bootstrap confidence intervals by the BC_a (5000 replications) and percentile (1000 replications) methods. For the stomach, liver, and lung, these confirm the point estimates in [2].

Table 2 Latencies, LRT, fitted parameters, ERR and bootstrap confidence intervals for the two-phase model

Full size table

Coverage for the BC_a method is estimated for the stomach at 94% for ERR₁ (376 of 400) and 93% for ERR_0.025 (372 of 400). Coverage for the percentile method is estimated for the stomach using the 0 - 20 mSv Hiroshima subcohort with the simplified two-phase model (10 year age-at-exposure categories) giving 94.3% for ERR₁ (377 of 400) and 93% for ERR_0.025 (372 of 400). All CIs reported in Table 2 use the full model (5 year categories, both cities).

For the stomach with latency ϕ = 5, 6,... 21, Figure 5 shows the BC_a 95% confidence intervals (B = 5000) for ERR₁ and ERR_0.025. Each of these CIs is strictly positive. Since the minimum time-since-exposure in the subcohort is 6.08 years, for ϕ = 5 or 6 the model involves no correction for latency and ERR is a function of the unmodified colon dose.

More generally, for each site (stomach, liver, lung, pancreas, leukaemia) there is a neighbourhood surrounding ϕ_m over which BC_a 95% CIs for ERR₁ and ERR_0.025 are strictly positive, which was evaluated restricting to integer latencies and using B = 5000 bootstrap replications. For the stomach, both CIs are strictly positive when 5 ≤ ϕ ≤ 21. For the liver, both CIs are strictly positive when 32 ≤ ϕ ≤ 43. For the lung, both CIs are strictly positive when 7 ≤ ϕ ≤ 21. For the pancreas, both CIs are strictly positive when 5 ≤ ϕ ≤ 11. For leukaemia, the 95% CI for ERR₁ is strictly positive when 24 ≤ ϕ ≤ 26, but the 95% CI for ERR_0.025 is strictly positive when 19 ≤ ϕ ≤ 36.

For each site, simulating data (200 sets) from the fitted model at ϕ_m and re-determining the optimal latency (restricting to integer values) from the simulated data shows the uncertainty in ϕ_m itself. For the stomach 5 ≤ ≤ 21 for 196 out of 200 simulated datasets, while 5 ≤ ≤ 21 for 187 out of 200 simulated datasets. For the liver, 32 ≤ ≤ 43 for 200 out of 200 simulated datasets, while 32 ≤ ≤ 43 for 197 out of 200 simulated datasets. For the lung, 7 ≤ ≤ 21 for 184 of 200, while 7 ≤ ≤ 21 for 173 of 200 simulated datasets (nb for the lung ϕ_m ≠ ϕ_mabs). For the pancreas the results are weaker, as 5 ≤ ≤ 11 for 157 of 200, and 5 ≤ ≤ 11 for 151 of 200 simulated datasets. For leukaemia 24 ≤ ≤ 26 for 132 of 200 and 24 ≤ ≤ 26 for 130 of 200, but 19 ≤ ≤ 36 for 188 of 200 and 19 ≤ ≤ 36 for 184 of 200 simulated datasets.

Additional File 1, Tables S1, S2, and S3 give the simulated null distributions of LRT for the two-phase model with the null hypothesis H₀: (β, σ, τ) = (β₀,σ₀,τ₀) vs the alternative (β, σ, τ) ≠ (β₀,σ₀,τ₀), for the stomach, liver, lung, pancreas, and leukaemia. Additional File 1, Table S1 shows simulation for each cancer at the fitted values (β₀,σ₀,τ₀) = () and for randomly selected points in the neighbourhood N₁₀ of the fitted values (see Methods). Additional File 1, Table S2 shows simulation for each cancer with σ₀ = 0, where τ₀ is arbitrary. Additional File 1, Table S3 shows additional simulation for the stomach when σ₀ ≠ 0, but (β₀,σ₀,τ₀) is outside the neighbourhood N₁₀. For the simulations in S2, Additional File 1, Table S4 shows the results for the linear model and the comparison with the two-phase model. These simulations are carried out at the optimal latency for the specific cancer.

As the control parameters are optimised subject to H₀ before simulating LRT, perhaps other choices of control parameters while retaining H₀ might yield different estimates of the null distribution of LRT subject to H₀. However, four sets of simulations (×500) for the stomach with (β₀,σ₀,τ₀) = (0, 0, arbitrary) using very different choices of control parameters and total cases expected, yielded fairly similar LRT* and fitted gamma distributions as shown in Table 3. Optimised control parameters for the observed data are shown in the last row. This is evidence that the null distribution of is approximately independent of the parameters used to simulate the data, provided that H₀ holds. In the terminology of [7] (Chapter 4 p. 139) is an "approximate pivot" even though Wilks Theorem fails. In turn, this gives the basis for rejecting H₀ given the observed data.

Table 3 Null distribution of LRT_2p-con for the stomach with various control parameters

Full size table

The simulated null distributions of from Additional File 1, Tables S1, S2, and S3 are then applied to estimate t_.95, and as defined in Methods. All simulations showed LRT and its fitted gamma distribution had 95^th percentiles qlrt(0.95) < 10 and qgamma(0.95) < 10. Thus t_.95 = 10 is a conservative estimate throughout the parameter space, for all cancer sites considered here, and the choice of N₁₀ is appropriate (see Methods).

Table 4 shows Profile Likelihood 95% confidence intervals for ERR₁ and ERR_0.025 at optimal latency ϕ_m as computed with the various estimated critical values.

**Table 4 Profile Likelihood 95% confidence intervals for ERR at estimated critical values of**

Likewise from Additional File 1, Table S4 and using the notation in that file, all simulations of the null distribution of LRT_2p-lin showed qlrt2(0.95) < 8 and qgamma2(0.95) < 8 so T_.95 = 8 is a conservative estimate throughout the parameter space, for all cancer sites considered here. For any given site is the supremum of qlrt2(0.95) and qgamma2(0.95) over simulations for that site. is the supremum of qgamma2(0.95) over simulations for that site. Likewise and are the suprema of qgamma2(0.99) and qgamma2(0.999) over simulations for that site. Table 5 shows these critical values and the observed value of LRT_2p-lin (see Table 2) for each site.

Table 5 Critical values and observed value of LRT_2p-lin

Full size table

Null distributions of and LRT_2p-lin for the stomach with ϕ = 5, 10,... 40 show little variation with latency and are similar to those obtained at ϕ_m as shown in Table 6.

Table 6 LRT simulations for the stomach with (β₀,σ₀,τ₀) = (0, 0, arbitrary) and various latencies

Full size table

Simulation of the optimal latency as described earlier also has implications for non-linearity. Again restricting to integral latencies, for the stomach LRT_2p-lin > 9.4 when 5 ≤ ϕ ≤ 21 and 196 of 200 simulated datasets gave in this range. For the liver LRT_2p-lin > 16.7 when 34 ≤ ϕ ≤ 38 (198 of 200 simulations). For the lung LRT_2p-lin > 7.7 when 7 ≤ ϕ ≤ 21 (184 of 200 simulations). For leukaemia LRT_2p-lin > 9.4 when 17 ≤ ϕ ≤ 36 (191 of 200 simulations). Each of these results shows non-linearity (see Table 5) and for the stomach and liver the evidence is very strong. For the pancreas, LRT_2p-lin > 9.5 when 7 ≤ ϕ ≤ 11 but only 156 of 200 simulations give in this range.

The covariate "city" has little impact on the estimates of ERR when taken as an additional control. For the liver including "city", BC_a 95%CIs for ERR₁ and ERR_0.025 are strictly positive when 29 ≤ ϕ ≤ 43, again restricting to integral latencies. With "city" included in both models LRT_2p-con attains its maximum subject to ERR₁ ≥ 0 when ϕ_m = 37 and LRT_2p-con = 28.41. At ϕ_m = 37, ERR₁ = 1.26 with BC_a 95%CI (0.61, 2.14) and ERR_0.025 = 0.68 with BC_a 95%CI (0.18, 1.25). For comparison, without "city" the optimal integral latency is ϕ_m = 37, at which ERR₁ = 1.22 with BC_a 95%CI (0.59, 2.11) and ERR_0.025 = 0.74 with BC_a 95%CI (0.23, 1.43). As a test of non-linearity (with "city" included in both models) LRT_2p-lin > 6.8 when 29 ≤ ϕ ≤ 41, and LRT_2p-lin > 18.15 when 34 ≤ ϕ ≤ 38, whereas without "city" LRT_2p-lin > 6.86 when 32 ≤ ϕ ≤ 41 and LRT_2p-lin > 16.73 when 34 ≤ ϕ ≤ 38.

Table 7 includes results with "city" for the 0 - 20 mSv dose range, for all 5 cancers. It also shows that fitting the two-phase model with "city" to the 0 - 20 mSv and 5 - 500 mSv dose ranges gives comparable results for ERR₁ and latency, for the stomach, liver, lung, and leukaemia.

Table 7 Optimal (integral) latency, LRT and ERR, including "city", by dose range

Full size table

A limited analysis by simulation of LRT for the model with "city" was carried out over the 0 - 20 mSv and 5 - 500 mSv dose ranges, focused at the optimal (integral) latencies on the null distributions of LRT_2p-con and LRT_2p-lin at (β₀,σ₀,τ₀) = (0, 0, arbitrary) and (β₀,σ₀,τ₀) = (, 0, arbitrary) where is obtained by fitting the linear model. Results are shown in Table 8 and the fitted gamma distributions are used to assign p-values to the observed LRT in Table 7.

Table 8 simulated null distribution of LRT_2p-con and LRT_2p-lin including "city", by dose range

Full size table

From Table 8 (with "city") and Additional File 1, Table S4 (without "city"), the criterion LRT_2p-lin > 6 is a test of non-linearity at 90% level. Accordingly, Table 7 also shows the latency ranges surrounding Φ_m for which LRT_2p-lin > 6.

Other aspects of the analysis (simulated null distributions of LRT_2p-con and LRT_2p-lin across the parameter space and simulated variation in optimal latency ϕ_m) were not repeated with "city".

For the stomach, liver, and lung Table 9 shows the results when the two-phase model with "city" is fitted separately to the male and female 0 - 20 mSv subcohorts. For the pancreas and leukaemia, there were too few cases to split the data by gender.

Table 9 Male and Female subcohorts, with "city", 0-20 mSv, integral latency

Full size table

Table 10 shows the linear model with "city" fitted to the 0 - 20 mSv, 0 - 5 mSv, 5 - 20 mSv, 0 - 0.5 mSv, and 0.5 - 20 mSv subcohorts for the liver and lung.

Table 10 Linear model with "city" for liver and lung on partitions of 0-20 mSv, by latency.

Full size table

Discussion

This paper develops and partially corrects the approach in [2] and the two should be read in conjunction. I focus initially on the 0 - 20 mSv dose range, which provides over 60% of the person-years of observation in the Life Span Study 12 cohort. Over the full data range 0 - 8000 mSv and the truncated range 0 - 4000 mSv the dose response for cancer mortality is known to be approximately linear and risk estimates for 1000 mSv have been obtained through the ongoing Life Span Study project. The ICRP recommendations for radiation protection are then based on linear extrapolation, as the Excess Relative Risk from 10 mSv is presumed to be 0.01 times the ERR from 1000 mSv.

If we were confident that the approximate linearity of the dose response extends to the low dose region, then linear extrapolation would be justified. But this is not known, and the response may be approximately linear over a wide dose range but highly non-linear at lower doses. There are many examples of non-linear dose response in radiation cell biology [16, 17].

Thus, if the Japanese data is used to derive risk estimates for doses such as 10 mSv or 1mSv which are directly relevant to occupational and public exposure, it makes sense to consider the 0 - 20 mSv data in its own right and in relation to wider dose ranges, rather than to pool the data and apply linear extrapolation.

Secondly, a latency parameter ϕ is used to assess the delay between exposure and cancer mortality. Although it would be better to allow latency to modify dose by some smooth function rather than an abrupt switch from no effect to full effect when Time-Since-Exposure passes ϕ, this is much harder to compute and was not attempted here.

As pointed out during peer review, we should consider at the outset whether the data is sufficient for fitting a non-linear model based on mean dose. Figure 6 shows scatterplots of p-y observation against mean dose, for the ranges 0 - 500 mSv, 5 - 20 mSv, 0 - 5 mSv, and 0 - 0.5 mSv. Clearly the 0 - 500 mSv and 5 - 500 mSv data contains a wide variety of doses. The 5 - 20 mSv data shows a spread of mean doses. Whilst the 0 - 5 mSv data is concentrated below 1 mSv, there is a spread of mean doses in that region. Each dose range determines a very large dataset.

The fact that for the stomach, liver, lung, and leukaemia rather similar results for ERR₁ and latency are obtained from fitting the two-phase model across the 5 - 500 mSv dose range as from fitting across the 0 - 20 mSv dose range suggests that there is also sufficient information in the 0 - 20 mSv data to justify fitting this model. The stomach, with the largest number of cases, shows fairly tight confidence intervals for Excess Relative Risk over a broad range of latencies, which supports the validity of the model.

If the dose response were linear, similar estimates of its slope (β) should be obtained over different dose ranges. As the liver and lung data illustrate (Table 10), fitting the linear model separately to subcohorts of the 0 - 20 mSv dose range yields very different estimates of the dose response. For the liver, there are significant results at comparable latencies in the 0 - 20 mSv dose range and all the subranges, but the estimates of β from doses below 5 mSv (or below 0.5 mSv) are completely different from those obtained from doses above 5 mSv (or 0.5 mSv), in fact around 20 times higher when latency = 36 years. For the lung, the linear model detects some significant activity in restricted dose ranges, but the results are very different, and combining the two dose ranges yields insignificant results. By contrast, see Table 7, the two-phase model shows significant results for the lung with the 0 - 20 mSv data for latencies from 10 to 21 years.

It is not unusual for a model to achieve significance on portions of the data but fail to do so when they are combined. Even univariate Ordinary Least Squares regression on a bimodal independent variable may illustrate this, as in Figure 7 with synthetic data.

As discussed in [2] the graphs of ERR against latency show a clear separation into positive and negative regions, as is also apparent with the linear model over restricted dose ranges for the liver and lung. For any given model, the "optimal latency" ϕ_m identifies the time lag which maximises LRT within the region of non-negative ERR₁. The existence of a significant raised risk is of interest whether or not other latencies show significant decrease in risk, with possibly higher LRT. Frequently ϕ_m coincides with ϕ_mabs which maximises LRT without constraint, but the distinction is relevant for the lung (Table 2) and more generally when simulating the variation in latency.

The point estimates of optimal latency ϕ_m and ERR obtained previously are confirmed by Table 2. The next stage of analysis concerns confidence intervals for ERR at the optimal latency. Statistical inference in [2] was based on the Likelihood Ratio Test but it was wrongly assumed, as Little [3] pointed out, that Wilks theorem held for the non-linear models so that LRT would be asymptotically χ² distributed, if the null hypothesis (specifying parameters in the model) were true.

One of the regularity conditions required for Wilks Theorem is that distinct values of the model parameters produce distinct probability distributions. In the two-phase model when σ = 0, ERR = βD_ϕ irrespective of the value of τ. The asymptotic properties of models with an indeterminate parameter (which cannot be identified from the distribution) have been analysed [18–21] but I was unable to use these methods and therefore took an intensive computational approach, which applies to the given set of records rather than asymptotically.

The parametric bootstrap is an established technique for finding confidence intervals in the absence of any information on the distribution of LRT. The percentile method refits the model to simulated data generated by samples from the model whose parameter values were obtained by fitting to the actual data. The BC_a method adjusts for the fact that simulation may produce biased estimates of the statistic of interest (in this case ERR) and of its variance. Both methods gave good coverage with almost identical rates when tested on the stomach data, and comparable confidence intervals for all the sites analysed. The BC_a method is much faster and generally preferable.

However, the underlying assumption of the parametric bootstrap is open to question. Perhaps the observed data arose from sampling at parameter values other than those obtained by fitting the model to that data. Confidence Intervals based on LRT avoid this assumption but depend on the critical value t_.95 used to reject the null hypothesis if LRT > t_.95.

If Wilks Theorem had applied, then LRT for testing H₀: (β, σ, τ) = (β₀,σ₀,τ₀) vs. the alternative (β, σ, τ) ≠ (β₀,σ₀,τ₀) would have the asymptotic distribution LRT ~ χ² on 3 d.f. if H₀ holds, for any H₀. Thus t_.95 = F^-1(0.95) = 7.8147 where F is the cumulative χ² on 3 d.f. would be the appropriate critical value, rejecting H₀ if LRT > t_.95.

In the absence of any theoretical prediction, the null distribution of LRT can still be estimated by Monte Carlo simulation but may depend on the choice of H₀ as seen in Additional File 1, Tables S1, S2, S3 and S4. In fact Additional File 1, Tables S2 and S3 show the null distribution of departs from χ² on 3 d.f. as σ₀ → 0, while Additional File 1, Table S4 shows that the null distribution of LRT_2p-lin departs from χ² on 2 d.f. everywhere. Table 3 indicates that the simulated null distribution of is fairly independent of the parameters for the control covariates, and depends only on H₀: (β, σ, τ) = (β₀,σ₀,τ₀). The null distributions when σ₀ = 0 are quite different from those when σ₀ >> 0, as expected since it is σ₀ = 0 which causes the indeterminacy of the model and the failure of Wilks Theorem on this hyperplane. Gamma distributions give a reasonable fit to the simulated null distribution of in general (see Additional File 1, Tables S1, S2 and S3). When H₀ has σ₀ = 0, typical shape and rate parameters are s ~ 2.24 and r ~ 0.63, while for σ₀ >> 0 s ~ 1.5 and r ~ 0.5 as predicted by Wilks Theorem. Figure 8, the probablity plot for the stomach with (β₀,σ₀,τ₀) = (0,0, arbitrary) shows the failure of Wilks Theorem and the close fit of simulated = LRT_2p-con with a gamma distribution (s = 2.418, r = 0.604).

The Kolmogorov-Smirnov test against the fitted distribution is extremely sensitive and even when it distinguishes LRT from the fitted gamma, a probability plot may show close agreement. For example for the liver, with (β₀,σ₀,τ₀) = (7.297, 0, arbitrary), the K-S test has D = 0.058 with simulated p = 0.001 but the plot (Figure 9) is fairly close.

Likewise, fitted gamma distributions generally give a reasonable fit to the null distribution of LRT_2p-lin with typical parameters s ~ 1.79, r ~ 0.68.

For the linear model, Additional File 1, Table S4 shows simulation of the null distribution of generally conforms to a χ² distribution on 1 d.f. as expected from Wilks theorem. However as β₀ → -0.5, the null distribution approaches a γ distribution with shape = 0.5, rate = 1. Since β₀ = -0.5 specifies a boundary of the parameter space, the theory of mixture models [22] predicts, at the boundary, the null distribution of ~ 1/2 χ² (1 d.f.) = 1/2 γ (0.5,0.5) = γ (0.5,1). The confidence intervals shown in Figure 2 for liver and urinary cancers involve β values far from the boundary and Wilks theorem is valid in this region.

For all simulations and all cancer sites (Additional File 1, Tables S1, S2 and S3) the 95^th percentile of the fitted gamma distributions and the 95^th percentile of the simulated itself are all below 10. Additional File 1, Tables S1, S2, and S3 summarise nearly 50,000 simulations of LRT at a wide variety of randomly chosen values of H₀ . On that basis, t_.95 = 10 is chosen as a conservative estimate applying throughout the parameter space for all cancer sites considered here. H₀ can be rejected whenever > 10, i.e. whenever (β₀,σ₀,τ₀) is outside the neighbourhood N₁₀ of the fitted parameters (). N₁₀ is thus an appropriate neighbourhood, as defined in Methods.

Inside this neighbourhood Additional File 1, Table S1 shows gamma distributions generally fit the simulated very well, justifying the definition of from fitted gamma distributions within N₁₀. The intermediate choice is based on simulations inside and outside N₁₀ and uses both the simulations and the fitted distributions to adjust the critical value upwards as either may be underestimates.

For the stomach, liver, and lung, confidence intervals for ERR obtained by the bootstrap (Table 2) or from simulation of (Table 4) are broadly similar, whichever critical value or bootstrap method is chosen. LRT intervals using show closer agreement with the bootstrap methods. For these 3 cancers, all the LRT-based intervals for ERR₁ and ERR_0.025 exclude 0 as do the bootstrap intervals, which are tighter. This validates the claim in [2] that ERR₁ is significantly elevated for the stomach, liver, and lung.

All five cancers show significant non-linearity of the dose response by LRT comparison of the two-phase and linear models (Table 5). For the stomach, liver, and leukaemia the two-phase model is preferable with p << 0.001, while for the lung and pancreas p < 0.01.

The previous discussion concerns behaviour at the optimal latency ϕ_m, an unknown parameter estimated from the data. BC_a intervals, comparable at ϕ_m to CIs obtained by other methods, were used to investigate ERR across a range of latencies, restricted to integers 5, 6,... 44. Each cancer site has latency ranges in which BC_a CIs for ERR₁ and ERR_0.025 are strictly positive. Figure 5 illustrates this for the stomach and also shows that ERR is significantly raised when the dose is not modified by latency (ϕ = 6 is below the minimum time-since-exposure in the 0 - 20 mSv cohort). With the linear model, liver and urinary cancers are significantly elevated over a range of long latencies, showing similar variation (Figures 1 and 2).

Fitting the two-phase model at ϕ_m, simulating new data and finding the simulated optimal latency gives an estimate of the probability that the true value of ϕ_m lies in any given region. For the liver and stomach, this approach gives very strong evidence that ϕ_m lies in the regions where ERR₁ and ERR_0.025 are strictly positive, and where the two-phase model is clearly superior (by LRT) to the linear model. For the lung, the evidence on these points is still valid, though weaker. For leukaemia, there is strong evidence that ϕ_m lies in the region where the dose response is non-linear, and likewise in the region where the 95% CI for ERR_0.025 is strictly positive. However, with the variation in ϕ_m we cannot be confident that the 95% CI for ERR₁ is strictly positive.

Many results for the pancreas are considerably weaker than for other sites. Even at the optimal latency, the confidence intervals are extremely wide. With the BC_a method, the lower limit is attained as the minimum bootstrap value, and is assigned a probability > 0.025. The "bias" and "acceleration" of the BC_a bootstrap sample are both large, unlike those for other sites. When the uncertainty in the optimal latency is determined by simulation at ϕ_m we cannot be confident that the true optimal latency for the pancreas lies within the range where ERR > 0 or where non-linearity is detected by LRT_2p-lin.

Up to this point, the primary aim was to test the validity of the conclusions in [2], and accordingly "city" was not included as a control covariate and the dose range remained at 0 - 20 mSv. A limited reanalysis including "city" in the baseline model and comparing the 0 - 20 mSv and 5 - 500 mSv dose ranges is shown in Tables 7 and 8. The higher dose range was chosen to include 10 mSv, but to exclude the lowest dose category in the LSS12 dataset. Above 5 mSv the revised dosimetry DS02 generally agrees with the DS86 dosimetry used here (and in [2]). Comparing these ranges tests whether the results from 0 - 20 mSv simply reflect some anomaly in the lowest dose category. With the exception of the pancreas, there is a striking similarity between results from the 0 - 20 mSv and 5 - 500 mSv dose ranges, which show comparable estimates of the optimal latency, latency ranges, and the Excess Relative Risk from 10 mSv.

I do not know if any other approach to baseline risk may explain these effects. However, significant non-linearity and positive ERR are also found for the stomach, liver and lung when the data is restricted to male or female subcohorts, with "city" included in the model (Table 9). Any possible interaction effects involving gender would disappear with the all-male or all-female data. As mentioned in [2] the use of log mean attained age was an adequate alternative to the full set of attained age categories, and alternative controls affected the estimates of ERR by no more than a factor of 2.

The results at 0.25 mSv (ERR_0.025) show that the risk detected here arises at extremely low doses. If there are doubts about the accuracy of the DS86 dosimetry, the question remains as to why the DS86 dose is a significant predictor of risk.

Overall, the results have several possible interpretations. They may reflect confounding by other covariates not shown in the public dataset, which may be correlated with the external dose used here (and in LSS12) and may also interact with radiation and may vary over time. Differences between the rural and urban populations may contribute to the non-linear effects. Or, these may be caused by gross underestimates of the external dose (unlikely) or by internal doses arising, for example, from "black rain". Such interpretations would be specific to the Japanese cohort, and would therefore cast doubt on using this data to predict radiation response elsewhere.

However, the results may also indicate that low doses of radiation do impact on cancer mortality, provided that latency is included in the model and the dose response is not constrained to be linear. In radiation cell biology, non-linear dose response is known for the bystander effect using a variety of endpoints and the two-phase model was proposed as a simplified form of a model for the bystander effect [16].

This paper does not reconsider any of these possible interpretations [2]. However, I hope it establishes the statistical validity of this analysis of the available public data for the 1950-1990 mortality cohort.

Conclusion

This reanalysis validates the main conclusions of [2]. Bootstrap methods and Monte Carlo simulation of LRT show that Excess Relative Risk for cancer mortality in Japanese A-bomb survivors exposed to external radiation doses below 20 mSv is positive, large, and significant for various cancers, as detected by models incorporating latency. The dose response is highly non-linear for the stomach, liver, lung, pancreas, and leukaemia. In each case the two-phase model shows large Excess Relative Risk at 10 mSv external dose lagged by the optimal latency for the cancer. LRT-based 95% Confidence Intervals are strictly positive, except for leukaemia. Bootstrap BC_a 95% CIs are strictly positive for all five cancers over a range of latencies. Large, positive Excess Relative Risk is also found when the male and female data is analysed separately for the stomach, liver and lung.

When the optimal latency varies by simulation, the stomach, liver, lung and leukaemia still show non-linear dose response, and likewise ERR > 0 at 95% level for the stomach, liver, lung.

With "city" included as a control covariate in the two-phase model, similar estimates of latency and ERR at 10 mSv are obtained for the stomach, liver, lung, and leukaemia whether the dose range is 0 - 20 mSv or 5 - 500 mSv. Dose response for the liver, lung, and leukaemia is significantly non-linear in the 5 - 500 mSv range, particularly for the lung. Such results cannot be explained by any anomaly of the 0 - 5 mSv data.

The linear model finds significant results for the liver and urinary tract over a range of long latencies.

This analysis of cancer mortality in Japanese A-bomb survivors exposed to low doses of external radiation in 1945 shows significant non-linearity of dose response and significant large Excess Relative Risk over a range of latencies. These findings do not support the current ICRP recommended annual occupational dose limit of 20 mSv.

Abbreviations

BC_a :: Bias-corrected accelerated
CEDR:: Comprehensive Epidemiological Data Resource
CI:: Confidence Interval
DS02:: Dosimetry System 2002
DS86:: Dosimetry System 1986
ERR:: Excess Relative Risk
ICRP:: International Commission on Radiological Protection
K-S:: Kolmogorov-Smirnov
LF:: Least Favourable
LRT:: Likelihood Ratio Test
LSS:: Life Span Study
MLE:: Maximum Likelihood Estimate
mSv:: milliSievert
p-y:: person-years
RERF:: Radiation Effects Research Foundation.

References

ICRP: 1990 Recommendations of the International Commission on Radiological Protection Publication 60, Annals of the ICRP 21 (1-3). 1991, Oxford
Google Scholar
Dropkin G: Low dose radiation and cancer in A-bomb survivors. Environ Health. 2007, 6: 1-10.1186/1476-069X-6-1.
Article Google Scholar
Little Mark: Statistical problems with analysis of Dropkin. [http://www.ehjournal.net/content/6/1/1/comments#260548]
Pierce DA, Shimizu Y, Preston DL, Vaeth M, Mabuchi K: Studies of the Mortality of Atomic bomb survivors. Report 12, Part 1, Cancer: 1950-1990. Radiat Res. 1996, 146 (1): 1-27. 10.2307/3579391.
Article CAS Google Scholar
Dudley R: MIT OpenCourseWare | Mathematics | 18.466 Mathematical Statistics, Spring 2003 | Lecture Notes. [http://ocw.mit.edu/OcwWeb/Mathematics/18-466Mathematical-StatisticsSpring2003/LectureNotes/]
Efron B, Tibshirani RJ: An Introduction to the Bootstrap. 1993, New York: Chapman & Hall
Book Google Scholar
Davison AC, Hinkley DV: Bootstrap Methods and their Application. 1997, Cambridge: Cambridge University Press
Book Google Scholar
R Development Core Team: R: A language and environment for statistical computing. 2007, Vienna, Austria: R Foundation for Statistical Computing, [http://www.R-project.org]
Google Scholar
RERF (Radiation Effects Research Foundation). [http://www.rerf.or.jp/]
CEDR (Comprehensive Epidemiological Data Resource) Data File Set JALSSA03 File ID: R12CANC. [http://cedr.lbl.gov/cgi-bin/spiface/find/cedrfile/?FILE=1040], accessed 19 September 2002
R source code based on Nash JC: Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation. 1990, CRC Press
Google Scholar
DiCiccio TJ, Efron B: Bootstrap Confidence Intervals. Stat Sci. 1996, 11 (3): 189-228. 10.1214/ss/1032280214. [http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.ss/1032280214]
Article Google Scholar
Pinheiro JC, Bates DM: Mixed-effects models in S and S-PLUS. 2000, New York: Springer
Book Google Scholar
Venables WN, Ripley BD: Modern Applied Statistics with S. 2002, New York: Springer, 4
Book Google Scholar
Geyer C: Kolmogorov-Smirnov and Lilliefors Tests. lecture notes Statistics. 5601, [http://www.stat.umn.edu/geyer/5601/examp/kolmogorov.html#one-test] , Univ of Minnesota
Google Scholar
Brenner DJ, Little JB, Sachs RK: The bystander effect in radiation oncogenesis: II. A quantitiative model. Radiat Res. 2001, 155 (3): 402-408. 10.1667/0033-7587(2001)155[0402:TBEIRO]2.0.CO;2.
Article CAS Google Scholar
Geras'kin SA, Oudalova AA, Kim JK, Dikarev VG, Dikareva NS: Cytogenetic effect of low dose γ-radiation in Hordeum vulgare seedlings: non-linear dose-effect relationship. Radiat Environ Biophys. 2007, 46: 31-41. 10.1007/s00411-006-0082-z.
Article Google Scholar
Davies RB: Hypothesis Testing When a Nuisance Parameter is Present Only Under the Alternative. Biometrika. 1977, 64 (2): 247-254. 10.2307/2335690.
Article Google Scholar
Davies RB: Hypothesis Testing When a Nuisance Parameter is Present Only Under the Alternatives. Biometrika. 1987, 74 (1): 33-43. 10.1093/biomet/74.1.33.
Google Scholar
Cheng RCH, Traylor R: Non-Regular Maximum Likelihood Problems. J R Stat Soc Series B Stat Methodol. 1995, 57 (1): 3-44. [http://www.jstor.org/pss/2346086]
Google Scholar
Andrews DWK, Ploberger W: Admissibility of the Likelihood Ratio Test When a Nuisance Parameter is Present Only Under the Alternative. Ann Stat. 1995, 23 (5): 1609-1629. 10.1214/aos/1176324316.
Article Google Scholar
Self GS, Liang KY: Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions. J Am Stat Assoc. 1987, 82 (398): 605-610. 10.2307/2289471.
Article Google Scholar

Download references

Acknowledgements

The very extensive simulations were shared out across a bank of PCs. Special thanks to Tom Bimpson, Phil Boyd, Jeremy Hawthorne, Peter Mayne, Paul Grunnill, Helen Marks, Paul Mayne, and Martin Grunnill for their help. Thanks to Chris Roberts for suggesting the use of R and the bootstrap, and to David Richardson for continuing correspondence and encouragement. Thanks to Duncan Thomas for pointing out the uncertainty in optimal latency, and for other criticisms during peer review. Thanks to Mark Little for criticisms of [2] which provoked this paper.

This work was not funded.

The data described in this report were obtained from the Radiation Effects Research Foundation (RERF) via the Comprehensive Epidemiologic Data Resource. RERF is a private foundation that is funded equally by the Japanese Ministry of Health and Welfare and the US Department of Energy through the US National Academy of Sciences. The conclusions reached in this report are mine and do not necessarily reflect the scientific judgement of RERF or its funding agencies.

Author information

Authors and Affiliations

Flat 5, 32 Sheil Road, Liverpool, L6 3AE, UK
Greg Dropkin

Authors

Greg Dropkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Greg Dropkin.

Additional information

Competing interests

The author declares that they have no competing interests.

Authors' contributions

GD conceived of and designed the study and carried out the statistical analysis.

Electronic supplementary material

12940_2009_275_MOESM1_ESM.ZIP

Additional file 1:Contains index file for downloading 4 Tables, 3 Code Files, 2 Commentary Files, and 5 Data Files.(ZIP 175 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Dropkin, G. Reanalysis of cancer mortality in Japanese A-bomb survivors exposed to low doses of radiation: bootstrap and simulation methods. Environ Health 8, 56 (2009). https://doi.org/10.1186/1476-069X-8-56

Download citation

Received: 03 February 2009
Accepted: 09 December 2009
Published: 09 December 2009
DOI: https://doi.org/10.1186/1476-069X-8-56

Reanalysis of cancer mortality in Japanese A-bomb survivors exposed to low doses of radiation: bootstrap and simulation methods

Abstract

Background

Methods

Results

Conclusion

Background

Methods

Bootstrap

Simulation of LRT

Latency

City, dose range, and gender

Results

Discussion

Conclusion

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Environmental Health

Contact us