COVID-19: Europe

Analysing the incidence of population density on COVID19 infection in Europe

Multiple Regression – COVID19 Confir./1000 inh. 04/24
Dependent variable: COVID19 Confir./1000 inh. 04/24
Independent variables:
density (per km2)
Area (km2)

.                                 Standard       T
Parameter                Estimate   Error    Statistic   P-Value
CONSTANT                 0.790899  0.39803    1.98703    0.0580
density (per km2)        0.0057684 0.00246611 2.33907    0.0276

Analysis of Variance
Source                Sum of Squares Df   Mean Square F-Ratio P-Value
Model                   8.10235      1     8.10235     5.47   0.0276
Residual               37.0223      25     1.48089
Total (Corr.)          45.1246      26

R-squared = 17.9555 percent
R-squared (adjusted for d.f.) = 14.6737 percent
Standard Error of Est. = 1.21692
Mean absolute error = 0.928434
Durbin-Watson statistic = 1.77972 (P=0.2894)
Lag 1 residual autocorrelation = 0.108316

Stepwise regression
Method: forward selection
P-to-enter: 0.05
P-to-remove: 0.05



COVID-19 : Hospital Bed Requirements and Testing Strategy


This analysis is based on France data : COVID-19 Cases in hospitals 14/04/2020, by “département”.

This is an analysis based on public data, and subject to revisions or errors including the processing.

Data sources: Géodes – données en Santé Publique, INSEE.


As a consequence of patients being displaced to hospitals in other départements than the origin,  the number of beds decreased artificially in places (in the “Est région” mainly) and artificially increased in others (e.g. Loire ?).

What is the utility of publishing area (“départements”) statistics, if they do not reflect the cases originated in the area ?

This has an impact mainly on the relationship between population density and hospitalisation requirements. It however seems negligible in the present analysis.

1. Hospital bed requirements

This analysis stresses the major role plaid by population density as a factor of COVID-19 transmission. It leads to a distinction between 4 areas, according to population density, with practical consequences on the demand of equipped beds. I refer to chart 1

Numbers of beds are rough estimates based on the slope of the regression line.

1 – Areas with a population density lower than 1000 inhabitants per km² (0.386 square mile) : COVID-19 bed requirements : none

2 – Areas with a population density between 1000 and 4000 inhabitants per km² (0.386 square mile) : COVID-19 bed requirements : 0.5 per 1000 inhabitants.

3 – Areas with a population density between 4000 and 12000 inhabitants per km² (0.386 square mile) : COVID-19 bed requirements : 1.5 per 1000 inhabitants.

4 – Areas with a population density above 12000 inhabitants per km² (0.386 square mile) : COVID-19 bed requirements : 2.5 per 1000 inhabitants. In this category are cities such as Paris, New York, Wuhan, etc.

Of course, due to the fact that the outbreak was sudden – at least felt as sudden – it led to an overwhelming of the health system in the largest cities of the world. But measures designed to slow down the circulation of people in these areas – and in and out of these areas -, including total lockdowns, have been effective and led to dramatically reduce the transmission rate of the virus, hence hospitalisation requirements However, because of the average length of hospitalisation being above 10 days, this doesn’t immediately show, neither in hospitalisation-, nor, in deaths- statistics.

2. Testing Strategy

It seems impossible in most countries to test a large number of people over a short span of time. This would however provide the appropriate knowledge of the degree of infection of the entire population.

Alternatively, sampling techniques can be used. These are techniques used in political polls. The main difficulty of such a methodology is that, to be able to generate valid statistical inferences, it requires a random choice of the sample over the entire population, which is not cost efficient. Therefore, most such surveys are in effect run on basis of reasoned choices, i.e. quotas or ratios defining clusters drawn from the entire population which in an aggregated form are assumed to be valid representations of the entire population, as would be, for example in politics, electoral constituencies representing the national choice.

Concerning COVID-19 testing, I propose to choose sampling clusters according to the density of population criterium, distinguishing the afore mentioned 4 categories of density, each with 5000 tests randomly chosen in the category. Of course, the estimate would gain in accuracy if the number of tests is increased in each cluster. And this would be facilitated in areas where population is concentrated. Hence, a variation of the methodology could be :

10000 tests randomly chosen in areas of category 4

3000 tests randomly chosen in areas of category 3

2000 tests randomly  chosen  in areas of category 2

1000 tests randomly chosen in areas of category 1

3. Analysis


Both population and density of population are significant factors in explaining the COVID-29 outbreak in France. The adjusted R squared, now (i.e. patients in hospitals to April 14th 2020) is slightly above 80%,  close to reach the norm of 85%  (see also F and T statistics).

The slopes of the least squares adjusted lines continue rising, especially concerning population density. The concentration in population is the main influence behind the increases. The coefficient has risen, so far, to 0,13. It is to be stressed that it represents accumulated hospital entries (net of outgoings) between March 18 and April 14. This reflects the length of the average hospitalisation time (more than 10 days). It represents the increase in hospitals bed requirements, when population density increases by 1%. In other words, for every 10%  increase in population density, there is a need for 1.3 additional hospitalisation bed resulting from COVID-19 infection in highly populated areas (above our previous estimate of 1). E.g. with a density of 20860, the needs for Paris are of some additional 2712 beds, as of April 14th .

The incidence of population remains second, and has only slightly increased, from a coefficient of 0.3 per thousand in the previous estimate to 0,4.

3.2. Statistical modelling

Multiple Regression - COVID19 in hospitals 200414
Dependent variable: COVID19 in hospitals 200414
Independent variables:
Population .  
 .                             Standard           T 
Parameter      Estimate          Error         Statistic      P-Value
CONSTANT      -61.9653         38.9732        -1.58995        0.1151
Density         0.128287        0.0108429     11.8314         0.0000
Population      0.000468948     0.0000512331   9.15322        0.0000
Analysis of Variance
Source         Sum of Squares     Df      Mean Square  F-Ratio    P-Value
Model          2.15985E7          2       1.07993E7     203.49     0.0000 
Residual       5.2008E6          98        53069.3
Total (Corr.)  2.67993E7        100 
R-squared = 80.5936 percent
R-squared (adjusted for d.f.) = 80.1975 percent
Standard Error of Est. = 230.368
Mean absolute error = 154.588
Durbin-Watson statistic = 1.46077 (P=0.0031)
Lag 1 residual autocorrelation = 0.26664

The output shows the results of fitting a multiple linear regression model to describe the relationship between COVID19 in hospitals 200414 and 2 independent variables.

The equation of the fitted model is : ‘ f ne ?” XD en ces Cède à

COVID19 in hospitals 200414 = -61.9653 + 0.128287*Density + 0.000468948*Population

Since the P-value in the ANOVA table is less than 0.05, there is a statistically significant relationship between the variables at the 95.0% confidence level. The R-Squared statistic indicates that the model as fitted explains 80.5936% of the variability in COVID19 in hospitals 200414. The adjusted R-squared statistic, which is more suitable for comparing models with different numbers of independent variables, is 80.1975%. The standard error of the estimate shows the standard deviation of the residuals to be 230.368. This value can be used to construct prediction limits for new observations by selecting the Reports option from the text menu. The mean absolute error (MAE) of 154.588 is the average value of the residuals. The Durbin-Watson (DW) statistic tests the residuals to determine if there is any significant correlation based on the order in which they occur in your data file. Since the P-value is less than 0.05, there is an indication of possible serial correlation at the 95.0% confidence level.

NB. The serial autocorrelation is mainly explained

model can be simplified, notice that the highest P-value on the independent variables is 0.0000, belonging to Population. Since the P-value is less than 0.05, that term is statistically significant at the 95.0% confidence level.

3.3. Charts

With the extension of the infection, density of population clearly appears to be the dominant factor. A lesson to be learned for future urbanistic requirements.

Chart 1Dep200414density

The above chart is the most interesting. It clearly shows the major incidence of population density, After the exceptional cases in the East of France départements (here appearing with a diminishing role in the total picture), The high density of the Paris area and other nearby départements (Hauts-de-Seine, Seine-Saint-Denis, Val-de-Marne…) are driving forces and, just after, Bouches-du-Rhône (Marseille is the second populated city in France) and Rhône (Lyon, third populated city).

Chart 2Dep200414population

Chart 3Dep200414predicted

Chart 4Dep200414residuals.png