France : COVID19 – Cases in Hospitals 29/03/2020

https://wp.me/p7ciWq-kj

Both population and density of population are significant factors in explaining the COVID-29 outbreak in France. With an adjusted R squared of 74.4%, the regression continues to gain in signification (see also F and T statistics)

The slopes of the least squares adjusted lines continue rising, especially concerning popularion density. In other words the concentration of population is the main influence behind the expected increases. A decline will signal that the peak in the pandemic has been passed.

This is an analysis based on public data, and subject to revisions or errors including the processing.

Data sources: Géodes – données en Santé Publique, INSEE.

Analysis

Multiple Regression – COVID19 in hospitals 200329
Dependent variable: COVID19 in hospitals 200329
Independent variables:
Inhabitants per km²
Population

.                                                                               Standard                  T
Parameter                                       Estimate           Error                Statistic    P-Value
CONSTANT                                     -26.9713            26.9506             -1.00077     0.3194
Inhabitants per km²                       0.0768737        0.00749688    10.2541       0.0000
Population                                       0.000266677     0.000035429    7.52708     0.0000

Analysis of Variance
Source                                     Sum of Squares    Df           Mean Square   F-Ratio  P-Value
Model                                          7.43789E6           2               3.71894E6       146.61  0.0000
Residual                                      2.48582E6        98                25365.5
Total (Corr.)                               9.92371E6       100

R-squared = 74.9507 percent
R-squared (adjusted for d.f.) = 74.4395 percent
Standard Error of Est. = 159.266
Mean absolute error = 97.8685
Durbin-Watson statistic = 1.31442 (P=0.0002)
Lag 1 residual autocorrelation = 0.33913

The StatAdvisor
The output shows the results of fitting a multiple linear regression model to describe the relationship between COVID19 in hospitals 200329 and 2 independent variables. The equation of the fitted model is

COVID19 in hospitals 200329 = -26.9713 + 0.0768737*Inhabitants per km² + 0.000266677*Population

Since the P-value in the ANOVA table is less than 0.05, there is a statistically significant relationship between the variables at the 95.0% confidence level.

The R-Squared statistic indicates that the model as fitted explains 74.9507% of the variability in COVID19 in hospitals 200329. The adjusted R-squared statistic, which is more suitable for comparing models with different numbers of independent variables, is 74.4395%. The standard error of the estimate shows the standard deviation of the residuals to be 159.266. This value can be used to construct prediction limits for new observations by selecting the Reports option from the text menu. The mean absolute error (MAE) of 97.8685 is the average value of the residuals. The Durbin-Watson (DW) statistic tests the residuals to determine if there is any significant correlation based on the order in which they occur in your data file. Since the P-value is less than 0.05, there is an indication of possible serial correlation at the 95.0% confidence level. Plot the residuals versus row order to see if there is any pattern that can be seen.

In determining whether the model can be simplified, notice that the highest P-value on the independent variables is 0.0000, belonging to Population. Since the P-value is less than 0.05, that term is statistically significant at the 95.0% confidence level. Consequently, you probably don’t want to remove any variables from the model.

Charts

Dep20329predicted

Dep20329density

Dep20329population

Dep20329residuals

France : COVID19, Cases in Hospitals 27/03/2020

https://wp.me/p7ciWq-k6

Please notice that, here, density is expressed in 000’s (thousands) inhabitants per km²

Both population and density of population are significant factors in explaining the COVID-29 outbreak in France. The regression gains in signification (see F and T statistics)

Notice that, as anticipated, the slopes of the least squares adjusted lines are rising. A decline will signal that the peak in the pandemic has been passed.

This is an analysis based on public data, and subject to revisions or errors including the processing.

Data sources: Géodes – données en Santé Publique, INSEE.

Analysis

Multiple Regression – COVID19 in hospitals 200327
Dependent variable: COVID19 in hospitals 200327
Independent variables:
Population
Th. Inhabitants per km²

Standard                     T
Parameter                               Estimate             Error                  Statistic     P-Value
CONSTANT                    –         19.387               24.1899              -0.801452     0.4248
Population                           0.000213978    0.0000317997         6.72892       0.0000
Th. Inhabitants per km²      61.2939              6.72891                9.10904       0.0000

Analysis of Variance
Source                                  Sum of Squares   Df       Mean Square    F-Ratio  P-Value
Model                                      4.75232E6          2         2.37616E6          116.28      0.0000
Residual                                  2.00262E6        98            20434.9
Total (Corr.) 6.75494E6 100

R-squared = 70.3532 percent
R-squared (adjusted for d.f.) = 69.7482 percent
Standard Error of Est. = 142.951
Mean absolute error = 84.8147
Durbin-Watson statistic = 1.18944 (P=0.0000)
Lag 1 residual autocorrelation = 0.402385

The StatAdvisor
The output shows the results of fitting a multiple linear regression model to describe the relationship between COVID19 in hospitals 200327 and 2 independent variables. The equation of the fitted model is

COVID19 in hospitals 200327 = -19.387 + 0.000213978*Population + 61.2939*Th. Inhabitants per km²

Since the P-value in the ANOVA table is less than 0.05, there is a statistically significant relationship between the variables at the 95.0% confidence level.

The R-Squared statistic indicates that the model as fitted explains 70.3532% of the variability in COVID19 in hospitals 200327. The adjusted R-squared statistic, which is more suitable for comparing models with different numbers of independent variables, is 69.7482%. The standard error of the estimate shows the standard deviation of the residuals to be 142.951. This value can be used to construct prediction limits for new observations by selecting the Reports option from the text menu. The mean absolute error (MAE) of 84.8147 is the average value of the residuals. The Durbin-Watson (DW) statistic tests the residuals to determine if there is any significant correlation based on the order in which they occur in your data file. Since the P-value is less than 0.05, there is an indication of possible serial correlation at the 95.0% confidence level. Plot the residuals versus row order to see if there is any pattern that can be seen.

In determining whether the model can be simplified, notice that the highest P-value on the independent variables is 0.0000, belonging to Population. Since the P-value is less than 0.05, that term is statistically significant at the 95.0% confidence level. Consequently, you probably don’t want to remove any variables from the model.

Charts

Dep200327Predicted

Dep200327Pop

Dep200327DenDep200327Res

 

France : COVID19 – Cases in Hospitals 25/03/2020

https://wp.me/p7ciWq-jN

This new regression, on French Départements data confirms the intuition : both population and density of population are significant factors in explaining the COVID-29 outbreak in France.

NB. Regional data retained density only as significant independent variable, reflecting mostly the high degree of centralisation in France.

Notice that, for population, the slope of the least squares adjusted line is 0.00016163. It is expected to rise in the coming weeks.

This is an analysis based on public data, and subject to revisions or errors including the processing.

Data sources: Géodes, données en Santé Publique, INSEE.

Here, the analysis

Multiple Regression – COVID19 Cases in hospitals
Dependent variable: COVID19 Cases in hospitals
Independent variables:
Population
Density

Standard                   T
Parameter        Estimate        Error                 Statistic           P-Value
CONSTANT      -12.9895         20.4993           -0.633655           0.5278
Population      0.00016163    0.0000269478    5.99787           0.0000
Density              0.0469516    0.00570321        8.23248           0.0000

Analysis of Variance
Source          Sum of Squares   Df        Mean Square     F-Ratio P-Value
Model                 2.75644E6         2        1.37822E6          93.87     0.0000
Residual             1.43885E6       98        14682.1
Total (Corr.)    4.19529E6        100

R-squared = 65.7032 percent
R-squared (adjusted for d.f.) = 65.0033 percent
Standard Error of Est. = 121.17
Mean absolute error = 71.1496
Durbin-Watson statistic = 1.20888 (P=0.0000)
Lag 1 residual autocorrelation = 0.392756

COVID19 Cases in hospitals = -12.9895 + 0.00016163*Population + 0.0469516*Density

The prediction plot

Dep200325d

The components charts

Dep200325

Dep200325b

The residual plot

Dep200325c

The residual plot mainly reflects the incidence of départements from the “Grand Est”, région, (to some extend also the Rhône). In these areas the outbreak was earlier and linked to a specific origin.

Italy : COVID-19 16/03/2020

https://wp.me/p7ciWq-jk

This is a multilinear regression based on the data published by the Istituto Superiore di Sanita “Epidemia COVID-19 Aggiornamentao nazionale 16 marzo 2020 -ore 16:00”

The data pattern and analysis reveals quite different from the data here analysed concerning France.

Lombardia looks apart from the other regions.

Emillia Romagna looks in the statistical range of other regions.

Population plays the main role, in contrast to France where density of population supersedes. I’m inclined to believe that after a delay (the 2 to 3 weeks to come ?), in France, the intrinsic population number will supersede. But, still, the demographic structure in France, with a high disequilibrium resulting from political centralization, has a specific incidence.

I’m puzzled with the statistics for Italy. The correlation coefficient is weaker than in France, despite more degrees of freedom and more cases.

Hopefully some of my Italian friends will have an explanation or provide guidelines for a further investigation.

Italy 1603b

covid190324d_analysis4_graph1.png

France : COVID-19 24/03/2020

https://wp.me/p7ciWq-iT

This is a multilinear regression on the daily statistics of COVID-19, by region as published by the French public health agency “Santé Publique France”.

Density is the number of inhabitants per km². This is the dominant independent variable, since we possibly are in the early stage of the epidemic. It shows the relevance of the urban incidence (to relate with Wuhan and New York data, for examples).

The fitted model is without any transformation (LOG and Box-Cox have been tried, with lower R²)

The slope of the adjustment line remains rising, to 0.010, from 0.0056 on March 20, 0.0077 (March 21), 0.0086 (March 22), 0.0097 (March 23), indicating that the disease is  expanding.

Notice that the fitting is slightly deteriorating, signalling either (or both) some adjustment in data collection or the fact that the influence of outliers is increasing.

covid190324c_analysis3_table1.png

COVID200324.gif

 

 

France : COVID-19 23/03/2020

https://wp.me/p7ciWq-iG

This is a multilinear regression on the daily statistics of COVID-19, by region as published by the French public health agency “Santé Publique France”.

Density is number of inhabitants per km². This is the dominant explanatory factor, possibly because we are in the early stage of the epidemic. Anyway, it shows the relevance of the urban incidence (to relate with Wuhan and New York data, for examples)

The fitted model is without any transformation (LOG and Box-Cox have been tried, with lower R²)

The slope of the adjustment line continues to rise, to almost 0.01, from 0.0073 in the previous day. This indicates that the disease continues tu expand.

covid190323b_analysis6_table1.png

covid190323b_analysis6_graph1.png