Calculating return level of extreme precipitation and temperature using Extreme Events Analysis.
Alexandre Martinez, 2018
In this activity, we have monthly climate variable data (precipitation and temperature). We want to perform extreme events analysis and calculate the return levels associated with 2, 10, 25, 50, and 100 years events. We learn how to fit a GEV distribution for extreme events analysis and how to calculate the confidence intervals of the analysis. We learn how to be critical about the results calculated, how to ensure the quality of the calculations performed and to put the results calculated with model in perspective to the empirical results. We will also use full and half datasets to reveal the importance of long climate record.
Learning Objectives
- Fitting a distribution to data for Extreme Events Analysis.
- Calculating return levels.
- Calculting confidence intervals for each return periods.
Data import
clear; close all; clc;
% IMPORT AND PLOTTING THE DATA
data = load('TempPrecip.txt');
First let’s see how is the data format. It is usually documented on the data providers but it is always good to check again.
disp(data(1:5,:))
1.0e+05 *
1.9010 | -0.0000 | 0.0002 |
1.9010 | -0.0001 | 0.0004 |
1.9010 | 0.0000 | 0.0007 |
1.9010 | 0.0001 | 0.0007 |
1.9010 | 0.0002 | 0.0007 |
disp(data(1,1))
190101
We can see that the data format is YYYYMM TEMP PRCP, we now need to convert it into a format easy to use and we cna plot the data to visually try to detect any major errors/bug/missing data.
monthV = 1901+1/12:1/12:2010;
figure()
subplot(2,1,1)
plot(monthV,data(:,2),'r');
ylabel('Temperature [C]');
title('Monthly Temperature Maxima');
subplot(2,1,2)
plot(monthV,data(:,3),'b');
xlabel('Year');
ylabel('Precipitation [mm]');
title('Monthly Precipitation Maxima');
We have monthly data and we need annual maxima data for both variables.
yearV = 1901:1:2008;
for iYear = 1:length(yearV)
temp(iYear) = max(data(1+12*(iYear-1):1+12*(iYear-1)+12,2));
prcp(iYear) = max(data(1+12*(iYear-1):1+12*(iYear-1)+12,3));
end
We can plot annual maxima.
figure();
subplot(2,1,1);
plot(yearV,temp);
title('Annual Monthly Temperature Maxima');
xlabel('Year');
ylabel('Temperature [C]');
grid on
subplot(2,1,2);
plot(yearV,prcp);
title('Annual Monthly Precipitation Maxima');
xlabel('Year');
ylabel('Precipitation [mm]');
grid on
Return periods
We want to know the return levels of the following return periods: 2, 10, 25, 50, 100 using a GEV distribution. First, we need to calculate the distribution parameters.
Return periods
We enter the return periods (in years) and calculate the equivalent probabilities.
returnT = [2;10;25;50;100];
returnP = 1-1./returnT;
Distribution fitting
Now, we want to fit the GEV distribution for Extreme Events Analysis. There is a Matlab function to get the parameters of the distribution.
parmhatP = gevfit(prcp);
parmhatT = gevfit(temp);
Return levels
Now we can calculate the return levels associated with each return periods and for both variables.
v1 = gevinv(returnP,parmhatT(1),parmhatT(2),parmhatT(3));
w1 = gevinv(returnP,parmhatP(1),parmhatP(2),parmhatP(3));
returnpV = 1.1:100;
returnTp = 1-1./returnpV;
Let’s plot the results
figure()
subplot(2,1,1)
returnLevel_T = gevinv(returnTp,parmhatT(1),parmhatT(2),parmhatT(3));
plot((returnpV),returnLevel_T,'k',returnT,v1,'k*');hold on
ylim([23 31]); grid on
title('Temperature Maxima');
xlabel('Return period [years]');
ylabel('Return level [deg F]');
subplot(2,1,2)
returnLevel_P = gevinv(returnTp,parmhatP(1),parmhatP(2),parmhatP(3));
plot(returnpV,returnLevel_P,'k',returnT,w1,'k*');hold on
ylim([150 500]);
title('Precipitation');
xlabel('Return period [years]');
ylabel('Return level [mm/mo]');
grid on
Discussion
Are we happy with the resutls? What is the return levels of an 80 years event for temperature or precipitation? Can we say it is 28 degree and 300 mn?Before rushing to the results, we should check our assumptions: we fit a distribution to some data. Is the choice of distribution correct? Is it fitting our data?
figure();
subplot(1,2,1);
histogram(prcp,0:50:400,'FaceColor',[.8 .8 1]);
xgrid = linspace(0,400,1000);
line(xgrid,5008*gevpdf(xgrid,parmhatP(1),parmhatP(2),parmhatP(3)));
xlabel('');
ylabel('');
title('Precipitation');
subplot(1,2,2);
histogram(temp,18:1:30,'FaceColor',[.8 .8 1]);
xgrid = linspace(18,30,100);
line(xgrid,109*gevpdf(xgrid,parmhatT(1),parmhatT(2),parmhatT(3)));
xlabel('');
ylabel('');
title('Temperature');
It is hard to say from the PDF, what about the CDF? Let’s plot in red the empirical CDF and in blue the modeled one (GEV).
[FP,yiP] = ecdf(prcp); % Empirical CDF
[FT,yiT] = ecdf(temp); % Empirical CDF
figure()
subplot(2,1,1)
stairs(yiP,FP,'r-');
hold on
plot(yiP,gevcdf(yiP,parmhatP(1),parmhatP(2),parmhatP(3)),'b-');
title('GEV fit for precipitation');
subplot(2,1,2)
stairs(yiT,FT,'r-');
hold on
plot(yiT,gevcdf(yiT,parmhatT(1),parmhatT(2),parmhatT(3)),'b-');
title('GEV fit for temperature');
We can see that most of the lines are superposed to each other but not all of them. Is it ok?
K-S test for distribution fitting
Instead of visual fitting, we should make a test of the distribution fit.Let’s make an hypothesis H0 that the GEV we fitted and the empirical data are from the same distribution. Let’s now test this hypothesis using a KS-test.
hP = kstest2(FP,gevcdf(yiP,parmhatP(1),parmhatP(2),parmhatP(3)))
hP = logical 0
Both tests are returning 0, which means that we CANNOT reject the Null Hypothesis H0. We should so accept the Null Hypothesis. We can now be sure that the GEV distribution we used is significantly fitting our experimental data.
Confidence Interval
Now that we are sure that our GEV function is the good distribution to choose, let’s check the confidence intervals. Let’s first recalculate GEV fit with the 95% confidence interval
[parmhatP,parmciP] = gevfit(prcp);
[parmhatT,parmciT] = gevfit(temp);
And let’s see the results
returnT = [2;10;25;50;100];
returnP = 1-1./returnT;
v1 = gevinv(returnP,parmhatT(1),parmhatT(2),parmhatT(3));
v2 = gevinv(returnP,parmciT(1,1),parmciT(1,2),parmciT(1,3));
v3 = gevinv(returnP,parmciT(2,1),parmciT(2,2),parmciT(2,3));
w1 = gevinv(returnP,parmhatP(1),parmhatP(2),parmhatP(3));
w2 = gevinv(returnP,parmciP(1,1),parmciP(1,2),parmciP(1,3));
w3 = gevinv(returnP,parmciP(2,1),parmciP(2,2),parmciP(2,3));
returnpV = 1.1:100;
returnTp = 1-1./returnpV;
And plot them again.
figure()
subplot(2,1,1)
returnLevel_T = gevinv(returnTp,parmhatT(1),parmhatT(2),parmhatT(3));
returnLevel_Tb = gevinv(returnTp,parmciT(1,1),parmciT(1,2),parmciT(1,3));
returnLevel_Tu = gevinv(returnTp,parmciT(2,1),parmciT(2,2),parmciT(2,3));
plot((returnpV),returnLevel_T,'k',returnT,v1,'k*');hold on
plot(returnpV,returnLevel_Tb,'b',returnT,v2,'b*');
plot(returnpV,returnLevel_Tu,'r',returnT,v3,'r*');
ylim([23 31]); grid on
title('Temperature Maxima');
xlabel('Return period [years]');
ylabel('Return level [deg F]');
subplot(2,1,2)
returnLevel_P = gevinv(returnTp,parmhatP(1),parmhatP(2),parmhatP(3));
returnLevel_Pb = gevinv(returnTp,parmciP(1,1),parmciP(1,2),parmciP(1,3));
returnLevel_Pu = gevinv(returnTp,parmciP(2,1),parmciP(2,2),parmciP(2,3));
plot(returnpV,returnLevel_P,'k',returnT,w1,'k*');hold on
plot(returnpV,returnLevel_Pb,'r',returnT,w2,'r*');
plot(returnpV,returnLevel_Pu,'b',returnT,w3,'b*');
ylim([150 500]);
title('Precipitation');
xlabel('Return period [years]');
ylabel('Return level [mm/mo]');
grid on
Now we can see that a 80 years event is between 26 to 30 degree and between 250 to 450 mm precipitation.How is uncertainty changing with the return period?
figure() % plot uncertainty vs return period.
plot(returnpV,100*(returnLevel_Pu-returnLevel_Pb)./returnLevel_P);
hold on
plot(returnpV,100*(returnLevel_Tu-returnLevel_Tb)./returnLevel_T);
title('Uncertainty on return level');
xlabel('Return period [years]');
ylabel('Relative error [%]');
legend('Precipitation','Temperature');
We can see that the relative uncertainty is increasing with time. Does it make sense? Why is uncertainty higher for precipitation than for temperature?
Using Half dataset
What if our dataset doesn’t have such a long record? Let’s do the same analysis.Let’s cut the dataset in two.
prcp = prcp(1:51);
temp = temp(1:51);
Let’s fit the GEV.
[parmhatP,parmciP] = gevfit(prcp);
[parmhatT,parmciT] = gevfit(temp);
And calculate the return levels.
returnT = [2;10;25;50;100];
returnP = 1-1./returnT;
v1 = gevinv(returnP,parmhatT(1),parmhatT(2),parmhatT(3));
v2 = gevinv(returnP,parmciT(1,1),parmciT(1,2),parmciT(1,3));
v3 = gevinv(returnP,parmciT(2,1),parmciT(2,2),parmciT(2,3));
w1 = gevinv(returnP,parmhatP(1),parmhatP(2),parmhatP(3));
w2 = gevinv(returnP,parmciP(1,1),parmciP(1,2),parmciP(1,3));
w3 = gevinv(returnP,parmciP(2,1),parmciP(2,2),parmciP(2,3));
returnpV = 1.1:100;
returnTp = 1-1./returnpV;
And plot them again.
figure()
subplot(2,1,1)
returnLevel_T = gevinv(returnTp,parmhatT(1),parmhatT(2),parmhatT(3));
returnLevel_Tb = gevinv(returnTp,parmciT(1,1),parmciT(1,2),parmciT(1,3));
returnLevel_Tu = gevinv(returnTp,parmciT(2,1),parmciT(2,2),parmciT(2,3));
plot((returnpV),returnLevel_T,'k',returnT,v1,'k*');hold on
plot(returnpV,returnLevel_Tb,'b',returnT,v2,'b*');
plot(returnpV,returnLevel_Tu,'r',returnT,v3,'r*');
ylim([23 35]); grid on
title('Temperature Maxima');
xlabel('Return period [years]');
ylabel('Return level [deg F]');
subplot(2,1,2)
returnLevel_P = gevinv(returnTp,parmhatP(1),parmhatP(2),parmhatP(3));
returnLevel_Pb = gevinv(returnTp,parmciP(1,1),parmciP(1,2),parmciP(1,3));
returnLevel_Pu = gevinv(returnTp,parmciP(2,1),parmciP(2,2),parmciP(2,3));
plot(returnpV,returnLevel_P,'k',returnT,w1,'k*');hold on
plot(returnpV,returnLevel_Pb,'r',returnT,w2,'r*');
plot(returnpV,returnLevel_Pu,'b',returnT,w3,'b*');
ylim([150 600]);
title('Precipitation');
xlabel('Return period [years]');
ylabel('Return level [mm/mo]');
grid on
Consequences
Now we see that the return levels for a 80 years event is between 26 to 33 degree and 200 to 600 mm precipitation. Does it make any sense to calculate a 80 years event with only 50 years of data?