This notebook deals with exploratory data analysis (EDA) of various meteorological parameters viz temperature, relative humidity, wind speed, sunshine hours, and solar radiation along with estimated reference evapotranspiration.
import numpy as np # To create and deal with arrays
import pandas as pd # To manage and create various datasets
import matplotlib.pyplot as plt # To plot graphs
import seaborn as sns # To plot scientific graphs
import pymannkendall as mk # To find out trend in time series parameters viz temperature, evapotranspiration
df = pd.read_excel("climate.xlsx")
df
J | Tmax | Tmin | RHmax | RHmin | Wind_speed | n | n/N | Rns | VPD | ETo | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 15.8 | 1.0 | 96 | 57.0 | 0.256 | 6.5 | 0.646 | 8.647 | 0.399 | 1.026 |
1 | 2 | 20.4 | 0.3 | 91 | 31.0 | 0.256 | 8.0 | 0.795 | 9.794 | 0.855 | 1.166 |
2 | 3 | 15.0 | 5.2 | 91 | 64.0 | 0.614 | 0.5 | 0.050 | 4.169 | 0.347 | 0.953 |
3 | 4 | 20.2 | 5.4 | 97 | 46.0 | 0.460 | 8.8 | 0.873 | 10.445 | 0.653 | 1.419 |
4 | 5 | 21.4 | 4.8 | 94 | 48.0 | 0.486 | 8.5 | 0.842 | 10.246 | 0.688 | 1.464 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9850 | 361 | 19.6 | 4.8 | 97 | 50.0 | 0.384 | 7.3 | 0.728 | 9.171 | 0.583 | 1.248 |
9851 | 362 | 20.2 | 5.0 | 97 | 51.0 | 0.614 | 8.3 | 0.827 | 9.929 | 0.593 | 1.432 |
9852 | 363 | 18.8 | 4.6 | 97 | 45.0 | 0.563 | 8.6 | 0.857 | 10.168 | 0.610 | 1.381 |
9853 | 364 | 19.2 | 4.0 | 97 | 45.0 | 0.230 | 5.0 | 0.498 | 7.494 | 0.624 | 1.067 |
9854 | 365 | 19.2 | 4.8 | 97 | 44.0 | 0.563 | 7.4 | 0.736 | 9.306 | 0.636 | 1.369 |
9855 rows × 11 columns
The dataset contains total 11 columns which represents 10 features and 1 label (ETo). The indices corresponding to different features are represents in df.info() which are required further in this notebook. There are total 9855 rows which reprensts 27 years data from 1995-2021.
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9855 entries, 0 to 9854 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 J 9855 non-null int64 1 Tmax 9855 non-null float64 2 Tmin 9855 non-null float64 3 RHmax 9855 non-null int64 4 RHmin 9855 non-null float64 5 Wind_speed 9855 non-null float64 6 n 9855 non-null float64 7 n/N 9855 non-null float64 8 Rns 9855 non-null float64 9 VPD 9855 non-null float64 10 ETo 9855 non-null float64 dtypes: float64(9), int64(2) memory usage: 847.0 KB
J = Julian days
Tmax = Maximum Temperature in celcius
Tmin = Minimum Temperature in celcius
RHmax = Maximum Relative Humidity in percentage
RHmin = Minimum Relative Humidity in percentage
Wind_speed = Wind Speed in meter per seconds
n = Actual sunshine hours
n/N = Ratio of Actual and Maximum possible sunshine hours
Rns = Net Solar Radiation Mj/$m^2$/day
VPD = Vapour Pressure Deficit in kPa
ETo = Reference Evapotranspiration in mm
In this section, initial data analysis is carried out to analyse how varies features and label varies with time and their minimum, maximum and mean values over the time.
df.describe()
J | Tmax | Tmin | RHmax | RHmin | Wind_speed | n | n/N | Rns | VPD | ETo | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 | 9855.000000 |
mean | 183.000000 | 29.813807 | 17.490939 | 83.573617 | 48.862253 | 1.059630 | 7.438709 | 0.619921 | 13.509078 | 1.449226 | 3.767591 |
std | 105.371375 | 7.419522 | 8.128256 | 15.713658 | 19.592271 | 0.659309 | 3.658271 | 0.298696 | 4.931936 | 1.022751 | 1.991188 |
min | 1.000000 | 5.200000 | -1.600000 | 22.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 3.713000 | 0.000000 | 0.580000 |
25% | 92.000000 | 24.000000 | 10.200000 | 78.000000 | 34.000000 | 0.614000 | 5.100000 | 0.423000 | 9.871000 | 0.728000 | 2.027000 |
50% | 183.000000 | 31.400000 | 18.000000 | 89.000000 | 47.000000 | 0.895000 | 8.400000 | 0.720000 | 13.432000 | 1.187000 | 3.556000 |
75% | 274.000000 | 35.000000 | 25.000000 | 95.000000 | 62.000000 | 1.356000 | 10.200000 | 0.861000 | 17.484500 | 1.814000 | 5.221000 |
max | 365.000000 | 46.600000 | 44.400000 | 100.000000 | 100.000000 | 7.419000 | 20.000000 | 1.566000 | 29.889000 | 5.692000 | 10.447000 |
fig, ax = plt.subplots(5,1, figsize=(14,16), dpi=250)
x = 365 # no. of days in year
# A function is defined to draw a vertical line after 365 points i.e. after a year for better understanding of parameters
for cor in range(27): # no. of years in current dataset
for i in range(5): # for all five subplots
ax[i].axvline(x=x, color='k')
x = x+365
ax[0].set_title('Daily Max and Min temperature from 1995 to 2021')
ax[0].plot(df.index,df['Tmax'])
ax[0].plot(df.index,df['Tmin'])
ax[0].axhline(df['Tmax'].mean(), color = 'navy') # A horizontal line for mean maximum temperature
ax[0].axhline(df['Tmin'].mean(), color = 'orange') # A horizontal line for mean minimum temperature
ax[0].axhline(df['Tmax'].max(), ls = '--') # A horizontal line for uppermost value maximum temperature
ax[0].axhline(df['Tmin'].min(), ls = '--') # A horizontal line for lowermost value minimum temperature
ax[1].set_title('Daily Avg temperature from 1995 to 2021')
ax[1].plot(df.index,df[['Tmax','Tmin']].mean(axis=1), 'r')
#ax[1].plot(df[['Tmax','Tmin']].mean(axis=1).rolling(7).mean(),'pink')
ax[2].set_title('Daily Max and Min RH from 1995 to 2021')
ax[2].plot(df.index,df['RHmax'],'g')
ax[2].plot(df.index,df['RHmin'],'y')
ax[2].axhline(df['RHmax'].mean(), color = 'blue')
ax[2].axhline(df['RHmin'].mean(), color = 'green')
ax[3].set_title('Daily Avg RH from 1995 to 2021')
ax[3].plot(df.index,df[['RHmax','RHmin']].mean(axis=1), 'grey')
#ax[3].plot(df[['RHmax','RHmin']].mean(axis=1).rolling(7).mean(), 'pink')
ax[4].set_title('Daily ET from 1995 to 2021')
ax[4].plot(df.index,df['ETo'])
ax[4].axhline(df['ETo'].mean())
ax[4].axhline(df['ETo'].max(), ls = '--')
ax[4].axhline(df['ETo'].min(), ls = '--')
plt.tight_layout()
plt.figure(figsize=(12,8), dpi=200)
sns.heatmap(df.drop('J',axis =1).corr(), linewidth=0.5, annot=True, cmap='viridis');
sns.pairplot(df.drop('J',axis =1),
kind ="reg", plot_kws={'line_kws':{'color':'tomato'}, 'color':'darkcyan'},
diag_kws={'color':'tomato'}, corner=True)
<seaborn.axisgrid.PairGrid at 0x1dcd6df21f0>
Adaptive boosting aka Adaboost which is a machine learning technique is employed to extract most important features to estimate reasonable accurate value of label which is ETo in our case.
X = df.drop(['J', 'ETo'], axis =1) # All general features used to estimate ETo are seperated and referred to X
y = df['ETo'] # label referred to y
from sklearn.model_selection import train_test_split # Dataset is splited into train (85%) and test (15%) data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=101)
from sklearn.ensemble import AdaBoostRegressor
An instance of Adaboost with one estimater and learning rate of 0.8 is created. Here in this, the base estimater is decision tree with one level.
feature1 = AdaBoostRegressor(n_estimators=1, learning_rate=0.8, random_state=101)
feature1.fit(X_train,y_train)
AdaBoostRegressor(learning_rate=0.8, n_estimators=1, random_state=101)
from sklearn.metrics import r2_score # R-square is used to analyze the performance of model
preds = feature1.predict(X_test)
print('R2 Value:',r2_score(y_test, preds))
R2 Value: 0.8594273982697587
feature1.feature_importances_
array([0.8443255 , 0. , 0. , 0. , 0.02269728, 0. , 0. , 0.13297722, 0. ])
feats = pd.DataFrame(index=X.columns,data=feature1.feature_importances_,columns=['Importance'])
imp_feats = feats[feats['Importance']>0].sort_values("Importance")
plt.figure(figsize=(4,3),dpi=100)
sns.barplot(data=imp_feats,x=imp_feats.index,y='Importance')
<AxesSubplot:ylabel='Importance'>
With one estimater, the maximum temperature turns out to be most important feature to estimate ETo followed by Rns and wind speed. It shows that only with these three features, ETo can be estimated fairly with R-square value of 0.86. Now, lets find out the optimum number of estimaters and corresponding number of features to estimate ETo with even higher accuracy.
error_rates = []
r2 = []
for n in range(1,20):
model = AdaBoostRegressor(n_estimators=n, random_state=33)
model.fit(X_train,y_train)
preds = model.predict(X_test)
err = (np.mean(np.abs((y_test - preds) / y_test)))
r2sc = r2_score(y_test, preds)
error_rates.append(err)
r2.append(r2sc)
fig,ax =plt.subplots(1,2, figsize=(10,6), dpi=100)
ax[0].plot(range(1,20),error_rates)
ax[1].plot(range(1,20),r2)
ax[0].axvline(x=11, color='r', linestyle='--')
ax[1].axvline(x=11, color='r', linestyle='--')
ax[0].set_title('Variation in error with number of estimators')
ax[0].set_xlabel('no. of estimators')
ax[0].set_ylabel('error')
ax[1].set_title('Variation in R-square with number of estimators')
ax[1].set_xlabel('no. of estimators')
ax[1].set_ylabel('R-square')
plt.tight_layout();
From the plots above, It can be depicted that minimum error and compartively high value of R-square can be achieved with 11 number of estimaters.
feature2 = AdaBoostRegressor(n_estimators=11, random_state=33)
feature2.fit(X_train,y_train)
AdaBoostRegressor(n_estimators=11, random_state=33)
pred = feature2.predict(X_test)
print('R2 Value:',r2_score(y_test, pred))
R2 Value: 0.9421388992848636
feature2.feature_importances_
array([0.35246599, 0.01852268, 0.07022322, 0.0026595 , 0.08157826, 0. , 0. , 0.28279395, 0.19175641])
feats2 = pd.DataFrame(index=X.columns,data=feature2.feature_importances_,columns=['Importance'])
imp_feats2 = feats2[feats2['Importance']>0].sort_values("Importance")
plt.figure(figsize=(4,3),dpi=100)
sns.barplot(data=imp_feats2,x=imp_feats2.index,y='Importance')
plt.xticks(rotation=90);
With 11 number of estimaters, R-square more than 0.94 can be achieved with these 7 features where Tmax contributes the most and RHmin least. Furthermore, we will see how top three features relates with ETo.
df.columns
Index(['J', 'Tmax', 'Tmin', 'RHmax', 'RHmin', 'Wind_speed', 'n', 'n/N', 'Rns', 'VPD', 'ETo'], dtype='object')
for i in ['Tmax','Rns','VPD']:
plt.figure(figsize=(12,8), dpi=150)
sns.jointplot(x=i,y='ETo',data=df);
<Figure size 1800x1200 with 0 Axes>
<Figure size 1800x1200 with 0 Axes>
<Figure size 1800x1200 with 0 Axes>
To analyze the trend of features and label with time, Mann-kendall method is adopted to observe increasind, decreasing or no trend in the dataset.
To do this analysis we need dataframe which represent each year separately for each month and feature. so for ease, I created two function to manage data. These function basically converted the dataframe into a three-dimensional array and we can create needful dataframe using numpy sclicing and pandas dataframe function.
annual_data()
: In this function, the required arguments are shape and df. Shape is basically the shape of 3d array which in this notebook is [365, 11, 27] where 365 is number of julian days in a year on axis 0, 11 is features and label on axis 1, and years on axis 3. Therefore a 3d array will be created with days on y-axis, features and label on x-axis, and years on z-axis or depthwise axis ( you can imagine years are stacked one after another).
month_data_avg()
: In this function, monthly average is calculated for each column (features and label) and a 3d array will be defined which takes shape and m as arguments where shape represents years, columns, and months on axis 0, 1, and 2, respectively.
def annual_data(shape=None, df = df):
temp = np.array(df)
annual = np.zeros((shape))
for i in range(shape[2]):
annual[:,:,i] = temp[i*shape[0]:((i+1)*shape[0])]
return annual
shape_annual = [365, 11, (2021-1995+1)] # As 365 julian days on dim[0], # features on dim[1] and # years on dim[2]
annual = annual_data(shape = shape_annual, df = df)
m = [0,31,59,90,120,151,181,212,243,273,304,334,365]
shape_monthly = [27,11,12] # dim0 : # years dim1: # features, dim2: # months
def month_data_avg(sh_mon =None, m = None):
temp = annual_data(shape = shape_annual, df = df)
monthly = np.zeros((sh_mon))
for i in range(sh_mon[0]):
k, l = 0, 1
for j in range(sh_mon[2]):
monthly[i,:,j] = np.average(temp[m[k]:m[l],:,i], axis = 0)
k +=1
l+=1
return monthly
monthly = month_data_avg(shape_monthly,m)
f = monthly[:,10,:] #calling slice of all years(27) for ET (avg) (as 10 index in second dim) for all months
months = ["Jan", "Feb", "March", "April", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"]
df_f = pd.DataFrame(f, index = list(range(1995,2022)), columns = months)
df_f
Jan | Feb | March | April | May | June | July | Aug | Sept | Oct | Nov | Dec | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1995 | 1.497742 | 2.143464 | 3.396290 | 5.337300 | 6.980581 | 7.414033 | 5.099871 | 3.517323 | 4.299800 | 3.401258 | 2.324367 | 1.617935 |
1996 | 1.563968 | 2.141536 | 3.512129 | 5.549233 | 6.801742 | 6.889333 | 5.295129 | 3.611484 | 4.291467 | 3.304258 | 2.353533 | 1.702581 |
1997 | 1.497194 | 2.537179 | 3.565742 | 5.150133 | 7.168677 | 5.891700 | 4.741710 | 4.110839 | 4.520833 | 2.723710 | 1.872300 | 0.983645 |
1998 | 1.546097 | 2.322321 | 3.332742 | 5.549800 | 7.017452 | 6.311700 | 4.819516 | 4.608226 | 3.957567 | 3.050290 | 2.150533 | 1.227355 |
1999 | 1.217806 | 2.310321 | 3.825226 | 5.927833 | 6.870968 | 6.357833 | 5.045548 | 4.616387 | 4.358367 | 3.247903 | 2.276100 | 1.374645 |
2000 | 1.475387 | 2.097036 | 3.790258 | 5.481900 | 6.665968 | 5.941967 | 4.373387 | 5.007710 | 4.386567 | 3.175226 | 1.855967 | 1.452677 |
2001 | 1.403065 | 2.478000 | 3.734387 | 5.234733 | 7.030161 | 5.463800 | 4.321839 | 4.794935 | 4.475767 | 2.887613 | 2.139167 | 1.391968 |
2002 | 1.499355 | 2.273179 | 3.749226 | 5.515867 | 6.910710 | 6.602633 | 5.585710 | 4.757097 | 4.060200 | 3.147774 | 1.971067 | 1.415677 |
2003 | 1.078419 | 2.234750 | 3.416839 | 5.455533 | 6.490097 | 6.140200 | 4.678548 | 4.467258 | 4.211300 | 3.185581 | 1.976533 | 1.377806 |
2004 | 1.251839 | 2.538464 | 4.011258 | 5.785400 | 6.607871 | 6.079933 | 5.732806 | 4.507161 | 4.467367 | 2.928548 | 1.998400 | 1.437161 |
2005 | 1.450903 | 1.924321 | 3.311677 | 5.550100 | 6.610290 | 6.845200 | 4.494806 | 4.971290 | 4.129367 | 3.175774 | 2.152400 | 1.380968 |
2006 | 1.739677 | 2.488321 | 3.597323 | 5.799500 | 7.027839 | 6.029133 | 4.676774 | 4.581097 | 4.003967 | 3.186839 | 1.953533 | 1.430806 |
2007 | 1.617290 | 1.955571 | 3.680903 | 5.786300 | 6.937806 | 6.233467 | 4.931548 | 4.406065 | 3.692767 | 3.232968 | 1.855333 | 1.363419 |
2008 | 1.514806 | 2.219750 | 3.838548 | 5.440600 | 5.894710 | 4.842933 | 4.754581 | 4.353548 | 4.166733 | 3.104355 | 2.019967 | 1.406226 |
2009 | 1.550677 | 2.514929 | 3.638194 | 5.886567 | 7.136065 | 7.076900 | 4.911968 | 4.482323 | 4.136133 | 3.140161 | 1.711367 | 1.310419 |
2010 | 1.145032 | 2.268714 | 3.810806 | 5.574067 | 6.637968 | 6.386200 | 4.053516 | 4.121645 | 3.644833 | 2.611677 | 1.922900 | 1.228548 |
2011 | 1.255613 | 1.973179 | 3.537774 | 5.150667 | 6.590258 | 5.667733 | 4.359000 | 3.941581 | 3.518067 | 3.164194 | 1.816267 | 1.386806 |
2012 | 1.283097 | 2.264071 | 3.590774 | 4.991200 | 6.632419 | 6.991933 | 5.337000 | 3.790742 | 3.975933 | 3.073032 | 1.884733 | 1.460000 |
2013 | 1.348903 | 2.020286 | 3.506968 | 5.135733 | 6.758484 | 6.083067 | 5.062032 | 3.960806 | 4.006767 | 2.482355 | 1.847067 | 1.308935 |
2014 | 1.315484 | 1.843143 | 3.335871 | 5.278733 | 6.376839 | 6.622633 | 5.231161 | 4.784194 | 3.803367 | 2.784419 | 2.092100 | 1.239129 |
2015 | 1.188645 | 2.071250 | 3.273710 | 5.245267 | 6.563032 | 6.364300 | 4.762935 | 4.259968 | 4.331767 | 3.093968 | 1.763933 | 1.366097 |
2016 | 1.126452 | 2.321750 | 3.454452 | 5.230500 | 6.608355 | 6.304333 | 4.480065 | 4.060677 | 3.700300 | 2.635129 | 1.917700 | 1.302387 |
2017 | 1.348258 | 2.293643 | 3.584710 | 5.840500 | 6.461097 | 5.991267 | 5.174258 | 4.423935 | 4.118567 | 2.903323 | 1.513767 | 1.316742 |
2018 | 1.476806 | 2.383071 | 3.793903 | 5.501733 | 6.465613 | 6.015867 | 4.535613 | 4.183742 | 3.864633 | 3.015194 | 1.912333 | 1.260548 |
2019 | 1.379097 | 1.921607 | 3.159355 | 5.150267 | 6.111903 | 7.160633 | 4.312903 | 4.342065 | 3.734133 | 2.758161 | 1.855500 | 1.035161 |
2020 | 1.318032 | 2.301500 | 3.000516 | 4.894567 | 6.206806 | 6.375333 | 5.183903 | 4.372548 | 4.349467 | 3.108161 | 1.807200 | 1.370452 |
2021 | 1.252290 | 2.244071 | 3.749000 | 5.163167 | 5.991548 | 5.799233 | 4.835871 | 4.747968 | 3.488267 | 3.306258 | 1.927867 | 1.208581 |
monthly_mk_test()
is a function to calculate Mann-kendall test for each month and return a dataframe which illustrates trend and other values for each month dor span of 27 years.
def month_mk_test(feature =None):
monthly_avg = month_data_avg(shape_monthly,m)
mon_mk = np.zeros((12,9), dtype='<U12')
for i in range(12):
mon_mk[i,:] = list(mk.original_test(monthly_avg[:,feature,i]))
df_mon_mk = pd.DataFrame(mon_mk, index =months,
columns = ["Trend", "Hypothesis", "p-value", "z-value", "Tau",
"s", "Var_s", "Slope",
"Intercept"])
return df_mon_mk
df_mon_mk_et = month_mk_test(10) #monthly ETo trend
df_mon_mk_et
Trend | Hypothesis | p-value | z-value | Tau | s | Var_s | Slope | Intercept | |
---|---|---|---|---|---|---|---|---|---|
Jan | no trend | False | 0.0729994301 | -1.792834256 | -0.247863247 | -87.0 | 2301.0 | -0.008207885 | 1.4857992831 |
Feb | no trend | False | 0.3812637215 | -0.875570218 | -0.122507122 | -43.0 | 2301.0 | -0.003699248 | 2.3121616541 |
March | no trend | False | 0.2602784927 | -1.125733137 | -0.156695156 | -55.0 | 2301.0 | -0.009354838 | 3.7063225806 |
April | no trend | False | 0.1131106461 | -1.584365157 | -0.219373219 | -77.0 | 2301.0 | -0.014511904 | 5.6441880952 |
May | decreasing | True | 0.0001251402 | -3.835831432 | -0.527065527 | -185.0 | 2301.0 | -0.028338709 | 7.0008225806 |
June | no trend | False | 0.7703961609 | -0.291856739 | -0.042735042 | -15.0 | 2301.0 | -0.004576190 | 6.3638238095 |
July | no trend | False | 0.6464984392 | -0.458632019 | -0.065527065 | -23.0 | 2301.0 | -0.006341397 | 4.9019543010 |
Aug | no trend | False | 0.5047075455 | -0.667101118 | -0.094017094 | -33.0 | 2301.0 | -0.007824596 | 4.5077842741 |
Sept | decreasing | True | 0.0045799907 | -2.835179754 | -0.390313390 | -137.0 | 2301.0 | -0.024231884 | 4.4335811594 |
Oct | no trend | False | 0.0665756440 | -1.834528076 | -0.253561253 | -89.0 | 2301.0 | -0.008916129 | 3.2202645161 |
Nov | decreasing | True | 0.0011454847 | -3.252117953 | -0.447293447 | -157.0 | 2301.0 | -0.014524074 | 2.1117129629 |
Dec | decreasing | True | 0.0097374242 | -2.585016835 | -0.356125356 | -125.0 | 2301.0 | -0.007996204 | 1.4744022770 |
fig, axes = plt.subplots(nrows=1,ncols=2, figsize = (15,8))
fig.suptitle('Average Monthly ET from 1995 to 2021')
axes[0].set_title('Avg. monthly ET of Jan and Feb from 1995 to 2021')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Avg. monthly ET')
axes[0].plot(df_f.index,df_f['Jan'],'b.-',ms=10, label = 'Jan')
axes[0].plot(df_f.index,df_f['Feb'],'y.-',ms=10, label = 'Feb')
axes[0].legend(loc=1)
axes[1].set_title('Avg. monthly ET of Nov and Dec from 1995 to 2021')
axes[1].set_xlabel('Year')
axes[1].set_ylabel('Avg. monthly ET')
axes[1].plot(df_f.index,df_f['Nov'],'g.-',ms=10, label = 'Nov')
axes[1].plot(df_f.index,df_f['Dec'],'r.-',ms=10, label = 'Dec')
axes[1].legend(loc=1);
#axes[1].axline((0, 2.1117129629), slope=-0.014524074, color='C0', label='by slope');
df_mon_mk_tmax = month_mk_test(1) #monthly Tmax trend
df_mon_mk_tmax
Trend | Hypothesis | p-value | z-value | Tau | s | Var_s | Slope | Intercept | |
---|---|---|---|---|---|---|---|---|---|
Jan | no trend | False | 0.7074787276 | -0.375244379 | -0.054131054 | -19.0 | 2301.0 | -0.014940577 | 17.594227504 |
Feb | no trend | False | 0.2972515252 | 1.0423454980 | 0.1452991452 | 51.0 | 2301.0 | 0.0432330827 | 20.923684210 |
March | no trend | False | 0.4784517390 | 0.7087949386 | 0.0997150997 | 35.0 | 2301.0 | 0.0276497695 | 26.808294930 |
April | no trend | False | 0.6168456628 | -0.500325839 | -0.071225071 | -25.0 | 2301.0 | -0.031333333 | 35.250666666 |
May | no trend | False | 0.4784517390 | -0.708794938 | -0.099715099 | -35.0 | 2301.0 | -0.020737327 | 39.353456221 |
June | no trend | False | 0.2783473662 | 1.0840393179 | 0.1509971509 | 53.0 | 2301.0 | 0.0474999999 | 36.929166666 |
July | no trend | False | 0.2972515252 | 1.0423454980 | 0.1452991452 | 51.0 | 2301.0 | 0.0196480938 | 33.751026392 |
Aug | no trend | False | 0.3375799754 | 0.9589578582 | 0.1339031339 | 47.0 | 2301.0 | 0.0176881720 | 33.328118279 |
Sept | no trend | False | 0.2875899350 | -1.063423511 | -0.148148148 | -52.0 | 2300.0 | -0.026904761 | 33.373095238 |
Oct | no trend | False | 0.4784517390 | -0.708794938 | -0.099715099 | -35.0 | 2301.0 | -0.024731182 | 32.173118279 |
Nov | no trend | False | 0.1961811327 | -1.292508417 | -0.179487179 | -63.0 | 2301.0 | -0.032156862 | 27.331372549 |
Dec | no trend | False | 0.6464984392 | -0.458632019 | -0.065527065 | -23.0 | 2301.0 | -0.010215053 | 20.716666666 |
df_Tmax = pd.DataFrame(monthly[:,1,:], index = list(range(1995,2022)), columns = months)
fig, axes = plt.subplots(nrows=1,ncols=2, figsize = (15,8))
fig.suptitle('Average Monthly Tmax from 1995 to 2021')
axes[0].set_title('Avg. monthly Tmax of Jan and Feb from 1995 to 2021')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Avg. monthly Tmax')
axes[0].plot(df_Tmax.index,df_Tmax['Jan'],'b.-',ms=10, label = 'Jan')
axes[0].plot(df_Tmax.index,df_Tmax['Feb'],'y.-',ms=10, label = 'Feb')
axes[0].legend(loc=1)
axes[1].set_title('Avg. monthly Tmax of Nov and Dec from 1995 to 2021')
axes[1].set_xlabel('Year')
axes[1].set_ylabel('Avg. monthly Tmax')
axes[1].plot(df_Tmax.index,df_Tmax['Nov'],'g.-',ms=10, label = 'Nov')
axes[1].plot(df_Tmax.index,df_Tmax['Dec'],'r.-',ms=10, label = 'Dec')
axes[1].legend(loc=1);
#axes[1].axline((0, 2.1117129629), slope=-0.014524074, color='C0', label='by slope');
plt.figure(figsize=(6,4), dpi = 100)
plt.xlabel('Reference Evapotranspiration (ETo)')
for i in months:
sns.kdeplot(df_f[i], label=i)
plt.legend(loc=1);
plt.figure(figsize=(6,4), dpi = 100)
plt.xlabel('Maximum Temperature (Tmax)')
for i in months:
sns.kdeplot(df_Tmax[i], label=i)
plt.legend(loc=1);
df_jTmax = pd.DataFrame(annual[:,1,:], columns=range(1995,2022))
df_jTmin = pd.DataFrame(annual[:,2,:], columns=range(1995,2022))
df_jRns = pd.DataFrame(annual[:,8,:], columns=range(1995,2022))
df_jVPD = pd.DataFrame(annual[:,9,:], columns=range(1995,2022))
df_jETo = pd.DataFrame(annual[:,10,:], columns=range(1995,2022))
df_jRHmax = pd.DataFrame(annual[:,3,:], columns=range(1995,2022))
df_jRHmin = pd.DataFrame(annual[:,4,:], columns=range(1995,2022))
df_jWS = pd.DataFrame(annual[:,5,:], columns=range(1995,2022))
ls=[1995,2000,2005,2010,2015,2020]
plt.figure(figsize=(12,6), dpi = 100)
plt.xlabel('Maximum Temperature (Tmax)')
for i in ls:
sns.kdeplot(df_jTmax[i], label=i)
plt.legend(loc=1);
ls=[1995,2000,2005,2010,2015,2020]
plt.figure(figsize=(12,6), dpi = 100)
plt.xlabel('Reference Evapotranspiration (ETo)')
for i in ls:
sns.kdeplot(df_jETo[i], label=i)
plt.legend(loc=1);
ix = [1,2,3,4,5,8,9,10]
cl = ['Tmax','Tmin','RHmax','RHmin','Wind Speed','Rns','VPD','ETo']
an_mk = np.zeros((len(ix),9), dtype='<U12')
for i in range(len(ix)):
an_mk[i,:] = list(mk.original_test(annual[:,ix[i],:].mean(axis=0)))
df_an_mk = pd.DataFrame(an_mk, index =cl,
columns = ["Trend", "Hypothesis", "p-value", "z-value", "Tau",
"s", "Var_s", "Slope",
"Intercept"])
For some selected parameters, Mann-kendal trend on annual average values from 1995 to 2021 is calculated, and it is found that Tmax, RHmin, and VPD has no trend; Tmin has increasing trend; and RHmax, Rns, wind speed, and ETo has decreasing trend at 5% confidence level.
df_an_mk
Trend | Hypothesis | p-value | z-value | Tau | s | Var_s | Slope | Intercept | |
---|---|---|---|---|---|---|---|---|---|
Tmax | no trend | False | 0.8024613596 | -0.250162919 | -0.037037037 | -13.0 | 2301.0 | -0.004257990 | 29.864121004 |
Tmin | increasing | True | 0.0009883866 | 3.2938117738 | 0.4529914529 | 159.0 | 2301.0 | 0.0421575342 | 16.920719178 |
RHmax | decreasing | True | 0.0097374242 | -2.585016835 | -0.356125356 | -125.0 | 2301.0 | -0.136073059 | 85.157990867 |
RHmin | no trend | False | 0.6464984392 | -0.458632019 | -0.065527065 | -23.0 | 2301.0 | -0.025022831 | 49.023926940 |
Wind Speed | decreasing | True | 0.0370973584 | -2.084690996 | -0.287749287 | -101.0 | 2301.0 | -0.004328584 | 1.1264058447 |
Rns | decreasing | True | 3.6918904438 | -4.628014011 | -0.635327635 | -223.0 | 2301.0 | -0.066842571 | 14.380558904 |
VPD | no trend | False | 0.2430379897 | 1.1674269578 | 0.1623931623 | 57.0 | 2301.0 | 0.0024511896 | 1.4102687815 |
ETo | decreasing | True | 0.0007322794 | -3.377199413 | -0.464387464 | -163.0 | 2301.0 | -0.010368150 | 3.8874626712 |
df_j = [df_jTmax,df_jTmin,df_jRHmax,df_jRHmin,df_jWS,df_jRns,df_jVPD,df_jETo]
color = ['coral','blue','salmon','lawngreen','purple','olive','tomato','darkcyan']
fig, ax = plt.subplots(4,2, figsize=(16,12), dpi=150)
for i in range(len(df_j)):
u =i//2
v = i%2
df_j[i].mean().plot(ax=ax[u][v],color=color[i], lw=2)
ax[u][v].set_title(f'Avg. annual {cl[i]} from 1995 to 2021')
ax[u][v].set_xlabel('Year')
ax[u][v].set_ylabel(f'Avg. annual {cl[i]}')
plt.tight_layout();
#Tmax cluster map
plt.figure(dpi=150, figsize=(4,4))
sns.clustermap(df_jTmax,row_cluster=False)
<seaborn.matrix.ClusterGrid at 0x1dcd853dca0>
<Figure size 600x600 with 0 Axes>
These cluster maps shows how similar are different years in terms of different parameters. The x-axis shows different years with dendrogram at the top. The smaller the length of dendrogram line more is the similarity between particular parameter values whereas y-axis depicts julian days in a year
#ETo cluster map
plt.figure(dpi=150, figsize=(4,4))
sns.clustermap(df_jETo,row_cluster=False)
<seaborn.matrix.ClusterGrid at 0x1dcd90ebcd0>
<Figure size 600x600 with 0 Axes>