Quantcast
Channel: Freakonometrics » ACT2040
Viewing all articles
Browse latest Browse all 57

Visualizing overdispersion (with trees)

$
0
0

This week, we started to discuss overdispersion when modeling claims frequency. In my previous post, I discussed computations of empirical variances with different exposure. But I did use only one factor to compute classes. Of course, it is possible to use much more factors. For instance, using cartesian products of factors,

> X=as.factor(paste(sinistres$carburant,sinistres$zone,
+ cut(sinistres$ageconducteur,breaks=c(17,24,40,65,101))))
> E=sinistres$exposition
> Y=sinistres$nbre
> vm=vv=ve=rep(NA,length(levels(X)))
>   for(i in 1:length(levels(X))){
+  	   ve[i]=Ei=E[X==levels(X)[i]]
+  	   Yi=Y[X==levels(X)[i]]
+   vm[i]=meani=weighted.mean(Yi/Ei,Ei)    # moyenne 
+   vv[i]=variancei=sum((Yi-meani*Ei)^2)/sum(Ei)    # variance
+  cat("Class ",levels(X)[i],"average =",meani," variance =",variancei,"\n")
+ }
Class D A (17,24]  average = 0.06274415  variance = 0.06174966 
Class D A (24,40]  average = 0.07271905  variance = 0.07675049 
Class D A (40,65]  average = 0.05432262  variance = 0.06556844 
Class D A (65,101] average = 0.03026999  variance = 0.02960885 
Class D B (17,24]  average = 0.2383109   variance = 0.2442396 
Class D B (24,40]  average = 0.06662015  variance = 0.07121064 
Class D B (40,65]  average = 0.05551854  variance = 0.05543831 
Class D B (65,101] average = 0.0556386   variance = 0.0540786 
Class D C (17,24]  average = 0.1524552   variance = 0.1592623 
Class D C (24,40]  average = 0.0795852   variance = 0.09091435 
Class D C (40,65]  average = 0.07554481  variance = 0.08263404 
Class D C (65,101] average = 0.06936605  variance = 0.06684982 
Class D D (17,24]  average = 0.1584052   variance = 0.1552583 
Class D D (24,40]  average = 0.1079038   variance = 0.121747 
Class D D (40,65]  average = 0.06989518  variance = 0.07780811 
Class D D (65,101] average = 0.0470501   variance = 0.04575461 
Class D E (17,24]  average = 0.2007164   variance = 0.2647663 
Class D E (24,40]  average = 0.1121569   variance = 0.1172205 
Class D E (40,65]  average = 0.106563    variance = 0.1068348 
Class D E (65,101] average = 0.1572701   variance = 0.2126338 
Class D F (17,24]  average = 0.2314815   variance = 0.1616788 
Class D F (24,40]  average = 0.1690485   variance = 0.1443094 
Class D F (40,65]  average = 0.08496827  variance = 0.07914423 
Class D F (65,101] average = 0.1547769   variance = 0.1442915 
Class E A (17,24]  average = 0.1275345   variance = 0.1171678 
Class E A (24,40]  average = 0.04523504  variance = 0.04741449 
Class E A (40,65]  average = 0.05402834  variance = 0.05427582 
Class E A (65,101] average = 0.04176129  variance = 0.04539265 
Class E B (17,24]  average = 0.1114712   variance = 0.1059153 
Class E B (24,40]  average = 0.04211314  variance = 0.04068724 
Class E B (40,65]  average = 0.04987117  variance = 0.05096601 
Class E B (65,101] average = 0.03123003  variance = 0.03041192 
Class E C (17,24]  average = 0.1256302   variance = 0.1310862 
Class E C (24,40]  average = 0.05118006  variance = 0.05122782 
Class E C (40,65]  average = 0.05394576  variance = 0.05594004 
Class E C (65,101] average = 0.04570239  variance = 0.04422991 
Class E D (17,24]  average = 0.1777142   variance = 0.1917696 
Class E D (24,40]  average = 0.06293331  variance = 0.06738658 
Class E D (40,65]  average = 0.08532688  variance = 0.2378571 
Class E D (65,101] average = 0.05442916  variance = 0.05724951 
Class E E (17,24]  average = 0.1826558   variance = 0.2085505 
Class E E (24,40]  average = 0.07804062  variance = 0.09637156 
Class E E (40,65]  average = 0.08191469  variance = 0.08791804 
Class E E (65,101] average = 0.1017367   variance = 0.1141004 
Class E F (17,24]  average = 0           variance = 0 
Class E F (24,40]  average = 0.07731177  variance = 0.07415932 
Class E F (40,65]  average = 0.1081142   variance = 0.1074324 
Class E F (65,101] average = 0.09071118  variance = 0.1170159

Again, one can plot the variance against the average,

> plot(vm,vv,cex=sqrt(ve),col="grey",pch=19,
+ xlab="Empirical average",ylab="Empirical variance")
> points(vm,vv,cex=sqrt(ve))
> abline(a=0,b=1,lty=2)

http://f.hypotheses.org/wp-content/blogs.dir/253/files/2013/02/Capture-d%E2%80%99e%CC%81cran-2013-02-13-a%CC%80-13.58.26.png

An alternative is to use a tree. The tree can be obtained from another variable (the insured had, or had not, a claim, during the period considered) but it should be rather close to the one we would like to model (the number of claims over the period considered). Here, I did use the whole database (with more that 600,000 lines)

> library(tree)
> T=tree((nombre>0)~as.factor(zone)+as.factor(puissance)+
+ as.factor(marque)+as.factor(carburant)+as.factor(region)+
+ agevehicule+ageconducteur,data=baseFREQ,
+ split =  "gini",minsize =25000)

The tree is the following

> plot(T)
> text(T)

http://f.hypotheses.org/wp-content/blogs.dir/253/files/2013/02/Capture-d%E2%80%99e%CC%81cran-2013-02-13-a%CC%80-13.55.13.png

Now, each knot defines a class, and it is possible to use it to define a class. Which is supposed to be homogeneous.

> X=as.factor(T$where)
> E=sinistres$exposition
> Y=sinistres$nbre
> vm=vv=ve=rep(NA,length(levels(X)))
>   for(i in 1:length(levels(X))){
+  	   ve[i]=Ei=E[X==levels(X)[i]]
+  	   Yi=Y[X==levels(X)[i]]
+   vm[i]=meani=weighted.mean(Yi/Ei,Ei)    # moyenne 
+   vv[i]=variancei=sum((Yi-meani*Ei)^2)/sum(Ei)    # variance
+  cat("Class ",levels(X)[i],"average =",meani," variance =",variancei,"\n")
+  }
Class  6 average =   0.04010406  variance = 0.04424163 
Class  8 average =   0.05191127  variance = 0.05948133 
Class  9 average =   0.07442635  variance = 0.08694552 
Class  10 average =  0.4143646   variance = 0.4494002 
Class  11 average =  0.1917445   variance = 0.1744355 
Class  15 average =  0.04754595  variance = 0.05389675 
Class  20 average =  0.08129577  variance = 0.0906322 
Class  22 average =  0.05813419  variance = 0.07089811 
Class  23 average =  0.06123807  variance = 0.07010473 
Class  24 average =  0.06707301  variance = 0.07270995 
Class  25 average =  0.3164557   variance = 0.2026906 
Class  26 average =  0.08705041  variance = 0.108456 
Class  27 average =  0.06705214  variance = 0.07174673 
Class  30 average =  0.05292652  variance = 0.06127301 
Class  31 average =  0.07195285  variance = 0.08620593 
Class  32 average =  0.08133722  variance = 0.08960552 
Class  34 average =  0.1831559   variance = 0.2010849 
Class  39 average =  0.06173885  variance = 0.06573939 
Class  41 average =  0.07089419  variance = 0.07102932 
Class  44 average =  0.09426152  variance = 0.1032255 
Class  47 average =  0.03641669  variance = 0.03869702 
Class  49 average =  0.0506601   variance = 0.05089276 
Class  50 average =  0.06373107  variance = 0.06536792 
Class  51 average =  0.06762947  variance = 0.06926191 
Class  56 average =  0.06771764  variance = 0.07122379 
Class  57 average =  0.04949142  variance = 0.05086885 
Class  58 average =  0.2459016   variance = 0.2451116 
Class  59 average =  0.05996851  variance = 0.0615773 
Class  61 average =  0.07458053  variance = 0.0818608 
Class  63 average =  0.06203737  variance = 0.06249892 
Class  64 average =  0.07321618  variance = 0.07603106 
Class  66 average =  0.07332127  variance = 0.07262425 
Class  68 average =  0.07478147  variance = 0.07884597 
Class  70 average =  0.06566728  variance = 0.06749411 
Class  71 average =  0.09159605  variance = 0.09434413 
Class  75 average =  0.03228927  variance = 0.03403198 
Class  76 average =  0.04630848  variance = 0.04861813 
Class  78 average =  0.05342351  variance = 0.05626653 
Class  79 average =  0.05778622  variance = 0.05987139 
Class  80 average =  0.0374993   variance = 0.0385351 
Class  83 average =  0.06721729  variance = 0.07295168 
Class  86 average =  0.09888492  variance = 0.1131409 
Class  87 average =  0.1019186   variance = 0.2051122 
Class  88 average =  0.05281703  variance = 0.0635244 
Class  91 average =  0.08332136  variance = 0.09067632 
Class  96 average =  0.07682093  variance = 0.08144446 
Class  97 average =  0.0792268   variance = 0.08092019 
Class  99 average =  0.1019089   variance = 0.1072126 
Class  100 average = 0.1018262   variance = 0.1081117 
Class  101 average = 0.1106647   variance = 0.1151819 
Class  103 average = 0.08147644  variance = 0.08411685 
Class  104 average = 0.06456508  variance = 0.06801061 
Class  107 average = 0.1197225   variance = 0.1250056 
Class  108 average = 0.0924619   variance = 0.09845582 
Class  109 average = 0.1198932   variance = 0.1209162

Here, when ploting the empirical variance (per knot) against the empirial average of claims, we get

http://f.hypotheses.org/wp-content/blogs.dir/253/files/2013/02/Capture-d%E2%80%99e%CC%81cran-2013-02-13-a%CC%80-14.05.08.png

Here, we can identify classes where remaining heterogeneity.

Arthur Charpentier

Arthur Charpentier, professor at UQaM in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.

More Posts - Website

Follow Me:
TwitterLinkedInGoogle Plus


Viewing all articles
Browse latest Browse all 57