Week 4: Two Step Estimator

     This week I implemented the 2-Step estimator from this paper (Nonparametric Conditional Density Estimation) and compared the results with the 1-Step estimator I had implemented in the previous week from the same paper. Also found out a shortcut method to calculate E(Y|X=x) that is not only more accurate than the existing ProbSpace method, but also much faster.

Monday:

    Studied the 2-step Estimator method from the Nonparametric Conditional Density Estimation paper. Had a discussion with Roger sir regarding future goals and objectives. Apart from optimizing the RKHS method further using the Dual Tree method , we have to integrate the RKHS method into the ProbSpace module. 

Implemented an optimization metric during the RKHS calculation phase that allows us to skip calculations that have very little effect towards our answer. This allowed us to reduce the time for calculations significantly: 


  • Or Optimization”, bound = 3*sigma

    • Average Error: 1.3968811031293926 

    • Max error: 5.431449902740937 

    • Time: 153.31915044784546



  • And Optimization”, bound=3*sigma

    • Average Error: 1.561447953266133 

    • Max error: 5.426098406537719 

    • Time: 38.010313987731934



For reference :

  • No Optimization

    • Average Error: 1.3966743172847627 

    • Max error: 5.431449933619589 

    • Time: 719.6718242168427


Tuesday & Wednesday:

Implemented the 2 step Estimator, here is a comparison for 2-Step vs 1-Step methods. (Note, we are using the “and optimization” (i.e, (abs(r1.X[i] - x) <= r1bound and abs(r2.X[i]-y) <= r2bound) condition for filtering).

Method

Average Error

Max Error

Time taken

1 Step Estimator

1.561447953266133

5.426098406537719

27.177362203598022

2 Step Estimator

1.4623417529653346

6.610000000001165

133.49220895767212







As seen, 2-Step Estimator does produce lower average error but at the cost of time and max-error. At this point I tried to calculate the same graphs but with “or optimization” (similar to “and” but with or in the middle), but that just takes too much time incase of the 2-step estimator. So I settled on using “or optimization” for 1-Step and “and optimization” for 2-Step.

Method

Average Error

Max Error

Time taken

1-Step (OR)

1.3968811031293926

5.431449902740937

113.14020037651062

2-Step (AND)

1.4623417529653346

6.610000000001165

131.32312035560608





As seen, 1-Step with OR optimization has 2-Step beat in all regards, all the while taking lesser time to calculate. The saving grace for the 2-Step method seems to be that it has the potential to reach a higher level of accuracy with some modifications to the optimization metric, however the time taken makes it difficult to consider.


Thursday:

Due to a suggestion from Roger sir, we discovered a shortcut method (this was a part of the 2-step estimator process) to calculate E(Y|X=x) without going through the full procedure of calculating P(Y=y|X=x) like we were before. This yielded amazing results, highest accuracy we’ve seen so far plus it’s even faster than the ProbSpace method!


Method

Average Error

Max Error

Time taken

ProbSpace

10.97397258530662

74.3476459734108

7.661393880844116

RKHS Sigma =  0.24

1.2084530592994442

19.74293374241948

2.8335468769073486






Friday:

Started studying the Fast Nonparametric Conditional Density Estimation[2] paper to implement the dual tree optimization technique.

Upcoming Week Plans:

Explore the Dual-Tree approximation from the Fast Nonparametric Conditional Density Estimation paper. 

References:

[1] Hansen, Bruce E. (2004) "Nonparametric Conditional Density Estimation", University of Wisconsin Department of Economics

[2]Holmes, Michael P.; Gray, Alexander G.; Isbell, Charles Lee Jr. (2007) "Fast nonparameteric conditional density estimation", URL: https://arxiv.org/ftp/arxiv/papers/1206/1206.5278.pdf





Comments

Popular posts from this blog

Week 12: Final Week

Week 0: An Introduction

Week 5: Random Features and Multiple Conditionals