Week 9: Filter-RFF and FPROB

This week I tried extending the filter RKHS method to form a filter RFF method. Unfortunately, filter-RFF does not produce the same quality of results as filter-RKHS. Later, I implemented FPROB or Filter-Probability and tested it against JPROB developed by Roger sir and the existing ProbSpace method. 


Monday :

Tried to extend the same principle behind Filter-RKHS to form a Filter-RFF. Here is a comparison between Filter RKHS vs Filter RFF vs ProbSpace:



As shown, it does not work very well. Like expected, it is faster than the RKHS variant (shown in table below). But when we are dealing with a small number of datapoints (<1000) anyway after filtration, the RFF method is ineffective. My own conclusion is that it’d be better to use the RKHS method for low data points and the RFF method when we are dealing with massive datasizes and need to speed up calculations.


Conditional

Method

Time Taken

R2

Avg. Error



P(Z|X,Y=0)

Filter-RFF

0.0034470558166503906

0.2888416990567383

0.502167548799212

Filter-RKHS

0.20218992233276367

0.9898016016934746

0.0461141819227475

ProbSpace

9.124860286712646     

0.9706109520204094

0.0856167312239470




P(Z|Y,X=0)

Filter-RFF

0.0042192935943603516

0.5686409577764066

0.4587688842720493

Filter-RKHS

0.30657529830932617

0.9988389090959336

0.0275002367869781

ProbSPace

8.993300437927246

0.9961056711545451

0.0403279894732498




Tuesday:

During Monday’s discussion with Roger sir, he demonstrated what he had been working on, mainly dealing with JPROB or Joint Probability module which has the capability to perform multiple variable conditioning. My objective is to add the FPROB (the FIlter-RKHS method) to that script so that we can have a comparison between JPROB vs ProbSpace vs FPROB.

And to that effect, I studied the code behind cprobPlot2D.py, cprobPlot3D.py and cprobEval.py. I ran each of them under different conditions and gained an understanding about the overall functionality.
There is also an updated rkhs module, RKHSmod/rkhsMV.py (rkhs-Multi Variate) which as the name implies deals with multiple conditioning. It deals with a multivariate gaussian kernel unlike the regular gaussian kernel we have been using for 2 variables. However, if we are dealing with just 2 variables this module behaves just like the previous RKHSmod/rkhs.py class we’ve been using. So going forward, for all my rkhs methods, I will be using the newer class.

Wednesday & Thursday:

Added my FPROB method to cprobPlot3D.py (here is my code). Here are the 3-D plots for 2 conditional variables ( of the form P(Z|X,Y)) with 1000 data points, 5 tries:


At low datapoints, both JPROB and FPROB do better than ProbSpace. The R2 calculations and time taken comparisons are attached in the table below. However, this is  before JPROB was making use of the shortcut kernel formula to find E(Z|Y=y,X=x) and instead still using the mean over range of P(Z=z|Y=y,X=x)  calculations to obtain the expected value. I suspect after the change, JPROB performs better than FPROB.


Method

Average R2

Min. R2

Max. R2

Time taken

dims, datSize, tries =  3, 200, 5

JPROB

0.67469776

0.50049742

0.7789244

0.00839073

ProbSpace

0.58546985

0.5069373

0.6468694

0.00764765

FPROB

0.76805646

0.7547343

0.8045642

0.00888116

dims, datSize, tries =  3, 1000, 5

JPROB

0.8191571

0.75672056

0.8760448

0.008450558 

ProbSpace

0.7697043

0.75134827

0.8116737

0.005075072

FPROB

0.8162201

0.78462719

0.8367108

0.00328513


Friday:

Started working on extending FPROB i.e, Suppose we are conditioning on N variables, we might filter N-1 Variables  and calculate a single conditional on the remaining variable, or like Roger sir suggested, we could try filtering all N of the variables and try plotting the remaining data points using a univariate kernel. It would be interesting to compare these results. Another interesting metric to find would be what % of variables do we filter and perform JPROB on the rest to obtain the best results.


Upcoming week plans:

Create a new module, UPROB, that has the same functionalities as rkhsMV.py but also encompasses JPROB and FPROB, taking a metric k such that we filter k (or k%) of N variables and perform JPROB on the rest. So at k=0, it is just going to be JPROB and k= N-1 would be the 2D Filter-RKHS method I’m currently using. 







Comments

Popular posts from this blog

Week 10: Implementing UPROB

Week 1: RKHS Kernel Implementation

Week 12: Final Week