This week I tried extending the filter RKHS method to form a filter RFF method. Unfortunately, filter-RFF does not produce the same quality of results as filter-RKHS. Later, I implemented FPROB or Filter-Probability and tested it against JPROB developed by Roger sir and the existing ProbSpace method.

Monday :

Tried to extend the same principle behind Filter-RKHS to form a Filter-RFF. Here is a comparison between Filter RKHS vs Filter RFF vs ProbSpace:

As shown, it does not work very well. Like expected, it is faster than the RKHS variant (shown in table below). But when we are dealing with a small number of datapoints (<1000) anyway after filtration, the RFF method is ineffective. My own conclusion is that it’d be better to use the RKHS method for low data points and the RFF method when we are dealing with massive datasizes and need to speed up calculations.

Conditional	Method	Time Taken	R2	Avg. Error
P(Z\|X,Y=0)	Filter-RFF	0.0034470558166503906	0.2888416990567383	0.502167548799212
	Filter-RKHS	0.20218992233276367	0.9898016016934746	0.0461141819227475
	ProbSpace	9.124860286712646	0.9706109520204094	0.0856167312239470

P(Z\|Y,X=0)	Filter-RFF	0.0042192935943603516	0.5686409577764066	0.4587688842720493
	Filter-RKHS	0.30657529830932617	0.9988389090959336	0.0275002367869781
	ProbSPace	8.993300437927246	0.9961056711545451	0.0403279894732498

Tuesday:

During Monday’s discussion with Roger sir, he demonstrated what he had been working on, mainly dealing with JPROB or Joint Probability module which has the capability to perform multiple variable conditioning. My objective is to add the FPROB (the FIlter-RKHS method) to that script so that we can have a comparison between JPROB vs ProbSpace vs FPROB.

And to that effect, I studied the code behind cprobPlot2D.py, cprobPlot3D.py and cprobEval.py. I ran each of them under different conditions and gained an understanding about the overall functionality.
There is also an updated rkhs module, RKHSmod/rkhsMV.py (rkhs-Multi Variate) which as the name implies deals with multiple conditioning. It deals with a multivariate gaussian kernel unlike the regular gaussian kernel we have been using for 2 variables. However, if we are dealing with just 2 variables this module behaves just like the previous RKHSmod/rkhs.py class we’ve been using. So going forward, for all my rkhs methods, I will be using the newer class.

Wednesday & Thursday:

Added my FPROB method to cprobPlot3D.py (here is my code). Here are the 3-D plots for 2 conditional variables ( of the form P(Z|X,Y)) with 1000 data points, 5 tries:

At low datapoints, both JPROB and FPROB do better than ProbSpace. The R2 calculations and time taken comparisons are attached in the table below. However, this is before JPROB was making use of the shortcut kernel formula to find E(Z|Y=y,X=x) and instead still using the mean over range of P(Z=z|Y=y,X=x) calculations to obtain the expected value. I suspect after the change, JPROB performs better than FPROB.

Method	Average R2	Min. R2	Max. R2	Time taken
dims, datSize, tries = 3, 200, 5
JPROB	0.67469776	0.50049742	0.7789244	0.00839073
ProbSpace	0.58546985	0.5069373	0.6468694	0.00764765
FPROB	0.76805646	0.7547343	0.8045642	0.00888116
dims, datSize, tries = 3, 1000, 5
JPROB	0.8191571	0.75672056	0.8760448	0.008450558
ProbSpace	0.7697043	0.75134827	0.8116737	0.005075072
FPROB	0.8162201	0.78462719	0.8367108	0.00328513

Friday:

Started working on extending FPROB i.e, Suppose we are conditioning on N variables, we might filter N-1 Variables and calculate a single conditional on the remaining variable, or like Roger sir suggested, we could try filtering all N of the variables and try plotting the remaining data points using a univariate kernel. It would be interesting to compare these results. Another interesting metric to find would be what % of variables do we filter and perform JPROB on the rest to obtain the best results.

Upcoming week plans:

Create a new module, UPROB, that has the same functionalities as rkhsMV.py but also encompasses JPROB and FPROB, taking a metric k such that we filter k (or k%) of N variables and perform JPROB on the rest. So at k=0, it is just going to be JPROB and k= N-1 would be the 2D Filter-RKHS method I’m currently using.

Search This Blog

The Causality Project

Week 9: Filter-RFF and FPROB

Monday :

Tuesday:

Wednesday & Thursday:

Friday:

Upcoming week plans:

Comments

Post a Comment

Popular posts from this blog

Week 12: Final Week

Week 10: Implementing UPROB