Posts

Showing posts from July, 2021

Week 7: Getting started with Filter-RKHS

Image
 This week I revisited the RKHS method to confirm that we are on the right track and got started with implementation of the Filtered-RKHS method. Here is a day-wise summary of my progress throughout the week: Monday & Tuesday: Analysed the RFF code to reduce the time taken for calculations. Implemented an RFF calculation method according to the formula: As described in this article where z(x) T z(y) is the equivalent calculation to k(x,y), the basic kernel function. The kernel techniques we employ use an iterative approach instead being performed through matrix operations. As a result of this, it involves calling the k(x,y) function repeatedly over the length of the dataset to produce results. I could not figure out how replacing the RFF calculations instead of RKHS kernel calculations would speed up the calculations. If anything, it is going to be slower because while the k(x,y) function is a straightforward calculation, the RFF method iterates over R random selected features

Week 6: Implementation of RFF

Image
 This week I implemented the RFF method and compared results with the RKHS and ProbSpace methods. Here is a day-wise summary of my progress throughout the week: Monday: Discussed some possible methods of multiple variable conditioning with Roger sir during our weekly meets. It is surprisingly hard to find papers on the topic and after many attempts to find one that could help what we are trying to achieve and failing, it seems like we will have to implement this feature by ourselves. We can take inspiration from how ProbSpace does it and extend it to kernel methods if possible, else we can try a hybrid of the two.            Roger sir was also looking into using RKHS to rebuild a probability distribution curve from the filtered data points(the Probspace method of doing conditional probabilities is essentially filtering the data repeatedly on each conditional variable value). This could help us solve the dwindling data points problem that occurs upon repeated filtering. I studied some

Week 5: Random Features and Multiple Conditionals

Image
       This week I've digested multiple papers and they are all not necessarily related but I believe it was still required to gain understanding on how to move forward with the project. Here is a day-wise summary of my progress throughout the week: Monday: Studied the Fast Nonparametric Conditional Density Estimation paper . The paper details a method to calculate the ideal bandwidths for the Kernel methods(sigma in case of the Gaussian Kernel). They have made use of Dual-tree recursion to reduce the time complexity of the operation to O(n2). The previous best method, calculating integrated squared errors(ISE) was of O(n3) so this method saves a lot of time without compromising accuracy when dealing with large datasets. However, calculating bandwidths is not a priority right now, so after discussion with Roger sir, I have shifted my attention towards extending our kernel technique to condition on multiple variables. Tuesday:      Revisited the Nonparametric Conditional Density E