Posts

Showing posts from June, 2021

Week 4: Two Step Estimator

Image
      This week I implemented the 2-Step estimator from  this paper  ( Nonparametric Conditional Density Estimation) and compared the results with the 1-Step estimator I had implemented in the previous week from the same paper. Also found out a shortcut method to calculate E(Y|X=x) that is not only more accurate than the existing ProbSpace method, but also much faster. Monday:      Studied the 2-step Estimator method from the Nonparametric Conditional Density Estimation paper. Had a discussion with Roger sir regarding future goals and objectives. Apart from optimizing the RKHS method further using the Dual Tree method , we have to integrate the RKHS method into the ProbSpace module.  Implemented an optimization metric during the RKHS calculation phase that allows us to skip calculations that have very little effect towards our answer. This allowed us to reduce the time for calculations significantly:  “ Or Optimization ”, bound = 3*sigma Average Error: 1.3968811031293926  Max error

Week 3: Conditional Probability Implementation

Image
      Implemented Conditional Probability calculations using RKHS and compared it to the existing ProbSpace method. On the bright side, the RKHS method is more accurate as we go farther from the mean, but it is also very slow in comparision.  Here is a day-wise summary of my progress throughout the week: Monday: Studied paper on Nonparametric Conditional Density Estimation. This paper is much easier to digest compared to the previous one we were working on. Clarified some doubts about the terminology and had a discussion about the formulae with Roger sir during the weekly meeting.  f (y | x) = f(y, x)/ f(x) =  Σ K h2 (x − Xi) * K h1 (y − Yi) /  Σ K h2 (x − Xi) This is the main formula we will be working with to calculate E(y|x) and later extend it to calculate E(Y|X) Tuesday: Implemented the equation and scaling issues aside, the results show promise to be better than the existing method in the Prob.py script. Generated dataset with Y =  X 2 . Shown here is the results for P(y=

Week 2: Working with ProbSpace Module

Image
      This week I studied the existing Prob.py module which deals with the calculation of conditional probabilites, probability distributions and other functionalities. I also worked on understanding the paper dealing with  Hilbert Space Embeddings of Conditional Distributions.  Here is a day-wise summary of my progress throughout the week: Monday: Continued to experiment with rkhsTest2.py script. Implemented a sawtooth kernel and evaluation function. Discussed results of the same with Roger sir in the weekly meeting. As it turns out, it doesn’t matter what kernel function we use, given enough data points and time, we can derive an accurate probability distribution curve. Using the right kernel to fit the dataset helps us to arrive at the desired accuracy of the curve much faster and with far lesser data points. Tuesday: Studied the paper (Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems ) to learn how to extract conditional probability d

Week 1: RKHS Kernel Implementation

Image
      My first long term goal is to improve the existing conditional probability calculation module by making use of RKHS mapping to get rid of the curse of dimensionality problem that has been haunting the traditional method. Here is a day-wise summary of my progress throughout the week: Monday: My Internship officially started on this day, I had a 1 on 1 meeting with Roger Sir regarding this week’s assignment which would be to implement an RKHS kernel for the purpose of predicting a probability distribution function, given just a set of data points ( 1000 in my case) belonging to the said distribution. This would bolster my understanding of RKHS working and will come in use when implementing more complex kernels in the future. Tuesday: Implemented the RKHS kernel and the evaluation function. Dataset for the arbitrary function X = logistic(-2,1) if choice([0,1]) else logistic(2,1) was created using the synthDataGen.py script and used as the basis for the evaluation function. Made

Week 0: An Introduction

Image
What is Causality?      Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state or object (a cause) contributes to the production of another event, process, state or object (an effect) where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. That is the definition according to wikipedia. In simpler terms though, if a variable b is dependant on another variable a , then we can say a   causes   b  or that a   is a causal factor to b . Why Study Causality?      Traditional statistics only deals with pure values of variables and attributes. It does not tell us of the causal relationships between factors in a large dataset. ML algorithms in their current state can be biased, suffer from a relative lack of explainability, and are limited in their ability to generalize the patterns they find in a training data set for multiple applications.        Current machine learning approaches also ten