Week 0: An Introduction
What is Causality?
Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state or object (a cause) contributes to the production of another event, process, state or object (an effect) where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. That is the definition according to wikipedia. In simpler terms though, if a variable b is dependant on another variable a, then we can say a causes b or that a is a causal factor to b.
Why Study Causality?
Traditional statistics only deals with pure values of variables and attributes. It does not tell us of the causal relationships between factors in a large dataset. ML algorithms in their current state can be biased, suffer from a relative lack of explainability, and are limited in their ability to generalize the patterns they find in a training data set for multiple applications.
Current machine learning approaches also tend to overfit the data. Indeed, they try to learn the past perfectly, instead of uncovering the real/causal relationships that will continue to hold over time. Because Deep Learning has focused too much on correlation without causation, data won’t answer the question when the problem moves away from very narrow situations.
"Correlation does not imply Causation" is a common phrase seen when describing the pitfalls of regular statistics. In many use cases, correlation is enough. However, causal inference would enable us to go one step further and figure out what would happen if we decide to change some of the underlying assumptions in our model. Understanding cause and effect would also make existing AI systems smarter and more efficient. For instance, think about a robot that understands that dropping things causes them to break and would not need to toss dozens of vases onto the floor to see what happens to them.
What are my objectives?
The objective of this project is to improve the HPCC Causality Bundle (Github Repo). We will be looking at research papers to implement latest methods that produce higher performace causal techniques (such as causal scan, independence testing, conditionalization). These methods will be tested over Synthetic datasets of sample established models. When a sufficient accuracy level is reached, Real World datasets will be put to the test to derive a causal model.
About HPCC Systems
HPCC Systems is an open source solution for big data insights that can be implemented by businesses of all sizes. With HPCC Systems, developers can design applications with Big Data at their core, enabling businesses to better analyze and understand data at scale, improving business time to results and decisions. HPCC Systems offers a consistent data-centric programming language, two processing platforms and a single, complete end-to-end architecture for efficient processing.
Comments
Post a Comment