Predicting an answer requires historical data. The data is then applied to a sophisticated algorithm used to identify a pattern and a sophisticated math formula calculates the probability of a result based on the data provided. Using customer churn as our example (i.e., will they cancel – “yes” or “no”), we’ll walk through how an algorithm works and the process of creating it.
First, you load a set of data that key information that leads to a predicted answer – in this example, “did they cancel.” You identify what data most likely predicts the answer. Data that can cause noise in the calculation, such as name or email address, is removed from the data set and data that might predict cancellation remains – this is typically key demographic or psychographic information (things like age, gender, annual income, activity can all be used to predict cancellation.)
Second, you train the algorithm with your data. You provide it both the data that predicts the answer while also providing the answer (such as, did the customer cancel as expected). This data is randomly split between between a training data set and a test data set. The training data set is used to calibrate the algorithm. The test data set verifies the accuracy of the algorithm.
Third, you test the algorithm to verify the results. This means that you evaluate the results to determine accuracy – for example, how many were false positives and how many were false negatives. A false positive is when you predicted a customer would cancel and they did not. A false negative predicts that they would not cancel but did cancel. An overall accuracy for a ‘good’ prediction is 80-90% accuracy which means the algorithm isn’t going to be perfect and your process needs to support that it might be inaccurate for a fraction of the results.
Fourth, once the algorithm meets a high enough accuracy rate, you roll out the algorithm through an API that can be used to predict the answer and then act accordingly. An example of how this might be used is to extend a special promotion to a customer who is predicted to cancel with a reduced price if they purchase within 24 hours.