Business objective driven data mining

A typical data mining project seeks to answer questions such as “which prospects will respond to offer X?” “which credit applicants will repay their loans?” or “who will defect to the competition?” on the basis of databases with examples of previous interactions. These questions define classification problems: find a model that discriminates between cases of either type (e.g., between prospects that respond and those that do not).

While this seems a perfectly reasonable translation of a business objective, often the resulting models do not perform optimally. In fact, the true business objective often is not one of simply discriminating between two types of behaviour, but of identifying the top 10% most likely responders, or selecting as much loan applications as possible up to a certain exposure of capital. Such objectives require more finegrained models, particularly when the problem owner wants the ability to analyse the effects of varying rates of selection for a particular model.

For this reason, data mining practitioners turn to methods such as ROC curves to assess their models.

Many machine learning models, from decision trees to support vector machines (SVMs), can in principle deliver the kind of graded results this requires. However, the methods used to develop such models focus on discriminating as well as possible in general, not on particular business objectives such as cherry picking or a smooth gradient in a particular part of the ROC curve.

This project aims to develop optimisation methods for machine learning approaches such as SVMs that do allow for this. Evolutionary Algorithms, for instance, are general purpose optimisers that offer a promising avenue of attack, but tailormade methods that take the underlying mathematics into account will equally be considered.

For a further description of the project, please visit