About
Join Us
Press  |  Investors
Contact  |  Content

Machine Learning at the Rubicon Project

Team member, Mohammad Sabah, discusses Machine Learning

Team member, Mohammad Sabah, discusses Machine Learning

Machine Learning is the branch of Computer Science, specifically Artificial Intelligence (AI) that studies systems that learn viz. systems that improve their performance with experience. It is about teaching a computer about the world – you observe the world, develop models that match observations, teach a computer to learn these models and finally the computer applies the learned model to the real world. Applications of Machine Learning (ML) are vast and diverse. Some examples include image recognition, fingerprint identification, weather prediction, medical diagnosis, game playing, text categorization, handwriting recognition, fraud detection, spam filtering, recommended articles/books/movies, solving calculus problems, driving a car … the list is endless. With the advent of the internet and internet advertising, ML is used to solve a whole slew of optimization problems to help improve the user, advertiser and publisher experience. ML draws techniques and algorithms from a diverse range of fields, which include information theory, numerical optimization, control theory, natural language processing, neurobiology, computational complexity theory and linguistics.

HISTORY OF MACHINE LEARNING
The intellectual roots of ML (or more generally AI) go back a long time, with some concept of intelligent machines being found even in Greek mythology. However, it was the advent of computers after World War II that provided the real thrust and theoretical underpinnings to this rich field. I have listed below some pioneering events to help us get a good picture of how the field has evolved.
40’s: This decade marked the advent of computers and the foundation of Formal Decision-making theory being laid down by VonNeuman and Morgerstern.
50’s: John McCarthy coined the term ‘Artificial Intelligence in 1956. Also, Arthur Samuel came up with the first game-playing program for checkers.
60’s: Pattern recognition became the prime emphasis in AI. In particular, instance-based methods a.k.a. “This document has the same label as the most similar document”, became popular. The most important event, however, was the perceptron or single-layer Neural Network – Marvin Minsky and Seymour Papert proved some nice properties and limits of the perceptron.
70’s: This decade saw the advent of expert systems (adhoc rule-based systems) and decision trees (“I can decide a document by incrementally considering its properties”).
80’s: Advanced decision tree and rule learning methods were invented. The biggest hype was around Artificial (multi-layer) Neural Networks (ANN), supposed to be loosely based on neural networks in the brain. ANN loosely defined is a method to extract linear combinations of input and output non-linear function of these combinations. Also, the focus in ML shifted to experimental methodology.
90’s: This marked major advances in all areas of ML. New techniques became popular like Reinforcement Learning (RL), Inductive Logic Programming (ILP), and Bayesian Networks (BN). A new set of meta-algorithms or ensembles were developed that combine the results of multiple less-accurate models to output a more accurate prediction – boosting, bagging and stacking becoming the popular methods. With the advent of the World Wide Web, text learning became the primary focus for ML.
2000’s: The pace of innovations and applications accelerated. Kernels and Support Vector Machines (SVM) became state-of-art methods to build highly accurate classifiers, regressers and rankers. Graphical methods began to be used more. Applications of ML now extended Transfer Learning, Sequence Labeling, Robotics and Computer Systems (debuggers, compilers). In a nutshell, this marked the solidifying of ML into an established science with a firm theoretical underpinning, and opening it up to unexplored areas.

DATA FLOW IN MACHINE LEARNING
As a machine learner, your goal is to use the past data to make predictions or summary of the data. But in order to do that effectively, the data has to pass through a number of stages. Here I have listed down some high-level steps that most ML-based solutions take in order to achieve the desired outputs.
1) Data Preparation: This marks the beginning of the data flow. Raw data is collected from the domains of interest and ‘cleaned’ according to the problem at hand. A few questions need to be answered at this before proceeding:
a. Data-specific: Is the data lawfully in our disposal? Are there issues with user privacy?
b. Learner-specific: How do I de-noise the data? How do I scale the individual values?
c. Task-specific: How do I reconstruct hidden signals from observed ones?
2) Exploratory Data Analysis: This is carried out to learn about the data distribution, validate assumptions and formulate hypotheses. Examples of graphical techniques are box plots, histograms, scatter plots and Multi-Dimensional Scaling.
3) Feature Engineering: This is concerned with decomposing the observed signals into individual variables that are pertinent to the problem at hand. For example, in bio-informatics, you may want to decompose mass spectrometry signals into variables that are predictive for the detection of cancer, and that can be traced back to certain proteins.
4) Training: This is the most complicated step of ML. It involves coming up with the optimization objective for the problem at hand, and solving it using exact or approximate methods to attain the desired objective. The training assumes that there is a loss function that is minimized and measures the system performance. Based on the nature of the output, this is divided into two kinds:
a. Supervised: The goal is to learn patterns to simulate given output. In classification, the output is a categorical label and loss function is accuracy of prediction (e.g. spam vs. non-spam email). In regression, the output is a real number and loss function is Mean Square Error (e.g. predict tomorrow’s stock price).
b. Unsupervised: The goal is to look for patterns in the data with no examples of output. For example, find interesting patterns in the data or find outliers.
5) Evaluation: This measures the performance of the system and provides an estimate of how the system will perform when deployed in the real world. Some example metrics are accuracy, sensitivity, specificity and Mean Squared Error.
6) Deployment: This uses the trained model to make predictions in the real world. Performance in the real world can be optionally fed back into the original data flow to tune the model parameters and improve overall performance.

HOW ML IS USED AT RUBICON PROJECT
the Rubicon Project provides an excellent opportunity to apply ML. One, there is an enormous amount of rich data to analyze. Two, there are some really challenging problems around prediction and forecasting that need to be solved. I will just mention two such high-level areas to illustrate – but in reality this list is long and ever-expanding. One area is the prediction of performance of advertiser campaigns, or click-through rate (CTR) prediction. The target/dependent variable is the performance of the advertisement given the features/independent variables like location of the page, size of the advertisement, content of the advertisement. This is critical component that is required to set the right expectations around pricing and yield for advertiser campaigns. Another challenging area is to infer the demographics and interests of the user to improve the efficacy of targeting him/her with the right advertisements. For example, merely knowing the age and gender of the person goes a long way to optimize the performance of the ad campaign and improve the user experience. On top of that, if you can infer some interests of the user e.g. “auto enthusiast”, “in the market for Toyota Camry 2007”, it gives an enormous insight into the prospective buy interest of the user and helps drive both user experience and ROI for the advertisers.

POPULAR EXAMPLES OF REAL WORLD MACHINE LEARNING
ML in popular culture has come a long way from the time that Arthur Samuel came up with the first game-playing program for checkers in the 50’s. Here are three popular examples:
1. IBM made history in the 90’s when Deep Blue (a chess-playing computer) beat the then World Champion (Garry Kasparov).
2. In 2005, Stanford’s autonomous vehicle (Stanley) won the DARPA Grand Challenge race for driver-less cars, passing through three narrow tunnels and navigating more than 100 sharp left and right turns.
3. More recently, in February 2011, IBM’s jeopardy-playing computer (Watson) defeated two of Jeopardy’s greatest (human) players handsomely, and proved yet another example of how smart machines can be trained to perform seemingly impossible tasks.

FURTHER READING
Please feel free to contact me if you have any questions or comments. Here are 2 excellent resources to get you started on ML:
1. The Elements of Statistical Learning (Trevor Hastie, Robert Tibshirani, Jerome Friedman).
2. Machine Learning (Tom Mitchell)