We like to think of machine learning as this perfectly intelligent, efficient process. We put data in and quickly get precise insights out. In reality, machine learning is still driven by humans, and it’s an iterative process that can take longer than enterprises would like, providing results that are less than accurate.
This is especially true in the feature engineering stage. When you want to perform analysis on data sets, you need to follow a process to prepare the data. All the tables have to be combined and must include training examples and explanatory variables -- aka, features. This is called the feature matrix, and feature engineering consists of identifying and extracting predictive features from the data -- manually.
Because feature engineering requires domain expertise to help brainstorm ideas, then technical expertise to implement them, it has become a bottleneck in the overall machine learning process. This is why automated feature engineering is exciting. Not only can it speed up your machine learning data analysis projects, but it can make them more accurate.
Seeing automated feature engineering in action is the best way to understand its benefits. Here are three real-world examples of how feature engineering helps organizations improve machine learning initiatives and solve business problems.
Everyone who’s used a credit card has had that fear of an account being breached. Oftentimes, this comes in the form of a purchase being declined because of a suspected fraudulent transaction. This was the problem Spanish bank BBVA set out to solve with data science.
The company’s first attempt, using traditional, manual feature engineering, turned out to be no more accurate than what it had been doing previously. Using Featuretools, our open source framework for automated feature engineering, however, yielded more than 100 historical behavior variables that BBVA hadn’t recognized before. By adding these variables to its model, BBVA achieved a return on investment of €190,000 for every 2 million transactions.
One of the biggest, most time-consuming problems facing consultancies like Accenture is that issues with their software projects are often discovered after the fact. This means going back and performing intense, high-effort post-mortem investigations to figure out what caused the problems. These investigations take valuable time and resources away from the hundreds of other projects the company has going on at any one time.
Accenture decided to use machine learning to identify potential challenges proactively, dubbing the effort “AI project manager.” Through the analysis of historic project data, the company hoped to find whether a problem was likely to occur, weeks in advance, and head it off. Featuretools recommended 40,000 patterns in the feature engineering process, which domain experts were able to whittle down to the most promising 100. Accenture’s AI project manager now predicts red flags 80 percent of the time.
The Holy Grail of retailing is predicting what your customer wants to buy next. This can help with marketing, targeted coupons and all the things that keep customers coming back. This type of prediction should be possible. With every visit to your website, and every use of your products, customers leave valuable information about how they’ll behave in the future.
The key, of course, is organizing and sorting through this data quickly and efficiently. In this case, trying to do feature engineering manually is almost a non-starter. After all, if you have hundreds of customers, making thousands of purchases every day, this data set constantly grows and changes.
We used Featuretools to perform this task on a multi-table dataset with more than 3 million online grocery orders for Instacart. The results? We were able to auto-generate more than 150 features, and narrow them down to the 20 most important, which we then used for modeling. This model can be used again and again, for different time periods and datasets, helping you get one step closer to knowing your customers’ behavior – even before they do.
The biggest problem with machine learning
The above three use cases illustrate a fundamental point about machine learning. The biggest problem with machine learning isn’t that it doesn’t work; it’s that companies struggle to use it effectively.
Automated feature engineering can help organizations make better use of machine learning. Across very different industries and use cases, the benefits are often company-changing: more accurate results in a much shorter time period, while enabling data scientists to do more and free them from the tedious task of generating features manually.