I created Deep Feature Synthesis two years ago while I was a student at MIT. My intention from the very beginning was to one day share that technology with the world. That day has finally come, and Featuretools is now available for anyone to use for free.
Open-sourcing Featuretools will help fill a gap in the ecosystem for building end-to-end machine learning systems. Great tools such as Pandas enable data prep and ad hoc feature engineering, and tools such as scikit-learn allow for machine learning. But until now, there has been no structured process for converting raw data into machine-learning-ready data.
Even though the importance of feature engineering has been acknowledged for years, Featuretools is actually the first release of an open source library for performing automatic feature engineering for relational and transactional datasets. With this release, we are not only making it easier than ever for newcomers to learn machine learning, but also increasing the productivity of data scientists ten-fold.
Open-sourcing Featuretools has been a long time in the making. At first, the code base needed more time to mature. Over time, priorities shifted to establishing and growing Feature Labs. However, with successful deployments and more than two years of testing, we are ready to share our work with the community.
The code is now available under the 3-Clause BSD License on Github. In this initial release, you will find
- Deep Feature Synthesis – an automated feature engineering algorithm
- Feature Primitives – reusable feature engineering functions
- Entity Sets – abstractions for representing structured dataset
Featuretools has been a labor of love for everyone here, at Feature Labs, because it combines our decades of experience as data scientists. We’re all excited because this is just the beginning of drastically improving the process of feature engineering in order to create better machine learning models.
And we can’t wait to see what’s in store for the future!