Alteryx is excited to introduce the release of Featuretools version 1.0! After years of development, dozens of releases, hundreds of closed pull requests, millions of downloads, and numerous production deployments, we believe Featuretools has earned the “1.0” version label.
This release represents not only a significant milestone in the evolution of Featuretools, but also includes a significant update to this popular open-source library for automated feature engineering, making it easier for users to load data and interface with other libraries, such as EvalML.
In this post, we will discuss the improvements that have been made to Featuretools and the benefits users can expect by upgrading to version 1.0.
What is Featuretools?
Featuretools is an open-source Python library for performing automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning. Featuretools can automatically generate many new complex features from existing datasets, saving users time while improving the performance of many machine learning models. If you are new to Featuretools, check out the documentation for details on how to get started using Featuretools in your machine learning projects.
Why should I upgrade?
The most notable change in Featuretools 1.0 is that Woodwork is now used to manage all typing information throughout Featuretools, eliminating the custom type system that was used previously.
Existing Featuretools users should upgrade to take full advantage of the improved data typing capabilities now available. Users will now have improved type inference accuracy and expanded inference capability, which will continue to evolve and improve as Woodwork develops.
For users, this update provides several benefits:
- More column types can be inferred automatically with greater accuracy, requiring less user intervention when creating EntitySets. For example, Woodwork can automatically infer email addresses and IP addresses, types that were not previously inferred by Featuretools.
- With more accurate column typing information, the quality of the features generated by Featuretools will be improved.
- As EvalML also uses Woodwork for managing typing information, users can now learn one typing system and use it across both Featuretools and EvalML.
- The feature matrix that is generated will now also include column typing information, as well as additional data about whether the feature was engineered by Featuretools or whether it was present in the original data – all of which can be used by downstream processes to better utilize the information contained in the feature matrix.
What has changed?
As part of this update, the old Featuretools Entity objects have been replaced with Woodwork dataframes, and the Variable class previously used to identify the column type has been removed.
EntitySets are now created by adding dataframes, with or without Woodwork initialized. If the user supplies a dataframe without Woodwork initialized, Featuretools will initialize Woodwork, kicking off the automatic type inference process. If a Woodwork-initialized dataframe is added to an entityset, the typing information present on the dataframe will be used directly by Featuretools.
Several updates to the Featuretools API have been made in version 1.0. Many methods have been updated, moved or renamed. In addition, some parameter names have been modified to reflect the change from the old Entity objects to the new Woodwork dataframes.
Users who need to override types that are inferred when adding a dataframe to an entityset will now need to specify the new types as valid Woodwork types, instead of specifying the types in terms of a Featuretools variable. While some basic knowledge of Woodwork is beneficial in this regard, Featuretools users are not expected to need an in-depth understanding of Woodwork in order to use Featuretools.
New API Examples
To illustrate some of the changes to the API, we will show a few common operations that users typically perform with Featuretools:
Creating an EntitySet and adding a DataFrame
# Create the EntitySetes = ft.EntitySet(id="my_entityset")# Create a DataFrame and add to the EntitySetdf = pd.read_csv("transactions.csv")es.add_dataframe(dataframe=df, dataframe_name="customers")
Generating a Feature Matrix
feature_matrix, features = ft.dfs(entityset=es, target_dataframe_name="customers")
Viewing Typing Information for a Dataframe
Physical Type Logical Type Semantic Tag(s)Column id int64 Integer ['index']age int64 Integer ['numeric']région_id category Categorical ['foreign_key', 'category']cohort int64 Integer ['foreign_key', 'numeric']loves_ice_cream bool Boolean favorite_quote string NaturalLanguage signup_date datetime64[ns] Datetime ['time_index']upgrade_date datetime64[ns] Datetime cancel_date datetime64[ns] Datetime cancel_reason category Categorical ['category']engagement_level category Ordinal ['category']full_name string PersonFullName email string EmailAddress phone_number string PhoneNumber date_of_birth datetime64[ns] Datetime ['date_of_birth']
Where can I learn more?
Existing users of Featuretools can find much more detail on how to move to version 1.0 in the guide Transitioning to Featuretools 1.0. This guide provides an in-depth review of the API changes, comparing the previous approach to the new approach.
While not necessary, users are encouraged to learn the basics of Woodwork by reviewing Woodwork Typing in Featuretools to gain a basic understanding of how Woodwork typing information is used throughout Featuretools.
Finally, advanced users and developers can find the Featuretools source code on GitHub. User contributions to Featuretools code are always welcome, and any problems can be reported to the development team by submitting a GitHub issue or reaching out on our Slack channel.