As a large number of organizations have either started or have been using ML models to solve their business problems the metadata associated with the ML lifecycle has grown exponentially. As the work around modeling, experimentation and data engineering grows, providing governance and intelligence capabilities is essential for long term management and oversight.
A typical ML lifecycle would include the following:
- A business use-case would be associated with an ML project
- The data scientist would create experiments associated with the project
- The experiment would involve multiple runs having different input data and using multiple models
- A model registry would have all the models created along with associated metadata such as tags and comments
- A feature store with all the feature data used along with versioning and parts
- Each run would generate artifacts and model evaluations
As expected, the metadata associated with these models would grow and the relationships between data and models would be very difficult to manage. Add to that the user management associated with permissions, ownership and access control just creates another layer of complexity that needs to be managed.
MLNinja creates a top down operational model to deal with most ML lifecycles. The fully extensible model creates new data assets and relationships allowing full data lineage with total transparency. This allows governing and tracking all ML changes over an extended period of time with historical audit capability.
The main features of MLNinja are:
- New and extensible operating model with appropriate data assets for an ML lifecycle
- Pre defined Relationship between ML assets allowing complete data lineage and tracing
- Ability to use workflows for model approval and final push to production
- Ability to track data sets and feature management and how they relate to model effectiveness
- Maintains all metadata associated with a model registry
- Ingest metadata from various external ML libraries and ability to create ingestors as required
- Help create well defined processes around ML data lineage including external data sources