Indico, a provider of Enterprise AI solutions for intelligent process automation, today announced the launch of a new open source project focused on enhancing the performance of machine learning for natural language processing. Named Finetune, the project offers users a single, general-purpose language model which can be easily tuned to solve a variety of different tasks involved in text and document-based workflows.
‘Finetuning’ is a specific type of transfer learning designed to take a model trained on one task and adapt it to solve a different, but related task. Users can make small modifications to repurpose an existing model to effectively solve a new, related problem, saving substantial time and effort, while also improving accuracy.
“Most organizations have natural language processing problems, but few have the labeled data they need to solve them with machine learning,” said Madison May, Indico machine learning architect and cofounder. “Finetune lets them do more with less labeled training data. And it only requires a base level of IT experience.”
The Finetune project extends original research and development work completed by OpenAI to address a wider range of problems. OpenAI’s base project provides an illustrative model for increasing the accuracy and performance of machine learning models with natural language content and includes general capabilities for document classification, comparison, and multiple-choice question answering. The Finetune library packages that capability up for easier use and supports additional tasks such as document annotation, regression, and multi-label classification.
Indico delivers Finetune in a format that mimics a popular open source repository – scikit-learn – and documents it so users are able to write as little as five lines of code (vs. 200) to try out OpenAI’s research on their own data problems. The models in Finetune have also demonstrated statistically higher performance as users add more labeled training data vs. traditional natural language processing approaches. Finetune outperforms these methods with only a hundred labels, and that gap continues to widen as available training data increase.
The Indico team is conducting empirical research to evaluate how the models behave on different datasets and machine learning tasks. The company also plans to incorporate Finetune into its commercial product to address specific customer use cases.
In June, Indico launched a related open source project named Enso, a library of standard interfaces and tools to streamline the benchmarking of embedding and transfer learning methods for a wide variety of natural language processing tasks. Enso was used to benchmark the improved performance achieved with Finetune before the project was launched.
“We have a vested interest in promoting the advantages of transfer learning and giving back to the open source community is a really productive way for us to do that,” said Slater Victoroff, co-founder and CTO of Indico. “I also want to acknowledge the important research and development work done by the team at OpenAI and Alec Radford. They are driving huge innovations in machine learning that really help accelerate the progress of companies like Indico.”