
I recently had a discussion on this topic with Amith Nair, global vice president and general manager of AI service delivery for TELUS Digital, one of the leading, global providers of AI infrastructure and services. Nair reaffirmed the importance of data: “Data is the core of everything that happens in AI, for all foundational model makers and anyone who’s building data applications for AI.”
“When it comes to AI, we can think about it like a layer cake,” Nair said with regard to infrastructure and the impact on data. “At the bottom there is a computational layer, such as the NVIDIA GPUs, anyone who provides the infrastructure for running AI. The next few layers are software-oriented, but also impacts infrastructure as well. Then there’s security and the data that feeds the models and those that feeds the applications. And on top of that, there’s the operational layer, which is how you enable data operations for AI. Data being so foundational means that whoever works with that layer is essentially holding the keys to the AI asset, so, it’s imperative that anything you do around data has to have a level of trust and data neutrality.”
Data neutrality as a competitive necessity
Within this consolidating economy, neutrality of data has evolved from a desirable aspect to an outright competitive imperative. For any organization engaged in the construction of AI models, guarding of business interests and model independence are critical to establishing and keeping a competitive edge. The risks in having common data infrastructure, particularly with those that are direct or indirect competitors, are significant. When proprietary training data is transplanted to another platform or service of a competitor, there is always an implicit, but frequently subtle, risk that proprietary insights, unique patterns of data or even the operational data of an enterprise will be accidentally shared.
This problem is not necessarily one of bad intentions but potential for use of such data to fuel or inform the development of alternative models, even aggregated or anonymized usage patterns.
The implications of this extend throughout the entire life cycle of AI:
- Model creation: Sources of non-neutral data can risk injecting nuance biases into the source data from which models are created and can potentially bias results in favor of the provider of data.
- Training: The quality and efficiency of training models can be negatively impacted if access to the data or processing power is preferentially granted to certain companies.
- Deployment strategies: The ability to deploy models with no concern for data provenance or the risk of intellectual property leak is one of the main drivers of market trust and acceptance.
Ultimately, data neutrality ensures an organization’s proprietary AI models are kept that way, taking only their own data, thereby protecting their intellectual property and long-term market position.