Spanish English French German Italian Portuguese
Social Marketing
HomeGeneralFinancingQuestions that every VC should ask themselves about the technological foundation of a...

Questions every VC should ask about the technological foundation of an AI startup

From fraud detection to agricultural crop monitoring, a new wave of technology companies has emerged, all armed with the conviction that their use of AI will address the challenges presented by the modern world.

However, as the AI ​​landscape matures, a growing concern is coming to light: at the heart of many AI companies, their models, are rapidly becoming commodities. A notable lack of substantial differentiation between these models is beginning to raise questions about the sustainability of their competitive advantage.

Instead, while AI models remain fundamental components of these companies, a paradigm shift is occurring. The true value proposition of AI companies now lies not only in the models, but also predominantly in the data sets that underpin them. It is the quality, breadth and depth of these data sets that allows the models to outshine their competitors.

However, in the rush to market, many AI-driven companies, including those venturing into the promising field of biotechnology, launch without the strategic implementation of a purpose-built technology stack that generates the indispensable data needed. for a automatic learning solid. This oversight has substantial implications for the longevity of your AI initiatives.

As you well know venture capitalists (VC) experts, it is not enough to examine the superficial appeal of an AI model. Instead, a comprehensive assessment of the company's technology is needed to assess its fitness for purpose. The absence of a meticulously designed infrastructure for data acquisition and processing could signal early on the downfall of an otherwise promising company.

This article offers practical frameworks derived from experience in machine learning-enabled startups. They are not exhaustive, but they can provide an additional resource for those who have the difficult task of evaluating companies' data processes and the quality of the resulting data and, ultimately, determining whether they are set up for success.

From inconsistent data sets to noisy inputs, what could go wrong?

Before moving on to the frameworks, let's first evaluate the basic factors that come into play when evaluating data quality. And, above all, what could come out bad if the data They are not up to par.

Relevance

First, let's consider the relevance of the data sets. The data must intricately align with the problem an AI model is trying to solve. For example, an AI model developed to predict housing prices needs data covering economic indicators, interest rates, real incomes, and demographic changes.

Similarly, in the context of drug discovery, it is crucial that experimental data show the greatest possible predictability of effects in patients, requiring expert reflection on relevant assays, cell lines, model organisms and others. .

Accuracy

Second, the data must be accurate. Even a small amount of inaccurate data can have a significant impact on the performance of an AI model. This is especially critical in medical diagnoses, where a small error in the data could lead to a misdiagnosis and potentially affect lives.

Coverage

Third, data coverage is also essential. If the data is missing important information, the AI ​​model will not be able to learn as effectively. For example, if an AI model is used to translate a particular language, it is important that the data include a variety of different dialects.

For language models, this is known as a “low-resource” versus “high-resource” language dataset. This also requires having a complete understanding of the confounding factors affecting the outcome, which typically requires the collection of metadata.

Bias and prejudices

Finally, data bias also deserves rigorous consideration. Data must be captured in an unbiased manner to avoid human bias or model bias. For example, image recognition data should minimize stereotyping. In drug discovery, data sets should cover both successful and unsuccessful molecules to avoid biased results. In both cases, the data would be considered biased and would likely lose its ability to make novel predictions.

The impact of poor data should not be underestimated. At best, they result in an underperforming model, and at worst, they make it completely ineffective. This can lead to financial loss, missed opportunities, and even physical harm.

Similarly, if the data is biased, the models will produce biased results, which can encourage discrimination and unfair practices. This has been a particular concern in the case of large language models, which have recently come under scrutiny for perpetuating stereotypes.

Compromised data quality also has the potential to erode effective decision making, which can ultimately result in poor business performance.

Framework 1: Technological pyramid for data generation

To avoid investing in ineffective AI startups, it is necessary to first evaluate the processes behind the data. Imagining a company's technological foundation as a pyramid is a good starting point, where the fundamental levels tend to have the greatest impact on the predictive outcome. Without this solid foundation, even the best data analytics and machine learning models face significant limitations.

Here are some basic questions a VC might initially ask to determine whether a startup's data generation process can actually create usable results for AI:

  • Is data capture automated to allow for scaling?
  • Is data stored in secure cloud environments with automated backups?
  • How is access to relevant IT resources and infrastructure managed and ensured?
  • Are data processing processes fully automated, with rigorous data quality checks in place to limit contamination from contaminated data points?
  • Is data easily accessible across the enterprise to power machine learning models and data-driven decisions?
  • How is data governance implemented?
  • Is there a data management strategy?
  • Are versions of data and ML models tracked and accessible, ensuring that ML models always work with the latest version of data?

Receiving solid answers to these questions can help determine a company's understanding of the underlying principles of its data pipelines. This understanding, in turn, will help measure the quality of the model output.

Framework 2: The five Vs of data quality

Once a company's technological foundation is deemed suitable for AI, it also needs to carefully consider the quality of the resulting data used to train its models. A common framework used to capture data quality classification is the five Vs of data quality. They represent five key dimensions of data quality that venture capitalists should consider when evaluating AI startups:

  • Veracity: The data must be accurate and truthful.
  • Variety: Data should be diverse and representative of the real world.
  • Volume: The data must be large enough to train the AI ​​model effectively.
  • Speed: Data must be updated frequently to reflect changes in the world.
  • Value: The data must be useful so that the AI ​​model can learn from it.

Here are some introductory questions to help evaluate a company's data for the five Vs:

  • Does the startup have a good hypothesis about what data it needs to create to build a differentiated capability or useful model?
  • What data do they collect?
  • Do they also collect relevant metadata?
  • How do you ensure the accuracy and consistency of the data you collect?
  • How does the startup plan to deal with data bias?
  • Do you collect multiple examples for the same question or experiment?
  • How useful is this data to the product you are creating?
  • What is the reason behind collecting this data?
  • Do you have evidence that your predictions improve by collecting and using this data? If so, how does the amount of data correlate with prediction improvement?
  • How easy is it for a competitor to collect the same data?
  • How long would it take them and how much would it cost them to do it?
  • Specifically for a biotech, how well does the indicator they predict correlate with a clinically relevant endpoint? Is there evidence of this?
  • What is the startup's plan to ensure the quality of its data over time?
  • How does the startup plan to protect its data from unauthorized access?
  • How does the startup plan to comply with data privacy regulations?

By carefully considering the five Vs of data quality, venture capitalists can ensure they are investing in AI startups that have the data they need to succeed. If the startup can answer the above questions convincingly and their data scores highly on all five dimensions, it is a good sign that they are serious about data quality and are adequately equipped to apply their AI models.

Finally, venture capitalists should evaluate the startup's commitment to data security. This includes things like your data governance policies, your data quality assurance procedures, and your data breach response plans.

Question the market to find the winners

Amid the resounding buzz surrounding AI lately, the lure of substantial investments has attracted startup founders willing to overhype their infrastructure and inflate capabilities in the search for capital.

Successful venture capitalists are asking the right questions to interrogate these companies thoroughly and filtering out potential winners built on a solid foundation from those with a hollow shell who are ultimately destined for failure. fracaso.

RELATED

Leave a response

Please enter your comment!
Please enter your name here

Comment moderation is enabled. Your comment may take some time to appear.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

If you have an interesting story about transformation, IT, digital, etc. that can be found on TRPlane.com, please send it to us and we will share it with the entire Community.

MORE PUBLICATIONS

Enable notifications OK No thanks