The Road to Enterprise AI

The Road to Enterprise AI

The Road to Enterprise AI

Artificial Intelligence (AI) offerings are at an all time high with promises of increased productivity, streamlined workflows, and growing revenue efficiency. But as AI adoption soars, so do the questions surrounding implementation.

To share what we've learned and elaborate on best practices, DataFox attended the 3rd annual AI Summit in New York, a conference dedicated to uncovering the practical implications of AI solutions. During his session, The Road to Enterprise AI, Bastiaan Janmaat, CEO of DataFox, highlighted the importance of Smart Data and outlined how the value of your AI deployments is entirely dependent on the quality of your input data.

We all know this to be true and have likely seen examples of this in our daily lives. If you've ever made an online shopping purchase and your similarly recommended products are an assortment of random items, it's likely not the algorithm's fault but rather the data collection. Similar to the computer science term - garbage in, garbage out - if flawed data is the input, then flawed data is the only output. Essentially, the quality of your AI output is determined by the quality of your input data, which makes it necessary to have impeccable data quality to begin with.

Unfortunately, not enough enterprise organizations pay attention to data quality. In fact, critical CRM and ERP systems operate with 30-50% bad data. With long data refresh cycles and publicly sourced data that quickly goes stale, this isn't a shock. What does come as a shock is the lack of focus on Smart Data, especially in relation to its effect on AI solutions.

Smart Data is the single most important driver to powering successful AI outcomes. It's a complete view of your data that updates in real time and connects across systems, providing you more precise understanding for smarter decision making. Knowing that Smart Data plays a vital role in AI-driven solutions, what does the road to Smart Data look like? How can we get there? Janmaat lays out a roadmap with the four most important considerations: be an expert in your domain, focus meticulously on precision, align humans and machines, and be relentless about data integrity.

1. Be An Expert In Your Domain

Not all data is created equal. While data quality is an obvious necessity, to derive actionable AI insights you need to spend time understanding exactly what type of data is important to your AI use case. Thus, it's not always about the volume of data, but rather about which specific data points are important to your AI outcomes.

At DataFox, we wanted to build an algorithm which dynamically identifies lookalike or similar companies. To build such an algorithm predictive of company similarity, we first needed to identify what data inputs would render the correct outcome. In discussion with data experts in our field, we concluded a few especially important inputs:

  • Company descriptions

  • News articles (companies getting co-mentioned together)

  • Website meta data (keywords)

Be An Expert In Your Domain

This analysis will vary from one organization to another -- the same group of data experts would not be equipped to build an algorithm for a healthcare solution. That's what makes domain expertise critical to the success of your AI solutions; having the right expertise to identify the most influential data points.

2. Focus Meticulously On Precision

Precision is all about the harmony between accuracy and focus. For AI deployments, this highlights the difference between generalized AI and applied AI.

"Generalized AI is intended to mimic human intelligence in solving all kinds of questions and problems, whereas applied AI involves algorithms designed to solve a specific type of problem. Generalized AI is far from matching human intelligence, so today's attempts at generalized AI often miss the mark with precision. Think of the last time you asked Siri or Alexa a few questions, I'm sure they made a mistake at some point. Now think of self-driving car technology (applied AI) and how perfect and error-proof the performance has to be in order to earn your trust." - Bastiaan Janmaat, CEO of DataFox

Rather than buying a general purpose AI tool - one which inevitably will be too generic and noisy to deliver precise results - focus on applied AI, which is tailored to a specific problem and therefore delivers precise results, which can be relied upon for actionable insights.

3. Align Humans And Machines

One of the most important aspects of Smart Data is the critical partnership between humans and machines. "Ultimately, we're striving for the highest combination of scale and accuracy - that's why DataFox exists - because incumbent company intelligence providers were either low accuracy or low scale," says Janmaat.

While algorithms help scale, the partnership between humans and machines keep precision high. Here at DataFox, we refer to this as human-in-the-loop feedback and Janmaat outlines exactly how this is implemented.

Align Humans and Machines

The first facet of a Smart Data cycle begins with a training data set. This encompasses all the machine learning, natural language processing, and decision trees necessary to begin building complex algorithms. As the training data set feeds into your algorithm, your initial data outputs will likely have a meager level of accuracy. This is where human-in-the-loop feedback becomes crucial. Rather than settling for this level of quality, human analysts begin checking the data outputs of your algorithms and begin verifying data points and flagging irregularities. This process of identifying false positives and false negatives goes back to retrain the algorithms and very quickly we're operating above 90% precision. Thus, the Smart Data output begins to steadily grow.

AI + Humans In Action

Above is an example of this data cycle in action. The algorithm, using natural language processing, highlights the sentence that describes what data point we are looking for. Then, human analysts check a sample of the algorithm's work and accurate results are displayed to our customers in near real-time.

4. Be Relentless About Data Integrity

Smart Data is an ongoing process, you can't set it and forget it.

Be Relentless About Data Integrity

To continuously power Smart Data and derive actionable AI outcomes, one must equally pursue ongoing data integrity. Janmaat describes the necessary steps below:

  • Protect your points of entry: Don't let dirty data seep into your system.
    • For example, we curate a few thousand news sources we trust - we don't just pull data from every news source in the world.

  • Refresh regularly: Know how often certain data points need to be revisited.
    • For a company's industry, we check once a year. For their location, once a month. For growth signals, every minute of every day.

  • Identify data irregularities: Use statistics to identify anomalies.
    • For example, when the ratio of a company's revenue to their headcount seems off, it gets flagged and we inspect the data - either their revenue or their headcount might be wrong.

Key Takeaways

Above all else, match your AI investments with your data investments. All the resources spent on data scientists, machine learning capabilities, and computing power need to match resources invested in Smart Data. Otherwise, poor data will doom every AI deployment.

The four key steps Janmaat discussed were:

  • Incorporate domain expertise when selecting input data sources

  • Seek applied AI, rather than generalized AI, for high precision

  • Human-in-the-loop feedback is an essential method needed to quickly and accurately bootstrap algorithms and get to results

  • Rinse and repeat - know how often datasets change and adjust your refresh rates accordingly

Follow these steps, and AI implementation will deliver practical value for your organization. Listen to Janmaat's presentation below: