This year, DataTribe announced two winners of the 2021 Challenge. QuickCode.ai and ContraForce are both moving forward with the potential for $2 million in seed funding.

Read on about 2021 DataTribe Challenge winner, QuickCode.ai.

Q: Tell us about your backgrounds

Gary King, Co-Founder

Dr. Gary King is the Weatherhead University Professor at Harvard University (one of 25 with Harvard’s most distinguished faculty title). He also serves as Director of the Institute for Quantitative Social Science. An elected member of the National Academy of Sciences and winner of numerous awards and prizes for his research, teaching, software, and entrepreneurship, he and his research group develop and apply empirical methods in many areas of social science research. Dr King is a co-founder and an inventor of the original technology for Crimson Hexagon (merged with Brandwatch, acquired by Cision), Learning Catalytics (acquired by Pearson), Perusall, Thresher, OpenScholar, and other firms. He has received 17 patents for these technologies. For more information, see GaryKing.org.

Becky Fair, Co-Founder

Ms. Fair is the CEO and cofounder of Thresher, a software company that combines unique data sets and machine learning to help decision makers in government and industry decode China, even when others intentionally manipulate the narrative. She spent a decade as a CIA officer in a variety of roles and brings a deep understanding of the intelligence community and program management within that community. She also has run her own management consulting practice for CEOs of mid-market companies. She started her career working in Russia at the International Finance Corporation, a division of the World Bank. She holds a BA from Middlebury College and an MBA from Dartmouth, graduating as a Tuck Scholar.

Shannon Hynds, CEO and Co-Founder  

A high school computer science course taught by a terrific and inspiring teacher set Shannon on the path to software engineering as a career. Her very first job out of college was at a startup, where she watched and learned what it took to build a business from the ground up. She spent many years after that leading software and non-software projects. Along the way, she discovered a talent for guiding teams from point A to point B.

A chance encounter with a group of VCs sparked Shannon’s interest in angel investing and entrepreneurship, a passion she still has today. When the startup software company Thresher came knocking, Shannon jumped at the chance to join an amazing team. She spent five years there as the product manager for QuickCode, supporting the needs of their analysts and other customers. QuickCode’s value as its own product separate from Thresher’s product line became apparent, and Thresher’s founders spun it out into its own company—Quickcode.ai—with Shannon as CEO.

Q: Tell us about your business/idea. 

For most organizations, there is a goldmine of internal knowledge contained in unstructured text data. This data, combined with the power of machine learning, has the potential to revolutionize and inform everything humans do—from operating a car to grocery shopping to communicating with our loved ones to receiving healthcare, and the list goes on and on. The problem is that there is so much unstructured text, it is hard to know where to start. When our team talks with data scientists about the power of QuickCode, all agree that the first question they have when solving a problem by using machine learning is “Can I get enough labeled training data?”

At quickcode.ai, we see the problem as much bigger than that. Because it isn’t enough to just have mountains of labeled training data. Machine learning models need the labeled training data to be relevant and representative of what the ML tool will encounter when it is deployed in the real world. With a dataset of unstructured text, that means the tool must identify slang, codewords, misspellings, dialects, jargon, and other domain-specific terminology.

QuickCode software helps solve this problem of getting the right kind of labeled training data, resulting in more accurate and less-biased machine learning models. QuickCode is a human-in-the-loop (HILT) solution that enables data scientists to get their subject matter experts in front of the data earlier in the machine learning pipeline process, allowing them to collect the right data for labeling, gain insights, and inform the development of labeling rules. QuickCode uses its own machine learning and natural language processing (NLP) algorithms to target the most relevant and representative text, creating training datasets that are narrow in use case but diverse in content.

Q: What was the original inspiration for your company/product? 

QuickCode was born out of a single use case, identified by a team of quantitative researchers at Harvard. Chinese netizens, trying to avoid government censors, swapped out the Chinese characters for “freedom” (自由) with the similar looking characters for “eyefield” (目田). The researchers attempted to reverse engineer Chinese censorship rules, suspecting netizens were using codewords but having no clue what the codewords were. Then the team had a breakthrough, allowing them to find codewords on Chinese social media using machine learning. QuickCode is the productized version of that technology.

Q: What’s your vision for the future: What will the market you are pursuing look like in 5-10 years?

It’s still early days for text-based machine learning and artificial intelligence, and data labeling remains a significant bottleneck in deploying production-quality machine learning models. The next 5-10 years will bring advances in perfecting labeling processes—be they human, machine, or a hybrid of both. The focus is likely to be on how to achieve higher quality models with fewer humans and fewer labels. In fact, there is a movement to do away with using labeled data altogether; we expect research and development into these methods will continue to bring about new, exciting technologies.

Beyond the details of how to label, the data-labeling industry also faces legitimate government and public concern surrounding ethics, trustworthiness, and risk. The International Organization for Standardization (ISO) and  Institute of Electrical and Electronics Engineers (IEEE), two of the most prominent standardizing bodies, are developing and recommending voluntary practices to address such issues. It is inevitable that certification or badging will be offered in the future to help allay concerns and boost confidence in AI solutions. Quickcode and other AI practitioners will have to work hard and continue to elevate our standards and practices to stay ahead of changing standards.

Q: How does your business address pressing cyber and data challenges for the commercial sector?

When it comes to deploying machine learning models that rely on unstructured text data, QuickCode is the place to start. We provide an iterative and easy (yes! easy!) way for an organization to extract value from their unstructured text data. First and foremost, QuickCode finds “more” of what a user is looking for, reducing bias and improving accuracy by introducing diverse concepts and terminology individual experts might have overlooked. This means that a user looking for tweets about COVID vaccine side-effects might be asked to consider including tweets that contain “arm soreness” or “facial paralysis” but exclude tweets that include “Bieber Fever.”

QuickCode can provide the labeled training data for proof-of-concept machine learning projects. Often, a data scientist has a theory or idea about how they can solve a problem with machine learning, but they lack the resources and/or time to procure a set of labeled training data to test their theory. QuickCode can provide the first-pass, allowing the data scientist to iterate rapidly without enlisting a group of outside labelers to assist.

In some cases, though, you need true experts to label data. In those cases, QuickCode is an ideal solution. Experts are expensive and their time is limited. When a labeling task requires an expert, such as a physician or an intelligence analyst, for example, the organization wants to minimize how much of that expert’s time is tied up in tedious hand-labeling or shifting through irrelevant content. QuickCode helps organizations realize this goal of improved time efficiency—without diminishing results. Our software reduces the overall volume of data sent for labeling while also increasing the number of positive targets in the training set. Literally better and faster.

Q: What attracted you to the DataTribe Foundry? Why did you choose to participate in the DataTribe Challenge? 

We’re founders looking for investment partners who will bring deep market understanding and experience—and the DataTribe team does not disappoint. We were impressed with DataTribe from our very first meeting. Natural language processing and machine learning technologies are complicated and not every investor gets it, but DataTribe did, right away.

It was also apparent that the DataTribe team takes a very active role in supporting their portfolio companies. Beyond the superpower expertise each member contributes, we see the skill and dedication they bring to picking up the phone and reaching out to potential partners and customers in their network. This is important to us, as we view our investors as traveling companions on the journey. That highest level of effort and attention sets DataTribe apart.

We’re excited to be a part of the DataTribe challenge and honored to be considered as one of the three finalists. Our relationship with DataTribe will open doors, and we are very grateful for the opportunity.

Q: What’s your long-term vision for your business?

QuickCode benefits any organization that is solving a problem with machine learning and unstructured text; it is truly sector agnostic. That said, we know that every industry and organization is unique in the way that it collects, stores, and delivers data. Our goal is for QuickCode to become the training set preparation tool of choice, which means we will need to invest in API technology that will allow QuickCode to be used alongside a myriad of data platforms.

We’re also adding to the feedback loop that informs the decision making processes by making investments in the technology that helps users guide QuickCode’s machine learning. These investments include visualizations, metrics, and enhanced recommending algorithms.

As we think about a long-term vision for the business itself, QuickCode will be used not only to improve efficiencies and accuracy in machine learning development pipelines, but also to identify and mitigate model drift in operational pipelines.  Specifically, as language changes over time due to technology advances, terminology evolution, or slang and jargon changes, QuickCode can help identify cases that are missed by the now-outdated model.