DataTribe Insights - Q4 2021: Cyber Is Eating the World
The DataTribe Team
Introduction
With COVID continuing to change the office landscape, and unprecedented numbers of cyberattacks, 2021 was a tumultuous year in cybersecurity. Still reeling from the discovery of the SolarWinds hack in December of 2020, 2021 contained major hacks against Microsoft, Kaseya, Colonial Pipeline, plus some of the hits to the supply chain by ransomware actor REvil, and the year closed out with the Log4j vulnerabilities. While it seems daunting, the business opportunities abound, and early-stage venture capital are seizing them.
In this issue of Insights, we highlight industry trends such as more accessible cyber solutions for SMBs, security for machine learning models, and the continued focus on AppSec. In venture specifically, 2021 marked a record number of deals closed, at record valuations in seed, series A, and series B rounds.
2021 Wrap-Up & Q4 Cyber Deal Activity
A look backwards to put the current market in the perspective of the last decade shows that the pandemic had a temporary effect that really only impacted 2020. 2021 marked a record year for early stage deal counts in all verticals. Given the fact that many deals are reported after a significant lag, we expect to see that difference grow.

The news for venture capital in 2021 has been widely reported – enormous amounts of capital are competing for early stage deals, driving up both valuations and round sizes. Q4 of 2021 showed medians for seed stage pre-money valuations for cybersecurity and all verticals at $10m and $11m, respectively. In all verticals, that median climbed steadily from $5m to $7m from 2017 through Q1 2021, and has shot up since the start of last year.

Round size, the amount of capital invested, has also gone up in a similar fashion. Round size medians for all verticals in seed stage deals climbed steadily from $1.5m in 2017 to $2m through Q3 of 2020, and then rose very quickly to the Q4 2021 peak of $3.4m.

A Race to Create the Cyber “Easy Button”
Each day, approximately 30,000 websites are hacked, and 64% of companies worldwide have experienced some form of cyberattack. As a result, the cybersecurity industry has produced thousands of products to analyze, protect and identify all sorts of bad-actor strategies. And to build a comprehensive defense, enterprises often piece together a mesh of products that are time-consuming and costly to maintain — this historically has limited many of the available cybersecurity solutions to larger enterprises.
Here are some examples of complex security operations solutions. Security information and event management (SIEM) platforms are difficult to set up and, once in place, require a team of analysts to make sense of the stream of events they relentlessly emit. To better handle the process of responding to alerts, security orchestration, automation and response (SOAR) products came to market. But they also are a beast to set up. Next up came extended detection and response (XDR), which added more intelligence to boil down activity to real-world problems that either can be handled automatically or that allows teams to respond more easily with less analysis. And yet another solution, managed detection and response (MDR), adds an outsourced team to help handle responses that still require humans. While the latter two offerings have lowered overall solution costs, they still often are too bulky and complex for smaller organizations.
In the set of companies that applied to this year’s DataTribe Challenge, we saw a trend emerging around providing easy-to-use, next-generation security operation management solutions — solutions that can be installed quickly to enable smaller companies with only one or a partial security team member on staff to operate or evaluate their security operations. Startups are taking a fresh look at workflow management, operations optimization and automation and compliance mapping. ContraForce, a leader in this category and winner of the DataTribe Challenge, has a platform that enables users in just a few clicks to integrate with a variety of endpoint and network tools. Users then can set up automated playbooks in just a few more clicks.
As regulations, cyber insurance providers and customers have increased their demands on smaller companies to prove they are following security best practices, there is a great need to implement realistic cybersecurity solutions. While 45% of the cyberattacks in 2021 were on small and midsize companies, only 14% of them were prepared to defend against them. If you carry this study’s sample across the entire small and midsize business market, then approximately 4.4 million of the country’s 31.7 million businesses were not prepared. This lack of preparation must be addressed, and we think there is a group of companies bringing the right solutions to market just in time.
If Half of All Airbags In Cars Were Defective…
… needed to be replaced, and those airbags could be remotely deployed at any moment by someone.
That’s like the Log4j fiasco.
Log4j is easily exploited, time-consuming to patch, and ubiquitous – used in virtually every Java-based system from apps on your phone to your bank to the power plants; all factors that compound to make it one of the worst vulnerabilities ever discovered (the worst according to Jen Easterly, Director of CISA). In some ways it’s both shocking and disturbing that such a broadly adopted, foundational open source utility had such a grave vulnerability. It shakes your confidence in the whole idea of open source.

The recent global scramble is a reminder of how important it is to “shift left” security into the core software development process. In fact, a Q4 report from Sonatype shows there has been a 650% year-over-year increase in attacks of common open source public libraries. We have long been strong proponents of innovation in application security to fix bugs in software that can become vulnerabilities before they get into the wild. Our investment in CodeDx (acquired in 2021 by Synopsys) was a prescient investment into this category. Even more than three years after our original investment in CodeDx, we are still in the early days of building security into software from the start. We continue to see early startups with new ideas on how to make the software development process more secure.
When It Comes to ML Security, Robustness May Be As Important as Accuracy
What does it mean for machine learning (ML) models to be robust, and why should you care?
When building ML models, data scientists traditionally evaluate the accuracy of the model to determine if the trained model is “good.” Model accuracy is defined as the percentage of correct predictions made out of total predictions. More recently, model robustness has started to become an important attribute to track. Model robustness measures how well the model predicts for edge cases. This becomes quite important when considering the safety and security of the model.
A recent hot topic in the news, ML security, addresses common issues that occur with ML models such as unwanted biases, data-poisoning concerns, and interactive attacks on operationalized ML models. These ML model issues can be split into two common attack methods:
- ML training pipeline attacks: Using data to retrain models that are either purposely poisoned or accidentally generate unwanted biases in the model
- Operational ML model attacks: Intentionally finding patterns to present to the model that are known to result in incorrect prediction—often finding situations where humans find the correct prediction obvious, but the model gets things wrong

To help address both ML attack methods, tools are needed that provide data scientists with easy insight into the robustness of a model and will help reduce the vulnerability of the model to be attacked by these two methods. We’ve seen a significant increase in ML security startups forming. In fact, one of the DataTribe Challenge winners, QuickCode, has a platform to help improve training set quality, which can significantly improve model robustness and enable early discovery of bias formation in addition to data-poisoning attacks.
This is happening just in time. Gartner estimates that half of enterprises will have automated processes for ML model training and deployment by 2025. In just two years, up to 10% of those models could be poisoned by benign or malicious actors. The pressure for companies to have streamlined ML training pipelines that efficiently lead to robust models is not only coming from business requirements, but 2021 saw a significant increase in regulation efforts targeting ML bias and transparency. In Q4 2021, the Defense Innovation Unit (DIU) of the U.S. Department of Defense (DOD) published a white paper outlining a set of highly prescriptive guidelines to be followed by contractors in order to “avoid unintended consequences” in ML systems. Also in Q4, New York City passed a new law banning companies in the city from using automated employment-decision tools to screen job candidates, unless the technology goes through a “bias audit.”
Overall, we expect these trends to continue, driving more innovation to streamline and harden the ML training pipelines and operational model processes.
Can We Win the War on Ransomware?
The statistics for ransomware are staggering: Since 2015, there’s been a 57x increase in annual ransomware damage, reaching $20 billion in 2021. Ransomware has become a rich ecosystem of business partners. Much of today’s ransomware activity is run on ransomware-as-a-service platforms operated by multinational ransomware gangs. These gangs set up revenue-share partnerships with a worldwide network of criminals who find every way possible to inject the enabling malware into organizations. These ransomware criminals seem to be winning, right? For that matter, looking even broader, cybercriminals using any value-extraction technique—ransomware, email compromise, identity theft, etc.—seem to be winning the war.
But there may be hope that law enforcement is starting to gain a foothold to push back on the exponential growth in cybercriminal activity seen to date. Establishing large multicountry communication channels has been key. In October, Europol reported the successful bust of multiple REvil affiliates, and South Korean and Kuwaiti authorities arrested a number of GandCrab affiliates throughout 2021—all part of Operation GoldDust, a joint law enforcement effort involving 17 countries.
International coordination hasn’t only led to arrests: The FBI, in conjunction with law enforcement in other countries, returned $2.3 million of the ransom collected in the DarkSide attack on Colonial Pipeline. Five months later, the FBI confiscated $6 million from two individuals associated with REvil.
The key to winning the war is to change the parameters of the game dynamics, and this increased law enforcement traction does that. The formula for the cybercrime game is simple:
Cost = (Risk + Effort)
(Cost/Reward) Goes Up = Good Guys Win
(Cost/Reward) Goes Down = Cybercriminals Win
Law enforcement traction increases risk, which is great. But implementing more international policy to further increase the risk for ransomware and other cybercriminals would be even better. In a recent article in Foreign Affairs, Dmitri Alperovitch does a great job of covering this need for increased cyber policy focus. But what can be done in industry? The majority of the cybersecurity solutions landscape focuses on improving cyber defense, and, therefore, contributes to the effort parameter required by cybercriminals. While it is hopeful that industry and law enforcement are working to increase the overall cost for criminals, we do not believe the current effort is quite enough to flip the incentives for the “good guys” to start winning the war.
As for the final parameter, reward, the $15 billion cyber insurance market is providing much-needed financial support for companies struggling to handle their risk of cyberattacks. However, this is also problematic as it adds upward pressure on the reward parameter, thereby helping cybercriminals win. In addition, while insurance companies are encouraging cyber best practices of their customers, it’s unclear how much this actually leads to companies better protecting themselves from cybercriminals. We feel there may be interesting and significant opportunities for startups to find a way to help change this cyber insurance dynamic to help companies improve their cyber defense more directly and manage risk while reducing any upward pressure it has on the reward parameter.
Maybe we have rose-colored glasses on, but we’re hopeful that the law enforcement traction in 2021 is the first of a series of positive steps that will soon drive the (cost/reward) ratio up so the good guys start to win the war on ransomware. We expect to see interesting innovations come to market as part of this journey.
Other aspects of the Web3 architecture also need to be secured, even though their vulnerabilities don’t grab as many headlines as bridge hacks. This includes common decentralized application (aka dApp) components such as the user interface and services like RPC Hosts, IPFS gateways, and network indexing services. The attack surface of Web3 has become complex, and while some traditional cybersecurity solutions can be applied to this new architecture, there are many opportunities to build new solutions to solve Web3-unique challenges.
Prove your Software is Secure - Compliance Shifts Left
It is actually possible to formally prove your software is secure. Verification of software against a formal specification has been a field of research in computer science for a while now. As software continues to “eat the world”, it is more important than ever to ensure that software does exactly what is wanted and has no security flaws. But typically, using formal verification languages such as Agda or Coq to formally prove even the simplest software matches a specification requires specialized, advanced math expertise and is extremely time consuming. Recently, there have been a number of companies innovating in this area, mostly Web3 smart contract auditing companies like CertiK and Certora. They are making formal verification more accessible.
The idea of “shift left compliance” means developers can verify their software is in compliance with a formal specification before going into production rather than companies primarily relying on audits to confirm their cybersecurity processes follow best practices listed in a compliance framework.
Because formal verification can be so complex and time consuming, one way to simplify it for an average development team is to pre-verify software building blocks that would commonly be used. In Q4, we talked to an early-stage company working on research based on this concept. The company has a technology they call “modular provable security.” We expect this type of approach will bring formal verification to the masses. Keep a lookout.