I wrote a blog post last fall about the grant that Hanzo received from Innovate UK’s Sustainable Innovation Fund and how we were planning to use the funds. These grants seek to support and rebuild businesses in the UK that have been affected by the COVID-19 pandemic. For our part, we’re looking for ways that we can extend Hanzo Hold, our purpose-built Slack ediscovery tool, to address the new workplace risks that accompany the abrupt transition to remote work.
This month, I want to give an update on the model we’re building to detect human resources risks on collaboration platforms.
Defining the Risk
Now that most knowledge workers are—for at least the foreseeable future—working from home, organisations need to devise new ways to supervise and monitor their working conditions. Companies can no longer count on the physical proximity of colleagues who might overhear threats, bullying, or discriminatory language or see uncomfortable interpersonal relationships. Instead, those interactions have, in large part, shifted onto collaboration platforms like Slack, where bad behaviour can fly under the radar. This can result in discrimination, harassment, bullying, and other policy violations that ultimately create an unpleasant, unproductive, or even hostile work environment. This can even present legal risks to an organisation.
Organisations need a way to identify those incidents so that they can, where appropriate, follow up with increased training, personnel reassignments, or other interventions. This is what Hanzo’s hoping to do. By analysing collaboration content over time, we’re training artificial intelligence (AI) models that can identify atypical or troubling patterns of behaviour. Organisations can then evaluate those “red flag” incidents to determine whether they represented a true risk and, if so, whether intervention would be appropriate. Here’s where we are so far.
Creating a Solution
At this point, we’ve trained AI models for various unwanted behaviours including sexist language, racist and white nationalist references, and offensive language such as profanity.
In the process, we’ve identified several spikes in activity representing “incidents” in our research sample set. One of those incidents—generated from IRC logs for Ubuntu, an open-source Linux forum—occurred when a firewall went down. In that case, users who had infiltrated the forum wrote a series of offensive messages. This incident is clearly visible as a “spike” on the graph illustrating incidences of profanity. This graph—seen below— reflects the fraction of messages per hour that score more than 95 percent on our “profanity” scale; the blue band shows the normal baseline. Ordinarily, fewer than half a percent of messages score this high on the profanity scale, but the firewall incident shows a much higher percentage that warrants human attention.
Note that the more you zoom in (see second chart, (1_profanity_L1_f95)), the more detail you can see, but the noisier thedata gets. When we zoom in like this—and this is three orders of magnitude closer than the previous graph—we become more sensitive to the actual patterns minute by minute.
There’s a trade-off you have to make between being able to see the big patterns and being able to zoom in on the individual incidents. We’re working on tools that will allow the user to toggle between various levels of granularity from messages per day all the way to messages per minute.
Now, it’s important to keep perspective on what these models that we’re building can and can’t do. A collaboration platform like Slack involves a huge and ever-evolving dataset.
As I’ve mentioned, the AI that we’re developing isn’t intended to make decisions about conduct within that dataset; it’s just designed to help human decision-makers narrow in on the interesting bits of conversation. Just because something scores high doesn’t mean there’s actually anything wrong with it.
In one example, (not shown due to the profanity and negativity of the messages), we needed to go look at the messages themselves to determine whether the spike of profanity reflected a false positive or a genuine incident that required a response. Reviewing the messages during the most active period of the firewall attack, it was immediately apparent that the users were engaging in intentional profanity, which disrupted the channel and triggered an emergency response.
Incidentally, this same firewall attack also scored highly on the racist/white supremacist AI model, as the next chart shows. This model isn’t as straightforward as the profanity model and, as a result, it’s still lacking in some nuance, but it’s getting better all the time. We generally expect fewer than one in a thousand messages to score highly in this model.
One quick aside: the models we’re developing now are still what we call “Tier 1” models that evaluate a single message at a time to discern whether it contains any concerning language. The final product that we develop will use a “Tier 2” model that incorporates multiple Tier 1 models as inputs and considers patterns over time. While end users won’t directly see these models, they form the foundation of how the AI ultimately operates, so it’s important that we invest now in understanding how our models are identifying messages of interest.
Again, there’s noise in any dataset, so the challenge for data scientists is to design tools that find the clear outliers and flag them for attention. These models are designed to do just that: spot outliers and run them up the flagpole for human decision-makers to evaluate. Outliers might represent real risks, or they might be artifacts from noise fluctuations. Either way, that’s not up to the models to decide.
Looking Ahead to What’s Next
One of our goals has been to identify incidents of bullying as well, but that one is a bit more difficult to pin down. Bullying can take many forms and can involve many subjects, so it’s hard to point to standard words or phrases that indicate someone’s being bullied. We’ve identified datasets with threatening language and we’re working closely with a psychology expert who’s going to help identify patterns of intimidating behaviour and markers of verbal toxicity.
In my next blog post, we’ll change course to take a closer look at the data leakage model we’re building. That model will detect disclosures of both personal information and organisational intellectual property, whether accidental or intentional. We have a great example involving the improper disclosure of … well, I’ll save that for next time!