On April 25, 2019 Orrick’s Successful Women in IP (SWIP) group and the newly launched AI Cross Practice Initiative joined forces to bring together industry-leading AI experts for a lively panel discussion on the data used to train the algorithms that drive machine learning. Orrick’s own IP partner Diana Rutowski moderated the panel, and was joined by Anamita Guha, IBM Global Product Lead at AI + Quantum, Sam Huang, Senior Associate at BMWi Ventures, Charlotte Lewis Jones, Associate General Counsel at Facebook AR/VR, and Emily Schlesinger, Senior Attorney in Artificial Intelligence and Research at Microsoft.
The panel dissected hot-button issues related to data acquisition, data use, the need for making the AI black box more transparent, the responsibility of companies when it comes to controlling their data, and determining when and how bias should be removed from the machine learning process. This panel discussion is one of many that Orrick plans to have as part of its AI Speaker Series.
Key Takeaways and Event Soundbites
On conceptualizing the relationship between AI, machine learning and data, Guha describes that if AI is a rocket ship, then data is the rocket fuel and the machine learning algorithm is the combustion chamber. In order to power the rocket ship, fuel is needed, and the fuel must be ignited by the combustion chamber. Similarly, in order to have working AI, data is needed to train the algorithm to power the AI system.
For Lewis-Jones the number-one legal concern surrounding AI is privacy. AI systems can be accurate only if the training data is representative, something that can happen only when the algorithms are trained on real world data. To acquire this data, entities must be cognizant that they are giving proper notice to its test subject. Has the person been told the purpose of the data collection? Have they been informed that the data could subsequently be used for other purposes? Have they been notified that their data can be sold or licensed to third parties for additional uses? Lewis-Jones notes that it is easy for data acquirers to say that consent is too difficult to obtain and then only train in a vacuum, but that this does not lead to reliable AI.
Schlesinger finds the uncharted legal landscape of AI to be both a fascinating and challenging space to navigate. Technology has outpaced the law, meaning that in-house counsel has a responsibility to define accountability. Part of this accountability means focusing on privacy and ethical concerns through the entire development life cycle of AI – data collection, making the model and post-deployment – rather than considering them only after the AI system is established.
From the investor perspective, Huang always focuses on whether the target entity owns their data. If not, will they have future access to it? For Huang, the most significant red flag is entities that don’t own their own data and won’t have future access to it. Lacking ownership means that the algorithms will not be able to be developed in the future or corrected in the event of biased results.
Biased Data Leads to Biased Algorithms
On the issue of bias, Schlesinger notes that bias can occur at any stage of development: for instance, if the underlying training data is biased, then the resulting data sets and algorithm outcomes will also be biased. Huang highlighted an instance where she was in an autonomous vehicle that would recognize and stop for pedestrians of lighter skin tone, but failed to recognize darker-skinned pedestrians. This failure, if not corrected, would result in an autonomous vehicle reliably stopping for pedestrians only when they are light-skinned.
According to Lewis-Jones, when a car cannot identify skin color, that is a result of bad and unsafe data. The remedy for this is training with a diverse and inclusive data set. For her work with Portal, she discusses the importance of training their microphone using people with different accents. As a solution, Guha points to the importance of having diverse teams building the algorithms, because without diversity in the teams, biases will be incorporated into the AI.
Guha also notes, however, that bias is not always a bad thing, and can sometimes be reframed as an expertise. If you have, for example, a mental health application, you want to have the algorithms built by health care professionals who can bring their expertise to building them.