Impact of GitHub Copilot on code quality

Jared Bauer summarizes results of a study I suggested this spring. 202 developers were randomly assigned GitHub Copilot, while the others were instructed not to use AI tools.  The participants were asked to complete a coding task.  Developers with GitHub Copilot had 56% greater likelihood of passing all unit tests. Other developers evaluated code to assess quality and readability.  Code from developers with GitHub Copilot was rated better on readability, maintainability, and conciseness.  All these differences were statistically significant.

What I’m working on

Many of my recent projects are confidential, and it’s not easy to provide public write-ups.  But I can summarize in general terms.  Selected recent matters:

  • Evaluating alternative remedies for proven violations of competition law.
  • Online forensics to determine whether a given publisher/partner is sending legitimate traffic versus malware, invisible traffic, fake clicks, and the like.
  • Measuring the incrementality of ad campaigns to distinguish the genuine incremental benefit, versus the sales that would have occurred anyway.
  • Estimating the market value of IPv4 addresses, and evaluating the impact of rules and restrictions on their transfer.
  • Exploring Excel data glitches including how data can become corrupted inadvertently, and what can be learned from internal Excel data structures.

I’m enjoying combining software engineering, law, economics, and (often) a bit of gumshoe work.  And it’s a delight to always be learning!

My next chapter

I am delighted to announce that I’m returning to more frequent writing on this site.  Closely related, I’ve resumed multiple projects to hold tech goliaths accountable.  Expect future writings and projects exploring all manner of online malfeasance.

Last month I began to serve as an advisor at Geradin Partners, a European law firm best known for its leadership in matters adverse to big tech.  Fully two decades ago, I was already flagging tensions between Main Street and Silicon Valley.  Those ideas took off slowly in the US, but in Europe they moved faster, in no small part thanks to the attorneys now at Geradin Partners.  I’m looking forward to working with them, and their clients, for all manner of projects with a locus in Europe.

More announcements to follow as to other affiliations.

The Effect of Microsoft Copilot in a Multi-lingual Context with Donald Ngwe

We tested Microsoft Copilot in multilingual contexts, examining how Copilot can facilitate collaboration between colleagues with different native languages.

First, we asked 77 native Japanese speakers to review a meeting recorded in English. Half the participants had to watch and listen to the video. The other half could use Copilot Meeting Recap, which gave them an AI meeting summary as well as a chatbot to answer questions about the meeting.

Then, we asked 83 other native Japanese speakers to review a similar meeting, following the same script, but this time held in Japanese by native Japanese speakers. Again, half of participants had access to Copilot.

For the meeting in English, participants with Copilot answered 16.4% more multiple-choice questions about the meeting correctly, and they were more than twice as likely to get a perfect score.  Moreover, in comparing accuracy between the two scenarios, people listening to a meeting in English with Copilot achieved 97.5% accuracy, slightly more accurate than people listening to a meeting in their native Japanese using standard tools (94.8%). This is a statistically significant difference (p<.05). The changes are small in percentage point terms because the baseline accuracy is so high, but Copilot closed 38.5% of the gap to perfect accuracy for those working in their native language (p<0.10) and closed 84.6% of the gap for those working in (non-native) English (p<.05).

 

Summary from Jaffe et al, Generative AI in Real-World Workplaces, July 2024.

Impact of M365 Copilot on Legal Work at Microsoft

Teams at Microsoft often reflect on how Copilot helps.  I try to help these teams both by measuring Copilot usage in the field (as they do their ordinary work) and in lab experiments (idealized versions of their tasks in environments where I can better isolate cause and effect).  This month I ran an experiment with CELA, Microsoft’s in-house legal department.  Hossein Nowbar, Chief Legal Officer and Corporate Vice President, summarized the findings in a post at LinkedIn:

Recently, we ran a controlled experiment with Microsoft’s Office of the Chief Economist, and the results are groundbreaking. In this experiment, we asked legal professional volunteers on our team to complete three realistic legal tasks and randomly granted Copilot to some participants. Individuals with Copilot completed the tasks 32% faster and with 20.3% greater accuracy!

Copilot isn’t just a tool; it’s a game-changer, empowering our team to focus on what truly matters by enhancing productivity, elevating work quality, and, most importantly, reclaiming time.

All findings statistically significant at P<0.05.

Full results.