Academic On Spotlight – Dr. Jonathan Kummerfeld

Early Career Researcher

Your academic journey

I started here, at The University of Sydney, completing my Bachelor of Science (Advanced) (Honours) between 2006 and 2009. Through the Talented Student Program and Vacation Scholar Program I had the opportunity to do research with many faculty members, leading to seven publications. My honours advisor, James Curran, also taught my first programming course, which was really a data science course before the term existed.

After graduating, I went to the US for 12 years, first as a PhD student at the University of California, Berkeley (2010-2016), then as a Postdoc at the University of Michigan (2016-2022), and finally as a Visiting Scholar at Harvard University (2021-2022). Over that time, I worked with a range of leading researchers, establishing collaborations that continue today.

I returned to Sydney in 2022 to take up a position as Senior Lecturer in the School of Computer Science and received an ARC DECRA Fellowship (2023-2025).

What are your major research focus areas?

I work on Natural Language Processing, a huge field (it’s more than just ChatGPT!) that crosses over between Computer Science, Linguistics, and Statistics. My lab is particularly focused on research questions that explore new ways for people to use and collaborate with AI systems.

One theme of my work, and the focus of my DECRA, is systems that generate code. Today we interact with computers using our hands (keyboards + mice), eyes (screens), and ears (speakers). If we could also speak and have computers generate code in response it would change what we can do and how quickly we can do it. For example, we could open a spreadsheet and create a complex visualisation by describing what we want, or we could get information from database by asking questions.

Another focus area is developing methods to help people rapidly read and see patterns in large amounts of text. Unlike summarisation, which by necessity leaves out information, the idea here is to present all of the data in a way that makes it easy to see large and small trends. One application is understanding Large Language Models better by looking at sets of outputs. Another application is supporting qualitative coding, e.g., when analysing survey responses.

Beyond these two, I also work on: language in multi-agent systems; information extraction from scanned documents; NLP for clinical notes; and more efficient data annotation.

What current collaborations do you have?

The projects in the list for the previous question are with (1) Tianyi Zhang at Purdue University and Toby Li at Notre Dame University, (2) Associate Professor Ben Hachey in the Faculty of Medicine and Health, and (3) Elena Glassman at Harvard University, respectively. I’m also starting to collaborate with CSIRO researchers on code generation to aid scientific research, and I am wrapping up a DARPA-funded project on multi-agent collaboration that involved researchers at the University of Maryland, Princeton University, and the University of Southern California.

What do you enjoy most about your research? Can you share a memorable project or experience in your research career?

I really enjoy the moment when I am in a meeting with a student discussing ideas and together we come up with a new and elegant approach to a problem. One example of that is a project during my postdoc where we were trying to create more diverse datasets for training AI assistants. The idea we came up with was simple but powerful: get a person to paraphrase a request (e.g., “what will the weather be tomorrow?”) but with the constraint that they can’t use certain words.

What ethical challenges do you foresee in the realms of digital science and how do you think researchers should address them?

Academic work is often focused on a small set of metrics, e.g., accuracy on a certain benchmark dataset. That is convenient for measuring progress, but no metric is perfect and so we risk optimising our systems in ways that have unanticipated side-effects. One solution is to build systems and trial them with people in careful studies that can reveal issues and push us to think about the broader context our work fits within.

Where do you see the field of Digital Sciences and technology in five years?

It is very hard to predict because we are in the middle of a major technological shift. Will there be further rapid advances in what LLMs and AI more generally can do? Or is the neural revolution going to slow down the way the statistical revolution of the 90s and 00s did? Either way, Digital Sciences is going to grow as these technologies are more widely applied.

In your opinion, how can initiatives like DSI help ECRs?

DSI and similar initiatives can bring ECRs and more experienced researchers together to work on projects. That helps ECRs get going in the research grant ecosystem and can establish collaborations that enrich the student experience, which in turn makes Sydney a more desirable university for PhD applicants.