Languages, Systems, and Data Seminar (Winter 2026)

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper, Tyler Sorensen, Reese Levine, and Achilles Benetopoulos

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For winter 2026, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
Jan. 9	Shanto Rahman	Reliable Software Testing Using LLM and Program Analysis
Jan. 16	Lef Ioannidis	Structural temporal logic for mechanized program verification
Jan. 23	George Pîrlea	Veil: Automated and Interactive Verification of Transition Systems
Jan. 30	Stephen Mell	Designing Programming Languages for Compound AI Systems
Feb. 6	Martin Kleppmann	Collaborative editors are distributed databases
Feb. 13	Alexandre Moine	All for One and One for All: Program Logics for Exploiting Internal Determinism in Parallel Programs
Feb. 20	Cheng Gu	Behind Bars: A Side-Channel Attack on NVIDIA MIG Cache Partitioning
Feb. 27	Shardul Chiplunkar	Automatic Layout of Railroad Diagrams
March 6	Justine Frank	Simulating Failure: Verifying Crash Recovery Systems
March 13	Shraddha Barke	Toward Verification of AI Agents

Jan. 9

Speaker: Shanto Rahman

Title: Reliable Software Testing Using LLM and Program Analysis

Abstract: Software testing is essential for software reliability, yet modern test suites frequently suffer from broken tests caused by nondeterminism or code evolution. These failures mislead developers, reduce trust in testing, and allow real bugs to escape into production. In this talk, I present my work on making software testing reliable using program analysis and large language models. I introduce techniques to automatically identify and explain the root causes of flaky tests using context-aware attribution, enabling developers to understand why nondeterministic failures occur. I then present automated repair techniques, including FlakeSync for repairing asynchronous flaky tests and UTFix for repairing unit tests broken by code changes. These techniques achieve high repair success with low overhead and have been evaluated on large-scale, real-world datasets. I conclude by outlining a vision for nondeterminism-aware reliability in emerging domains such as cloud services, ML systems, and quantum software.

Bio: Shanto Rahman is a Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Professor August Shi. Her research spans software engineering and AI, focusing on reliable software testing under nondeterminism and code evolution. She develops program-analysis- and LLM-based techniques to detect, explain, reproduce, and repair broken tests, with publications in ICSE, OOPSLA, ASE, and ICST. She has gained industry experience through internships at Google and Amazon Web Services (AWS). Her recognitions include MIT EECS Rising Stars, UC Berkeley NextProf Nexus, and multiple UT Austin fellowships and awards.

Jan. 16

Speaker: Lef Ioannidis

Title: Structural temporal logic for mechanized program verification

Abstract: Mechanized verification of liveness properties for infinite programs with effects and nondeterminism is challenging. Existing temporal reasoning frameworks operate at the level of models such as traces and automata. Reasoning happens at a very low-level, requiring complex nested (co-)inductive proof techniques and familiarity with proof assistant mechanics (e.g., the guardedness checker). Further, reasoning at the level of models instead of program constructs creates a verification gap that loses the benefits of modularity and composition enjoyed by structural program logics such as Hoare Logic. To address this verification gap, and the lack of compositional proof techniques for temporal specifications, we propose Ticl, a new structural temporal logic. Using ticl, we encode complex (co-)inductive proof techniques as structural lemmas and focus our reasoning on variants and invariants. We show that it is possible to perform compositional proofs of general temporal properties in a proof assistant, while working at a high level of abstraction. We demonstrate the benefits of Ticl by giving mechanized proofs of safety and liveness properties for programs with scheduling, concurrent shared memory, and distributed consensus, demonstrating a low proof-to-code ratio.

Bio: I work in Programming Languages (PL) and Verification (Proof Assistants) in Microsoft Research’s RiSE group. I completed my PhD in the University of Pennsylvania advised by Sebastian Angel and Steve Zdancewic, where I developed Domain-Specific Programming Languages (DSLs) to improve the reliability, security and privacy of computer systems, combining techniques from formal verification and cryptographic protocols. Before starting my PhD I worked in industry as a Software Engineer (Bridgewater Associates, Apple) and Tech Lead at UnifyID. I completed by Bachelor’s and Masters in MIT and my thesis advisors were Frans Kaashoek and Nickolai Zeldovich. I am from Thessaloniki, Greece and started my academic career at the Aristotle University of Thessaloniki (AUTH).

Jan. 23

Speaker: George Pîrlea

Title: Veil: Automated and Interactive Verification of Transition Systems

Abstract: Veil is a foundational verification framework embedded in the Lean 4 proof assistant that makes distributed protocol verification both powerful and accessible. Improving on tools such as TLA+ and Ivy, Veil provides a unified environment where testing and automated push-button verification coexist with the full power of a modern proof assistant, all while maintaining foundational soundness guarantees.

Unlike traditional verifiers, which are implemented as standalone tools and (optionally) use proof assistants as backend certificate checkers, Veil is a Lean library, with all its functionality entirely embedded in the proof assistant. We believe this approach has significant advantages over traditional verifiers and that, in the near future, more verifiers will be built in this way.

In this talk, we describe our vision for building verifiers in Lean, and share our experience of writing (and rewriting) Veil, and the lessons we have learned along the way. We will also give a small tutorial of Veil, which you can try online at https://try.veil.dev/

Bio: George Pîrlea is a final year PhD student at the School of Computing, National University of Singapore, where he works on formal verification and testing of distributed systems. His research focuses on making formal methods practical and accessible, with a particular emphasis on combining automated and interactive approaches to verification. George is the lead developer of Veil.

Jan. 30

Speaker: Stephen Mell

Title: Designing Programming Languages for Compound AI Systems

Abstract: AI systems, such as LLMs, have proven successful at a wide range of tasks, from natural language processing to code generation. Recently, there has been a trend toward building compound AI systems, which combine one or more neural models with more traditional software components, such as search procedures and software APIs. Thus, an important question is how such systems should be built, and in particular how the programming languages used can be enhanced to support them. This talk addresses two facets of this question.

First, compound AI systems are glue scripts, in the sense that their performance bottleneck is waiting for external AI models to return, rather than in the developer-written code, rendering traditional performance optimizations ineffective. To address this, we propose a novel opportunistic evaluation strategy for scripting languages based on a core lambda calculus, which automatically dispatches independent external calls in parallel and streams their results. We prove that this is sound, and we show that it yields large speedups (running time up to 6.2× and latency up to 12.7×) on five compound AI systems.

Second, LLM agents take actions by invoking tools; in many cases, this is done via code actions, where the agent generates code, typically in Python, which is then executed. Though LLMs are quite proficient at Python, it may not be the optimal language for actions. Instead we propose Quasar, a language offering automated parallelization to improve performance, uncertainty quantification to improve reliability and mitigate hallucinations, and security features enabling the user to validate actions. LLMs writes code in a subset of Python, which is then transpiled to Quasar before execution.

Bio: Stephen is a final-year PhD student at the University of Pennsylvania, advised by Osbert Bastani and Steve Zdancewic, and is an incoming postdoc at CMU, advised by Zhihao Jia. He is interested in building better programming abstractions both for compound AI systems in particular and high-level scripting more generally.

Feb. 6

Speaker: Martin Kleppmann

Title: Collaborative editors are distributed databases

Abstract: Collaborative editors for text, (e.g. Google Docs, Overleaf), spreadsheets (e.g. Excel in Microsoft 365), graphics (e.g. Figma), and other file types have become an essential part of how we work. However, their current architecture relies on centralised, trusted cloud services. This is risky for various reasons: What if the government forces the cloud provider to disclose your sensitive documents, or to stop serving you entirely? What if the provider is a startup that gets acquired and shut down?

In this talk I will provide an overview of our research around Automerge, a library for building decentralised, local-first collaboration software. It brings together ideas from CRDTs, distributed databases, column-oriented data formats, version control systems, and end-to-end encrypted messaging apps in order to enable collaboration without depending on any one trusted company.

Bio: Martin Kleppmann is an Associate Professor at the University of Cambridge, working on decentralised systems and cryptographic protocols. He works with the Ink & Switch research lab on local-first collaboration software, and advises the decentralised social network Bluesky. In a previous life he was a Silicon Valley software engineer and entrepreneur, cofounding and selling two startups and working on large-scale data infrastructure at LinkedIn. He is the author of the best-selling O’Reilly book Designing Data-Intensive Applications.

Feb. 13

Speaker: Alexandre Moine

Title: All for One and One for All: Program Logics for Exploiting Internal Determinism in Parallel Programs

Abstract: Nondeterminism makes parallel programs challenging to write and reason about. To avoid these challenges, researchers have developed techniques for internally deterministic parallel programming, in which the steps of a parallel computation proceed in a deterministic way. Internal determinism is useful because it lets a programmer reason about a program as if it executed in a sequential order. However, no verification framework exists to exploit this property and simplify formal reasoning.

To capture the essence of why internally deterministic programs should be easier to reason about, we define a property called schedule-independent safety. A program satisfies schedule-independent safety, if, to show that the program is safe across all interleavings, it suffices to show that one terminating execution of the program is safe. We then present Musketeer, a separation logic for proving that a program satisfies schedule-independent safety. Once schedule independence is guaranteed, correctness can be shown in Angelic, a companion logic in which the user may select interactively one sequential interleaving and reason about it alone.

Using Musketeer, we prove the soundness of MiniDet, an affine type system for enforcing internal determinism. All results are mechanized in Rocq on top of the Iris separation logic framework.

This work was presented at POPL’26 and received a distinguished paper award.

Bio: Alexandre is a postdoctoral associate at New York University working with Joe Tassarotti on formal verification of parallel programs. Before that, he was a PhD student at Inria Paris (France) working with Arthur Charguéraud and François Pottier on formal verification of heap space bounds. Most of his work has a link to separation logic, resource consumption or parallelism.

Feb. 20

Speaker: Cheng Gu

Title: Behind Bars: A Side-Channel Attack on NVIDIA MIG Cache Partitioning

Abstract: NVIDIA Multi-Instance GPU (MIG) is a feature designed to enable isolation and secure multi-tenancy on large data center GPUs. MIG partitions a single GPU into multiple in- stances, each with dedicated hardware resources such as L2 cache slices. MIG is also documented to form the founda- tion of NVIDIA’s confidential computing stack by providing hardware-isolated trusted execution environments. However, the security claims of MIG deserve closer investigation, espe- cially given the complexity of the GPU memory system and its many (sparsely documented) memory instructions.

In this work, we empirically examine the behavior of GPU L2 cache with MIG enabled. We find that despite the par- titioning design, cross-instance L2 cache interference still occurs. Specifically, memory barriers (membars) generated in one MIG instance have side effects that propagate across L2 partitions and affect the timing of certain load operations in other instances. We also find that these membars can be triggered by specific GPU activities, such as kernel launches. Building on these observations, we develop a new timing- based side-channel attack in which an attacker in one MIG instance can infer the kernel launch patterns of a victim in another instance. We show that this attack compromises the confidentiality of widely used GPU applications, such as large language model inference, because kernel launch patterns in these applications are correlated with sensitive information.

Bio: Cheng Gu is a first-year Ph.D. student at the University of Rochester, advised by Prof. Yanan Guo. He conducts research in system and hardware security, with a focus on GPU side-channel attacks and GPU memory safety. His work has been published at S&P and USENIX Security.

Feb. 27

Speaker: Shardul Chiplunkar

Title: Automatic Layout of Railroad Diagrams

Abstract: Railroad diagrams (also called “syntax diagrams”) are a common, intuitive visualization of grammars, but limited tooling and a lack of formal attention to their layout mostly confines them to hand-drawn documentation. We present the first formal treatment of railroad diagram layout along with a principled, practical implementation. We characterize the problem as compiling a diagram language (specifying conceptual components and how they connect and compose) to a layout language (specifying basic graphical shapes and their sizes and positions). We then implement a compiler that performs line wrapping to meet a target width, as well as vertical alignment and horizontal justification per user-specified policies. We frame line wrapping as optimization, where we describe principled dimensions of optimality and implement corresponding heuristics. For front-end evaluation, we show that our diagram language is well-suited for common applications by describing how regular expressions and Backus-Naur form can be compiled to it. For back-end evaluation, we argue that our compiler is practical by comparing its output to diagrams laid out by hand and by other tools.

This work is to appear in ECOOP 2026. Meanwhile, a preprint is available here.

Bio: Shardul Chiplunkar is a doctoral student in the Systems and Formalisms (SYSTEMF) lab at EPFL, Switzerland, working with Prof. Clément Pit-Claudel. His main research interest is in diagramming for programs and proofs: how might we enrich our interfaces with programming and theorem-proving environments with the diagrams that are otherwise so common on our whiteboards and slides? More broadly, he is interested in topics at the intersection of programming languages, formal methods, and human-computer interaction. Prior to EPFL, Shardul earned his Bachelor’s degree from MIT in 2022.

March 6

Speaker: Justine Frank

Title: Simulating Failure: Verifying Crash Recovery Systems

Abstract: Systems that need to be robust to crashes often implement complex recovery procedures to be able to return to well behaved states without losing all progress. Work on verification of these systems has often relied on complex operational semantic models with pen and paper proofs of correctness that rely on assumptions which are challenging to state in theorem provers. Coinductive simulation relations are a common technique in mechanized frameworks for reasoning about the correctness of systems. They should be promising for reasoning about systems with failures because they offer a more interactive way of exploring the complex control flows that are introduced by non-deterministically failing and running recovery procedures, providing a more intuitive path for constructing correctness proofs. However, a key challenge in using simulations for verifying failing systems is in handling the non-determinism of the failures. Capturing the semantics of failures needs to treat possible failures as always being adversarial decisions, but prior work only deals with systems where the simulating system is always able to decide which non-deterministic branch is taken. I will present my work on creating simulations that are able to properly account for nondeterministic failures in the simulating system.

Bio: Justine Frank is a PhD student studying Programming Languages in the PLUM lab at the University of Maryland, advised by Milijana Surbatovich. She is interested in developing techniques for streamlining the development of correctness proofs of programming language implementations and of systems built on non-traditional architectures.

March 13

Speaker: Shraddha Barke

Title: Toward Verification of AI Agents

Abstract: As AI agents are deployed in real workflows, verifying their behavior becomes a central challenge. These systems act through multi-step traces containing plans, tool calls, observations, and state updates, where small procedural mistakes can silently accumulate into larger failures. We propose agentic verification, a framework that derives checkable rules from instructions, tool schemas, policies, and task context, and uses them to verify execution traces step by step. The verifier checks behavioral compliance, tool-use correctness, argument validity, transition legality, workflow adherence, and task completion, and then uses the resulting evidence to explain violations and localize the first unrecoverable failure. This allows verification to cover both failed runs and apparently successful runs whose final outcome hides procedural non-compliance. The overall goal is an auditable and scalable foundation for evaluating and debugging tool-calling agents.

Bio: I am a Senior Researcher in the Research in Software Engineering (RiSE) group at Microsoft Research in Redmond, WA, USA. My research is at the intersection of software engineering and artificial intelligence and focuses on making AI trustworthy. I graduated with my Ph.D. in 2024 from University of California, San Diego advised by Dr. Nadia Polikarpova where I worked on neurosymbolic program synthesis for domain specific languages.