Languages, Systems, and Data Seminar (Fall 2025)

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper, Tyler Sorensen, Reese Levine, and Achilles Benetopoulos

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For fall 2025, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
Sept 26	NA	NA
Oct 3	Jessica Dagostini, Yanwen Xu, and Patrick Redmond	Conference Practice Talks
Oct 10	Reese Levine and Nathan Liittschwager	Conference Practice Talks
Oct 17 (Cancelled)	NA	NA
Oct 24	Tom Lyon	NFS Must Die! (and how to get Beyond File Sharing in the Cloud)
Oct 31	Mingwei Zheng	Semantic Bug Detection for Reliable Network Protocol Implementations
Nov 7	Tommy McMichen	Representing Data Collections for Analysis and Transformation
Nov 14	Lasse Moldrup	AWDIT: An Optimal Weak Database Isolation Tester
Nov 21	Eric Chan	Practical Distributed Computing Primitives for Heterogeneous Quorum Systems
Nov 28	NA	NA
Dec 5	Jayaprabhakar Kadarkarai	From Vibe Coding to Verified Systems: The Role of Formal Specification in the AI Era

Sept. 26

Social Hour!

Oct. 3

This week we will have practice talks for upcoming conference presentations.

Jessica Dagostini: miniGiraffe: A Pangenomic Mapping Proxy App, to appear at IISWC 2025

Yanwen Xu: BetterTogether: A Interference-Aware Framework for Fine-grained Software Pipelining on Heterogeneous SoCs, to appear at IISWC 2025.

Patrick Redmond: Exploring the Theory and Practice of Concurrency in the Entity-Component-System Pattern, to appear at OOPSLA 2025

Oct. 10

This week we will have practice talks for upcoming conference presentations.

Reese Levine: SafeRace: Assessing and Addressing WebGPU Memory Safety in the Presence of Data Races, to be presented at OOPSLA

Nathan Liittschwager: CRDT Emulation, Simulation, and Representation Independence, to be presented at ICFP.

Oct. 17

Seminar cancelled because of ICFP/OOPSLA and SOSP.

Oct. 24

Speaker: Tom Lyon

Title: NFS Must Die! (and how to get Beyond File Sharing in the Cloud)

Abstract: One of the most important lessons learned in distributed computing and concurrency is that shared mutable data is a bad idea . What is the purpose of a network file system? – to provide a shared mutable data space . There are many other problems with the NFS model at cloud scale. NFS remains popular because its killer feature is access to large data sets, by network-unaware applications, without having to first copy them. Using existing file systems, OverlayFS , and NVMe-Over-Fabrics , we propose a new approach to achieve blazing-fast, highly scalable, and consistent access to dynamic data sets. We solicit collaborators.

Bio: Tom Lyon is a mostly retired computing systems architect, serial entrepreneur and UNIX Greybeard. His most recent startup was DriveScale, which created a disaggregated server management system, and was sold to Twitter in 2021. Prior to DriveScale, Tom was founder and Chief Scientist of Nuova Systems, a start-up that led a new architectural approach to systems and networking. Nuova was acquired in 2008 by Cisco, whose highly successful UCS servers and Nexus switches are based on Nuova’s technology. He was also founder and CTO of two other technology companies. Netillion, Inc. was an early promoter of memory-over-network technology. The Netillion team moved to Nuova Systems. At Ipsilon Networks, Tom invented IP Switching. Ipsilon was acquired by Nokia and provided IP routing and security technology for many operator and enterprise networks. As employee #8 at Sun Microsystems he contributed to the UNIX kernel, led many networking and storage projects, and was one of the NFS and SPARC architects. He started his Silicon Valley career at Amdahl Corp., where he was a software architect responsible for creating Amdahl’s UNIX for mainframes technology. Tom holds numerous US patents in system interconnects, memory systems, and storage. He received a BS in Electrical Engineering and Computer Science from Princeton University.

Oct. 31

Speaker: Mingwei Zheng

Title: Semantic Bug Detection for Reliable Network Protocol Implementations

Abstract: Countless devices around the world communicate through network protocols, forming the backbone of modern digital infrastructure. Ensuring the security and correctness of these protocol implementations is critical, as flaws can lead to service disruptions, security vulnerabilities, and data loss. While extensive research has focused on low-level reliability through techniques such as fuzzing and traditional program analysis, true robustness also depends on high-level semantic conformance to the behaviors prescribed by natural language protocol standards. This latter aspect, semantic correctness, remains under-explored in the domain of network protocol testing.

In this talk, I will present three complementary efforts aimed at detecting semantic bugs in network protocol implementations. First, I will introduce ParDiff, a static differential analysis framework that identifies silent parser bugs by comparing multiple independent implementations of the same protocol. ParDiff automatically extracts finite state machines (FSMs) from programs to model protocol message formats and employs bisimulation and SMT-based reasoning to reveal fine-grained semantic discrepancies. Second, I will discuss ParCleanse, which leverages advances in large language models to automatically extract message formats from RFCs and generate both positive and negative test cases to evaluate parser correctness. Finally, I will present RFCAudit, an LLM agent designed to align RFC documents with source code to detect functional bugs beyond parsers. RFCAudit integrates an indexing agent that performs semantic indexing of source code, and a detection agent that conducts retrieval-guided consistency checking to uncover specification violations. Across these efforts, our research has uncovered over 100 semantic bugs in widely used network protocol implementations, demonstrating the promise of semantic bug detection for building secure and trustworthy network software.

Bio: Mingwei Zheng is a Ph.D. candidate in Computer Science at Purdue University, advised by Prof. Xiangyu Zhang. Her research lies at the intersection of large language models (LLMs) and software engineering. She focuses on building efficient and effective LLM agents for automated software development tasks such as code generation, software testing, and program repair, with the broader goal of improving software correctness, robustness, and trustworthiness. Her work has been published in top-tier conferences including OOPSLA, ASE, ISSTA, S&P, CCS, and NeurIPS, and has been recognized with the ACM SIGPLAN Distinguished Paper Award (OOPSLA 2024) and a NeurIPS 2025 Spotlight. She has completed two research internships at Microsoft Research (RiSE Group) and is currently an Applied Science Intern at AWS AGI.

Nov. 7

Speaker: Tommy McMichen

Title: Representing Data Collections for Analysis and Transformation

Abstract: Compiler research and development has treated computation as the primary driver of performance improvements in C/C++ programs, leaving memory optimizations as a secondary consideration. Developers are currently handed the arduous task of describing both the semantics and layout of their data in memory, prematurely lowering high-level data collections to a low-level view of memory for the compiler. This forces an early commitment to low-level memory representations that obscures high-level structure and blocks memory layout optimizations.

In this talk I will describe MEMOIR: an SSA intermediate representation with data collections as a first-class citizen. At its core, MEMOIR decouples the memory used to store data from the memory used to logically organize it. Through its SSA form, MEMOIR enables static analysis on collection elements and allows us to generalize traditional analyses and transformations to operate on these elements. Furthermore, preserving these high-level abstractions in the compiler allows us to automate memory optimizations that must be performed manually today.

Bio: Tommy McMichen is a final-year Ph.D. student at Northwestern University, advised by Simone Campanoni, where he created and leads the MEMOIR project: a compiler intermediate representation with data collections as first-class citizens. Tommy’s research focuses on developing language-agnostic intermediate representations that retain high-level semantic information to enable more precise static analysis and unlock automatic optimizations on data organization and representation. His work aims to bridge the gap between high-level programming abstractions and low-level performance optimization through automated compiler techniques.

Nov. 14

Speaker: Lasse Moldrup

Title: AWDIT: An Optimal Weak Database Isolation Tester

Abstract: In order to achieve low latency, high throughput, and partition tolerance, modern databases forgo strong transaction isolation for weak isolation guarantees. However, several production databases have been found to suffer from isolation bugs, breaking their data-consistency contract. Black-box testing is a prominent technique for detecting isolation bugs, by checking whether histories of database transactions adhere to a prescribed isolation level. The complexity of such testing has recently been shown to be polynomial for weak database isolation levels, but existing testers have a large polynomial complexity, restricting testing to workloads of only moderate size, which is not typical of large-scale databases.

In this work we develop AWDIT, a highly efficient and provably optimal tester for weak database isolation. Given a history H of size n and k sessions, AWDIT tests whether H satisfies the most common weak isolation levels of Read Committed (RC), Read Atomic (RA), and Causal Consistency (CC) in time O(n^(3/2)), O(n^(3/2)), and O(n * k), respectively, improving significantly over the state of the art. Moreover, we prove that AWDIT is essentially optimal, in the sense that there is a conditional lower bound of n^{3/2} for any weak isolation level between RC and CC. Our experiments show that AWDIT is significantly faster than existing, highly optimized testers; e.g., for the ~20% largest histories, AWDIT obtains an average speedup of 245x, 193x, and 62x for RC, RA, and CC, respectively, over the best baseline.

Bio: Lasse Møldrup is a second-year Ph.D. student at Aarhus University, advised by Andreas Pavlogiannis. His research focuses on the intersection of algorithms, complexity theory, and programming languages, particularly in problems related to testing concurrent systems. Lasse’s work typically begins with a performance-critical problem, such as a particular testing task, and asks: what is the theoretical limit on algorithmic efficiency for this problem, and can we design an algorithm that matches it? His research has been accepted to POPL and PLDI, and he was awarded with a Distinguished Paper Award for his PLDI paper.

Nov. 21

Speaker: Eric Chan

Title: Practical Distributed Computing Primitives for Heterogeneous Quorum Systems

Abstract: Byzantine quorum systems replicate data in the presence of arbitrary and malicious parties. Traditionally, quorums were uniform across processes. Their modern incarnations incorporate personalized and heterogeneous trust, in which processes select their own, personal quorums. It has been previously shown that the quorum intersection and availability properties are necessary to support the reliable broadcast and consensus primitives. We show they are not sufficient. In response, we propose a new property called quorum subsumption, which, together with intersection and availability, is sufficient for implementing reliable broadcast and consensus. We present practical protocols for both in the heterogeneous setting.

Bio: Eric Chan is a sixth year Ph.D. student at UC Riverside, advised by Mohsen Lesani. His research focus is in personalization and localization in distributed systems. His work entails both protocols and limitations for blockchains, heterogeneous quorum systems, and replication systems.

Nov. 28

No seminar (Thanksgiving break)

Dec. 5

Speaker: Jayaprabhakar Kadarkarai

Title: From Vibe Coding to Verified Systems: The Role of Formal Specification in the AI Era

Abstract: AI makes it easy to “vibe code” a quick demo, but the results are unpredictable and not something you want to ship to your customers. In this talk, I will show that formal specification is the missing foundation in the AI coding era. It lets us pin down system behavior, generate far more reliable code, and still ship quickly. I’ll also show how specification-driven development restores rigor to AI-generated code and illustrate the approach with FizzBee, a new specification language for practical, high-assurance software development.

Bio: With 20 years of experience building distributed systems at companies like Google and others, Jayaprabhakar Kadarkarai has witnessed the challenges and costs of traditional validation methods, driving his focus on improving how engineers design and reason about complex systems. His current work aims to make rigorous system validation more practical and accessible for developers.