Languages, Systems, and Data Seminar (Winter 2025)

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper, Tyler Sorensen, Reese Levine, and Gan Shen

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For winter 2025, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
Jan. 10	Social Hour!	NA
Jan. 17	Mako Bates	Choreographic Programming in theory and practice
Jan. 24	Tim Goodwin	What if Kubernetes was a compiler target?
Jan. 31	Yan Tong	Cykas, a new protocol for sender-side enforcement of causal message delivery
Feb. 7	Roy Shadmon	From Coordination to Inference: A Probabilistic Approach to Byzantine Agreement
Feb. 14	Jubi Taneja	LLM-Vectorizer: LLM-Based Verified Loop Vectorizer
Feb. 21	Mahita Nagabhiru	Hardware support for lock-free programming
Feb. 28	Callista Le	Concurrent Data Structures Made Easy
March 7	NA	Canceled Due to the Stand Up For Science Rally
March 14	Abtin Molavi	TBD

Jan. 10

Social Hour!

Jan. 17

Speaker: Mako Bates

Title: Choreographic Programming in theory and practice

Abstract: Writing systems of concurrently executing programs is difficult: they can behave (and fail) in unexpected ways because of race conditions and deadlocks. When we talk about what we want these systems to do, we often tell “Alice & Bob” stories to describe the intended protocol. The fundamental idea of choreographic programming is that we can write real software that same way: as a single thread of activity described from a “3rd-party omniscient” perspective. Research on choreographic programming evolved out of formal protocol-specification methods over a decade ago, but systems that can actually be used to write software (e.g. Choral, HasChor, and MultiChor) are only a couple years old. My team and I have developed several such implementations with novel type-directed correctness guarantees. In this talk I will present a summary of our own work as well as a review of other relevant cutting-edge systems.

Bio: Mako Bates is a PhD candidate at the University of Vermont, advised by Joseph Near. He has done research and development of choreographic programming systems, secure multi-party computation, and differential privacy. Prior to starting his PhD, he worked in industry for several years.

Jan. 24

Speaker: Tim Goodwin

Title: What if Kubernetes was a compiler target?

Abstract: Multi-tier programming, a concept from the programming languages community, provides abstractions for building multiple components of a distributed application as parts of a single program. Modern software development, however, often moves in the opposite direction: applications are typically composed of independently developed programs that are later integrated through service APIs. While distributing an application into independent units has its benefits, this approach requires developers to write repetitive plumbing code to connect application components and forces them to navigate an error-prone configuration and deployment process to ultimately get things working. Multi-tier programming offers a solution to these challenges, yet it remains relatively unknown in industry. However, closely related topics—such as “monolith vs. microservice,” “monorepo or separate repos,” and even “common language or polyglot”—receive lots of attention and discussion in the software engineering community. In this talk, I will provide an overview of multi-tier programming and how it might simplify software development on Kubernetes, a popular platform for running containerized applications. I will also demonstrate a prototype “Kubernetes compiler” that can turn a monolithic codebase into a distributed application that runs on Kubernetes.

Bio: Tim Goodwin is a 3rd year PhD student at UC Santa Cruz, advised by Lindsey Kuper and Andrew Quinn. His research focuses on cloud-native programming models and the challenges they present to developers. Prior to grad school, he wrangled with microservice complexity in industry.

Jan. 31

Speaker: Yan Tong

Title: Cykas, a new protocol for sender-side enforcement of causal message delivery

Abstract: Protocols for causal message delivery are widely used in distributed systems. Traditionally, causal delivery can be enforced either on the message sender’s side or on the receiver’s side. The traditional sender-side approach avoids the message metadata overhead of the receiver-side approach, but it is also more conservative than the receiver-side approach, meaning that messages are delivered later than necessary. We present Cykas (“Can you keep a secret?”), a new protocol for sender-side enforcement of causal delivery that sidesteps the conservativeness of the traditional sender-side approach by allowing the eager sending of messages and constraining the behavior of their recipients. Our experiments show that for applications involving long-running jobs, Cykas has a performance advantage: Cykas lets long-running jobs start (and end) earlier, leading to a shorter overall execution time compared to the traditional sender-side approach. We implemented the Cykas protocol using the Stateright implementation-level model checker for Rust and checked the safety and liveness of our implementation using a custom-implemented property checker.

Bio: Yan Tong is a 2nd year PhD student at UC Santa Cruz, advised by Lindsey Kuper. His research focuses on distributed messaging protocol.

Feb. 7

Speaker: Roy Shadmon

Title: From Coordination to Inference: A Probabilistic Approach to Byzantine Agreement

Abstract: Multi-agent sensor networks and control systems such as distributed cyber-physical systems, smart power grids, and robot swarms make important control decisions using inputs from networked sensor (or sensor-derived) data streams, some of which may be faulty or compromised. These inputs are used to update models that represent the physical state of the system under control and drive decision procedures that seek to achieve a goal (e.g., drone swarms follow the leader), maintain system properties (e.g., keep temperature near 70F), or enforce safety requirements (e.g., shutdown power if current exceeds 3A). Most real-world examples of these systems are inherently edge-based, thus requiring increased tolerance to signal noise, crash and network faults, as well as malicious components. Current approaches often fall short due to centralized designs with a single point of failure, but applying traditional Byzantine fault-tolerant agreement protocols is challenging due to the noisy nature of the input data streams. Existing approximate agreement protocols overlook latent information present in the proposed values, but extracting that information is complicated by the possibility that some values are malicious.

In this talk, I will present Proximal Byzantine Agreement (PBA), a novel stochastic agreement protocol. PBA is based on Bayesian statistical inference driven by observations of replicated computation and provides statistically superior accuracy and reliability guarantees compared to current approximate agreement protocols. I will also talk about some of our ongoing work, including Centauri, a framework for organizing Byzantine fault-tolerant edge devices and sensor networks using PBA.

Bio: Roy Shadmon is a PhD candidate at UC Santa Cruz, advised by Owen Arden. His research focus comes at the intersection of multi-agent systems, Byzantine agreement, and Bayesian statistics.

Feb. 14

Speaker: Jubi Taneja

Title: LLM-Vectorizer: LLM-Based Verified Loop Vectorizer

Abstract: Compiler auto-vectorization promises performance gains through SIMD execution but often fails due to conservative heuristics and missed opportunities. Manually writing SIMD intrinsics, while effective, remains error-prone and inaccessible to most programmers. What if Large Language Models (LLMs) could generate optimized SIMD code automatically? And more importantly—how can we trust the correctness of AI-generated transformations? In this work, we explore the potential of Large Language Models (LLMs) to generate vectorized code from scalar loops and propose a finite-state multi-agent approach that combines LLMs with test-based feedback. Our results show speedups of 1.1x to 9.4x over state-of-the-art compilers (Intel Compiler, GCC, Clang). To ensure correctness, we integrate Alive2 for formal verification and introduce scalability techniques that improve verification coverage. Our approach successfully verifies 38.2% of vectorizations on the TSVC benchmark.

Bio: Jubi is a Research Engineer at Microsoft Research, where she explores the intersection of LLMs and compilers (LLMs for Compilers and Compilers for LLMs). Her broader research interests span compiler optimizations, formal verification, and program synthesis. She earned her Ph.D. from the University of Utah under the guidance of Prof. John Regehr, where her work leveraged formal methods to enhance compiler construction. Her contributions to testing LLVM’s static analyses earned her both the Best Paper Award and Best Student Presentation Award at CGO 2020. Beyond research, Jubi is deeply committed to mentorship. She has been a long-term mentor for SIGPLAN-M for the past five years, helping students navigate graduate school and supporting aspiring researchers in their academic journeys.

Feb. 21

Speaker: Mahita Nagabhiru

Title: Hardware support for lock-free programming

Abstract: Non-blocking algorithms, including lock-free programming, represent advanced concurrency techniques that go beyond traditional lock-based synchronization (e.g., mutexes or semaphores). Instead of relying on locks to protect shared resources, lock-free programs leverage hardware atomic primitives. This allows multiple threads to concurrently access shared data, effectively preventing issues like deadlocks and priority inversion. While powerful, lock-free programming is notoriously difficult to implement and verify. Its complexity demands a deep understanding of memory consistency models, atomic operations, memory ordering, and potential race conditions. However, the performance gains offered by lock-free techniques make them indispensable in high-performance applications such as databases, operating systems, and real-time systems. This work introduces lock-free programming, discusses the inherent challenges and then delves into practical examples of common lock-free data structures. Then explores advanced mechanisms such as multi-word compare-and-swap (MCAS) and lock-free hardware transactional memory (HTM), which simplify lock-free program development and offer stronger progress guarantees.

Bio: Mahita Nagabhiru is a CPU architect at Google. She works on Load/Store Unit- design and modelling, exploring performance features currently focused on store-to-load-forwarding. Her PhD was under Prof. Greg Byrd at NC State University, Raleigh in H/W support for lock-free programming. This talk is based on the research at NCSU.

Feb. 28

Speaker: Callista Le

Title: Concurrent Data Structures Made Easy

Abstract: Design of an efficient thread-safe concurrent data structure is a balancing act between its implementation complexity and performance. Lock-based concurrent data structures, which are relatively easy to derive from their sequential counterparts and to prove thread-safe, suffer from poor throughput under even light multi-threaded workload. At the same time, lock-free concurrent structures allow for high throughput, but are notoriously difficult to get right and require careful reasoning to formally establish their correctness. In this work, we explore a solution to this conundrum based on a relatively old idea of batch parallelism—an approach for designing high-throughput concurrent data structures via a simple insight: efficiently processing a batch of a priori known operations in parallel is easier than optimising performance for a stream of arbitrary asynchronous requests. Alas, batch-parallel structures have not seen wide practical adoption due to (i) the inconvenience of having to structure multi-threaded programs to explicitly group operations and (ii) the lack of a systematic methodology to implement batch-parallel structures as simply as lock-based ones. We present OBatcher, a Multicore OCaml library that streamlines the design, implementation, and usage of batch-parallel structures. OBatcher solves the first challenge (how to use) by suggesting a new lightweight implicit batching design pattern that is built on top of generic asynchronous programming mechanisms. The second challenge (how to implement) is addressed by identifying a family of strategies for converting common sequential structures into the corresponding efficient batch-parallel versions, and by providing a library of functors that embody those strategies. We showcase OBatcher with a diverse set of benchmarks ranging from Red-Black and AVL trees to van Emde Boas trees, skip lists, and a thread-safe implementation of a Datalog solver. Our evaluation of all the implementations on large asynchronous workloads shows that (a) they consistently outperform the corresponding coarse-grained lock-based implementations, the only ones available in OCaml to date, and that (b) their throughput scales reasonably with the number of processors.

This talk is based on work presented at OOPSLA 2024: https://doi.org/10.1145/3689775

Bio: Callista Le is currently a software engineer at Ahrefs, where she mainly works with OCaml. She graduated from Yale-NUS College in 2024 and wrote her final year thesis, Simple and Efficient Concurrent Data Structures via Batch Parallelism, under the supervision of Professor Ilya Sergey.

March 7

Canceled.

March 14

Speaker: Abtin Molavi

Title: TBD

Abstract: TBD

Bio: TBD