s+++ title = “Languages, Systems, and Data Seminar (Spring 2025)” date = 2025-03-30T19:11:27-08:00 math = false

+++

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper, Tyler Sorensen, Reese Levine, and Gan Shen


The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For spring 2025, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date Speaker Title
April 4 Shadaj Laddad Hydro: Modular, Correct, and Performant Distributed Systems
April 11 Eva Graversen A Theory of Choreographic Programming
April 18 Huaifeng Zhang The Hidden Bloat in Machine Learning Systems
April 25 Oliver Kennedy Draupnir: A toolkit for scalable, easy-to-write declarative compilers
May 2 TBD
May 9 TBD
May 16 Amanda Xu
May 23 Abtin Molavi
May 30 Achilles Benetopoulos
June 6 TBD

April 4

Speaker: Shadaj Laddad

Title: Hydro: Modular, Correct, and Performant Distributed Systems

Abstract: We’ve been stuck with programming models for distributed systems that force developers to individually program a fleet of sequential programs and manually reason about global correctness, or leverage frameworks that lock the developer into global scaling and fault tolerance decisions. Choreographic programming gives us a step in the right direction, by letting developers write distributed programs while avoiding opaque network boundaries. But it has significant limitations when it comes to reasoning about concurrent executions. In this talk, I’ll present the key ideas behind Hydro, a Rust framework for general-purpose distributed systems. I’ll show how Hydro leverages stream-choreographic programming to offer low-level control with strong correctness guarantees.

Bio: Shadaj is a final-year PhD student co-advised by Alvin Cheung and Joe Hellerstein in the Sky Computing Lab at UC Berkeley. He designs programming paradigms that make distributed systems modular, correct, accessible, and performant. His research spans foundational language design, program optimization, and formal methods. Shadaj is a co-organizer of the SF Systems Club, the largest systems-oriented meetup in the Bay Area, and loves to play the sitar in his spare time.

April 11

Speaker: Eva Graversen

Title: A Theory of Choreographic Programming

Abstract: Choreographic programming promises a simple approach to the coding of concurrent and distributed systems: write the collective communication behavior of a system of processes as a choreography, and then the programs for these processes are automatically compiled by a provably-correct procedure known as endpoint projection. While this promise prompted substantial research, a theory that can deal with realistic communication failures in a distributed network remains elusive.

In this talk, I will provide a theory of choreographic programming that addresses realistic communication failures taken from the literature of distributed systems: processes can send or receive fewer messages than they should (send and receive omission), and the network can fail at transporting messages (omission failure). This theory supports the programming of strategies for failure recovery, and a novel static analysis (called robustness) to check for delivery guarantees (at-most-once and exactly-once).

A key technical innovation is a deconstruction of the usual communication primitive in choreographies to allow for independent implementations of the send and receive actions of a communication, while still retaining the static guarantee that these actions will correlate correctly (the essence of choreographic programming). This has two main benefits. First, each side of a communication can adopt its own failure recovery strategy, as in realistic protocols. Second, initiating new communications does not require any (unrealistic) synchronization over unreliable channels: senders and receivers agree by construction on how each message should be identified.

Bio: Eva Graversen is a Postdoc at Tallinn University of Technology working on theoretical foundations of choreographic programming. Her PhD was under Iain Phillips and Nobuko Yoshida at Imperial College London, working on concurrent reversible computing.

April 18

Speaker: Huaifeng Zhang

Title: The Hidden Bloat in Machine Learning Systems

Abstract: In this talk, we explore the problem of bloat in machine learning (ML) systems and present key findings from our study. We begin by analyzing bloat in ML containers, which are widely used in modern cloud computing environments. Next, we examine code bloat in shared libraries of popular ML frameworks such as TensorFlow and PyTorch. We introduce our approaches for debloating both ML containers and shared libraries. Our study reveals that both ML containers and shared libraries include a substantial amount of unnecessary files and code, which bloat their size and lead to performance degradation, resource waste, and increased security vulnerabilities.

Bio: Huaifeng Zhang is a Ph.D. student in the Department of Computer Science at Chalmers University of Technology, advised by Prof. Ahmed Ali-Eldin Hassan. His research interests include software engineering, machine learning systems, cloud computing and blockchain. He also held positions at ByteDance and Google.

April 25

Speaker: Oliver Kennedy

Title: Draupnir: A toolkit for scalable, easy-to-write declarative compilers

Abstract: Compilers and program analysis tools are defined by sets of rules, or productions, that govern how code is translated, optimized, or analyzed. Production rules are compositional, allowing compiler authors to subdivide a global objective into smaller-scale local logical units that are easier to reason about. For example, a compiler author writing an optimizer might specify a rule that identifies if-then-else statements, where the condition is trivially tautological, and replaces it withthe body of the then clause. When compiling (resp., optimizing, analyzing) a block of code, the compiler (or optimizer, or program analysis tool) iteratively selects an appropriate production rule, applies it, and repeats.

The simplicity of local reasoning with production rules comes at the cost of redundant computation. To improve compile times, many production compilers adopt non-idiomatic design patterns, such as merging production rules or tracking non-local properties manually. While necessary for performance, such anti-patterns increase code complexity and make the code harder to maintain. In this talk, I will introduce Draupnir, a new compiler implementation framework based around a database engine. Draupnir allows compiler authors to continue expressing compiler logic through normal production rules, but approaches compilation, optimization, and analysis as forms of query processing. This shift in perspective allows Draupnir to leverage techniques from the database community, including Incremental View Maintenance (SIGMOD 2020), multi-query optimization (in-progress), optimization for the memory hierarchy (in-progress), and distribution (in-progress) to achieve orders of magnitude improvement in compile times, while simultaneously simplifying the logic of the compiler.

Bio: Oliver Kennedy is an associate professor at the University at Buffalo. He earned his PhD from Cornell University in 2011 and now leads the Online Data Interactions (ODIn) lab, which operates at the intersection of databases and programming languages. Oliver is the recipient of an NSF CAREER award, an IEEE Region 1 Technological Innovation Award, UB’s Exceptional Scholar Award, and several UB SEAS teaching awards. Oliver is also one of the founding board members of Breadcrumb Analytics. Several of Oliver’s papers have been invited to “Best of” compilations from SIGMOD and VLDB. The ODIn lab is currently exploring (i) how we can leverage database techniques like incremental view maintenance to make compilers faster, (ii) how to make it easier for data scientists to track how sources of uncertainty, ambiguity, and/or bias affect analyses, and (iii) how to streamline the interfaces — both human and software — between different tools for data science, like python, sql, and spreadsheets.

May 2

Speaker: TBD

Title: TBD

Abstract: TBD

Bio: TBD

May 9

Speaker: TBD

Title: TBD

Abstract: TBD

Bio: TBD

May 16

Speaker: Amanda Xu

Title: TBD

Abstract: TBD

Bio: TBD

May 23

Speaker: Abtin Molavi

Title: TBD

Abstract: TBD

Bio: TBD

May 30

Speaker: Achilles Benetopoulos

Title: TBD

Abstract: TBD

Bio: TBD

June 6

Speaker: TBD

Title: TBD

Abstract: TBD

Bio: TBD

Archive