Languages, Systems, and Data Seminar (Fall 2024)

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper, Tyler Sorensen, Gan Shen, and Reese Levine

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For fall 2024, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
Oct 4	Jinsheng Ba	Testing Database Engines via Query Plans
Oct 11	Jonathan Castello	Inductive diagrams for causal reasoning
Oct 18	Bastian Köpcke	Descend: A Safe Imperative GPU Programming Language
Oct 25	Haofan Zheng	Decentagram: Highly-Available Decentralized Publish/Subscribe Systems
Nov 1	Justin Lubin	Programming By Navigation
Nov 8	Bhakti Shah	Proof Visualization for Graphical Structures
Nov 15	Federico Mora	Automated Reasoning About Distributed Systems
Nov 22	Haining Tong	Towards Unified Analysis of GPU Consistency
Dec 6	Yanan Guo	Behind the Pixels: Unveiling GPU Hardware and Software Security Flaws

Oct 4

Speaker: Jinsheng Ba

Title: Testing Database Engines via Query Plans

Abstract: Database Management Systems (DBMSs) are fundamental software systems that store, maintain, and retrieve data. They are used in almost every personal computer, mobile device, and server. Therefore, it is important to find bugs before they incur severe consequences. Automatic testing is an efficient and effective technique to find crash bugs, which terminate DBMSs, but is struggling to detect logic bugs and performance issues. Logic bugs refer to incorrect results, while performance issues refer to unexpected slow performance. Unlike crash bugs, both categories of bugs do not terminate DBMSs and are hard to observe by existing automatic testing methods. Triggering them requires valid test cases, which are also challenging to generate automatically. In my PhD thesis, I advance automatic testing to efficiently find logic bugs and performance issues in DBMSs. My approaches are united by the idea of leveraging query plans, which are internal representations of how a DBMS executes a query, for automatically testing DBMSs. I put forward the following thesis statement: Query plans allow efficient and effective testing of DBMSs by providing internal execution information. I propose four research works to utilize query plans for testing. First, to detect performance issues, I propose Cardinality Estimation Restriction Testing (CERT), which inspects estimated cardinalities in query plans without measuring execution time. Second, to identify logic bugs, I propose Differential Query Plans (DQP), which inspects the result consistency of multiple query plans of the same query. Third, to generate diverse test cases for exploring target systems thoroughly, I propose Query Plan Guidance (QPG) for guiding the test case generation process towards diverse query plans. Last, observing that query plans cannot be conveniently used as they are exposed in various DBMS-specific representations, I propose a Unified query plan representation (Uplan) based on an empirical study aiming to reduce the effort of building applications based on query plans. Since most DBMSs—including commercial ones—expose query plans to the user, I consider CERT, DQP, QPG, and Uplan generally applicable, black-box approaches for finding logic bugs, performance issues, and building applications on query plans. These methods are effective as they found more than 100 unique and previously unknown bugs in several widely used DBMSs. I view these results as a step towards more reliable DBMSs, and expect this statement of utilizing query plans for testing can be widely adopted to tackle more problems, such as test suite evaluation, debugging deployed DBMSs, and optimization checking.

Bio: Jinsheng Ba is a postdoctoral researcher in the Advanced Software Technologies (AST) Lab at ETH Zurich, mentored by Prof. Zhendong Su. He received his Ph.D. degree in Computer Science from National University of Singapore (NUS) in 2024, advised by Dr. Manuel Rigger. His research interests include software testing, security, and software engineering. His work has been recognized with Distinguished Paper Award twice at ASE 2022 and ICSE 2023. He is also a recipient of NUS Dean’s Graduate Research Excellence Award.

Oct 11

Speaker: Jonathan Castello

Title: Inductive diagrams for causal reasoning

Abstract: Causality diagrams are used throughout academia and industry to informally reason about the possible executions of distributed and concurrent systems. Their formal counterparts, what we’ll call Lamport executions, have been used since at least Lamport’s 1978 introduction of logical clocks. However, Lamport executions are not quite as visceral a model for humans as causality diagrams. This mismatch is exacerbated in modern proof assistants, which are geared toward compositional reasoning over inductive data. In this talk, we’ll present a new formal representation of executions that is both inductively-structured and more similar to the informal diagrams we started with.

Bio: Jonathan is a third-year Ph.D. student at UC Santa Cruz. His research focuses on treating concurrency as something to be started from, rather than something to be introduced to an existing system. This perspective leads to interests in distributed systems, network protocols, and low-level hardware, among other threads(!). In his spare time, he nurtures a continued fascination with all things space.

Oct 18

Speaker: Bastian Köpcke

Title: Descend: A Safe Imperative GPU Programming Language

Abstract: Programming massively parallel hardware such as Graphics Processing Units (GPU) is challenging to get right. Programmers must correctly and efficiently coordinate thousands of threads and their accesses to various shared memory spaces. Existing mainstream GPU programming languages, such as CUDA and OpenCL, are based on C/C++ inheriting their fundamentally unsafe ways to access memory via raw pointers. This facilitates easy to make, but hard to detect bugs, such as data races and deadlocks.

In this talk, I will present Descend, our approach to a safe GPU programming language. In contrast to prior safe high-level GPU programming approaches, Descend is an imperative low-level GPU programming language in the spirit of Rust, enforcing safe CPU and GPU memory management in the type system by tracking Ownership and Lifetimes. Descend introduces a new holistic GPU programming model where computations are hierarchically scheduled over the GPU’s execution resources: grid, blocks, warps, and threads. Descend’s extended Borrow checking ensures that execution resources safely access memory regions without data races. For this, we introduced views describing safe parallel access patterns of memory regions, as well as atomic variables. For memory accesses that cannot be checked by our type system, users can annotate limited code sections as unsafe.

I will discuss the memory safety guarantees offered by Descend at the hands of examples, take a brief look at the formal type system, and show an evaluation of the current implementation using multiple benchmarks, demonstrating that the approach that we take with Descend is capable of expressing real-world GPU programs showing competitive performance compared to manually written CUDA programs that are lacking Descend’s safety guarantees.

This talk will be based on work that appeared at PLDI 2024: https://dl.acm.org/doi/10.1145/3656411

Bio: Bastian is a Computer Science PhD student in the Parallel and Distributed Systems research group at the University of Münster, working closely with Michel Steuwer’s programming language and compiler group at TU Berlin. He is interested in programming languages, verification and compiler techniques to enable safer and more productive programming of parallel hardware.

Oct 25

Speaker: Haofan Zheng

Title: Decentagram: Highly-Available Decentralized Publish/Subscribe Systems

Abstract: In this talk, we will present Decentagram, a decentralized framework for data dissemination using the publish/subscribe messaging model. Decentagram uses blockchain smart contracts to authenticate events that will be published using digital signatures or self-attestation certificates from code running in trusted execution environments (TEEs), both of which are verified on-chain. This approach permits any host with valid credentials to publish verified updates, increasing decentralization and availability of the system as a whole by simplifying compensation and incentivization, even for untrusted hosts running TEEs. Decentagram also supports on-chain subscribers where third-party contracts receive events immediately: within the same transaction as the published event. The same event will also be delivered to off-chain subscribing applications through an off-chain event broker. We provide an open-source implementation of Decentagram, and evaluate the gas cost of its on-chain components and the end-to-end latency of its off-chain component.

This presentation is based on the work by Haofan Zheng, Tuan Tran, Roy Shadmon, and Owen Arden, and was published at DSN 2024: https://doi.org/10.1109/DSN58291.2024.00037

Bio: Haofan is a Computer Science Ph.D. candidate at UC Santa Cruz, conducting research in the Decent Lab and the Languages, Systems, and Data (LSD) Lab. His work focuses on Trusted Execution Environments (TEEs) and blockchain technologies, exploring the intersection of these areas to enhance security in distributed systems. Haofan’s research aims to develop a framework for building secure distributed applications that ensures confidentiality, integrity, and availability by leveraging secure enclaves and blockchain.

Nov 1

Speaker: Justin Lubin

Title: Programming By Navigation

Abstract: Most commonly, a program synthesizer works by (1) requiring a specification from the user and (2) handing back a program that satisfies the specification, if one exists. Sometimes, this program is the smallest such program. Sometimes, it’s chosen via a user-defined cost function. And, sometimes, it’s just the one that came out of the synthesizer first!

But what should we do if we want to hand control back to the user and let the user pick from among all the valid solutions—what we call the Particular Program Task? Can we solve it without just returning a stream of all valid programs? Can we solve it even if there are infinitely many programs to consider?

To do so, we introduce the Programming By Navigation paradigm, which is based on a novel program synthesis problem that roughly states: Given a work-in-progress program, return all (and only) the valid next steps. We solve the Programming By Navigation synthesis problem with a new algorithm that turns information from a type inhabitation oracle (in the style of classical logic) into a concrete program (in the style of constructive logic). By structuring our algorithm as such, we can use an off-the-shelf Datalog engine as our type inhabitation oracle and reap the wealth of research and engineering efforts that have gone into modern Datalog engines for free. In the process, classic program synthesis optimizations such as observational equivalence pruning and memoization fall out naturally from existing Datalog optimizations. Based on preliminary results, our new algorithm solves the Particular Program Task faster than baselines.

More broadly, I’ll talk about our deep, ongoing collaboration with experimental biologists that led to the design of Programming By Navigation.

Bio: Justin is a PhD candidate in computer science advised by Sarah E. Chasins at UC Berkeley. His main research interest is co-designing programming systems with domain experts to empower them to write the code they need with autonomy. As part of this process, he collaborates closely with the Nuñez Lab at UC Berkeley, a group of experimental biologists who blend CRISPR-based genome and epigenome editing, functional genomics, cell biology, and biochemistry to understand the regulatory principles of the human genome.

Nov 8

Speaker: Bhakti Shah

Title: Proof Visualization for Graphical Structures

Abstract: Reasoning about graphical languages in a proof assistant can be hard. Canonical diagrammatic representations are optimized for readability, and may abstract away details irrelevant to a pen-and-paper proof, but these details are often important to a proof assistant. We develop a methodology and library for working with graphical languages associated with classes of categories, and equip it with an automated visualizer integrated with the Coq proof assistant, enabling diagrammatic reasoning. We instantiate this with the ZX-calculus, a language for quantum computation, as an example of specialized diagrammatic reasoning.

Bio: Bhakti Shah is a PhD student at the University of St. Andrews, advised by Edwin Brady, working on the interoperability of proof systems. She was previously at the University of Chicago, advised by Robert Rand, working on reasoning about diagrammatic languages in proof assistants. Her research interests broadly span the usability of interactive proof assistants.

Nov 15

Speaker: Federico Mora

Title: Automated Reasoning About Distributed Systems

Abstract: Designing and implementing distributed systems is still an enormous task for software engineers. Much of this challenge stems from the fact that bugs can arise from complex combinations of machine failures and message interleavings that are difficult for humans to reason about manually. As distributed systems become increasingly critical infrastructure, engineers will need more and more computational support to correctly build and deploy them.

In this talk, I will present three automated reasoning techniques that can help software engineers build and deploy correct distributed systems. First, a domain-specific verification engine for message-passing distributed systems. Second, a decision procedure for a relevant fragment of logic. Third, a new approach for semi-automatically modelling distributed systems in a formal language.

Bio: Federico Mora is a PhD candidate at the University of California, Berkeley, where he is advised by Sanjit A. Seshia. Federico’s research focuses on designing and implementing languages for automated reasoning. His studies have been supported by a Qualcomm Innovation Fellowship and three internships at Amazon Web Services, where he has applied many of the ideas described in his thesis.

Nov 22

Speaker: Haining Tong

Title: Towards Unified Analysis of GPU Consistency

Abstract: After more than 30 years of research, there is a solid understanding of the consistency guarantees given by CPU systems. Unfortunately, the same is not yet true for GPUs. The growing popularity of general purpose GPU programming has been a call for action which industry players like NVIDIA and Khronos have answered by formalizing their PTX and Vulkan consistency models. These models give precise answers to questions about program’s correctness. However, interpreting them still requires a level of expertise that escapes most developers, and the current tool support is insufficient. To remedy this, we translated and integrated the PTX and Vulkan models into the Dartagnan verification tool. This makes Dartagnan the first analysis tool for multiple GPU consistency models that can analyze real GPU code. During the validation of the translated models, we discovered two bugs in the original PTX and Vulkan consistency models.

Bio: Haining is a Computer Science Ph.D. student advised by Keijo Heljanko at the University of Helsinki. His work focuses on using formal verification techniques to analyze GPU memory consistency models.

Dec 6

Speaker: Yanan Guo

Title: Behind the Pixels: Unveiling GPU Hardware and Software Security Flaws

Abstract: Today’s computing systems face significant security challenges. While security researchers have extensively studied CPU vulnerabilities, GPUs–an increasingly essential component of modern computing systems–have received much less attention. In this talk, I will present two of my recent studies that help fill in this blank. In the first part, I will demonstrate how hardware designs in modern GPUs can enable practical and robust cache side channels. In the second part, I will discuss our findings on GPU memory management systems and their implications for GPU memory safety. Finally, I will briefly introduce countermeasures to these vulnerabilities and outline directions for future research in GPU security.

Bio: Yanan Guo joins us as an Assistant Professor. Her research interests lie in computer architecture and cybersecurity, with a goal of building secure, high-performance computing systems. Specifically, she studies microarchitectural side channels, memory exploitation, machine learning security, as well as GPU systems and architectures. She has multiple CPU/GPU side-channel attacks. Her study on prefetch-based side channels was nominated for the Top Picks in Hardware and Embedded Security 2023.