Languages, Systems, and Data Seminar (Fall 2022)

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper and Tyler Sorensen

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For fall 2022, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
Sept. 23	None	Lab social event and introductions
Sept. 30	Andrew Quinn	Debugging the OmniTable Way
Oct. 7	Liting Hu	Stream Processing Systems for Emerging Trends
Oct. 14	Gan Shen	Modeling and Testing for MicroBus
Oct. 21	Tania Lorido Botran	Powering data centers the smart way
Oct. 28	Priyanka Mondal	Applying consensus and replication securely with FLAQR
Nov. 4	Yahui Song	Automated Temporal Verification with Extended Regular Expressions
Nov. 11	No meeting (Veterans Day)
Nov. 18	Abel Nieto	Modular Verification of Op-Based CRDTs in Separation Logic
Nov. 25	No meeting (Thanksgiving)
Dec. 2	Ishita Chaturvedi	GhOST: a GPU Out-of-Order Scheduling Technique

Sept. 30

Speaker: Andrew Quinn

Title: Debugging the OmniTable way

Abstract: Debugging is time-consuming, accounting for roughly 50% of a developer’s time. To identify the cause of a failure, a developer usually tracks the state of their program as it executes on a failing input. Unfortunately, most debugging tools make it difficult for a developer to specify the program state that they wish to observe and computationally expensive to observe execution state. Moreover, existing work to improve our debugging tools often restrict the state that a developer can track by either exposing incomplete execution state or requiring manual instrumentation.

In this talk, I will describe our new debugging approach based on the OmniTable abstraction, which captures all state reached during an execution as a large queryable data table. Our query model built around the OmniTable abstraction supports SQL to simplify debugging without restricting the state that a developer can observe: I’ll show that OmniTable debugging queries are more succinct than equivalent logic specified using existing tools.

Then, I’ll describe how our prototype, SteamDrill, accelerates debugging by taking advantage of how an OmniTable decouples debugging logic from the original execution. Namely, the system employs lazy materialization: it uses deterministic record/replay to store the execution associated with each OmniTable and resolves queries by inspecting replay executions. It employs a novel multi-replay strategy that partitions query resolution across multiple replays and a parallel resolution strategy that simultaneously observes state at multiple points-in-time. I’ll show that SteamDrill queries are an order-of-magnitude faster than existing debugging tools.

Oct. 7

Speaker: Liting Hu

Title: Stream Processing Systems for Emerging Trends

Abstract: Stream processing is proposed and popularized as a “technology like Hadoop but can give you results faster”, which lets users query a continuous data stream and quickly get results within a very short time period from the time of receiving the data. For that reason, stream processing technology has become a critical building block of many applications, such as making business decisions from marketing streams, identifying spam campaigns from social network streams, predicting tornados and storms from radar streams, and analyzing genomes in different labs and countries to track the sources of a potential epidemic. However, state-of-art solutions have dominantly centered around stateless stream processing, leaving another urgent trend—stateful stream processing—much less explored. A driving need is that the future stream applications need to store and update state along with their processing, and process live data streams in a timely fashion from massive and geo-distributed data sets. In this talk, I will present a next-generation geo-distributed scalable stateful stream processing system. (1) At the architecture layer, I will introduce a decentralized “many masters/many workers” architecture that improves the scalability of stream processing systems. (2) At the mechanism layer, I will present a fragment-based parallel recovery mechanism that recovers large distributed states by leveraging erasure codes. Finally, I will outline the emerging trends for developing stream processing systems.

Bio: Liting Hu received her Ph.D. degree in computer science from Georgia Institute of Technology in 2016. She served as assistant professor of computer science at Virginia Tech in the 2021–22 academic year. Before that, she served as assistant professor in the school of computing and information sciences at Florida International University from 2017 to 2021. Liting conducts research on experimental computer systems, including stream processing systems, cloud and edge computing, distributed systems, and operating systems virtualization. Liting has received an NSF CAREER Award, an NSF SPX Award, an NSF OAC Award, a Meta Faculty Research Award, and a Cyber Florida Seed Award.

Oct. 14

Speaker: Gan Shen

Title: Modeling and Testing for MicroBus

Abstract: MicroBus is a newly developed replication service at AWS that provides consistency, durability, and fault tolerance guarantees. Both modeling and testing have been employed to ensure its correctness and have shown some success. Nevertheless, they all have shortcomings, with modeling not connected with the implementation and testing only covering limited scenarios. To solve this, we propose model-based testing as a hybrid approach that combines the best of the two worlds.

In this talk, I will first give a gentle introduction to the MicroBus protocol/implementation and the pros and cons of modeling and testing. Then I will explain the model-based testing approach and the benefits it brings on to the table.

(This project was done during my internship in the Automated Reasoning Group at AWS summer 2022)

Bio: Gan Shen is a 3rd-year PhD student at UC Santa Cruz, working with Prof. Lindsey Kuper in the LSD lab. His research area is programming language theory, in particular, how to design new programming tools/abstractions that facilitate software development.

Oct. 21

Speaker: Tania Lorido Botran

Title: Powering data centers the smart way

Abstract: Infrastructure powering current workloads is very heterogeneous: central data centers with various hardware accelerators, edge data centers with constraint resources and a variety of client devices with different capabilities. Orchestrating all these players can be a challenge and this talk will touch on how ML (or DL) can actually help. First, we will deep-dive into two projects: we will explore the use of Reinforcement learning for optimal container placement in edge data centers, and also visit an unsupervised approach for anomaly detection in containers, in particular, the noisy neighbor effect. After that, we will briefly learn about some on-going projects around autoscaling for ML pipelines, performance degradation detection in serverless functions, bottleneck detection with GANs and high-level sustainability scores for latency sensitive/critical tasks.

Bio: Dr. Tania Lorido-Botran is a Research Scientist at Roblox. Prior to that, she worked at Microsoft and the Pacific Northwest National Laboratory. During her PhD, she had the opportunity to spend one year at Rice University and also did two internships at VMware and HP Labs. Dr. Lorido Botran received her PhD from the University of Deusto in Spain with a Cum Laude distinction, and her master’s degree in Distributed systems from University of the Basque Country with a highest marks distinction. Her current research interests span across ML for systems, data center sustainability and fault-tolerance.

Oct. 28

Speaker: Priyanka Mondal

Title: Applying consensus and replication securely with FLAQR

Abstract: Availability is crucial to the security of distributed systems, but guaranteeing availability is hard, especially when participants in the system may act maliciously. Quorum replication protocols provide both integrity and availability: data and computation is replicated at multiple independent hosts, and a quorum of these hosts must agree on the output of all operations applied to the data. Unfortunately, these protocols have high overhead and can be difficult to calibrate for a specific application’s needs. Ideally, developers could use highlevel abstractions for consensus and replication to write fault tolerant code that is secure by construction. This paper presents Flow-Limited Authorization for Quorum Replication (FLAQR), a core calculus for building distributed applications with heterogeneous quorum replication protocols while enforcing end-to-end information security. Our type system ensures that well-typed FLAQR programs cannot fail (experience an unrecoverable error) in ways that violate their typelevel specifications. We present noninterference theorems that characterize FLAQR’s confidentiality, integrity, and availability in the presence of consensus, replication, and failures, as well as a liveness theorem for the class of majority quorum protocols under a bounded number of faults.

Bio: Priyanka Mondal is a PhD student at the University of California Santa Cruz, where she is supervised by Professor Owen Arden and Professor Ioannis Demertzis. Her research areas include Applied Cryptography, Programming Languages and Security in Distributed Systems. Prior to joining UCSC she worked as a software engineer at Citrix R&D Bangalore.

Nov. 4

Speaker: Yahui Song

Title: Automated Temporal Verification with Extended Regular Expressions

Abstract: Existing approaches to temporal verification have either sacrificed modularity in favor of achieving automation or vice-versa. To exploit the best of both worlds, we present a new framework to ensure temporal properties via Hoare-style verifiers and term rewriting systems (TRSs).

The leading technique of temporal verification is automata-based model checking, which has possible insufficiencies: (i) it requires a manual modeling stage and needs to be bounded when encountering non-terminating traces; (ii) to conveniently deploy existing inclusion-checkers for automata, the expressiveness power is limited by the finite-state automata; and (iii) there is always a gap between the verified logic and the actual implementation from the program.

To tackle these issues, we propose a framework that conducts local temporal verification, which leads to a modular and compositional verification strategy, where temporal reasoning can be combined to reason about the overall program. Meanwhile, we propose various effect logics to be the temporal specification languages, which are essentially extended regular expressions (REs), pushing the expressiveness boundary of the most deployed linear temporal logic (LTL). Furthermore, the proposed framework devises purely algebraic TRS to check the inclusions for the novel logics, avoiding the complex translation into automata.

We demonstrate the applicability of the proposed framework and various REs-based temporal logics in different domains; presents the corresponding prototype systems, case studies, experimental results, and necessary proofs.

Bio: Yahui Song is a 5th year PhD Student at the National University of Singapore (NUS), supervised by Associate Professor Chin Wei Ngan. Her interests are automated program verification and programming language design. Her current works are dedicated to more expressive and efficient temporal verification.

Nov. 18

Speaker: Abel Nieto

Title: Modular Verification of Op-Based CRDTs in Separation Logic

Abstract: Commutative Replicated Data Types (CRDTs) are a family of distributed data structures where all operations are designed to commute, so that replica states eventually converge. Additionally, CRDTs require that operations be propagated between replicas in causal order. This talk presents a framework for verifying safety properties of OCaml CRDT implementations using separation logic. The framework consists of two libraries. One implements a Reliable Causal Broadcast (RCB) protocol so that replicas can exchange messages in causal order. A second OpLib library then uses RCB to export a CRDT builder interface that simplifies the creation (and correctness proofs) of new CRDTs. OpLib allows clients to implement new CRDTs as purely-functional data structures, without having to reason about network operations, concurrency control and mutable state, and without having to re-implement causal broadcast each time. Using OpLib, we have implemented 12 sample CRDTs from the literature, including multiple versions of replicated registers and sets, as well as two CRDT combinators for products and maps. Our proofs are conducted in the Aneris distributed separation logic.

Bio: I’m a 3rd (and final) year PhD student at Aarhus University, working with Lars Birkedal on verification of distributed systems using the Aneris distributed separation logic. In particular, my work focuses on reasoning about systems that are highly-available but weakly consistent.

Dec. 2

Speaker: Ishita Chaturvedi

Title: GhOST: a GPU Out-of-Order Scheduling Technique

Abstract: Despite supporting fast context switching among many available threads to hide latency, GPU applications still suffer from stalls. To reduce these stalls, researchers have proposed several out-of-order (OoO) execution methods for GPUs. While these techniques improve performance, they use complex hardware to support register renaming, memory instruction reordering, or make invasive changes to the ISA to enable the compiler to provide hints to the hardware. In this talk, I will present GhOST, the first cost-effective OoO technique for GPUs. In an in-order GPU, a warp scheduler selects a single instruction to issue by considering only the oldest instruction from each context. GhOST introduces OoO execution by having the warp scheduler instead consider the instruction from each context’s instruction buffer deemed most likely to issue, while preserving sequential semantics. The architecture of GhOST allows it to elide register renaming and reordering memory instructions in the instruction buffer, without sacrificing performance. Thus, GhOST executes memory instructions sequentially, removing any need for dependence analysis. Since GhOST is a hardware only technique, which makes no modifications to the ISA, it can run binaries directly compiled using CUDA for all benchmarks. On the Nvidia Titan V (T) and RTX 2060 Super ® GPUs, GhOST delivers 20.0% (T) and 21.8% ® geomean speedup across 30 benchmarks, with a 0.05% increase in area. As future work, we will look at using the compiler to reorder instructions to further boost the OoO performance of GhOST.~

Bio: Ishita Chaturvedi is a third-year Ph.D. student at Princeton University, working with Prof. David August in the Liberty Research group. Her research interests lie in computer architecture and high-performance computing. More specifically, she is working on efficient out-of-order execution on GPUs.