Languages, Systems, and Data Seminar (Fall 2021)

Time: Fridays, noon - 1:15pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper and Tyler Sorensen

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For fall 2021, we will continue to host the LSD Seminar on Zoom, but local folks can gather in person in the lab to tune into the Zoom talks together.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date Speaker Title
Sept. 24 None Welcome!
Oct. 1 Maria Schett Cheaper (& correct) blockchain protocols and programs
Oct. 8 Tanvir Ahmed Khan Rescuing Data Center Processors
Oct. 15 Elizabeth Dinella Data Driven Program Merge
Oct. 22 Vaishnavi Sundararajan Better Safe than Sorry: Symbolic Verification for Security Protocols
Oct. 29 Jenna Wise TBD
Nov. 5 Thomas Koehler TBD
Nov. 12 Giulia Guidi TBD
Nov. 19 Hayley LeBlanc TBD
Dec. 3 Sydney Gibson TBD

Sept. 24


Oct. 1

Speaker: Maria Schett

Title: Cheaper (& correct) blockchain protocols and programs

Abstract: We will explore how we can make it cheaper to run blockchain protocols and programs—while guaranteeing correctness!

For cheaper protocols, we move from a blockchain to a blockgraph. Such a blockgraph, which we show to be a reliable point-to-point link, is built together by a set of servers. Thereby, the servers embed many parallel runs of a deterministic protocol into it. And because the protocol is deterministic, every server can locally replay the protocol—and no more messages need to be sent [1]!

For cheaper programs, we super-optimize bytecode of the Ethereum Virtual Machine, which runs on many servers to execute programs on the blockchain. Now, to avoid the halting problem, every executed bytecode costs money. So we have a clear optimization target—money—and we use an automated theorem prover to synthesize cheaper, observationally equivalent, programs [2].

Bio: Maria A Schett is currently finishing her PhD on “Cost Reduction With Guarantees: Formal Reasoning Applied To Blockchain Technologies” at University College London. She loves living in London and talking about blockchains, distributed systems, and compilers. She is conversational in term rewriting and qualitative research (from a previous life) and in Esperanto.

[1] “Embedding a Deterministic BFT Protocol in a Block DAG”, Maria A Schett and George Danezis, Proc. of the 2021 ACM Symposium on Principles of Distributed Computing (PODC) 2021

[2] “Synthesis of Super-Optimized Smart Contracts using Max-SMT”, Elvira Albert, Pablo Gordillo, Albert Rubio and Maria A Schett, Proc. of 32nd International Conference on Computer-Aided Verification (CAV) 2020

Oct. 8

Speaker: Tanvir Ahmed Khan

Title: Rescuing Data Center Processors

Abstract: To serve billions of users around the world, modern web applications that run across data centers access huge datasets and perform complex application logic. As a result, data center applications face two major challenges: (1) poor data access behavior and (2) poor instruction access behavior. In my research, I demonstrate that novel hardware-software codesign effectively solves both challenges. Specifically, I observe that both data and instruction accesses in data center applications follow a deeply repetitive pattern that can be efficiently optimized by profiling the application’s program flow behavior. In this talk, I will first present an overview of my techniques to improve data and instruction accesses. I will then describe two of my techniques in detail, showing how these techniques outperform prior proposals from Google.

Bio: Tanvir Ahmed Khan is a fifth-year Ph.D. candidate at the University of Michigan. His research interests lie at the intersection of computer architecture, compilers, and operating systems. He is interested in designing techniques at the boundary of hardware and software that enable software to better leverage hardware resources. He was a Facebook Fellowship (2020) finalist. His research on data center applications’ performance optimizations has appeared in top computer architecture and systems venues like ISCA, MICRO, PLDI, and OSDI. His work is being used by Intel and ARM to design the next-generation CPU architectures. Tanvir expects to graduate soon and after graduation, he is interested in tenure-track faculty positions.

Oct. 15

Speaker: Elizabeth Dinella

Title: Data Driven Program Merge

Abstract: In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. The most widespread merge algorithm, employed in modern version control systems, is an unstructured merge. Unstructured merge treats input programs as a sequence of characters and reports a conflict when changes interfere textually. On the other hand, structured approaches leverage knowledge of the underlying language to perform “intelligent” merges over the program’s AST. These approaches have shown significant advancements in automatic merging for Java. However, these methods have not found their way into modern version control systems as they are typically language dependent and do not generalize to dynamic languages such as JavaScript.

In a 1991 seminal paper on program merge [1], Westfechtel voiced: “A tool is urgently needed which automates the process of merging as much as possible.” Thirty years later, and there have not been significant advances in the prevailing merge algorithm - unstructured merge.

In this work, we take a fresh data-driven look at the problem of merge resolution. Guided by a multilingual dataset of over 200,000 merges, we explore deep learning techniques to resolve unstructured merge conflicts. Our key innovation is an edit-aware embedding of merge inputs.

Our resulting model achieves 63–68% accuracy of merge resolution synthesis, yielding nearly a 3x performance improvement over existing structured merge tools. Finally, we demonstrate that our approach is sufficiently flexible to work with source code files in Java, JavaScript, TypeScript, and C# programming languages, and can generalize zero-shot to unseen languages.

[1] Bernhard Westfechtel. 1991. Structure-oriented merging of revisions of software documents. In Proceedings of the 3rd international workshop on Software configuration management (SCM ‘91).

Bio: Elizabeth Dinella is a fourth-year PhD student at the University of Pennsylvania. Her research interests can be broadly described as “data-driven program reasoning.” Most recently, Elizabeth has worked on data-driven bug finding (Hoppity), program merge, and test oracle inference. She is an admirer of coffee and her chow chow puppy Cinnabon.

Oct. 22

Speaker: Vaishnavi Sundararajan

Title: Better Safe than Sorry: Symbolic Verification for Security Protocols

Abstract: A large part of our lives is spent online, creating, sharing, and consuming data in various forms. We use a profusion of security protocols to handle this more-often-than-not sensitive data, and connect to the internet and to the world at large. How secure are these communications? How private? These are fundamental questions to which we demand definitive answers. Even with highly sophisticated methods of testing, one cannot get unconditional answers, and this is where formal verification steps in. One can use logic or automata or similar formal methods to conclusively tell us if there are security flaws in these protocols. In this talk, we will provide a quick and dirty introduction to the world of symbolic verification, with a focus on security protocols. We will then look at a few ways in which the protocols of today’s day and age are modelled and verified.

Bio: Vaishnavi Sundararajan is a postdoc at UCSC. She completed her PhD at Chennai Mathematical Institute, with a thesis on formal aspects of certification in security protocols. She is interested in the areas of logic, security, proof theory and verification. Before UCSC, she was a postdoc at IRISA, Rennes, and a Research Associate at Ericsson Research, Bengaluru.

Oct. 29

Speaker: Jenna Wise

Title: TBD

Abstract: TBD

Nov. 5

Speaker: Thomas Koehler

Title: TBD

Abstract: TBD

Nov. 12

Speaker: Giulia Guidi

Title: TBD

Abstract: TBD

Nov. 19

Speaker: Hayley LeBlanc

Title: TBD

Abstract: TBD

Dec. 3

Speaker: Sydney Gibson

Title: TBD

Abstract: TBD