Languages, Systems, and Data Seminar (Spring 2021)

Time: Fridays, noon - 1:15pm (PT)
Location: The Internet
Organizers: Lindsey Kuper and Tyler Sorensen

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating students should register for the 2-credit course CSE 280O.

For spring 2021, this seminar is completely virtual and will feature a mix of internal and external speakers.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
April 2	Aldrin Montana	Running the Trails of Data Management
April 9	Michael Arntzenius	PL design and programming by voice
April 16	Achilles Benetopoulos	HARP: Eliminating Software Supply-Chain Vulnerabilities via Active Learning and Regeneration of String Processing Libraries
April 23	Konstantinos Kallas	PaSh: A parallelizing shell
April 30	Eric Atkinson	Programming and Reasoning with Partial Observability
May 7	Vadim Zaliva	HELIX: From Math to Verified Code
May 14	Benjamin Valpey	Discovering the Semantics of NVIDIA’s SASS ISA
May 21	Zeeshan Lakhani	Polarized Functional Programming v0.1
May 28	Philippa Cowderoy	Information Aware Type Systems and Telescopic Constraint Trees
June 4	Ranysha Ware	Battle for Bandwidth: Evaluating Congestion Control Deployability For The Internet

April 2

Speaker: Aldrin Montana

Title: Running the Trails of Data Management

Abstract: The demise of Moore’s law and Dennard scaling have swung a pendulum back towards computational storage–disk drives equipped with modest processors and working memory. As was the case when the earliest ``active drives” were developed over three decades ago, moving computational kernels closer to where the data is stored presents an opportunity to improve data processing and retrieval performance by alleviating bottlenecks at CPUs that can no longer promise exponential increases in performance over time. We explore this opportunity in a 3-way collaboration between industry and academia with the goal of showing the benefits provided by computational storage devices for scientific analysis workloads. The analysis workloads are based on use cases for the Human Cell Atlas (HCA) and the UCSC Genomics Institute, the computational storage devices are architected and designed by Seagate, and the system that brings these together is architected by our declarative programmable storage research group at UCSC. This talk uses trail running as a fun metaphor to detail the context and directions of our collaboration that spans data management, single-cell RNA sequencing, and a storage system that leverages computational storage devices.

Bio: Aldrin is a 3rd year PhD student at UC Santa Cruz advised by Peter Alvaro, but works extensively with Carlos Maltzahn (UCSC), Jeff LeFevre (UCSC), and Philip Kufeldt (Seagate). Aldrin received his B.S. and M.S. in computer science from Cal Poly, San Luis Obispo working with Alex Dekhtyar on a collaborative, microbial source tracking project with professors and students in the biology and biochemistry departments. Before joining UCSC’s PhD program, Aldrin has worked on data management of genomic variants at a biotech company, Personalis, Inc. His research interests are primarily in data management systems and bioinformatics, but also span programming languages, storage systems, and software engineering.

April 9

Speaker: Michael Arntzenius

Title: PL design and programming by voice

Abstract: Many programmers and other computer users suffer from repetitive strain injuries (RSI) of the hand that make using a keyboard for extended periods painful. For some kinds of RSI, therapy or surgery may resolve the problem, but in recalcitrant cases sometimes the only option is to reduce or cease typing. So what does a programmer do if they can’t type?One option is to use your voice. Speech recognition technology has advanced by leaps and bounds over the past decade, but most people’s ideas about voice-driven UI are based on convenient but limited voice assistants like Siri or Alexa, or at best dictation or prose-writing tools. What does a voice interface for expert users look like? A small but growing community of programmers, many affected by RSI, have been developing and using tools that explore this question.I suffer from RSI and use a voice control tool called Talon to program, control my computer, and write my thesis; but I’m also a programming languages researcher, and so I find it natural to wonder: how might PL research intersect with voice control? In this talk, I’ll try my best to answer that question, as well as a few others:

What is voice coding like? What unique challenges does it present compared with keyboard coding?
What can we learn from this about designing voice interfaces for expert users? Specifically, how can programming languages better support voice coding?
How does a voice control system work? What does it have in common with PL implementation?

Bio: Depending on how you count, Michael Arntzenius is somewhere between a 4th- and a 9th-year PhD student. He works with Neel Krishnaswami on Datafun, an attempt to combine the bottom-up deductive query language Datalog with higher-order typed functional programming. He is interested in monotonicity as a unifying and simplifying theme in incremental, concurrent, and distributed computation; and generally in the theory, design, and implementation of programming systems.

April 16

Speaker: Achilles Benetopoulos

Title: HARP: Eliminating Software Supply-Chain Vulnerabilities via Active Learning and Regeneration of String Processing Libraries

Abstract: Software supply-chain attacks are becoming an important security concern. These attacks focus on a dependency of the target rather than the target itself and often affect the target as well as other consumers of that dependency. We propose active library learning and regeneration (ALR) techniques for inferring and reconstructing the core functionality of a black-box software dependency. Using increasingly sophisticated rounds of exploration, ALR generates inputs, provides these inputs to the dependency, and observes the dependency’s outputs and interactions to infer a model of the dependency’s behavior captured in a domain-specific language. We present HARP, an ALR system for string processing dependencies. We apply HARP to successfully regenerate string-processing libraries in JavaScript, Python, and C/C++. In the majority of cases, HARP completes the regeneration in less than a minute, remains fully compatible with the original library, and achieves performance indistinguishable from the original library. We also demonstrate that HARP can eliminate vulnerabilities associated with libraries targeted in a number of highly visible security incidents of the JavaScript ecosystem.

Bio: Achilles Benetopoulos is a backend software engineer at Plum Fintech. He received a Masters degree in Engineering from the National Technical University of Athens. He is interested in exploring better ways of building robust and performant distributed systems. Starting in the fall, he will be joining UC Santa Cruz as a graduate student advised by Peter Alvaro, to work towards this goal. When he gets too frustrated by computers, he cooks.

April 23

Speaker: Konstantinos Kallas

Title: PaSh: A parallelizing shell

Abstract: In this talk I will present PaSh, a new shell that exposes data parallelism in POSIX shell scripts. To achieve that, PaSh proposes: (i) an order-aware dataflow model that captures a fragment of the shell, (ii) a set of dataflow transformations that extract parallelism and have been proven to be correct, (iii) a lightweight framework that captures the correspondence of shell commands and order-aware dataflow nodes, and (iv) a just-in-time compilation framework that allows for effective compilation despite the dynamic nature of the shell. PaSh is open-source and you can try it out today here: https://github.com/andromeda/pash. If you want to learn more about it you can read our paper that will appear in EuroSys 2021 (next week) here: https://arxiv.org/abs/2007.09436 and if you want to learn more about the dataflow model you can read the paper that introduces it and proves the correctness of the transformations here: https://arxiv.org/abs/2012.15422.

Bio: Konstantinos Kallas is a 3rd year PhD student at the University of Pennsylvania working with Rajeev Alur. They are broadly interested in all things PL, distributed systems, and greek folk mountain music. Recently, together with Nikos Vasilakis and several other amazing people they have been working on invigorating the research on the shell. They are also working on partial order semantics for distributed stream processing and programming models for serverless. You can find more information about them here: https://angelhof.github.io/.

April 30

Speaker: Eric Atkinson

Title: Programming and Reasoning with Partial Observability

Abstract: Computer programs are increasingly being deployed in partially-observable environments. A partially-observable environment is an environment whose state is not completely visible to the program, but from which the program receives partial observations. Developers typically deal with partial observability by writing a state estimator that, given observations, attempts to deduce the hidden state of the environment. In safety-critical domains, to formally verify safety properties developers may write an environment model. The model captures the relationship between observations and hidden states and is used to prove the software correct.

In my talk, I will present a new methodology for writing and verifying programs in partially observable environments. I will present belief programming, a programming methodology where developers write an environment model that the program runtime automatically uses to perform state estimation. A belief program dynamically updates and queries a belief state that captures the possible states the environment could be in. To enable verification, I will present Epistemic Hoare Logic, which reasons about the possible belief states of a belief program the same way that classical Hoare logic reasons about the possible states of a program. I will develop these concepts by explaining a semantics and a program logic for a simple core language called BLIMP. I will also discuss a case study in which we used BLIMP to implement a verified controller for the Mars Polar Lander, and I will evaluate the performance of a C implementation of BLIMP.

Paper link: https://dl.acm.org/doi/10.1145/3428268

Bio: Eric Atkinson is a 6th year PhD student at MIT advised by Michael Carbin. His primary research interests involve using programming language tools to help developers manage both probabilistic and nondeterministic uncertainty.

May 7

Speaker: Vadim Zaliva

Title: HELIX: From Math to Verified Code

Abstract: In this talk we will presents HELIX, a code generation and formal verification system with a focus on the intersection of high-performance and high-assurance numerical computing. This allowed us to build a system that could be fine-tuned to generate efficient code for a broad set of computer architectures while providing formal guarantees of such generated code’s correctness.

The method we used for high-performance code synthesis is the algebraic transformation of vector and matrix computations into a dataflow optimized for parallel or vectorized processing on target hardware. The abstraction used to formalize and verify this technique is an operator language used with semantics-preserving term-rewriting. We use sparse vector abstraction to represent partial computations, enabling us to use algebraic reasoning to prove parallel decomposition properties.

HELIX provides a formal verification foundation for rewriting-based algebraic code synthesis optimizations, driven by an external oracle. Presently HELIX uses SPIRAL as an oracle deriving the rule application order. The SPIRAL system was developed over the years and successfully applied to generate code for various numeric algorithms. Building on its sound algebraic foundation, we generalize and extend it in the direction of non-linear operators, towards a new theory of partial computations, applying formal language theory and formal verification techniques.

HELIX is developed and proven in Coq proof assistant and demonstrated on a real-life example of verified high-performance code generation of the dynamic window safety monitor for a cyber-physical robot system.

Bio: Vadim Zaliva works on formal verifications of computer programs, presently as a Senior Research Associate at Cambridge University and previously as part of his Ph.D. at Carnegie Mellon University. His diverse academic interests in addition to formal methods range from computer security to machine learning and gesture recognition. With 20+ years of experience in the design and implementation of commercial software, he has worked through the ranks of the software industry from software engineer to CTO, CEO, and company founder.

May 14

Speaker: Benjamin Valpey

Title: Discovering the Semantics of NVIDIA’s SASS ISA

Abstract: In this work, we specify a formal semantics of NVIDIA’s SASS instruction set, the ISA that is actually executed by NVIDIA GPUs. We introduce the ROCetta approach for discovering the semantics of a target language that requires the semantics a source language, and the ability to obtain a translation from the source to the target language. Using this approach, we have discovered and verified the semantics of 421 SASS instructions. In addition, our verification effort was able to uncover 18 bugs in the CUDA compiler’s translation of single-instruction PTX programs. In some cases, bugs were only caught during verification and were missed by the testing framework, which lacked a test for inputs that trigger the bug. Furthermore, verification of 3,581 programs took 14 minutes when run in parallel, while running the full test suite against hardware took upwards of 30 minutes. The low cost of verification, along with its superior ability to detect bugs, makes it an invaluable tool to detect bugs that occur in the translation of small programs.

May 21

Speaker: Zeeshan Lakhani

Title: Polarized Functional Programming v0.1

Abstract: When designing a new, real-world type system, where would you start? Looking for examples, you may notice that a language like Typescript has a myriad of advanced typing features, though they’re oriented toward trying to capture JavaScript’s many dynamic idioms and (now) lengthy history of existing programs. Furthermore, even with these advanced features, the language is not sound, i.e. certain operations cannot be known to be type-safe at compile-time. Looking elsewhere, like in the ML family of languages, you’ll see a powerful type inference mechanism helping to free programmers from the burden of numerous, explicit type annotations. Yet, as new research-driven type extensions have been incorporated, the need for explicit annotations or pre-processing steps has increased in certain situations. As the complexity of our programs and the properties about them that we’d like to statically guarantee grow, how far can just typechecking take us while remaining sound and without us having to associate proofs with our programs?

The answer is pretty far. Lightweight verification through refinement types is catching on. Most modern languages, e.g. Rust, OCaml, Dotty/Scala, come out of the box with sum types or enumerated variants and integrations for extended static code analysis. Our answer to these questions lies in the powerful relationship between static analysis and type theory and how to treat evaluation (essentially, eager versus lazy) as first-class.

In this talk, I will present our work-in-progress research on a polarized functional programming language based on the call-by-push-value (CBPV) λ-calculus, which allows for the mixing of both call-by-name and call-by-value evaluation inside of a single program via a kind-like operator, shifting between positive and negative types. We extend this polarized calculus further with a transparent and equirecursive interpretation of mixed-inductive-and-coinductive datatypes, allowing us to leverage strong structural subtyping and datasort refinements over sums and intersections. Additionally, typechecking works bidirectionally in our system, meaning that terms can be checked to a certain type or synthesized as a certain type, which scales well for adding various expressive type features to our language.

Bio: Zeeshan is the founder of Papers We Love and PWLConf, as well as an organizer. He’s also the Director of Research and Development at BlockFi working on various data and program analysis projects. And, he’s a PhD student at Carnegie Mellon University’s School of Computer Science, studying Programming Languages under Frank Pfenning.

May 28

Speaker: Philippa Cowderoy

Title: Information Aware Type Systems and Telescopic Constraint Trees

Abstract: What does conservation of information have to do with type systems? Are type checkers specialised operating systems? Can we understand dataflow in type systems or are we doomed to drown in it?

I’ll present a highly familiar type system in pursuit of an `Information Aware’ style, using information effects to reveal data flow and help in implementing a checker. I also calculate a general, scoped, constraint-based representation of typechecking problems from the typing rules.

And then, because why not? Let’s toy with substructural systems and maybe even incremental checking!

Warning: talk may contain willfully creative use of long-standing concepts…

Bio: Philippa Cowderoy is a professional procrastinator and amateur computer scientist, having dropped out of the University of Nottingham as an undergraduate in 2007.

Somehow this still hasn’t put her off being frustrated with her tools on a increasingly deeper level. When she is bored enough, she is now mining particularly meta veins of frustration shared by others in the PL community.

In her copious free (if not high-quality) time she plays too many games and spends too little time around cats.

June 4

Speaker: Ranysha Ware

Title: Battle for Bandwidth: Evaluating Congestion Control Deployability For The Internet

Abstract: The stability of the Internet relies on congestion control algorithms (CCAs) to efficiently and fairly share limited network resources. Dozens of congestion control algorithms have been proposed in the past 30 years, however, we still lack a good way to decide whether a new algorithm is reasonable to deploy on the Internet. As we will show in this talk, new algorithms can be dramatically unfair to widely-deployed legacy algorithms. For example, we prove that Google’s new CCA, BBR, will always consume a fixed fraction of the link when competing with any number of Cubic flows.

Given the threat of poor performance due to competing heterogeneous CCAs, in this talk we discuss considerations for deploying new CCAs on the Internet. While past efforts have focused on achieving fairness or friendliness between new algorithms and legacy algorithms, we instead advocate for an approach centered on quantifying and limiting harm caused by the new algorithm on the status quo. We argue that a harm-based approach is more practical, more future proof, and handles a wider range of quality metrics than traditional notions of fairness and friendliness.

Bio: Ranysha Ware is a PhD student in Carnegie Mellon University’s Computer Science Department, co-advised by Professor Justine Sherry and Professor Srinivasan Seshan. Her bailiwick is computer networking. Her current research focuses on challenges arising from the deployment of new transport protocols and congestion control algorithms in the Internet. Ranysha earned her M.S. (2015) from UMass Amherst and B.S (2013) from SUNY New Paltz. She is a recipient of the IRTF Applied Networking Research Prize, Facebook Emerging Scholar Award, and National GEM Consortium Fellowship.