Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper and Tyler Sorensen
The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).
For fall 2023, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.
|Towards fine-grained compartmentalization of operating system kernels
|Compositional & Scalable Programming in Granite
|Advancing Property-Based Testing in Theory and Practice
|VyZX: Formal Verification of a Graphical Language
|Multiparty Session Type Projection and Subtyping with Automata
|GrayC: Greybox Fuzzing of Compilers and Analysers for C
|Suha S. Hussain
|MLFiles: Using Input-Handling Bugs to Inject Backdoors Into Machine Learning Pipelines
|Systems security in practice: threat modelling at Trail of Bits
|TULIP🌷 : Transcompilation Undergoes LLVM-IR for Interactive Parallelization
Speaker: Vikram Narayanan
Title: Towards fine-grained compartmentalization of operating system kernels
Abstract: Commodity operating systems execute core kernel subsystems in a single address space along with hundreds of dynamically loaded extensions and device drivers. Lack of isolation within the kernel implies that a vulnerability in any of the kernel subsystems or device drivers opens a way to mount a successful attack on the entire kernel. Historically, isolation within the kernel remained prohibitive due to the high cost of hardware isolation primitives. Recent CPUs, however, bring a new set of mechanisms. Extended page-table (EPT) switching with VM functions and memory protection keys (MPKs) provide memory isolation and invocations across boundaries of protection domains with overheads comparable to system calls.
I will talk about how we developed a collection of techniques for lightweight isolation of privileged kernel code(LXDs and LVDs) that demonstrates near-native performance of two (network, block) high-performance device drivers. I will present KSplit, a framework for isolating unmodified device drivers in a modern, full featured kernel that performs automated analyses on the unmodified source code of the kernel and the driver to identify shared state and synchronization requirements for efficient isolation.
Bio: Vikram recently graduated with a PhD from the University of Utah. His research interests are building secure operating system kernels, and heterogeneous systems.
Speaker: Nick Rioux
Title: Compositional & Scalable Programming in Granite
Abstract: Granite is a strict and untyped programming language under development with the goal of enhancing functional programming along a number of axes. It aims to provide a basis for general-purpose programming that is:
- Compositional: It is easy to extend (mutually) recursive functions to support new cases.
- Expressive: Programmers can work with both the rich data structures common in functional programming as well as cyclic data more naturally supported by Datalog.
- Scalable: Programs naturally incorporate parallelism in a manner reminiscent of LVars. This should also extend to eventually consistent-by-design distributed computation a la CRDTs.
Perhaps surprisingly, these three aspects of language design are related: Granite’s design is emerging from the observation that the key to achieving each of the above-mentioned goals is a join-semilattice. Granite makes semilattice structure a first-class language feature; we’ll see what this means by example.
Bio: Nick is a PhD student at the University of Pennsylvania. He applies mathematical concepts from abstract algebra to make it more practical for programmers to break their code into smaller reusable components while also scaling their programs across a distributed system.
Speaker: Harrison Goldstein
Title: Advancing Property-Based Testing in Theory and Practice
Abstract: Property-based testing (PBT) is a testing methodology that allows users to write executable specifications of programs and test those specifications with automatically generated program inputs. PBT is well-documented as a power-tool for bug-finding, with success stories at companies like DropBox, Volvo, and Amazon, but we think it has still not reached its full potential. Our group has taken a user-centered approach to advancing PBT, talking to real developers and discovering the challenges that they face when using it.
I present two new abstractions for random data generators—free generators and reflective generators—that help address some of these challenges. Free generators re-imagine generators as “parsers of randomness.” They show that a generator is really a parser for sequences of random choices, and this perspective suggests new generation techniques. Reflective generators build on free generators, adding support for “backward” execution in which a generated value can be turned back into the sequence of choices that produced it. Backward generator execution is the foundation for a wealth of new algorithms, including ones for test-case “shrinking,” test-case mutation, and example-based generator tuning.
Moving forward, we plan to focus on usability: free and reflective generators should be easy for developers to write, and they should make other parts of the PBT process easier as well. I discuss future work in this direction.
Bio: Harry Goldstein is a Ph.D. candidate at the University of Pennsylvania, advised by professor Benjamin Pierce. His work leverages techniques from both programming languages and human-computer interaction to make tools for software testing more powerful and usable. You can find his publication list, blog, and other information at https://harrisongoldste.in.
Speaker: Adrian Lehmann
Title: VyZX: Formal Verification of a Graphical Language
Abstract: Mathematical representations of graphs often resemble adjacency matrices or lists, representations that facilitate whiteboard reasoning and algorithm design. In the realm of proof assistants, inductive representations effectively define semantics for formal reasoning. This highlights a gap where algorithm design and proof assistants require a fundamentally different structure of graphs. To address this gap, we present VyZX, a verified library for reasoning about inductively defined graphical languages. These inductive constructs arise naturally from category theory definitions. A key goal for VyZX is to verify the ZX calculus, a graphical language for reasoning about categorical quantum mechanics. Used as a more general model than the standard quantum circuit model, the ZX calculus equips its construction “ZX-diagrams” with a collection of diagrammatic rewrite rules that preserve the graph’s semantic interpretation. The ZX calculus has been shown to be useful for building quantum error correction, quantum compilers, and for general graphical reasoning. In VyZX using an initial set of rules proven through their semantics, we proceed to prove facts about the ZX calculus by only appealing to statements about the ZX calculus using standard proof assistant techniques. VyZX integrates easily with the proof engineer’s workflow through visualization, automation and verified conversion of quantum circuits to ZX diagrams.
Bio: Adrian Lehmann is a PhD student at the University of Chicago adivsed by John Reppy working with Robert Rand. His interests lie in programming languages: compilation, verification, and using PL for better software engineering. He’s currently exploring applying these techniques to quantum computing.
Speaker: Elaine Li
Title: Multiparty Session Type Projection and Subtyping with Automata
Abstract: Multiparty session types (MSTs) are a type-based approach to verifying communication protocols. Central to MSTs is a projection operator: a partial function that maps protocols represented as global types to correct-by-construction implementations for each participant, represented as a communicating state machine (CSM). Existing projection operators are syntactic in nature, and trade efficiency for completeness. In the first part of the talk, I will present the first projection operator that is sound and complete. I will highlight the automata-theoretic nature of our projection operator, which separates synthesis from checking implementability. For synthesis, we use a simple subset construction; for checking implementability, we present succinct conditions that summarize insights into the property of implementability. We use these conditions to show that MST implementability is PSPACE-complete.
While our projection operator always computes a candidate implementation if one exists, it may not always be the best candidate. In the second part of the talk, I motivate the subtyping problem for MSTs through a series of examples. I then demonstrate how we can use our conditions from before to obtain decidability results for MST subtyping with unrestricted CSMs as implementation model.
Bio: Elaine Li is a fourth year PhD student at New York University, working with Thomas Wies. She is interested in both the theory and practice of protocol verification, and her PhD research focuses on the theory of multiparty session types.
Speaker: Karine Even-Mendoza
Title: GrayC: Greybox Fuzzing of Compilers and Analysers for C
Abstract: Fuzzing of compilers and code analysers has led to a large number of bugs being found and fixed in widely-used frameworks such as LLVM, GCC and Frama-C. Most such fuzzing techniques have taken a blackbox approach, with compilers and code analysers starting to become relatively immune to such fuzzers.
In this talk, I will introduce a novel coverage-directed, mutation-based approach for fuzzing C compilers and code analysers inspired by the success of greybox fuzzing in other application domains. I will discuss the main challenge of applying mutation-based fuzzing in this context and present our solution: GrayC, a novel Greybox fuzzer for C compilers and analysers.
Naive mutations often result in non-compilable programs, which hinders their ability to discover critical bugs affecting optimisation, analysis, and code generation routines. To address this, we have designed a novel greybox fuzzer for C compilers and analysers by developing a new set of mutations to target common C constructs, transforming fuzzed programs so that they produce meaningful output, allowing differential testing used as a test oracle, and paving the way for integrating fuzzer-generated programs into compiler and code analyser’s regression test suites.
We have implemented our approach in GrayC, an open-source LibFuzzer-based tool. Our experiments demonstrate its superior coverage, particularly in compilers and analysers’ middle and back-end stages, outperforming Clang-Fuzzer, PolyGlot, and similar LangFuzz techniques. Through GrayC, we have identified 30 confirmed bugs in compilers and code analysers and enriched the Clang/LLVM test suite with 24 simplified, coverage-enhancing the Clang/LLVM test suite, targeting 78 previously uncovered functions in the LLVM codebase.
Bio: Karine Even-Mendoza is a Lecturer in Systems and Programming Languages (CS) at King’s College London with a PhD in Computer Science from King’s College London. Before joining as a lecturer, she was a Research Associate in the Department of Computing at Imperial College London, where she worked in the Software Reliability Group (SRG) and Multicore Programming Group on compiler testing and software testing in general. Her PhD at King’s College London, in the SSY group, focused on model checking, SMT solving, and incremental verification for software. Before her PhD studies, she worked in several local and international software companies.
Speaker: Suha S. Hussain
Title: MLFiles: Using Input-Handling Bugs to Inject Backdoors Into Machine Learning Pipelines
Abstract: The widespread use of machine learning (ML), especially in safety-critical applications, necessitates robust security measures for ML pipelines. Prior research has demonstrated the existence of model vulnerabilities, including model backdoors that can compromise the integrity of ML pipelines. Although many backdoor attacks limit the attack surface to the model, ML models are not standalone objects. These models are embedded in ML pipelines that involve multiple interacting components and are built using a wide range of ML tools. In this talk, I will discuss our investigation of input-handling bugs in ML tools as a vector for injecting backdoors into ML pipelines. Input-handling bugs are central to the field of language-theoretic security (LangSec), which advocates for the treatment of inputs as a formal language in order to develop precise, minimalist input-handling code. Drawing from a LangSec taxonomy of input-handling bugs, we systematically identified and exploited vulnerabilities with ML model serialization in popular tools. This process enabled us to construct ML backdoors, substantiating our claim. In the process, we engineered malicious artifacts, including polyglot and ambiguous files, using ML model files; contributed to the fickling library; and formulated a series of guidelines to provide actionable steps to ameliorate this issue. Our investigation brings to light the risks posed by input-handling bugs in tools to the overall security of ML pipelines, arguing for an approach that concurrently addresses software security issues in tools and model vulnerabilities.
Bio: Suha S. Hussain is a security engineer on the machine learning assurance team at Trail of Bits. She has worked on projects such as the safetensors security audit and fickling. She received her BS in Computer Science (with threads in people and theory) from Georgia Tech where she also conducted research at the Institute for Information Security and Privacy. She previously worked at the NYU Center for Cybersecurity and Vengo Labs.
Speaker: Kelly Kaoudis
Title: Systems security in practice: threat modelling at Trail of Bits
Abstract: Every system user and engineer has a different threat model, and a different understanding of the systems and applications they use or work on. Failure to unify these bodies of knowledge leads to not sufficiently considering weaknesses of the system and threats to it; this leads to surprise when an attacker exploits these weaknesses, which leads to incident response (and sometimes also sadness). Holistic threat modelling informs and enables making good system-level security decisions to minimize potential attack vectors. During a threat modelling engagement, Trail of Bits aims to methodically enumerate as many in-scope, system-level risks and weaknesses as possible. “System-level” here means architectural, design-level, or operational gaps in the client’s security posture. We use concrete examples in the form of threat scenarios and findings to show the client (rather than tell them) the insufficiently applied security controls we have identified, and to illustrate the risk implications of the lack of those security controls.
In this talk, I will present some of the interesting findings we’ve uncovered during previous (published) threat modelling engagements. Using examples from engagement reports to motivate each step, I will walk through the threat modelling process at Trail of Bits, and also talk about how our process can be useful in academic security work.
Bio: Kelly Kaoudis is a senior security engineer in the Research practice at Trail of Bits. She is a tech lead for threat modelling engagements, and contributes to Trail’s academic and industry research projects including open source parser and file formats analysis tooling. Prior to Trail of Bits, Kelly was the tech lead for Twitter’s application security team, and a graduate student in the Networking and Security (NSR) group at University of Colorado Boulder with Prof. Eric Keller. She received an MS in computer science from University of Colorado - Boulder in 2015.
Speaker: Susan Tan
Title: TULIP🌷 : Transcompilation Undergoes LLVM-IR for Interactive Parallelization
Abstract: The exploration of heterogeneous systems has given rise to various parallel programming models (PPMs). However, the preference of hardware vendors for specific PPMs requires developers to manage multiple implementations for the same kernels. This not only consumes additional time but also impedes the seamless integration of new features and algorithm optimizations. This can be mitigated by a transcompiler. As it focuses its optimizations on the Abstract Syntax Tree (AST) level, a traditional transcompile falls short due to the preservation of syntax and semantics, limiting the compiler’s ability to reason about dependences. To address this, we introduce TULIP, a framework that transcompiles CUDA programs into OpenMP code—two widely adopted PPMs for prevalent heterogeneous CPU+GPU systems. Operating at the Intermediate Representation (IR) level instead, TULIP opens up opportunities to use state-of-the-art automatic parallelization frameworks for enhanced code performance.
Bio: I am a fifth-year Ph.D. student in the Liberty Research Group at Princeton University, working under the supervision of Prof. David I. August. My research focuses on automatic parallelization, with a particular interest in exploring how tools such as decompilers and transcompilers can derive benefits from and contribute to state-of-the-art parallelization frameworks. In my spare time, I engage in reading and photography, capturing moments for friends’ special occasions.