λk.(k blog): λk.(k blog)

A high-level summary and interpretation of ACM finances

2023-12-14T20:06:46Z

I’m in the middle of liaising with the UBC library to attempt to negotiate joining ACM Open. It’s not going well. While US universities appear to see cost increases of 2x—3x, Canadian universities are seeing costs of 10x—20x. And our budgets run much tighter, and with far less research funding, particularly for article processing and other publication costs. A 10x—20x increase in publication costs is hard or impossible to swallow.

So, I’ve been forced to look at publishing—where should I publish, how much will it cost, etc—and one question I asked was: why does ACM Open cost so much? It’s a lot more than other open access journals. So I started looking at ACM finances and.. well…

Here’s my dive into ACM finances, and questioning fundamental claims surrounding ACM publishing costs.

UPDATE, March 1 2024 After discussions with other ACM members and feedback on this article, I’ve realized there are several errors in the analysis and the interpretation. I’ll publish an updated version eventually, but leave this version for transparency and as a record.

In short, here are a list of errors in the below facts and analysis:

The data relied upon, FY 2022, is atypical due to including running conferences under COVID restritions. Conferences generally pay for themselves.
The $700 average cost of an article is justified by the Form 990s, despite the profit. This can be seen by cross-referencing the ACM publications finances report with the Form 990s, which is somewhat difficult to do because of the difference in categorizations and granularity of the different reports.
Eliminating membership fees would actually increase costs, since it would likely increase membership, and members receive various benefits which incur costs.
The ACM is probably not as bad as it seems at investing. The calculated return on investment appears to be atypical or artificially low. Further, for very good reasons, the ACM has a conservative investment strategy.
The assets, and income from assets, are at least partially restricted in various ways, both from donors (restricted or endowed funds), or because it is held for the SIGs, which limits how it could be spent down.

However, I still stand by my high-level interpretation of the high profit and high assets of the ACM.

Outdated, invalid information

A high-level summary of ACM finances, from an objective view point

The ACM publishes annual reports in which it summarizes financial information such as its revenue and costs etc. The most recent is available here: https://cacm.acm.org/magazines/2023/4/271230-acm-publications-finances-for–2021/fulltext

Others have relied on this to infer where the $700 APC is spent, and I will argue about that later.

But these reports are presented in the best possible light, to tell a story; they come from a very subjective view point. They’re also at a very low level, so I have a hard time making sense of them at first.

The ACM is also required to file annual IRS disclosures to maintain their non-profit status, which are available with convenient summaries here: https://projects.propublica.org/nonprofits/organizations/131921358.

These are incredibly boring pure figures, presented on PDFs of tables of rows of undigested data—IRS tax documents. Interpreting them can be difficult, for multiple reasons—tax accounting can be different from our intuitive classifications of things; can involve figures from the past and future, anticipated figures, and completely imaginary figures. However, they should be exact and objective.

Here are some salient facts about that ACM’s finances, as reported to the IRS and the public in these filings, and as somewhat inexpertly summarized to a few significant digits by me:

The ACM has been making, on average, ~$8,000,000 in annual “net income” for the past 10 years, and on average $10,000,000 in “net income” for the past 5 years.
The ACM currently has approximately $150,000,000 in investments, essentially all publicly traded securities.
From these investments, the ACM made approximately $3,000,000 in dividends and $3,000,000 in net gains—an approximately 4.4% return on their investment.
The ACM is currently sitting on $25,000,000 in uninvested, non-interest bearing, cash. (I assume it’s filling a swimming poll in Times Square.)
The ACM spent ~$3,300,000 on “good works” related to its mission—grants, student awards, travel awards, etc. This does not include any expenses spent on operating the ACM, publishing, running conference, running the digital library, salaries, etc.
The ACM’s total revenue was $66,000,000, with a breakdown of the major sources below:
- Publishing: $24,500,000
- Conferences: $18,000,000
- Membership fees: $7,500,000
- Ads: $1,000,000
- Investments: $6,500,000
  - Dividends: $3,300,000
  - Capital gains: $3,200,000
The ACM’s total expenses were $56,000,000, with a breakdown of major categories below. There are quite a few other smaller categories, all of which are pretty reasonable.
- Conferences: $20,000,000
- Salaries (non-executies): $7,200,000
- Publication production and services: $6,200,000
- IT: $5,000,000
- “Good works” (grants and awards etc): $3,300,000
- Executives pay: $1,500,000

This summary of the IRS filing is easy to misunderstand. For example, the “conferences” category is probably the most difficult to understand from the IRS filings. Some of this could involve some publication costs, since publications are associated with conferences. The publication costs probably looks artificially low, since the IT category is probably almost entirely related to the digital library, which we would consider part of publication costs, but the IRS considers as a separate line item. Much of the salaries are related to staff for support publication, but salaries are considered separately.

“Good works” is an ill-defined term that comes up in discussion about the ACM’s mission. In the IRS numbers, the things I’ve labeled “good works” are what the IRS calls grants. These are unambiguous expenditures of money by the ACM not on operating any business or service, but that it just gives away. However, note that running conferences, publishing, and maintaining the library are part of the ACM mission.

A subjective interpretation of the high-level ACM finances

Digital library subscriptions and APCs are the main source of revenue

The “publishing” revenue of $24,500,000 is the main source of revenue. We can ignore the conference revenue entirely. This is money that ACM collects ($18,000,000), and then immediately spends on conferences ($20,000,000). The ACM appear to marginally subsidize the cost of conferences, although this is deceptive given the membership fees and the possible misinterpretation of the conference expense category.

Given that they are the main source of revenue, it’s pretty understandable why any change to the structure of that revenue greatly concerns that ACM. A significant change in publication revenue could bankrupt the ACM…. although, it would take several years of literally $0 revenue, given the large pile of reserves the ACM is sitting on.

The ACM is sitting on a very large rainy day fund

The ACM has $25,000,000 in cash and $150,000,000 in investments. Given annual expenses of $50,000,000, the ACM could afford to take in 0 revenue for 3 years, and still wouldn’t be bankrupt.

This seems pretty excessive, but I’m not experienced enough as a financial analyst. The only point of comparison I could quickly find was arXiv, which in 2013 was aiming to cover half of their annual expenses in a rainy day fund: https://info.arxiv.org/about/reports/arXiv_Reserve_Funds_Policy.pdf.

The investments are a little different than the cash, since they could take a while to sell off, you might sell them at a loss, and selling them cuts of a small bit of future revenue. But we’re talking about emergencies and unlikely events here.

The ACM is very profitable, particularly for a non-profit

Normally, a for-profit business would call net income “profit”. The ACM is quite profitable, averaging $10,000,000 per year in profit. It’s not unusual or bad for a non-profit entity to have net income in a given year; it would be a very precarious entity that was constantly breaking even or running a deficit. But this long term trend of surplus is interesting, at least to me, a member, who is being charged various fees that form the revenue of the ACM.

The purpose of a non-profit is not to make profit, but to achieve its mission, which according to Schedule O is:

TO ADVANCING THE ART, SCIENCE, ENGINEERING, AND APPLICATION OF INFORMATION TECHNOLOGY, SERVING BOTH PROFESSIONAL AND PUBLIC INTERESTS BY FOSTERING THE OPEN INTERCHANGE OF INFORMATION AND BY PROMOTING THE HIGHEST PROFESSIONAL AND ETHICAL STANDARDS.

I do not think it is in the public or professional interest to charge unnecessarily high fees, taken out of research and educational funding, and to hold onto that surplus income as an increasingly large pile of assets.

The profit margin for the ACM is 15%. For context, Elsevier has profit margin of around 40%, while Springer has a adjusted operating margin (similar enough to profit margin for our purposes) of 26%. These two are notorious for-profit academic publishers, and should be considered outliers. They have absurdly large profit margins.

For a better view, we could look at various related industries. Given that this is essentially entirely from publishing, I’ll compare this to publishing companies: https://ised-isde.canada.ca/app/ixb/fpd-dpf/report.

From here, I generated the NAICS 513 - Publishing industries - Financial Performance Data, for year 2022, for companies with average revenue $5,000,000—$20,000,000. The publishing industry profit margin range is –8.8%—17.5%.

The ACM has a very high profit margin for the publishing industry.

The ACM is not spending $700 per article

The ACM is not spending $700 per article to publish and maintain the library, no matter how we look at it.

The easiest way to see this is to observe the approximately $10,000,000 in profit the ACM makes, on average, every year, which comes almost entirely from its publication licensing. The ACM could afford to cut the APC to $595, 85% of the APC, giving up its 15% profit margin, and benefiting its members at the same time.

Arguably, it should not give up all net income, as you want some saved for a rainy day. However, the ACM currently has as large rainy day fund.

The ACM funds no good works at all

The ACM funds essentially no good works at all. 5% of the ACM’s costs go to good works, which represents a margin of error in costs. The good works the ACM funds represent the same amount of money, about 3 million dollars, as it makes in annual dividends from its investments. Arguably, nothing the ACM does—collecting revenue from publishing, membership fees, conferences, or donations—funds the good works. It’s only the passive investment in publicly traded companies that fund the good works—the profit from the ACM’s private capital funds good works.

The ACM doesn’t need to charge membership fees

The ACM could cut its membership fees to 0 tomorrow and still be running a $2,500,000 surplus. In fact, probably more, because it could reduce expenses in maintaining infrastructure to collect membership fees, accounting, etc.

The ACM seems very bad at investing

The investment numbers can be very deceptive because its very easy, totally legal, and advisable to play various tax shenanigans with how investments are taxed, so the actual return may be higher.

But, taken at face value, and considering the ~$500,000 in investment management fees, the ACM is extremely bad at investing their money. If they stuck it in a CD right now, they could in principle receive about 5% return for 0 fees. My personal investments currently return about 4.5% dividends and total return of 13%.

If the ACM would like to hire me to manage their money, my rate is $499,000 per year, payable directly to my unrestricted account at UBC. Past performance is no indication of future success; investments may lose value; this does not constitute legal advice; please consult your doctor before starting any medication; etc.

Conclusions: What should I do with this information?

If you’re an ACM member, then there are several things you could call for, press for, attempt to change, etc.

A policy around limiting the net income, or at least directing the use of the net income, of the ACM could substantially reduce costs for members. This could be short-sighted: investing the net income does build an endowment and future income, could better reduce costs in the future. However, the ACM seems bad at investing, and tying the stability of the ACM to the stock market doesn’t seem any more stable than having a long-term balanced budget at lower costs.
A policy to spend down ACM assets. It’s reasonable to keep some cash in reserves, but it may be unreasonable to keep that much cash invested. On the other hand, all universities have large endowments that are similarly managed, so maybe you don’t want to spend it down.
Eliminate membership fees. They represent 75% of the profit.
Eliminate conferences. Conferences are a major expense for the ACM, and apparently a net cost. Removing them could reduce the obvious costs, but also some of the non-obvious indirect costs, and enable reducing publishing costs.
Fund more good works. Maybe we’re fine with the costs, but we just want to spend that profit on good works. We could afford to triple them, and still be profitable.
Do nothing. Maybe things are fine the way they are.

To draw any more conclusions, I think we’d need to dive into the low-level details of ACM’s budget, start comparing it to other publishers, other organizations, reconsidering the services we want the ACM to provide, etc. Some good, low-level information is available in the annual reports: https://dl.acm.org/doi/pdf/10.1145/3389687. One bit that sticks out to me is the cost of copyediting, which is about $1,000,000, and I’d really pay for them to stop destroying my work.

Academia Is a For-Profit Industry

2023-09-18T17:44:45Z

An ill-advised blog post in which I speculate that academia is actually for-profit, but not in the usual sense of “profit”.

Profit is often narrowly construed as about money. It’s not; it’s about value, particularly surplus value. Money is a proxy for value, and often a useful one.

In academia, however, money is not a good proxy for value. I’d argue the primary proxy for value is academia is paper and citation counts, although there are others. And by these measures, academia operates like a for-profit industry with many of the problems that entails: concentration of market power by accumulation of capital, the exploitation of labour, the externalizations of costs, all following the drive for ever higher surplus value. By disguising itself as non-profit, by eschewing money as value, academia disguises these issues.

If paper and citation counts represent value, whence profit? To profit is not to merely produce things of value, but to seek to produce surplus value, and use the capture of that surplus value to incentive production. If papers represent value, then surplus value would represent papers and citation counts beyond what is required to produce the embodied knowledge. If academia were for-profit, we’d expect structural incentives to produce more and more papers and citations, and to produce them more efficiently (or at lower cost anyway). We’d expect to see that academic institutions measure success by increasing profit (i.e., papers and citations). That, for example, academics are hired and retained based on their ability to bring profit to the employing institution. We would expect capital to accumulate that makes increasing profit easier, so some groups accumulate profit that they can reinvest into increasing numbers of papers and publications.

Some of these are easy to test; the others are testable with some straightforward data collection and analysis. I’m not going to do that now because (1) I’m bad at numbers (2) research is my job and no one is paying me for this.

Academics and universities do judge their success, at least in part, by the number of papers and citations. See for example https://csrankings.org for one formalization of this; it’s like the Dow of CS universities. Or something. Researchers are hired and rewarded based on publication count and citations. These go into the hiring packets for new faculty and faculty up for tenure. Some contracts have specific publications counts you have to hit to make tenure. This is analogous to high-level management positions with profit targets; missing them could well get you fired.

On it’s face, this isn’t necessarily a problem, and in some ways, is less problematic than a for-profit economy in general. At least papers represent knowledge, so we’re incentivizing the production of knowledge. The reasons I dwell are: (1) an incentive to optimize a proxy, a metric, is IMO bad; it leads to perverse incentive, i.e., incentives to optimize merely the metric at the expense of the thing measured. This might not be a problem but for: (2) while papers represent knowledge-as-value, this value can be exchanged for monetary gain, so exploitation of the paper metric is profitable in the usual sense, and at the expense of knowledge.

So, does the “profit” motive lead to exploitation? I’d say so, and in the same predictable ways as the usual profit motive. The easiest ways to increase production and decrease costs aren’t do hard work at making things more efficient, but to lie, cheat, and steal. We see these are work in academia. Made up data, plagiarized papers, etc. But let me zoom in on some higher level stuff.

Exploitation of labour happens, at least until tenure. Grad students are often exploited and receive only a part of the value of their labour, although interesting I’d argue that by the measure of paper and citation count, less exploited than other industries. Grad students usually receive an equal share of the paper count and the citations. However, they might put in a disproportionate share of the labour required to produce a paper. I’ve heard of cases where students are cut out of a paper they worked on, or a supervisor puts almost no work in but receives a paper and citations. Very capitalist—they provided the capital to produce the paper, even if they provide no labour, and receive a share of profit in return. Importantly, though, junior faculty (at least) are exploited as well. Here, it’s not by a direct superior, but indirectly by the institution, through a series of committees of “peers”. “Publish or perish” means they must produce value, papers, or be replaced by the many workers willing to step into the scarce high-demand faculty positions. Even highly respected workers who miss arbitrary targets are denied tenure, because they don’t produce enough profit. Exploitation of the individual worker only ceases if they get tenure, although systemic exploitation continues. In this view, I guess tenure is finally accumulating enough capital to become a capitalist, and live off the rents of capital accumulated? You never have to produce another paper again. Here, the exchange of “profit” for profit seems important.

The analogy to capital accumulation holds as well. The more publications and citations you get, the easier it is to get more. To produce more and more citable papers requires grant funding, top graduate students, and post-docs. To get these, you need to have sufficient grant funding, and sufficient reputation and renown to attract top workers. In a typical for-profit industry, those with the most profit can afford to recruit the best talent, by paying more. In academia, though, salaries as in money are relatively fixed by institutions, not individual advisors (exceptions exist). Instead, academics compete largely on reputations—who can train you to produce, and get you the most, high quality publications. That is, who can help you acquire the most capital-of-academia. Profit should also be translatable into capital, and this happens. Grants and awards look, in part, at publications and citations (and, ironically, other awards received) to decide who gets awards. These grants and award fund more papers and citations. Higher profits lead to the accumulation of capital, which leads to higher profits. Metrics like the H-index even ensure papers and citations depreciate, unless they generate ever higher citations over time.

Costs of production get externalized. The most notable might be, as in many industries, greenhouse gasses. To produce papers and make them well known to gain citations often requires travelling, often internationally, often frequently. One other I worry about, although it’s unclear whether it’s a real problem, is the training of large numbers (I’d argue, unnecessarily large numbers) of PhDs. I’d argue we produce more than academia needs, except that we need them to produce research cheaply. In some places, there is pressure to increase the size of the PhD program. There’s a straightforward way to reduce these costs, and like in other industries, there’s no way in hell we’re doing it: produce less, or produce slower, or both. Of course, like in other industries, there’s reason. If we do stop producing sufficient profit (papers), people will lose their livelihood. Grad students may not get jobs if they don’t produce enough papers presented at international venues. Junior faculty may not get tenure.

I don’t think any of this is a particularly novel observation, except the explicit analogy as profit. And even that I doubt.

And the analogy isn’t perfect, and it’s complicated by the intersection with the money economy. That’s okay; no model is perfect, but some are useful. I think this for-profit model of academia is useful.

What is a model?

2023-06-15T20:25:25Z

What is a model, particularly of a programming language? I’ve been struggling with this question a bit for some time. The word “model” is used a lot in my research area, and although I have successfully (by some metrics) read papers whose topic is models, used other peoples’ research on models, built models, and trained others to do all of this, I don’t really understand what a model is.

Before I get into a philosophical digression on what it even means to understand something, let’s ignore all that and try to discover what a model is from first principles.

Definitions of “model”

The apparent place to start to understand the meaning of a word is to read its definition. This is actually no help at all. There are lots of uses of the word “model”, with several definitions. Here are some.

Definition 0 In science and engineering, a model is “an abstract description of a concrete system using mathematical concepts and language”. See Wikipedia provides a nice introduction to this kind of model, and the Standard Encylopedia of Philosophy provides a nice explanation in the context of model theory, which will be relevant later in this post.

Definition 1 A syntactic model (of a type theory) is defined by Boulier, Pédrot, and Tabareau as a translation from one type theory into another that preserves typing, the definition of false, and definitional equivalence. This syntactic model enables the source type theory to inherit properties of the target type theory—such as consistency.

Definition 2 A model (of a vocabulary also called a language $\sigma$ ) in the sense of model theory (as defined by Elements of Finite Model Theory) is a $\sigma$ -structure (“also called a model”) defining a set A along with 3 sets providing interpretations of that vocabulary. These sets are $Ic_A$ , which interprets each constant in $\sigma$ as an element of $A$ , $IP_A$ , which interprets each n-ary predicate symbol or relation symbol from $\sigma$ as an n-ary (set-theoretic) relation between elements of $A$ , and $If_A$ , which interprets each n-ary function symbol in $\sigma$ as a (set-theoretic) function from n elements of $A$ to an element of $A$ .

Definition 3 The above definition is confusing, since it conflates structure and model, which the text later distinguishes with the following separate definition. A model (of a theory (over a vocabulary $\sigma$ )) is a structure (“also called a model”) of vocabulary $\sigma$ such that every sentence in the theory is interpreted in the structure to make the sentence true. (A theory is a set of sentences drawn from a vocabulary.) My rephrasing of the definition of model is intentionally confusing and difficult to parse, to make apparent the inherit confusingness created by the several layers of definitions and one definition that defines “model” using a second definition of “model”.

Definition 4 Nlab hosts an article with a much clarified definition, which distinguishes language, theory, structure, and model carefully. In particular, it is careful to only call structure the interpretation of the language (call vocabulary above), and only call model an interpretation that makes true the axioms composing the theory of the language.

Definition 5 Carlo Anguli once gave me the following definition of model:

A collection of interpretation functions that interpret every syntactic category in such that the original relationship is respected.

e.g.,
- interpret every context as a set,
- interpret every (non-dependent) type as a set, and
- interpret every term-of-a-type indexed-by-a-context as an element-of-the-interpretation-of-that-type indexed-by-elements-of-the-interpretation-of-that-context.

Implicit in this definition is that the interpretations must respect equality — because if you don’t respect equality of arguments then you’re not a function!

This definition seems to be close to Definition 2, as it doesn’t mention axioms and their interpretation. However, it might be Definition 3 instead, as there could be implicit in the definition of syntax an inclusion of all judgements of the programming language, and therefore in the phrase “such that the original relationship is respected” a requirement that axioms of those judgements become true.

Also implicit in this definition of model is what it is a model of. Perhaps a programming language, but it again depends on what syntax means and thus what “every syntactic category” refers to.

I’m interested in the requirement that the interpretation is a collection of functions, which seems to be missing or only implied in some model theory definitions of “model”.

So what is a model?

One of the first thing that jumps out to me after reviewing the above definitions is that to understand each definition, you have to reframe the definition of model into model [of what]. It really never makes sense to give a definition of merely “model”.

Definition 0 defines a model of a system [of the real world]. Definition 1 defines a model of a type theory. Definitions 2 and 3 give definitions of a model, in the sense of model theory, but of two different objects: model of a vocabulary (or language), which is more often called a structure, and model of a theory (which everyone seems to agree is a “model”). Definition 4 makes this distinction very clear. Definition 5 seems to use “model” in the model-theoretic sense, but has abstracted a bit away from a particular notion of theory and generalized to syntax.

What is a model of a programming language?

I’ve had two problems understanding the word “model” in the context of programming languages.

First, we use “model” in three different senses, and I have neither understood that nor understood the relationships between them.

Model in the sense of an abstract description of a system. This is Definition 0. This sense of “model” means something like “mathematical description”. What we want is a description in which we can work using math, so we can make predictions about the real world. Ideally, the predictions we make will be true.
Model in the strict sense of model theory. These are Definitions 2, 3, and 4. This sense of “model” is the closet to having a strict definition. It often carries a set-theoretic connotation, asking for a set defining the domain of values, and three interpretation functions that interpret specifics parts of a theory in specific ways.
Model in the generalized sense, inheriting from or related to model theory. I hesitate to even call this distinct from the second sense, but I will anyway. I also hesitate to speculate about history—perhaps this sense actually predates model theory. But I distinguish it from the second sense, because it frequently generalizes away from the strict 3 category “constant symbol”, “predicate symbol”, “function symbol” specification and doesn’t seem beholden to set theory. Definitions 1 and 5 use “model” in this sense.

In the second sense of “model”, the first sense of the word remains—we’re still interested in a description of some system (the theory), and of using the model to make predictions or reason. However, since the theory is also mathematical, we can be more rigid about our reasoning requirements—axioms of the theory must be true of the model, and relationships must be preserved in the model. This is rarely true of a model of the real world; e.g., the Newtonian model of gravity works pretty well, until it doesn’t, so it’s a model that doesn’t quite make all axioms true or preserve all relationships.

The third sense seems closer to the idea of semantics, in the mathematical logic sense of the word as assigning meaning or interpretation to syntax. In this sense, the word “model” frequently avoid committing to set theory as a formal foundation, generalizes away from the three interpretation functions, and focuses instead on the relationships between uninterpreted syntax being preserved by the interpretation. For example, in Definition 1, the relationships of interest are well-typedness, definitional equivalence, and falsehood, and the formal foundation is type theory. Category theory seems to come closest to a complete formalization of this sense of the word “model”, although I’ve had a hell of a time understanding that. Nlab articles don’t say this explicitly, but reading between the lines in articles linked to from the Nlab article on model theory for the words syntax and semantics implies that the idea of syntax, i.e., uninterpreted symbols with relationships between themselves and judgements about them, can be formalized in category theory, and then so can the idea of semantics, i.e., providing an interpretation in some other domain of those uninterpreted symbols; a domain in which one can use all the power of the other domain to reason about the judgements one wishes to make about the uninterpreted symbols.

The second problem with the word “model” is that we frequently work with two senses simultaneously.

When I write down a programming language, I’m often trying to model (in the first sense) a real programming language (or some feature of it), one actual software developers use to make real things happen in the real world. I am not merely describing a mathematical object for study. (Okay, sometimes I do that, but usually to the first end, eventually.) When I write down such a model, I may describe the abstract syntax, the typing judgement, and an abstract machines or reduction rules. These form a pretty good mathematical description of how a real language behaves. A compiler will reject syntactically invalid expressions. It may then type check the abstract syntax tree, and reject some possibly semantically invalid expressions. If judged well typed, the compiler may transform the tree into something that runs, and that run-time behaviour can be predicted using the reduction rules.

However, for much programming languages work, I’m not interested in merely predicting the behaviour of a single program. I might want to predict behaviour or properties of the entire language, or its typing judgement, etc. To reason about single programs, the model (in the first sense) may work well. But it might not work well for, say, trying to decide whether certain types can even be inhabited. To solve this, we might build a model (in the second or third sense). We interpret the abstract syntax tree and typing judgement in some other domain. That is, the AST and the typing judgement, being a model in the first sense, form a theory in the model theoretic sense. We can then construct a model (in the second sense) of a model (in the first sense). The Standard Encyclopedia of Philosophy article on model theory goes into this in detail in the context of model theory, which is great.

What’s more interesting is how these two senses of model interact in programming languages. If one is interested in a model, in the second sense, it may inform how one develops a model (in the first sense). If I know I will want to construct a model (in the second sense) to reason about the typing judgement, I may decide that single-step reduction rules are actually irrelevant; I only care that certain program equivalences hold, really, and any implementation that has those equivalences suffices. So rather than create a model (in the first sense) with an abstract machine or small-step operational semantics, I’ll specify an equivalence judgement. This might give less predictive power about a real world implementation, but allow the predictions I do make to apply to many implementations.

If you see these patterns, you may have some insight into how the author is approaching their work, and in what senses they are using the word “model”.

What is syntax?

2023-06-07T20:58:46Z

I’m in the middle of confronting my lack of knowledge about denotational semantics. One of the things that has confused me for so long about denotational semantics, which I didn’t even realize was confusing me, was the use of the word “syntax” (and, consequently, “semantics”).

For context, the contents of this note will be obvious to perhaps half of programming languages (PL) researchers. Perhaps half enter PL through math. That is not how I entered PL. I entered PL through software engineering. I was very interested in building beautiful software and systems; I still am. Until recently, I ran my own cloud infrastructure—mail, calendars, reminders, contacts, file syncing, remote git syncing. I still run some of it. I run secondary spam filtering over university email for people in my department, because out department’s email system is garbage. I am way better at building systems and writing software than math, but I’m interested in PL and logic and math nonetheless. Unfortunately, I lack lot of background and constantly struggle with a huge part, perhaps half, of PL research. The most advanced math course I took was Calculus 1. (well, I took a graduate recursion theory course too, but I think I passed that course because it was a grad course, not because I did well.)

So when I hear “syntax”, I think “oh sure. I know what that is. It’s the grammar of a programming language. The string, or more often the tree structure, used to represent the program text.”. And that led me to misunderstand half of programming languages research.

The First Meaning of Syntax

Syntax has two meanings in programming languages, and both meanings can frequently be found in the same paper.

The first meaning is the one I gave above. I could give a definition of the syntax (in the first sense) of the lambda-calculus as follows.

e ::= x | (lambda (x) e) | (e e)

Ah. Beautiful syntax.

If we were following a standard text, such as Harper’s Practical Foundation for Programming Languages (2nd ed), we might next define the “semantics” of this “syntax”. We might define the “static semantics”, i.e., the type system or binding rules, then the “dynamic semantics”, i.e., the rules governing the evaluation behaviour of the syntax. For example, I might write the following small-step operational semantics.

((lambda (x) e) e') -> e[x := e']

Ah. Beautiful semantics.

Except, everything I wrote above, reduction rule included, is also syntax and not semantics.

Historical Interlude

The words “syntax” and “semantics” come from mathematical logic.

In that context, “syntax” describes sentences, statements, symbols, formulas, etc, without respect to any meaning. You can write down a logical formula say as "∀ X.P(X, A)" (where “A” is a logical constant, “X” is a variable, “P” is a proposition), and it has no meaning; it’s mere syntax. It might be true, or might be false, depending on its interpretation of “P”, “A”, and "∀". I could say that it means “all leaves are green”, which would be false. A more relevant example for PL might be the syntax ((lambda (x) x+1) 2) = 3, which I would certainly like to be true, but it very much depends on what I mean. If + means string append as in JavaScript, then the statement is false since ''.concat(1, 2) = '12'. Wikipedia is a good start for trying to understand this history of the word “syntax”: https://en.wikipedia.org/wiki/Syntax_(logic)

By contrast, in that same context, “semantics” is the means by which syntax is given an interpretation. Perhaps the most widely used approach to providing an interpretation of syntax is model theory, which I never learned. In model theory, we start with a “syntax” (or “theory”). This theory is a collection of constants, function symbols, and predicate symbols. A model then is a map from the uninterpreted syntax to some interpretation that preserves relationships. I’ll say more of this in a later post, but for now, consider the following example. I might provide a model of our earlier example that interprets + as ''.concat, and = is mapped to, say ===. This preserves relationships, if all my constants are mapped to strings. Wikipedia is a good source for this history too: https://en.wikipedia.org/wiki/Semantics_of_logic.

When Semantics is the Syntax

What’s interesting about this history is how it was adopted in programming languages, and evolved in two different ways. On the one hand, a programming language grammar is syntax, in the sense of being uninterpreted statements. That syntax can be given a semantics, an interpretation, by using operation semantics (this is the sense in which operational semantics is a semantics). The operational semantics provides an interpretation to our grammar.

But, in another sense, the grammar, typing rules, and evaluation rules (the “syntax”, “static semantics”, and “dynamic semantics”) are mere syntax, in the older logical sense. They are a theory, in the model-theoretic sense. To see why, we must understand what the earlier example ((lambda (x) x+1) 2) = 3 means. Or in fact, realize that it doesn’t mean anything at all.

To write this down is to write down a proposition about the grammar: that one piece of the grammar is equal to another. Except I didn’t write a proposition that the two were equal. I wrote the uninterpreted proposition symbol =, the syntax =, next to two pieces of uninterpreted grammar, two other pieces of syntax. Every syntactic judgment about our grammar is itself syntax, in the model theoretic sense. At least, this is true if we follow the tradition of writing them down synthetically, axiomatically, about the grammar, as is done in standard programming languages textbooks such as Types and Programming Languages or Practical Foundations for Programming Languages.

In this view, the typing rules and reduction relations are syntax. This is a bizarre perspective from a software engineering perspective, but makes sense from the mathematical logic perspective.

With this perspective, it might make sense to call “operational semantics” “syntactic semantics”, or to imagine a tower of syntax and semantics where one level’s semantics become the next level’s syntax. This view finally helped me make sense of why we call “syntactic logical relations” syntactic, when they are clearly semantics. (A problem I danced around in my previous post on logical relations.)

This perspective is also useful, for two reasons. The first is that reasoning purely syntactically, while very general, prevents you from importing any other reasoning principles from any other domain. By viewing the typing system as syntax, and then building a model of it (and by necessity, the programming language terms) in, say, set theory, we can import all set-theoretic reasoning in our attempts to reason about our type system. But more than that, we can reinterpret the syntax freely, to prove general results. While I might have written a type system using syntax that looks like numbers, I could build a model that interprets that type system as over strings, and know that actually the entire system is safe for strings, too. Appropriately generalized, I wouldn’t need to do any additional proofs.

Unfortunately, this double meaning of the word syntax seems to be completely taken for granted by some. nLab is a good example of this. To quote from the introduction to the nLab model theory page:

On the one hand, there is syntax. On the other hand, there is semantics. Model theory is (roughly) about the relations between the two: model theory studies classes of models of theories, hence classes of “mathematical structures”.

What’s most interesting about this quote isn’t what it says, but what it links to. The link for “syntax” is to the page on the internal logic of a category. From the software perspective, this is not syntax, but semantics. How on earth could it be syntax? The link for “semantics” is to the page on structure, the idea of equipping a category with a particular functor. How on earth is that any more semantics than the original abstract nonsense version of syntax?

Before I understood “syntax”, I couldn’t make any sense of that, but now I’m beginning to understand. The internal logic of a category in some sense must be able to express the grammar of a language, and the judgments of a language, but in a purely syntactic way—in the same way that when I write down the grammar and typing rules of a language, I don’t refer to any interpretation of those symbols beyond the way I combine them on the page. Then the semantics or structure is a the particular functor over that category, providing an interpretation, a semantics, of that original category (the syntax).

Anyway, now I think I’m ready to understand what a model is.

In What Sense is WebAssembly Memory Safe?

2023-05-19T02:35:56Z

I’ve been trying to understand the semantics of memory in WebAssembly, and realized the “memory safety” doesn’t mean what I expect in WebAssembly.

What is memory safety?

Here are some definitions.

Memory safety is a feature of programming languages that prevents certain types of memory-access bugs, such as out-of-bounds reads and writes, and use-after-free bugs. In an app that manages a list of to-do items, for example, an out-of-bounds read could involve accessing the nonexistent sixth item in a list of five, while a use-after-free bug could involve accessing one of the items on an already deleted to-do list.

https://spectrum.ieee.org/memory-safe-programming-languages

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences.

https://en.wikipedia.org/wiki/Memory_safety

Memory (un)safety in Wasm

WebAssembly (Wasm) is a language that guarantees “type safety … [preventing] invalid calls or illegal accesses to locals, … memory safety, and … inaccessibility of code addresses or the call stack”.

(Technically, the Wasm paper describes Wasm as a binary code format, that happens to be presented as a language.)

Formally, a whole Wasm program that type checks is guaranteed to either be a well-typed value, or take an evaluation step to a well-typed program, or evaluate to the well-known dynamic error “trap”.

This is in contrast to an unsafe language like C. A well-typed C program might take a step to a well-typed program, or it might evaluate to a value of arbitrary type or no type. For example, a well-typed program of type char that reads from a buffer might evaluate to a well-typed char, or it might evaluate to an arbitrary integer that does not correspond to any character because you were reading uninitialized memory.

For example, consider the following C program.

// unsafe.c
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char** argv) {
  char* buf = malloc(0);
  memcpy(buf, "Hello world\n", 12);
  write(1, buf, 12);
  return 0;
}

(compiled with clang -o unsafe.exe unsafe.c; run with ./unsafe.exe)

This program creates a buffer of size 0, writes “Hello world\n” to it, and tries to print that to standard out. The program printed “Hello world” when I ran it, but it’s undefined behaviour, so anything could happen. I tried to writing a loop that mallocd lots of memory and wrote arbitrary numbers, but never managed to crash the program. Still, it’s not memory safe.

The equivalent Wasm program is below.

;; safe.wat
(module
 (import "wasi_unstable" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

 (memory 0)
 ;;(memory 1)
 (export "memory" (memory 0))

 (data (i32.const 0) "Hello World\n")

 (func $main (export "_start")
       (i32.store (i32.const 12) (i32.const 0))
       (i32.store (i32.const 16) (i32.const 12))
       (call $fd_write (i32.const 1) (i32.const 12) (i32.const 1) (i32.const 20))
       drop))

(run with wasmtime safe.wat)

In this example, we create a string “Hello World\n” at address 0 in the module’s memory. We then create (encode) a new iovs just after it, starting as address 12, with a pointer to address 0 and length 12. Then we call fd_write, from the wasi API.

Unfortunately, we declared the memory size to be 0, so trying to allocate this string fails, traps safely, and the process exits with an error message.

So wasm is memory safe right?

Well, sort of, but there’s a pretty key distinction here.

In C, we are creating a new pointer with malloc. We are allocating a new data structure, then using it (unsafely).

In Wasm, there is exactly one memory for the entire module. Inside that memory, we encode 2 data structures: our string, and the iovs structure used by fd_write. All access to the global memory are safe. But not all accesses to the encoded data structures are.

Most application will create data structure within the memory. That’s what our call to fd_write did. The two stores actually create an iovs structure in the global memory. We have no guarantees, within Wasm, about that data structure.

For example, here’s our Hello World program in Wasm which uses the memory safely and correctly, but creates an iovs whose length is claimed to be 100, larger than the actual string.

;; unsafe.wat
(module
 (import "wasi_unstable" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

 (memory 1)
 (export "memory" (memory 0))

 (data (i32.const 0) "Hello World\n")

 (func $main (export "_start")
       (i32.store (i32.const 12) (i32.const 0))
       (i32.store (i32.const 16) (i32.const 100))
       (call $fd_write (i32.const 1) (i32.const 12) (i32.const 1) (i32.const 20))
       drop))

(run with wastime unsafe.wat)

When I run this, I get “Hello world\nd” printed to stdout. I have no idea where that trailing d comes from, and it didn’t crash, suggesting it read uninitialized memory of some kind.

Arguable, this is cheating: Wasm does not and cannot make claims about external system functions, and wasi is unstable. But IMO the root of the error isn’t really about wasi.

Really, the root cause of this error is memory unsafety, but of a data structure encoded within a Wasm module. In a truly memory-safe language, if I try to access the 100th element of a 12-character long string, I get an error:

> racket
Welcome to Racket v8.9 [cs].
> (string-ref "Hello world\n" 100)
; string-ref: index is out of range
;   index: 100
;   valid range: [0, 11]
;   string: "Hello world\n"
; [,bt for context]

But that doesn’t happen in Wasm.

Wasm memory safety doesn’t apply to data structures implemented (encoded) within the memory. It only applies to the module’s memory, which is protected from other modules, even those running in the same process’s virtual address space.

This means Wasm modules are protected from each other, and so this kind of memory unsafety probably isn’t a security risk, only a cause of logic bugs.

In Wasm, data structures have to be encoded anyway, since Wasm doesn’t provide any kind of structure data primitives; you only have integers and some integers are interpreted as addresses into memory. But, when you encode such data structures in the memory and use them incorrectly, you have no guarantees about what happens. You could read some arbitrary data (from your own module), or read some uninitialized memory (from your own module). I.e., you get out-of-bounds reads and writes.

In another view of this, memory is the only data structure in Wasm, and it is memory safe. That’s all the language can be responsible for; if you go about encoding weird things inside that data structure, errors are likely. But this doesn’t seem like what people would expect when they hear “memory safe”. At least, it’s not what I expected at first.

What is logical relations?

2023-03-24T23:32:03Z

I have long struggled to understand what a logical relation is. This may come as a surprise, since I have used logical relations a bunch in my research, apparently successfully. I am not afraid to admit that despite that success, I didn’t really know what I was doing—I’m just good at pattern recognition and replication. I’m basically a machine learning algorithm.

So I finally decided to dive deep and figure it out: what is a logical relation?

As with my previous note on realizability, this is a copy of my personal notebook on the subject, which is NOT AUTHORITATIVE, but maybe it will help you.

Here’s my working definition of a logical relation:

A realizability semantic model,
built of predicates over syntax,
that reflects judgments and structures from semantics to syntax.

Point 1 is subtle; it implies that the logical relation is both a model, and a realizability semantics. Unfortunately, I still don’t know what a model is, so I’m going to have work with the following probably wrong oversimplification: the logical relation must take (syntactically) equal terms to (semantically) equal terms. Which notion of syntactic equality though? I’m not sure, and I’m going to ignore it for now.

Point 2 is actually more specific than necessary. We don’t need to predicates over syntax specifically, but really over some base model. It’s easier for me to think of this as “syntax”, though.

Point 3 is quite difficult to make precise without making a lot of this more precise in a mathematical framework. Jon Sterling gave me the following helpful definition:

A logical relation on a model M (viewed as a category) is then a model that is constructed in the following way:

Choose some functor R : M —> E where E is a sufficiently structured category (e.g. the category of sets, or something else!). The most basic example of a functor R is the “global sections functor” M —> Set, which sends every type in M to the set of closed elements of that type. This is exactly the usual “non-Kripke logical relations"; to get Kripke logical relations, you replace Set with a functor category (presheaf category) and choose a more interesting functor R.

Now define a new category G, as a category whose objects are pairs of an object A of M, together with a subobject of R(A). A morphism in G from (A,A’) to (B, B’) is given by a morphism (f : A -> B) that sends elements satisfying A’ to elements satisfying B’.

You have to show that the category G is actually a model of your language (e.g. show that it has function spaces, booleans, whatever). Doing so is the FTLR.

Note that there are some ways to generalize the situation above, but this is basically what logical relations are.

Point 3 is also is more specific than necessary; “syntax” can be generalized to be “base model”.

Despite the complexity, we can see point 3 in action in some examples below.

What are logical relations, historically?

tait1967 - Intensional interpretations of functionals of finite type I

Logical relations are sometimes called “Tait’s Method”, dating back to Tait, as far as I can tell.

In this paper, Tait proves that System T with bar induction is a conservative extension of intuitionistic analysis U_1, which is intuitionistic arithmetic plus quantification over functions plus the axiom (schema) of choice plus bar induction. This conservative extension property is the semantic property of interest. The proof starts with a proof that T without bar induction is a conservative extension of just intuitionistic arithmetic (no choice or bar induction).

To do this, Tait develops a type-indexed predicate over System T terms (without bar induction), providing a U_0 term for all T terms of each type. These predicates M_t, C_t, and E_t are (I think) what we refer to as a logical relation. In particular, the C_t relation provides the interpretation of T values of type t, M_t seems to deal with variables, and E_t seems to be a binary relation defining semantics (“weak α-definitional equality”) of terms.

Theorem V (page 205) uses this logical relation to prove that, for all semantics values at the same type, (weak α-) definitional equality is decidable: they either are or are not related in E_t. This seems to be the key point: the definitional equality is reflected out of the semantics of terms, so it can apply to the syntax of terms.

This use of logical relation seems to also be a realizability semantics, since it it assigns syntactic types to a collection of semantics terms, by induction over syntactic types, where the realizers are a subset of all possible semantics terms.

However, it seems to be more than a realizability semantics, too. What seems very important in this paper is that the semantics preserves structure, namely definitional equality. Perhaps implicitly though, other pieces are important. For example, T functions are interpreted as U functions, although it’s not clear to me that this is critical.

This is in contrast to Kleene’s (kleene1945) realizability, which did not seem concerned with structure, but only the existence of the realizers.

plotkin1973 - Lambda-definability and logical relations

Plotkin seems to be responsible for the name, and perhaps rediscovering logical relations in the context of programming languages.

Plotkin helpfully gives us a definition of “logical”, as well, and it seems quite importantly related to part 3 of my working definition. Plotkin defines a relation R as logical if it is:

a subset of any D_k from the carrier any D∞ model (this seems to correspond to “admissible relations” in modern logical relations parlance);
the relation is preserved by functions in D. That is, the relation holds on a function f in D_k iff for all arguments x, R(x) implies R (f x) (extended to the n-ary case for n-ary relations).

This suggests that it is important the logical relation is somehow interpreting syntactic structures as semantic structures, as in the case of Tait’s model interpreting syntactic functions as semantic functions. More generally, we likely want this property of all structures in the languages: syntactic pairs are interpreted as semantic pairs, etc. Jon’s category theoretical definition seems to generalize Plotkin’s definition nicely.

This denotational logical relation also shows us a logical relation that is not defined over syntax. Instead, it is a relation over some arbitrary non-trivial D∞ model. The author mentions that since they can interpret syntax in a D∞ model, they informally treat the logical relation as over syntax sometimes, which I suppose could be made formal easily enough.

How is “logical relations” used in PL?

ahmed2006 - Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types

In this paper, Ahmed is concerned with syntactic logical relations for recursive and quantified types, in particular for reasoning about contextual equivalence. Likely due to Ahmed’s work, this kind of syntactic logical relation seems to be what most people mean or think when they say “logical relation”, although that may be changing.

The desired property of the logical relation then is that two related semantic terms should be contextually equivalent in the syntax. That is, the logical relation reflects (from semantics to syntax) equivalence.

Strangely (for a realizability model), this particular syntactic logical relation also reflects typing: semantic terms in the relation are also guaranteed to be well-typed in the syntax. In contrast, some uses of “logical relations” enable semantics terms to be syntactically ill-typed. Such logical relations might be better called realizability models, although they do something reflect some structure, so perhaps reflecting typability is not a critical point of reflecting structure.

Ahmed in the introduction points out an interesting distinction: that logical relations can be either denotational, or syntactic. Syntactic logical relations model syntax as sets of syntactic values such that some property holds over that syntax. By contrast, denotational logical relations model syntax as some denotational object, Syntactic logical relations are useful for proving properties of the operational semantics directly. Denotational models instead model syntax as denotational objects, such as, e.g., sets of set-theoretic functions over elements of a D∞ model in plotkin1973. This is useful for easily proving meta-theoretical properties by reflecting properties of the denotation into the syntax, but not necessarily about the operational semantics directly.

For example, Tait uses a “denotational logical relation” into intuitionistic analysis to prove that definitional equality of System T is decidable—the definition of definitional equality, in the model, and its proof of decidability, are reflected back into the syntax; this requires no operational semantics at all. Plotkin uses a denotational logical relation, into domain theory, to show that certain λ-calculus constructs are or are not definable—existence of a term in the logical relation is reflected into the syntax as a definable expression. Neither of these is a syntactic logical relation; the semantic values never mention syntactic values directly.

Ahmed uses a “syntactic logical relation” to prove something about the operational semantics, namely, to prove contextual equivalence (an operational notion), indirectly. Direct proofs of contextual equivalence are difficult. So instead, a semantic proof of equivalence is reflected back into the syntax as ta proof of contextual equivalence. This requires structuring the logical relation into a denotation of sets of syntactic terms that evaluate in the operational semantics, so that being in the relation tells us something about evaluation in the operational semantics, which tells us something about contextual equivalence.

abel2018 - Decidability of conversion for type theory in type theory

Abel et al. define a syntactic logical relation for typed, reducible (and equivalent) terms, to prove decidability of conversion for type theory. Here, the use of syntactic logical relation is important for proving a particular conversion algorithm over the syntax is decidable.

The interesting feature of this logical relation is the generalization from a model inductively defined over types, to inductively defined over judgments. This demonstrates a weakness in my working definition of logical relation and realizability, since I defined “realizability” in terms of models inductively defined over types.

timany2022 - A Logical Approach to Type Soundness

This paper is interesting because it uses a syntactic logical relation that intentionally does not reflect typing, as many syntactic logical relations cdo. Semantically valid terms are not necessarily syntactically valid. In other ways, it looks very much like a logical relation: syntactic pairs are semantic pairs, sums sums, functions functions, etc.

The key property this paper is interested in is type safety: all well-typed terms are well-defined in the operational semantics, i.e., they evaluate to values or well-defined errors or fail to terminate, but importantly, do not get stuck. “in the operational semantics” is important to understanding why this is a syntactic logical relation; it must model terms as sets of syntactic values to reason about the operational semantics given in the paper.

However, one could imagine proving a slightly different form of type safety with a denotational logical relation. Giving a logical relation into an arbitrary model with a well-defined notion of evaluation would be implicitly a proof of type safety: that there exists a model that is type safe. The ability to reflect from semantics to syntax provides a mechanism for constructing that evaluation over syntax. So while the denotational logical relation provides no direct proof about the operational semantics, it may provide a mechanism for a type-safe-by-construction operational semantics. (This reflecting evaluation out of the semantics seems very related to the idea of normalization-by-evaluation, but I’m not clear on this.)

What is realizability?

2022-10-05T21:54:39Z

I recently decided to confront the fact that I didn’t know what “realizability” meant. I see it in programming languages papers from time to time, and could see little rhyme or reason to how it was used. Any time I tried to look it up, I got some nonsense about constructive mathematics and Heyting arithmetic, which I also knew nothing about, and gave up.

This blog post is basically a copy of my personal notebook on the subject, which is NOT AUTHORITATIVE, but maybe it will help you.

My best understanding of realizability right now, in programming languages (PL) terms, is:

A technique for assigning each syntactic type to a collection of semantic terms;
By induction over syntactic types;
Where the semantic terms that are realizers—i.e., included in the collection related to some syntactic type—are a sub-collection of all possible terms in the semantic domain. That is, there are valid semantic terms not associated with any syntactic type.

I use the word “collection” rather than “set” to avoid invoking set theory.

Graphically, we can represent this as follows:

The point of the technique is that clause 2 gives us a proof technique by induction, and clause 3 means we can relate the collection of terms (or proofs) to some other well-known collection. This yields a proof technique for metatheoretic properties about the collection, such as that there are only terminating terms in the collection of realizers, or there are only recursive functions and therefore some classical things remain unprovable.

I’m not entirely sure that clause 2, induction, is necessary, and I can’t find anything explicit about clause 3, but they seem to be true historically and in many uses of the term.

Okay so how did I get to this understanding?

What is realizability, historically?

kleene1945 - On the Interpretation of Intuitionistic Number Theory

Realizability seems to come from Kleene’s paper “On the Interpretation of Intuitionistic Number Theory”. I say “seems to” as Kleene attributes the “detailed investigation of the notion of realizability” to David Nelson, attributes several of the results in the paper to Nelson, and claims that the main results of the paper are joint work with Nelson. But the paper only has Kleene’s name on it, and Kleene claims in the first footnote that they introduced the idea of realizability to Nelson in a seminar. So anyway, realizability seems to come from Kleene, and this is the canonical paper cited for the technique.

In this paper, realizability is quite specific. It’s a technique that takes an intuitionistic first-order logic formula about Peano arithmetic (Heyting arithmetic) and constructs a natural number from it, representing the (constructive) proof of that formula. Only provable formulas are realized. The point of this exercise is to prove various metatheorems about the realized language: is it consistent, and what are provable/unprovable in the intuitionistic formulae.

Intuitively, something is unprovable if there exists a formula, but there does not exist a realization of it. This can be shown by connecting the formula to the set of realizers (in this case, natural numbers), but showing that there cannot exist a related natural number (or, more often, function on natural numbers represented by its Gödel number) with the properties required of the realizability interpretation. The simplest example: since “false” is unprovable (it has no realization, by construction), the intuitionistic logic is consistent.

This also lets us prove something about the class of all provable statements. Since we have a method for constructing something from any provable (or true) statement, we can say something about the set of all provable statement in relation to the realizers. Kleene mentions one consequence is that the intuitionistic calculus cannot prove the existance of any function other than a general recursive function, since those are the only functions constructed in the realizability interpretation. This tells us, for example, that the intuitionistic calculus is different from classical set theory, which contains other functions.

An important detail in this paper that clarifies the distinction between the intuitionistic and the classical happens in Clause 6, on page 113. This is the definition of the realizability interpretation for existential quantification ∃x.A(x). This has a realization if, for some x, A(x) has a realization. It’s important to notice that this second “for some x” quantification happens in the metalanguage, namely, classical set theory, and therefore could be choosen by Choice. Kleene discusses this on page 118, where he uses the word “classically” as a modifier on various quantifiers to remind us that, when working with the quantification and realizers directly, we are working in a classical system in which intuitionistic proofs also exist.

What seems to be going on here is that the realizers are something like the intuitionistic subset of classical set theory. I think that statement isn’t exactly true; Kleene uses classical choice when working with the realizers to show there are unprovable theorems. For example, a realizer parameterized over (classically) all variables may not correspond to an intuitionistic formula. So it’s not that the realizers are only intuitionistic, I think. But any particular realizer is (must be)? The important point may be the realizers are a subset of the whole system, and thus we can prove interesting metatheorems that rely on distinguishing the realizers (and therefore, the formulae they realizer) from all the things in the full system.

amadio1998 - Domains and Lambda-Calculi, Chapter 15

Chapter 15 of Amadio and Currien’s book “Domains and Lambda-Calculus” introduces realizability in its historical context. The introduction formalizes Kleene’s work as an example, and discusses its use.

They emphasize two things, which seem to confirm some of my understanding:

The realizability relation is defined inductively over formulas, and relates formulas to proofs.
The use lets us reason about all proofs in the system.

This is the best definition of realizability I’ve seen, and applies both to Kleene’s original, but also to uses in PL.

The authors point out that Kleene’s original goal was to prove consistency. They then confirm my above intuitions, that the realizability interpretation also lets us prove metatheorems about what is provable/unprovable in the realized system. However, they note that one application of this is to find unprovable true statements, which can be consistently axiomatized back into the original system. There are proofs in the set of realizers, i.e., true statements, that are never constructed by the realizability interpretation. These could be added back to the original system to enrich it.

This latter use seems to confirm one feature of realizability that isn’t explicit stated anywhere, but seems to be true of all realizability interpretations I’ve seen: that the realizers are a strict subsystem of some larger formal system.

How is “realizability” used in PL?

In programming languages, we’re not often concerned with intuitionistic vs classical logic; we’re working constructively by default. In fact, many of the uses of “realizability” in PL don’t seem to be related to logic at all, but to modeling well-typed programs. And while, sure, these are related by Curry-Howard, the difference seems important to me. So what does realizability mean in this context?

In most uses in PL, the important feature seems to be clause 3 in my definition above: the collection of all values is larger than the set of realizers. In PL, this suggests that we’re ascribing types to “untyped” terms, and the realizers are those that are semantically well typed, but not necessarily syntactically well typed. The full collection contains also untyped terms, and we can therefore prove through realizability that the type system rules out ill-typed terms.

There do seem to be some examples in PL that are explicitly relating classical and intuitionistic ideas, namely those trying to import constructive interpretations of classic logic. I’m not really interested in those, and I think the connection to realizability is much more clear in those applications, so I’ll ignore that area.

Let’s look at some examples.

benton2010 - Realizability and Compositional Compiler Correctness for a Polymorphic Language

In “Realizability and Compositional Compiler Correctness for a Polymorphic Language”, Benton and Hur define a “realizability” interpretation of System F types realized by terms in low-level language, for proving some compiler correctness properties. The terms realize the types, and this lets us talk about which low-level programs are valid to link with, without restricting the set of linkable programs to only those generated by the compiler.

This has lost all connection to intuitionistic vs classical logic, but I suppose it keeps the key features of the technique: types (formula) of one language are realized by terms in another, and there is some concern that the realizers should be a subset of all terms. Not all low-level programs should be valid, but some set of them should be.

nakano2000 - A Modality for Recursion

“A Modality for Recursion” was actually the start of my realizability journey. This paper starts by defining a collection of models (β-models) of the untyped λ-calculus. It then defines the class of realizability models, in terms of β-models, for an extrinsically typed λ-calculus with equi-recursive types. A realizability model is parameterized by a β-model, and is a relation inductively defined over types to their realizers, which are values drawn from the β-model.

So why is this realizability? Well, I don’t see anything to do with intuitionistic vs classical. But, the set of all values is larger than the set of realizers, which seems to be important to all uses of “realizability”, and important for this result in particular. In this paper, this is used to show that the dot modality rules out some valid β-model terms, namely those that would correspond to non-terminating λ terms.

Later in the paper, they define a “realizability interpretation”. This seems to be distinct from the collection of all realizability models in that they pick a particular set of realizers? So, it ought to be a realizability model, I guess? But they don’t say so explicitly. The interpretation is still quite heavily parameterized, but it does seem to fix or restrict the set of realizers. Anyway, this interpretation includes all the features of my definition above: it’s inductively defined over types, relating types to (a semantic model of) untyped λ terms, for the purposes of proving something about the collection of realizers as they related to the collection of all untyped λ terms.

The A Means A

2022-06-30T17:25:55Z

I have argued about the definition of “ANF” many times. I have looked at the history and origins, and studied the translation, and spoken to the authors. And yet people insist I’m “quacking” because I insist that “ANF” means “A-normal form”, where the “A” only means “A”.

Here, I write down the best version of my perspective so far, so I can just point people to it.

I want to answer three question: what does the A mean, why does the A matter, and where does the A come from.

What does the A mean?

The “A” in “A-normal form” refers to a particular formal object, named “A” (not “administrative”), with respect to which there is a normal form with certain useful properties. This form is “A normal”—none of the A reductions apply to terms in this form—hence, A-normal form.

While it’s true that the history of ANF is concerned with “administrative reductions” in CPS, this is an informal concept, modeled by the formal object “A”.

In truth, “A” is several formal objects, defined somewhat differently in at least 3 different papers. Only one of these is arguably called “administrative”, but is about CPS, and not what we now call ANF.

“A” appears in “The Essence of Compiling with Continuations”, page 5. Under the discussion of the CPS, optimization, and un-CPS diagram, the authors observe that this diagram begs for a completion, some direct process, “A”, that simply normalizes a term within the same language. This diagram is reproduced below: $\begin{array}{ccc} e & \overset{CPS}{\to} & e' \\ \overset{A}{\downarrow} && \overset{\beta}{\downarrow} \\ e_A & \overset{unCPS}{\leftarrow} & e_O \end{array}$ They ask, what are some set of reductions, call this set A, such that normalizing with respect to A would produce a normal form, A-normal form, that characterizes the use of CPS in practice.

The same pattern appears in "Reasoning about Programs in Continuation-Passing Style, page 1:

Thus, we refine this question as follows: Is there a set of axioms, A, that extend the call-by-value λ-calculus such that: …

The authors go on to define the set A, never naming it by some administrative reductions, but deriving A instead from the inverse CPS translation.

We could argue that Sabry’s thesis, Chapter 3, Section 1, “Administrative Source Reduction: The A-Reductions”, names the A-reductions “administrative”. He goes on to analyse those reductions considered to be the administrative ones, defining βlift and βflat in terms of CPS. He then defines in Definition 3.1, the administrative source reduction (A-reductions). However, these refer to reductions over CPS terms, and are distinct from the reductions considered for ANF. While they are the origin of ANF, they do not produce terms in what we now call ANF. A term in “administrative normal form” with respect to that set of reductions would actually be in CPS. That’s not what we mean when we say ANF; we mean normal with respect to the set A defined in “The Essence of Compiling with Continuations”.

Maintaining this distinction between the formal object A and the informal notion of administrative reductions is important for two reasons. First, it helps remind us that ANF is a form ultimately about normalizing a specific set of reductions, not the output of a particular translation, which is important in practice. Implementations often relax ANF until code generation, by omitting some of the A reductions, typically, A2 in “The Essence of Compiling with Continuations”—even that paper relaxes A2 in their implementation in the appendix, because A2 leads to exponential code duplication or requires object-language continuations (“join points”). It’s hard to even formally discuss this relaxation if we do not have the set of normalized reductions in mind. Second, the idea that ANF is free of “administrative” redexes is absurd, since the idea of the administrative redex is an informal concept: a reduction that isn’t really necessary but merely an artifact of the translation. It is easy to introduce such administrative redexes in ANF; e.g., let x = y in x contains an extra unnecessary ζ redex, but it is in ANF. It is, however, free of A reductions.

Why does the A matter?

I don’t actually care what the “A” means, or what the authors intended it to mean. I care that we think about ANF as a normal form, normal with respect to a specific set of reductions.

This most recent rant was triggered by a conversation with a reviewer, who, after observing that the “A” actually stood for “administrative”, asked whether our ANF translation could be decomposed into two translations, one that did everything but normalize the ifs (handling if is annoying in ANF, as it either requires being clever or causes code duplication), and then separately handle if.

The answer is completely obvious… if you think about ANF in terms of a normal form with respect to a set of reductions, and not as merely the output of some translation process, nor “CPS but like without adminsitrative redexes”. Since ANF is a normal form with respect to “A”, we can easily decompose it into multiple normal forms, thus deriving several decomposed translations: remove the A reduction that normalizes if, and you get another normal form. Remove the rules that normalize if and nested let, and you get monadic form.

But all of this is much more complicated to explain if you think of ANF as a particular translation or particular syntactic form, and not a normal form with respect to the set A. And this seems to be very likely how you will think of you think A means administrative.

Where does the A come from??

Incidentally, I spoke with Amr after he read this blog post. The origin of the “A” comes from a result by Curry, who proves some theorems about any combinatory logic extended by a set A of ground equations: https://staff.fnwi.uva.nl/p.h.rodenburg/Varia/RelCLlam.pdf

This led Matthias to ask Amr to create a set A, such that bla bla bla.

Amr admits he may have intended a pun between A and administrative, but doesn’t remember.

What is peer reviewing?

2022-04-26T23:56:04Z

I’ve been doing, and experiencing, a lot of peer reviewing lately. I’ve been ranting about it on Twitter as I get reviews that don’t help me and, in many ways, hurt me, and lauding reviews that provide useful constructive feedback (even if I disagree with it or the decisions). I’ve been trying to figure out how to provide good reviews and avoid negative aspects of reviewing.

I need to get the thoughts out of my head. These are not declarations of what peer reviewing is or should be, but my attempt to work through those questions.

The Scientific Aspect of Reviewing

scientific | adjective: based on or characterized by the methods and principles of science.

Science | noun: a systematically organized body of knowledge.

Mostly we think about the scientific aspect of peer reviewing. If we’ve written a scientific piece, we have tried to answer a question, scientifically. We have posed a research question, hypothesized some answer, tried to evaluate the answer objectively, and present all of that clearly to advance the state of knowledge.

The point of peer review, then, is to check that we have indeed done some science. To check our question is reasonable, that our evaluation is not flawed, and that we have indeed advanced the state of knowledge (and communicated that knowledge clearly, at least relatively to some community).

This interpretation of peer review is intuitive, but is it real?

Here are the evaluation criteria from the call for papers for some major programming languages conferences I’ve been involved in. They’re all SIGPLAN conferences, so they are all fairly similar:

POPL 2022:

The Review Committee will evaluate the technical contribution of each submission as well as its accessibility to both experts and the general POPL audience. All papers will be judged on significance, originality, relevance, correctness, and clarity. Each paper must explain its scientific contribution in both general and technical terms, identifying what has been accomplished, explaining why it is significant, and comparing it with previous work.
PLDI 2022:

Reviewers will evaluate each contribution for its accuracy, significance, originality, and clarity. Submissions should be organized to communicate clearly to a broad programming-language audience as well as to experts on the paper’s topics. Papers should identify what has been accomplished and how it relates to previous work.
ICFP 2022:

Submissions will be evaluated according to their relevance, correctness, significance, originality, and clarity. Each submission should explain its contributions in both general and technical terms, clearly identifying what has been accomplished, explaining why it is significant, and comparing it with previous work. The technical content should be accessible to a broad audience.

Correctness

Accuracy or correctness appear in all of these. This makes sense; one major aspect of reviewing is to make sure the scientific work hasn’t made any mistakes. Of course, the reviewing itself might make mistakes. The goal isn’t to do a perfect job, but to try. And if we try, keep doing science, keep checking each others’ work, keep asking questions and even re-asking questions, eventually we’ll approach something resembling the truth. I like this criterion and think it’s probably the most important one.

Originality

Originality appears in all three as well. This one is a little odd. If the point of science is to advance the state of knowledge, it makes sense that scientific work should be original, i.e., new, novel, producing something that was previously unknown. But, it also seems a little at odds with the previous criteria. One great way for checking correctness is reproducing or replicating prior results, double-checking existing work and making sure we get the same answer. The emphasis on originality or novelty seems to be at odds this goal. We could interpret it a little more generously, by considering replication and reproduction as new in the sense that they are new evaluations of an old question, so it is still original work. That’s okay with me. But it does require a little care in the interpretation of the reviewing criteria....

Clarity

Clarity appears in all three as well. This is interesting as it maybe seems irrelevant to the idea of a rigorous and objective evaluation of a research question.

It may even seem.. not true of essentially any research paper. If you’ve ever tried to read one of my research papers, and you’re not literally in my field, and even for many people who are, you’d probably say they’re hard to read. Maybe not very clear, maybe lots of obtuse notation, vocabulary, methodologies, ideas, ... But the reviews have judged them to be clear enough for accepting.

So clarity seems to be quite ambiguous, quite subjective. At the very least it’s relative. Let’s say it’s relative to the community reviewing and calling for papers.

But why is it important? As long as the work is correct! Well, the point of science isn’t merely evaluating a research question, but advancing the state of knowledge. We can’t very well advance knowledge if no one, or a vast majority of a community of interest, can understand what you’ve done.

So it’s important for the work to be not only well evaluated, but clearly (relatively) well evaluated, and well communicated. This way it advance knowledge for many people, and not just the authors.

So far, so scientific. All these criteria directly relate to the original scientific goal.

Relevance

POPL and ICFP include "relevance". I take this to mean relevance to the field. For example, submitting a machine learning paper to POPL is probably not relevance, even if it is of the highest quality science.

This is related to the advancement of knowledge, since only reviewers who are familiar with the area, methodologies, state of knowledge etc are going to be capable of assessing the other criteria.

This leads to some problems at the borders between areas. What if it is relevant, but heavily in another field, and as a result, there isn’t very much expertise to review the paper? But one hears stories about these papers having a hard time finding a place in any venue. I guess I should take a charitable view of such papers, but then, that may require sacrificing review quality.

Or what about work that is revolutionary, in the sense that it is creating completely new models, methodologies? Such work would argue that it is relevant, but it would be difficult for the reviewers to judge it so. It might seem completely irrelevant— no one has used it yet, perfectly good techniques (which will, by nature, be more clear and easier to judge correctness of) will apply in many of the examples.

I don’t think there’s a good solution here. Revolutions are hard. Founding new fields, subfields, etc, is hard.

Significance

The last criteria for PLDI is significance, which also appears for POPL and ICFP.

I have no idea what this means. The dictionary provides:

significance | noun [mass noun]: the quality of being worthy of attention; importance

This seems like a very suspect criterion. How do we know, apriori, which ideas are important, are worthy of attention? This requires us to understand, in advance, what impact the work could have? Are we psychic?

I suppose, in some cases, it might be somewhat clear that work is not important. Sam TH, who I love for pushing against my (and others’) simple takes to advocate for complexity, recommended this paper: https://doi.org/10.1007/s11245-006-0005-2. It introduces a strawman in the context of philosphy, of people inventing and then studying a game called "chmess". One could investigate all of these questions scientifically, but they would all be insignificant, because the made up game affects no one in the world.

In this case, it’s clear that the questions are not significant, but it introduces the key problem with significance— the answer to the question is a relative one (like clarity). Maybe a question is not significant to one field, but is to another, because that field knows how to apply that kind of question to some other problem that is significant, and so on.

I don’t know how to detect significance then.

The chmess paper suggests asking whether you could explain the research question to an outside audience and convince them of its importance. This seems like a low bar, however. Almost every paper I read begins with a motivation section, which does exactly that: here’s some interesting the theoretical problem and how it could, in principle, be used to solve or address or make progress on some real world problem. Perhaps that just means we’re all good at (convincing ourselves that we’re) working on significant problems.

Supposing I could accurately judge significance by this definition, how does it relate to the original goal: ensuring that a scientific work advances knowledge? Well, if the result is unimportant, one might argue (if one were a pragmatist, in the sense of philosophy sense) that unimportant knowledge is not useful, and therefore, not knowledge. But it still seems quite difficult to judge the utility of knowledge.

It reminds me of number theory, which was considered quite without practical application. Until we invented cryptography.

While trying to understand significance on Twitter, others proposed an alternative definition: that the research question is large enough (to be important?). This seems like an even more suspect definition. For one, what is large enough? Completely subjective. The prior definition of significance is relative to a community, but this is relative to an individual’s expectations. For two, it has perverse results in practice. In an effort to ensure a result is big enough, an author is incentivized to make a result larger (or appear larger), perhaps artificially. The knowledge is withheld from the community and society until it reaches some arbitrary bar. That bar is unavoidably pushed higher, as each year new scientists joining the field compare to the current bar, new authors strive to beat that bar (from the prior incentive) and the baseline is reset.

And for what? What part of our original scientific goal is achieved by ensuring a result is "large enough"? It doesn’t help advance knowledge to withhold a result for being small, if it is correct, and clear, and original.

I see none.

At best, this definition seems to be a response to something unscientific... see the next section.

The Social Aspect of Reviewing

social | adjective: relating to society or its organization.

There’s another aspect to peer review, which I will call a social aspect. Science is, of course, a social process, so these are not unrelated. But I want to separate them.

The previous aspects to reviewing dealt explicity with scientific mission— the advancement of knowledge. But it is us humans that are attempting to do that, in a large context with many systems and pressures that we interact with and within.

For example, to advnace knowledge, we must keep up with the state of knowledge. One purpose of reviewing I’ve heard advocate is to act in defense of human attention, so that we can focus on the advancement that are relevant and important. A peer reviewers job is, in part, to reject "noise" from the process, so it is even possible to know what the state of knowledge is.

This seems related to the second definition of "significance" in the prior section. If a result is "too small", perhaps it becomes a distraction. New authors spend too much time learning it and pushing it to a state of being useful, taking more time than it would have taken the original authors, thus wasting time. Or perhaps realizing the result does not scale to enough settings as originally assumed, but only after wasting a great deal of time, attention, and effort.

But this requires us to make the call on what others will want to and/or should pay attention to. That’s a hard call to make; I’m not sure whether I can make or should make it.

It could also be a reaction to gaming the system that employs scientists and funds research. A bunch of small publication is still a large publication count, which can (through the magic of relying on metrics) convince people to provide funding or jobs to one person, but not another. This, in turn, can prevent others from doing useful work to advance knowledge; they could use made use of those jobs or funding.

I’m not going to rant here about the Tyranny of Metrics, for which there’s a whole book I heartily recommend.

Metrics exist, and we have to work with them, so it’s worth paying attention to them. We don’t want people gaming our system, even if our system is stupid. We should try to change it, but that’s not always possible.

Still I think it’s worth being careful of doing more harm than good when trying to prevent people from gaming the system. I’m not sure how easy it is to recognize "salami slicing" a work in to lots of little pieces of work, apriori. Science is incremental.

On the other side, does a venue have a responsibility to game unjust systems? Some systems exist evaluate venues based on inclusion in certain indexes, acceptance reates, classificaiton as journal or conference, etc, and peoples jobs’ rely on these things. Should a venue do things like turn itself into a journal (as PACMPL has done) to game an unjust evaluation system? What about targeting a certain acceptance ratio, which POPL explicitly does?

https://www.sigplan.org/Conferences/POPL/Principles/

This acceptance ratio is important for maintaing a high ranking in the CORE ranking system.

See ICFP’s downgrade from A* to A, which cites the acceptance ratio of 33% being too high: http://portal.core.edu.au/core/media/justification/CORE2021/4612ICFP.pdf

Do reviewers have a responsibility to maintain a high ranking, to the benefit of the venue and the people submitting to their venue, by rejecting a certain proportion of papers? Without doing so, we risk the scientific endeavour— the venue and its community may no longer be able to effectively advance knowledge.

I dunno man I’m not a moral philosopher.

The Syllabus

2022-04-15T19:55:39Z

I wrote the follow at some semester prior to today, about no one in particular, while updating syllabus.

Please read the syllabus. The answer to your question is on the syllabus. If you review the syllabus you will find answers to this question and others. As I have said in many other places many times before, you can find the answer to your questions regarding course policies answered on the syllabus. The syllabus is where we describe the answer to course policies, grading, instructions on what to do in various circumstances, and answer frequently asked questions. Like this one. The syllabus is a useful tool for discovering the answer to questions about the course. Questions we have answered on the syllabus should not be asked because we have answered them on the syllabus. Piazza, email, canvas messages, office hours, after class ambushing of the professor and TAs, and lab are not the places to ask questions that are answered on the syllabus. so many times but it is not getting to me. Even if you think your situation is unusual, we have almost certainly run into it before and answered it on the syllabus. You will never make me crack. We have been teaching for decades, and get guidance from Above, and we answer the questions and put information on the syllabus. Every time you ask a question that is answered on the syllabus, the unholy child weeps the blood of virgins, and Russian hackers subtract points from your final grade. Asking questions that are answered on the syllabus summons tainted souls into the realm of the living. Questions answered on the syllabus go together like love, marriage, and ritual infanticide. The syllabus cannot hold it is too late. The force of questions already answered on the syllabus in the same conceptual space will destroy your mind like so much watery putty. If you ask questions answered on the syllabus you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. The endless questions already answered on the syllabus will liquify the nerves of the sentient whilst you observe, your psyche withering in the onslaught of horror. ((Questions-answered-on-Syll̿̔̉us are the cancer that is killing the Piazza it is too late it is too late we cannot be saved the transgression of a chi͡ld ensures syllabus will consume all living tissue (except for questions already answered there which it cannot, as they have been answered on the syllabus) dear lord help us how can anyone survive this scourge asking questions answered on the syllabus has doomed humanity to ( an eternity of dread torture and time wasted that could have spent teaching as questions already answered on the syllabus establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like Grading Scheme, but more corrupt) a mere glimpse of the world of questions answered on the syllabus will instantly transport a students’ consciousness into a world of ceaseless screaming, he comes, the pestilent slithy syllabus-infection will devour your questions, application and existence for all time like The Design Recipe only worse he comes he comes do not fight he com̡e̶s, ̕h̵is un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, syllabus questions lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liquid pain, the song of re̸gular waitlist will extinguish the voices of mortal man from the sphere I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful the final snuffing of the lies of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ (((

)

What is the Point of a Final Exam

2022-01-07T04:11:16Z

It’s weird, but professors are almost never taught how to teach, how to design a course, how to assess students, how to design an exam or what the point of an exam even is. We’re just expected to pick this up on our own, I guess. It’s not as nonsensical as it sounds, since we are trained how to do research and communicate that research, and there is some overlap. But still.

If my experience is any indication, we just pick up an existing course structure and more or less follow that. Oh, the last person who taught this course used this material, and this syllabus, and these exams, so I’ll just do more or less that for now. If we’re ambitious and/or want to shoot our tenure track in the foot, we might try to innovate soon after. Otherwise, we might innovate later.

Anyway I’m not good at doing things just because that’s how they’ve always been done; I need first principles. After designing, administering, grading, invigilating several of them, I was struggling to figure out what the point of a final exam is. So I had a bunch of conversations on Twitter and now I’m collecting my thoughts on what the point of a final exam is and how it might, or might not, serve a purpose in my course.

Warning: I am not an education researcher, and this is not research, and it’s got a lot of stream-of-consciousness.

What is the point of a final?

Obviously, a final exam is a part of a course, so what is the point of a course? Presumably, to teach students. If only we lived in such a utopia.

I’d start by assuming the purpose of a university course is:

to teach students; and
to credential students.

I consider credentialing to be less important than teaching, but acknowledge that there is a use in credentialing, so I should do it and do it well.

Credentialing

A final exam can very easily accomplish credentialing: you put everything the students should have learned in the course on the final exam, and you measure how well they do, giving them a grade (credential).

But is a final exam the best way to accomplish credentialing? What goes into credentialing? I think the main principle is that we’d like the credential to be close to ground truth. A high mark should indicate high mastery, and vice versa. How do we know if an exam is measuring mastery well?

After a bunch of discussions, I think there are a few threats that could cause measurement error in any assessment:

cheating;
student performance error (e.g., stress, anxiety, illness);
design error (e.g., confusing instructions or questions, too little time to complete assessment, questions that don’t measure learning objectives);
attribution error (e.g., measuring a group assessment that doesn’t reflect an individual’s mastery);
sampling error (e.g., a grade on an assessment early in the semester does not actually indicate lack of mastery if the student learns later but has no opportunity to demonstrate improvement later; or, an assessment in one context may not be representative in another context, so another assessment in another context is useful).

I’m going to mostly ignore (3). It’s a research area of its own (e.g., Item Response Theory, Concept Inventory), for one, and I’m no expert in that area. But design errors in an exam are as likely (or unlikely) as in any other assessment, or perhaps less because course staff can spend more time polishing a single exam than multiple assessments.

Exams, and final exams, reduce (1) and (4) very well. Exams are typically very structured making cheating difficult compared to homework’s, projects, etc, and to properly measure a single individual. Take home or online exam don’t have this benefit, so must be designed to be more difficult, complex, or required invasive invigilation technology.

Final exams in particular mitigate (5), since they occur at the very end of the semester. Exams in general also provide an opportunity to revisit assessing several learning objectives in another, broader, perhaps integrated context.

Exams seem very susceptible to (2). Universities typically have policies in place to support accessibility and deal with illness, to help address some of (2). But many students find a single, highly weighted exam very stressful, and they may fail to perform on this one, very important assessment, despite having mastered material and demonstrated that mastery throughout the semester. On the other hand, some students find exams far less stressful, and far less time consuming, than many intermediate assessments throughout a semester. Students who find planning or long term focus challenging, or have chronic accessibility issues, might find an exam much easier than a project, for example.

So exams are pretty good for credentialing, in that they avoid sources of measurement error, as long as they’re well designed. But they’re very stressful. So if you need a credentialing assessment that reduce (1), (4), and (5), a final exam seems like a good choice, if you can find a way to reduce stress and anxiety. Maybe novel exam structures could do that; however, novel structures might cause more of (1) and (4), and be more complex to design and thus making (3) even trickier.

Some learning objectives might also be difficult to assess with other forms of assessment than exams. For example, in my project-based compilers course, some of my learning objectives don’t fit into the course-long “write a compiler” project. They are latent skills I’d like students to learn from building a compiler, but I can’t figure out how to explicitly measure that learning in the project. This could be a failure of my ability, but it suggests the structure of exams is simpler to employ.

Teaching

I consider credentialing less important than teaching, yet I covered it first because exams seem most related to credentialing.

Some of my colleagues pointed out that you can make the exam part of the teaching process. This is very counter to how I’ve seen exams.

An exam, particularly a final exam, can be an opportunity for students to reflect on all the material students have seen. They could be presented a new challenge or material that they should be able to learn from if they’ve mastered course material. They could also be given a chance for explicit transfer from one context to another, related to point (5) in the previous section about measuring skills in different contexts.

Given the stress caused by the assessment function of an exam, particularly a final exam, I’m not sure how useful this is compared to other methods of teaching. If you want to give students the opportunity to see something in a new context, why not multiple (smaller) projects, or multiple homeworks? Novel exam structures, or lower stakes can mitigate this, as discussed above, but come with design and measurement challenges. And at what point is a less stressful, novel structured “exam” really an exam, and not something else?

Another way the exam, and final exam, serve teaching is actually taking advantage of the high stakes of the exam. By giving students adequate time and material to review for the exam, they’re able to review all the material in the course in preparation for a high-stakes exam. This review is useful for learning, even if the exam itself is not.

This seems at odds with research on teaching as it relates to grades, though. I’d suggest that it’s the grade, not the exam, that causes students to review. Teaching More by Grading Less (or Differently) notes that grades motivate students perversely, lowering interest in learning but causing anxiety and interest in avoiding a bad grade. This suggests exams as motivator for reviewing material are not a good way to motivate learning.

Final exams in particular seem to next to useless for teaching since students also don’t get very much qualitative feedback. Even if they do, they have very little reason, method, or even time to review that feedback and apply it. So even if the exam is cleverly structured to enable learning in a new context, or lets students integrate disparate lessons, will they know whether they successfully integrated lessons or transfered skills to a new context? Even if the instructor spent the time to provide qualitative feedback (which, in my experience, is not what happens with a final exam), the incentives are against the student using it.

Is a final exam right for my course?

For Teaching

I’m not really convinced by the value of the exam for teaching, in itself. But, I could see two reasons:

your course doesn’t provide another final opportunity for students to integrate disparate lessons into a whole;
your course must introduce a variety of disparate skills that students have little opportunity to revisit, but that a final exam encourages them to review.

This requires mitigating stress causing performance problems on the exam, or destroying the intrinsic motivations for learning. I don’t think stress is a problem in itself, if it can be managed. I’m more concerned about replacing intrinsic motivation with extrinsic.

For Credentialing

Final exams seem very useful for courses with group work, or where detecting cheating is difficult. I’d hate to optimize for trying to catch cheating, but I think detecting it is necessary, particularly since credentialing is part of the goal of a university course. In-person exams are a much less invasive and less time-consuming mode of invigilating.

Similarly, with heavy group work, it is difficult to assess individual mastery. Alternatives include oral assessment or review of the group work. This is resource intensive, but also enables more qualitative feedback so could have additional value for teaching.

An Example

In my case, I’m trying to decide whether exams are right for my compilers course. The course is largely structured around a single semester-long project where the students implement a compiler (a big software project). The first two weeks milestones are completed individually, but the rest of the semester is completed in a group. Students who complete and contribute to the project can be reasonably assumed to have mastered a large portion of the material and most of the learning objectives. The project is not graded until the end of the semester, and students are provided considerable intermediate feedback, so have a lot of opportunity to learn from feedback, improve, and demonstrate improvement.

Cheating is a minor concern, but shirking is a bigger concern. A student could easily coast on their group.

I think an exam has some minor effect for discouraging shirking, and at least lets us catch it. I think performing oral code reviews and some form of survey of group contribution would be a more direct way of identifying this and providing more qualitative feedback. However, I don’t really have the resources for more than about 1 code review. We could perhaps implement it stochasticly.

An exam provides some opportunity for learning, in itself, but at a high cost. I can use it to have students apply the same material in a new context. However, in such a time constrained and high-stakes setting that I’m not convinced it is particularly effect or is worth the cost. The project itself is cumulative, and provides all the opportunity and material needed to review the course, so encouraging review isn’t necessary.

Compared to an exam, providing more, low-stakes opportunities to apply lessons in a new context seems like a better approach, since it mitigates (2) and provide more opportunity for feedback, learning from the feedback, and demonstrating learning. Making these exercises individual would address cheating and shirking, but might require them to be higher stakes, at least cumulatively. This would have a stronger effect for detecting shirking early. However, this would disadvantage students who find multiple assessments problematic. Such students are already at a disadvantage because of the time investment the project requires.

The exam does provide an opportunity to assess some learning objectives that I can’t figure out how to assess in the project. Perhaps I could integrate these into the aforementioned lower stakes exercises, or figure out how to integrate them into the project. Or perhaps these learning objectives aren’t really important, if I think the project is most important.

So it seems like the primary purpose the exam is serving in my course is credentialing. It helps us avoid attribution error, and to a lesser extent sampling error, at the cost of normal performance errors. It serves little, but some, learning function.

The learning function might be better served by multiple excesses outside the project. There’s a tension in using them: they need to be high stakes to serve the credentialing function and avoid sampling error, but low stakes to serve the learning functions. Sampling error is not a big problem in this course, so maybe I should favour low stakes exercises.

Attribution error is a problem, but stochastic code review and group work surveys might be a better way to address this than an exam.

If the exam is not in-person, the increased possibility of cheating negates some effect on attribution error, although code similarity detection tools make detecting cheating easier than detecting attribution error.

The exam is perhaps a simpler mechanism to employ than a combination of more small exercises and stochastic code review, but could result in more performance error.

Is an exam right? I don’t think it’s a wrong choice, but I think there are better choices particularly in favour of learning, and now I have a better idea of what the trade-offs are.

Enabling CORS for nginx WebDAV and CalDAV reverse-proxy

2021-05-13T20:08:26Z

The past few weeks I’ve been learning to develop and deploy a Progress Web App (PWA) that can communicate with my WebDAV and CalDAV servers. Unfortunately, while these are on the same domain, they are on different sub-domains, and this causes the requests to be considered cross-origin requests. For security reasons, cross-origin requests are blocked by most browsers by default unless the server explicitly allows cross-origin resource sharing (CORS). This is pretty easy to set up for static resources or scripts, if they use default headers and GET and POST methods. However, it’s particularly complicated for WebDAV, CalDAV, and other protocols that use additional headers or methods.

1 TLDR

2 CORS Requests and Responses

2.1 Preflight

2.1.1 Preflight Request

2.1.2 Preflight Response

2.2 Cross-Origin Requests

3 Configuring nginx

3.1 Configure Valid Cross-Origin Hosts

3.2 Configure CORS Headers

3.3 Process CORS Requests

4 Conclusion and Debugging

1 TLDR

Copy/paste/modify the below snippets into your nginx.conf in the correct places. You’ll need to add the map declarations to http context, and merge the two server declarations into your WebDAV and CalDAV server configuration blocks. You’ll also need to customize the safelist that sets $cors_origin_header, and possibly the $cors_expose_headers and $cors_allow_headers variables.

cors-nginx.conf

http {
  # .. in http context ..
  # Declare the safe cross-origin hosts
  map $http_origin $cors_origin_header {
    hostnames;
    default "https://example.com";
    "https://example.com" "$http_origin";
    "https://www.example.com" "$http_origin";
  }
  # Declare CORS exposed response headers
  map $host $std_response_headers {
    default "Content-Type, Content-Range, Content-Language, Date, Content-Length, Content-Encoding";
  }
  map $host $cache_control_response_headers  {
    default "Etag, Last-Modified";
  }
  map $host $dav_response_headers {
    default "Dav";
  }
  map $host $cors_expose_headers {
    default "${dav_response_headers}, ${std_response_headers}, ${cache_control_response_headers}";
  }
  # Declare CORS allowed request headers
  map $host $std_request_headers {
    default "Authorization, Origin, X-Requested-With, Range, Accept-Encoding, Content-Length, Content-Type";
  }
  map $host $dav_request_headers {
    default "If-Match, If-None-Match, If-Modified-Since, Depth";
  }
  map $host $cors_allow_headers {
    default "${dav_request_headers}, ${std_request_headers}";
  }
  # Detect a preflight request
  map $http_access_control_request_headers $preflight_h {
    default "true";
    "" "false";
  }
  map $http_access_control_request_method $preflight_m {
    default "true";
    "" "false";
  }
  map $request_method $preflight {
    default "false";
    "OPTIONS" "${preflight_h}${preflight_m}true";
  }
  # Configure WebDAV
  server {
    listen       443 ssl http2;
    listen       [::]:443 ssl http2;
    server_name  webdav.example.com;

    location /.well-known/ {
      root /srv/http/www;
    }

    # Advertise CORS access controls.
    add_header "Access-Control-Allow-Origin" "$cors_origin_header" always;
    add_header "Access-Control-Allow-Credentials" "true" always;
    add_header "Access-Control-Expose-Headers" "$cors_expose_headers" always;

    location / {
      # Handle preflight request
      if ($preflight = "truetruetrue"){
         add_header "Access-Control-Allow-Origin" "$cors_origin_header";
         add_header "Access-Control-Allow-Headers" "$cors_allow_headers";
         add_header "Access-Control-Allow-Methods" "PROPFIND, COPY, MOVE, MKCOL, CONNECT, DELETE, DONE, GET, HEAD, OPTIONS, PATCH, POST, PUT";
         add_header "Access-Control-Max-Age" 1728000;
         add_header "Content-Type" "text/plain charset=UTF-8";
         add_header "Content-Length" 0;
         return 204;
      }

      auth_basic "Not currently available";
      auth_basic_user_file /etc/nginx/htpasswd;
      root /srv/http/webdav/data;
      client_body_temp_path /tmp/nginx-webdav;
      client_max_body_size 0;

      dav_methods PUT DELETE MKCOL COPY MOVE;
      dav_ext_methods PROPFIND OPTIONS;

      create_full_put_path on;
      dav_access user:rw group:r;

      autoindex on;
    }
  }

  # CalDAV and CardDAV
  server {
    listen       443 ssl http2;
    listen       [::]:443 ssl http2;
    server_name  caldav.example.com carddav.example.com;

    location /.well-known/ {
      root /srv/http/www;
    }

    location /.well-known/caldav {
      return 301 https://caldav.example.com/;
    }

    location /.well-known/carddav {
      return 301 https://carddav.example.com/;
    }

    add_header "Access-Control-Allow-Origin" "$cors_origin_header" always;
    add_header "Access-Control-Allow-Credentials" "true" always;
    add_header "Access-Control-Expose-Headers" "$cors_expose_headers" always;

    location / {
      if ($preflight = "truetruetrue"){
         add_header "Access-Control-Allow-Origin" "$cors_origin_header";
         add_header "Access-Control-Allow-Headers" "$cors_allow_headers";
         add_header "Access-Control-Allow-Methods" "REPORT, PROPFIND, COPY, MOVE, MKCOL, CONNECT, DELETE, DONE, GET, HEAD, OPTIONS, PATCH, POST, PUT";
         add_header "Access-Control-Max-Age" 1728000;
         add_header "Content-Type" "text/plain charset=UTF-8";
         add_header "Content-Length" 0;
         return 204;
      }

      auth_basic "Not currently available";
      auth_basic_user_file /etc/nginx/caldav/htpasswd;
      proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_pass_header Authorization;
      proxy_pass        http://127.0.0.1:5232/;
    }
  }
}

2 CORS Requests and Responses

2.1 Preflight

When a script running in a secure browser attempts to make a cross-origin request, the browser first send a preflight request (for non-trivial requests), and then sends the actual request if the server advertises that CORS is enabled for that request. A preflight request might be skipped for an HTTP GET method request, because this is considered harmless.

2.1.1 Preflight Request

Essentially, a preflight request is the browser asking the server for permission to make a request of a certain METHOD and then share the data and certain headers with a third party. The preflight request is an HTTP OPTIONS method request with the following headers set:

Access-Control-Request-Headers, which declares the headers that the cross-origin script is requesting.
Access-Control-Request-Method, which declares the method of the request that the cross-origin script wants to send.
Origin, which declares the domain of the origin of the script that wants to make a cross-origin request

For HTTP servers serving static content or scripts that don’t use OPTIONS, it’s enough to detect an OPTIONS request, set the above headers, and return a 204 status code. For some HTTP servers, like WebDAV and CalDAV, the OPTIONS request has another use, and we really have to detect a preflight reuqest by detecting both an OPTIONS request and the preflight request headers.

2.1.2 Preflight Response

To respond to a preflight request, the server is expected to reply with an empty content response, HTTP status code 204, and the following headers:

Access-Control-Allow-Origin, which declares the hostnames that are allowed to make cross-origin requests. This ought to include the Origin of preflight request for the preflight request to succeed, and can be a wildcard value *.
Access-Control-Allow-Headers, which declares which headers are allowed to be part of the cross-origin request. These are all be HTTP request headers.
Access-Control-Allow-Methods, which declares which HTTP methods are allowed as part of cross-origin request.
Access-Control-Max-Age, an optional header that declares how long this response to a preflight request can be cached;

The 204 status code declares a success with no content. An HTTP request header is one that originates from the client and is part of a request from the client.

See https://developer.mozilla.org/en-US/docs/Glossary/Request_header for more.

Some browsers (such as Firefox, and Chromium) will consider the preflight request as succeeding if the above headers are present, even if the status code is not 204, and even if the request contains other data.

Sme headers are part of the Access-Control-Allow-Headers by default, as they are considered safe.

See https://developer.mozilla.org/en-US/docs/Glossary/CORS-safelisted_response_header for more details.

Figuring out exactly which headers to list in Access-Control-Allow-Headers is a little annoying. For my WebDAV (nginx) and CalDAV (radicale) servers, the following list seemed sufficient for my uses:

The following standard headers: Authorization, Origin, X-Requested-With, Range, Accept-Encoding, Content-Length, Content-Type;
The following DAV headers: If-Match, If-None-Match, If-Modified-Since, Depth

This will depend on exactly what web app is communicating with the server and what it relies on, what the underlying server is. You may need to do a bunch of testing in the web developer’s console to figure it out.

Similarly, figuring out exactly which methods to list in Access-Control-Allow-Methods depends on the app and server (but not the browser). These methods are probably more well-specified. For WebDAV and CalDAV, the following were sufficient: REPORT, PROPFIND, COPY, MOVE, MKCOL, CONNECT, DELETE, DONE, GET, HEAD, OPTIONS, PATCH, POST, PUT.

2.2 Cross-Origin Requests

After a preflight request, the browser will start sending cross-origin HTTP requests. These will be normal HTTP requests, but the browser will expect the following additional headers in the response:

Access-Control-Allow-Origin, which declares the hostnames of cross-origin scripts that this response can be shared with.
Access-Control-Allow-Credentials, which is either "true" or "false", and declares whether the authorization information in this response can be shared with cross-origin scripts.
Access-Control-Expose-Headers, which declares which HTTP response headers can be exposed to the cross-origin script.

An HTTP response header is one that originates from the server and is part of a response from the server.

See https://developer.mozilla.org/en-US/docs/Glossary/Response_header for more.

For my WebDAV and CalDAV servers, I needed to expose via Access-Control-Expose-Headers the following for my uses:

The following standard response headers: Content-Type, Content-Range, Content-Language, Date, Content-Length, Content-Encoding;
The following response headers that have to do with cache control: Etag, Last-Modified. You may want to add Pragma if you support HTTP 1.0, and Cache-Control and Expires if your server needs to direct your app about cache expiration. Etag and Last-Modified were sufficient for detecting changes between the local and remote versions of DAV files in my app.
The following DAV-specific headers: DAV.

3 Configuring nginx

Configuring nginx correctly is tricky due to the design of the nginx configuration language. It is a declarative language, but can look imperative and trip us up. We have to be careful in how we conditionally add headers and process requests.

nginx also doesn’t allow us to use set to create variables in all contexts, so we have to be a little clever at times.

3.1 Configure Valid Cross-Origin Hosts

To limit which domains can issue a cross-origin request, we create a safelist and set a variable based on the Origin header of the request. We use map to declare the variable $cors_origin_header to be the origin, if the origin is on the safelist.

See http://nginx.org/en/docs/http/ngx_http_map_module.html#map for more.

map $http_origin $cors_origin_header {
  hostnames;
  default "https://example.com";
  "https://example.com" "$http_origin";
  "https://www.example.com" "$http_origin";
}

In this safelist, we allow cross-origin requests from https://examples.com and https://www.examples.com, but no other hosts. We could use the wildcard "*" to allow requests from anyone.

3.2 Configure CORS Headers

In http context, I use the following maps to declares the CORS request and response headers. This is an abuse of map to give us the ability do define variable in http context, since set doesn’t work in http context.

You’re free to inline these header values later, but separating them out into these variables made them easier to reuse in both the WebDAV and CalDAV servers.

# Declare allowed CORS Expose Headers; each is an HTTP response header.
map $host $std_response_headers {
  default "Content-Type, Content-Range, Content-Language, Date, Content-Length, Content-Encoding";
}
map $host $cache_control_response_headers  {
  default "Etag, Last-Modified";
}
map $host $dav_response_headers {
  default "DAV";
}
map $host $cors_expose_headers {
default "${dav_response_headers}, ${std_response_headers}, ${cache_control_response_headers}";
}

# Declare allowed CORS Request Headers; each is an http request header.
map $host $std_request_headers {
  default "Authorization, Origin, X-Requested-With, Range, Accept-Encoding, Content-Length, Content-Type";
}
map $host $dav_request_headers {
  default "If-Match, If-None-Match, If-Modified-Since, Depth";
}
map $host $cors_allow_headers {
  default "${dav_request_headers}, ${std_request_headers}";
}

3.3 Process CORS Requests

Next, we need to detect a preflight request. We might be tempted to use if, but remember: If is Evil, so we want to avoid it.

Instead, we’re going to use map to create a variable that is equal to "truetruetrue" if and only if we detect a preflight request. This time, we’re using map as intended, to conditionally define variables.

map $http_origin $cors_origin_header {
  hostnames;
  default "https://example.com";
  "https://example.com" "$http_origin";
  "https://www.example.com" "$http_origin";
}

map $http_access_control_request_headers $preflight_h {
  default "true";
  "" "false";
}
map $http_access_control_request_method $preflight_m {
  default "true";
  "" "false";
}
map $request_method $preflight {
  default "false";
  "OPTIONS" "${preflight_h}${preflight_m}true";
}

We set the value of $preflight to "truetruetrue" when we detect a (non-empty) Access-Control-Request-Headers header, a (non-empty) Access-Control-Request-Method, and the request method is OPTIONS. We set the variables through string concatination to emulate boolean and, since nginx does not support nested conditions or boolean arithmetic.

To actually detect and process a preflight request, we add the following code in location context in the server on which you want to enable CORS. I add it in the location / block of both my WebDAV and CalDAV server blocks.

if ($preflight = "truetruetrue"){
   add_header "Access-Control-Allow-Origin" "$cors_origin_header";
   add_header "Access-Control-Allow-Headers" "$cors_allow_headers";
   add_header "Access-Control-Allow-Methods" "REPORT, PROPFIND, COPY, MOVE, MKCOL, CONNECT, DELETE, DONE, GET, HEAD, OPTIONS, PATCH, POST, PUT";
   add_header "Access-Control-Max-Age" 1728000;
   add_header "Content-Type" "text/plain charset=UTF-8";
   add_header "Content-Length" 0;
   return 204;
}

Note that due to limitations on add_header, this if block must appear in location context. Note that we also cannot move any add_header command outside the if. The add_header commands are not executed in a sequential order, but all of them are "executed" simultaneously as a block at the current level.

See http://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header.

Note also that this if must end in return 204. This is part of the preflight request response (although some browsers will let you get away without it), and necessary for if to behave correctly, since If is Evil.

You can customize the Access-Control-Allow-Methods header depending on the server and your app to provide the least privilege.

Finally, we add the headers for other cross-origin requests. We add the following in any valid context, except the if body for the preflight request. I added them in server context.

add_header "Access-Control-Allow-Origin" "$cors_origin_header" always;
add_header "Access-Control-Allow-Credentials" "true" always;
add_header "Access-Control-Expose-Headers" "$cors_expose_headers" always;

Note that the always argument is required for non-preflight requests, since the HTTP response codes for successful requests will be variously 207, 200, and 304 (maybe others), and the add_header does not actually add a header for responses with some of these status codes.

See http://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header for more details.

4 Conclusion and Debugging

Now, if you look in the Network Monitor of your browser (Ctrl+Shift+E), and click "XHR", you should see some successful cross-origin requests from your web app. If you see they’re being rejected, try anaylzing the request, and changing the above configurations with additional headers or safelisted origins.

A Suitable Cutlery Tray

2021-01-18T17:59:36Z

This post is a transcription of a thread that happened live on Twitter on June 30, 2019, in response to some anger at my lack of a cutlery tray. I was in a mood following a previous cutlery incident.

IT NEEDS A TRAY

You’re right. The current state of this drawer is unacceptable. Sure, I can grab a knife, spoon, or fork with relative ease, and the rubberized mat makes cleaning the drawer simple. But look at those irregular lines. There is no order. This must be fixed!

The problem I have is in finding a suitable tray. I have, on occasion, been called obsessive, but really I just live my life in the way that make me happy. I like each item that I own to meet that goal. I don’t just want a utilitarian cutlery tray; I want one that will spark joy.

When I look at cutlery trays, the first criteria, obviously, is a perfect fit within the drawer. It must fit flushl edge-to-edge, at least to the left and right edges, and against the bottom edge. Against the top edge would be ideal, but I suppose I can live with extra top space.

The height is also a concern. The side walls of the tray and of the dividers must come exactly to the height of the drawer. This, I’m sure, is obvious to you all.

Next, the tray slots must divided evenly amongst the spoons, forks, and knives (in that order, of course, for obvious reasons). Any drawer of cutlery would have equal numbers of each, and it would be unjust to have a tray inequality divided.

At this point, most cutlery trays are already out of the running, and yet I have more (obvious and sensible) requirements. The tray must be made of wood. Plastic is abhorrent. You would not believe the difficulty in locating a wooden cutlery tray by itself.

But one that meets all the other requirements? Impossible. I’ve looked. I’ve been to Amazon, to Bed Bath and Beyond, to TJ Max. They simply don’t exist. It’s like a conspiracy. My only recourse, I fear, is to design my own.

I could, I suppose, start a company, whose sole purpose is to design the perfect cutlery tray. I could seek out designers and material engineers, source responsibly forested wood. Teak, ideally. I could find a way to construct it without the use of glue or staples or nails.

This would probably not be sufficiently profitable to stay in business long, and I’m likely to move soon and need a new design. To keep the business afloat, I should diversify. Military contracting is always profitable; perhaps I could re-purpose my engineers.

My team, experienced in perfect design, responsible materials and supply chain management, would be suited to completely dominate this new market. Soon, I would hold all military contracts, patents for unimaginable weapons—because no one before had dared imagine.

Using the proceeds of my new monopoly, I would quietly return to the cutlery market, taking over or putting out of business any company who dared manufacture “cutlery trays”… an insult to the very idea…

This, I fear, would not satisfy me, though. Obviously, every cutlery drawer should come equipped with the perfect cutlery tray. I would have to move in to the interior design market. All drawers and cabinets would be manufactured according to my perfect ideal, my grand design.

As I begin to corner this new market, the FTC becomes concerned at the grand and beautiful hegemony I bring to each new market, but I cannot let them stop me. I perform my biggest hostile take over yet: the United States of America. I naturally held on to my best weapons.

I rename it to the Grand Hegemony of Perfect Cutlery Design, and shift the focus of the economy into the design and production of cutlery trays that suite my tastes. This destroys several Chinese companies dedicated to mass production of plastic trays, and sparks a trade war.

As China is my main supplier of teak, this is a true threat to the Grand Hegemony. I am forced in to military action.

While I expect a swift victory due to my superior design in weapons, I did not count on the effectiveness of Chinese corporate espionage, nor on the retaliatory strikes from Russia and North Korea. The nuclear fallout forces the remains of my government into underground bunkers.

We eventually win. We will have complete control of the teak supply. Our cutlery tray empire is secure. The nuclear fallout will clear eventually. Until then, I have the perfect cutlery tray built in to my cutlery drawer to keep me at peace.

What is the optimal arrangment of cutlery in a drawer?

2021-01-18T17:20:32Z

This post is a transcription of a thread that happened live on Twitter on June 30, 2019, in response to someone claiming there was a correct way to arrange cutlery in a drawer. The statement caused me to momentarily lose my mind. The thread has since become difficult to find, so I’m reproducing it here.

it’s fairly obvious. due to shape, spoons go in the wide compartment. then the other two are the usual table arrangement: forks left, knives right. Spoons go left of the two because they’re used less, and you often want to quickly grab a knife to do something, and right is easy.

Yes, this was my thought process when moving in to my apartment. I thought “What is the optimal arrangement of cutlery in this drawer? Ah, first, we need a measure against which to optimize. I choose time spent fetching items from the drawer. Next, how do I set a table…”

I then went to the whiteboard with my compass and protractor, and calculated which items best fit in which part of the drawer. I debated how the odd items, such as chopsticks and pearing knives fit in my scheme. I judged the relative merits of Western vs Eastern table settings.

After several hours of this, my movers insisted I finish inspecting the things they’ve unpacked and sign some paper work, but I ignored them. After all, I must optimize time spent fetching items from the drawer; I have no time for unpacking my apartment, paper work, etc.

Upon realizing I have a collection of non-uniform cutlery, which threw my calculations into disarray, I threw them all out and proceeded to locate new, regular items. I calculated optimal size of spoon, fork, and knife relative to placement in the drawer and time spent fetching.

Minimizing weight was an obvious concern, since additional weight increased energy expenditure in fetching items, which in turn increases caloric requirements, which would increase time spent fetching item of cutlery from the drawer. I decided to decrease all physical activity.

Years go by, and my muscles are atrophied, my heart unable to beat blood through my veins. This is no problem; I had contractors install the cutlery drawer into my bed frame. I can fetch items of cutlery at a rate of 5 items per second, and a cost of 200 calories per day.

I have done it. I have won. I have maximized my cutlery fetching efficiency.

Locking down your browser to defend yourself from rickroll

2020-11-14T02:24:29Z

Recently, a UBC SPL grad student offered a bounty to anyone who could Rickroll me. This has resulted in an arms race. I have increased my browser security to prevent Rickrolling entirely on most of my machines. This isn’t fool proof, of course, but hopefully it will help defend against low-tier attempts.

This page rendered in error. Please let me know what browser you’re using.

Setting up your backup service

2020-06-30T20:54:20Z

I just ran the command rm -rf ~, deleting all my personal files in the process. This was not the first time, and it was no big deal, because I back up my files with automatic rolling backups. My backup system is secure, redundant, and has low resources requirements. The backup repository is encrypted, deduplicated, compressed, and mirrored across multiple machines. You can choose to use any or none of these features while following this guide.

In this guide, I describe how to set up a secure and robust backup service yourself, which runs on Linux, macOS, and Windows via WSL 2. I provide my own scripts, config files, and workflows for maintaining, validating, and restoring the backups. This is all setup using free software, supports multiple configurations with varying degrees of security and redundancy, and scales well to more backup clients.

If you’d prefer to not set this up yourself and you run macOS or Windows, I recommend Backblaze:

https://www.backblaze.com/cloud-backup.html#af9v9g

They automatically handle everything, including most of the features I want in a backup service and some I could never implement myself, for $6/m per machine (USD).

1 Introduction

2 Install Prerequisite Software

2.1 Backup Software

2.2 Optional GUI for Client

2.3 Mirror Software

3 Initialize the Backup Repository

3.1 Setup Server Environment

3.2 Setup Client-Only Environment

3.3 Create the Encrypted Repository

3.4 Mirror the Client-Only Repository Offsite

4 Configure the Backup Client

4.1 Install Backup Script

4.2 Exclude Extraneous Files From Backup

4.3 Configure Access to the Backup Repository

4.3.1 Client-only Repository Folder

4.3.2 Backup Server via SSH

4.3.3 Least Priviledge for Client SSH Key

5 Configure Mirrors

5.1 Least Priviledge for Mirrors

6 Monitor and Check Backups

6.1 Check Backups are Happening

6.2 Integrity Check the Repository

6.3 Prune Expired Snapshots

6.4 Finding Large Extraneous Files in the Repository

7 Restore from Backups

1 Introduction

This guide will help you set up a backup system that automatically records hourly snapshots, compresses, deduplicates, and encrypts them, enabling a very robust and secure backup system that takes up very little drive space. For example, I four machines backed up with 2.5TB of snapshots stored in 21GB of space, mirrored on machines in multiple locations. It would take an extraordinary event for me to lose data. I’ve successfully recovered GBs of data usually resulting from my own stupidity, and occasionally the result of various tools corrupting files or the whole filesystem.

I describe two main configuration options: (1) client-only, which requires only a single machine but relies on an external service for saving the backups offsite; or (2) a client/server approach that requires access to an always-on server but offers more redundancy. Within these two main configurations, I describe additional configuration measures, such as setting up offsite mirrors for the backup repository, implementing principles of least priviledge to restrict remote access while still automating backups.

At the end, you too will be able to (but probably shouldn’t) use rm -rf without fear, among other benefits.

2 Install Prerequisite Software

2.1 Backup Software

The main backup software is borg.

https://borgbackup.readthedocs.io/en/stable/index.html

borg features automatic compression, deduplication, encryption. It also supports an on-demand backup server via SSH, useful file exclusion methods, and filtering/recreating backup archives for when you realize you backed up something that you didn’t need to and it’s taking up too much space. These features and its superb documentation and easy of use have made it better than every other tool I’ve tried.

Install this on the server and all clients.

For example, on Arch:

pacman -S borg

Or macOS:

brew cask install borgbackup

2.2 Optional GUI for Client

borg has an optional, third-party (still free software) GUI you can install called vorta.

https://vorta.borgbase.com/

If you’re uncomfortable with commandline nonsense, you can to use this on the clients to configure most of what I describe about below. I haven’t used it myself, so you’ll need to figure out the translation from each concept and my scripts to the equivalent in the GUI. The GUIs looks pretty discoverable, though, so this shouldn’t be hard.

2.3 Mirror Software

To make redundant mirrors of your backup repository offsite, you’ll need a tool to synchronize the repository to the mirrors. I own several machines, and treat all of them as mirrors for maximum redundancy without relying on cloud services.

I recommend rclone for this, but alternatives like rsync or unison work well too.

https://rclone.org/

rclone provides rsync like capabilities, but also performs local caching to speed up the computing the delta to be transfered, supports various cloud storage backends, in case you want to sync to ~the cloud~.

Install this on all mirrors.

Arch:

pacman -S rclone

macOS:

brew install rclone

If you’re using a client-only configuration, you can also install this on the client if you wish to synchronize the local repository to a cloud service or secondary machine. However, unless your cloud service features strong and easy to use version control, I recommend installing git instead, as there are some downsides to a client automatically synchronizing a local backup repository without version control. I discuss this in Mirror the Client-Only Repository Offsite.

3 Initialize the Backup Repository

3.1 Setup Server Environment

For the client/server model, the backup server needs:

A name or fixed IP address. I call this backup-server.tld.
An SSH daemon.
A user with SSH access, permission to execute borg, and shell access. I’ll call this user backupd.
A folder this user owns to store the backup repository. I call this folder ~/backups (meaning ~backupd/backups).

3.2 Setup Client-Only Environment

For the client-only model, you only need a folder that the client has read/write access to. I’ll call this folder ~/backups, and call client user client-user.

3.3 Create the Encrypted Repository

Next we need to initialize the backup repository with an encryption key. The backup repository is encrypted at-rest.

Run the following command.

borg init -e repokey ~/backups

You’ll be prompted for a password.

I strongly recommend storing the password in a password manager. borg can automatically read from the password manager using the environment variable BORG_PASSCOMMAND. For example, I use pass as my password manager, and set BORG_PASSCOMMAND="pass show backup-server.tld/borg", which in turn causes gpg-agent to query me or my login keychain for the master password.

You can also set the password as a string the environment variable BORG_PASSPHRASE. For example, if you’re password is "password", you can set BORG_PASSPHRASE="password". You should not do this if the environment variable is stored in a plaintext file.

There are several other initialization options which you can explore if you want to customize encryption levels, disable encryption (don’t do it!), or optimize for hardware acceleration, but I’m happy with the default.

borg init --help

3.4 Mirror the Client-Only Repository Offsite

If you do not have a backup server, we need to set up at least one mirror. We need to make sure the local backup repository is stored somewhere else in the event of a total data loss locally (e.g., a stolen laptop), or a partial data loss that affects the backup repository itself (e.g., a corrupted drive).

Bad solutions include using a file synchronization service such as Dropbox, Google Drive, or OneDrive as a mirror; or automatically synchronizing via rsync, unison, or rclone to a secondary machine. In the event of data loss, an automatic synchronization service could overwrite the remote copy with a completely empty backup repository, totally destroying your backups. Some file-sync services will allow you to restore older versions of a file, which mitigates some of this risk. This is not a good solution unless you’re really sure of the version control.

An acceptable solution is to use a version-controlled file hosting service like GitHub or GitLab to host your backup repository. You can set up a cron job to automatically commit and push the backup repository regularly, tagging each commit in the same way as the archives are tagged. Ideally, the repository should be private, but since it’s encrypted, this is not strictly required. This exposes your data to more risk, as with sufficient resources, a dedicated attacker (such as a corporation or government) could break the encryption. However, such attackers probably aren’t targeting you, and if they are, you might have bigger problems.

To use my suggested method, first make ~/backups a git repo. Run the following commands.

cd ~/backups

git init

git checkout -b main

git add -A

git commit -m "Initilize repo"

Next, add the remote repository:

git remote add -m main origin git@git-repo.tld:client-user/backup-repo.git

Now add a cron job. Run crontab -e and add the following line.

@hourly /home/client-user/bin/sync-local-borg-repo.sh

Finally, install the following script in ~/bin/ for the client:

sync-local-borg-repo.sh

#!/bin/sh

cd ~/backups
git add -A
git commit --fixup HEAD
git tag `hostname`+`date +"%Y-%m-%dT%H_%M_%S"`
git push origin main

And make it executable: chmod +x ~/bin/sync-local-borg-repo.sh.

This method will use considerable client disk space, which is split between the client and server in the client/server configuration. I recommend your regularly prune the git repo, but only do so manually after checking your backups (see Monitor and Check Backups). Setting up an automatic job to prune it risks deleting your backup repository in the event of a data loss. The commit option --fixup HEAD in line 5 makes this easy with the following commands:

env EDITOR=true git rebase --root --autosquash -i

git gc

git push -f origin main

This will squash the entire history of the repo and force push to the remote. Losing the history is not a big deal, since the backup repository is actually keeping hourly snapshots. The git history is only for preventing synchronization from losing data if an automatic push happens after a data loss.

4 Configure the Backup Client

Each backup client needs:

A user with read access to all files included in the backup. I call this user client-user. For me, this is my username on the client machine. In some circumstances, I create a group, backupg, to give this user read access to special files.
A cron daemon of some kind.

To start the backup system, we need to add a script to run automatically backing up files, and exclude any extraneous files. I take the approach of including everything by default, and then manually inspecting archives from time to time for large extraneous files and folders.

4.1 Install Backup Script

I use the following script, which I set to run every hour. Add the following cron job to client-users’s crontab by running crontab -e, and adding:

@hourly /home/client-user/bin/borg-backup.sh

Then install the following script in ~/bin/ for client-user.

borg-backup.sh

#!/bin/sh

## borg-backup.sh

## Usage:
# run `borg-backup.sh`
#
# Optional environment variable inputs:
# - TAG     By default, the tag for the archive is set using the hostname of the
#           client machine. To manually set a tag, set the environment variable
#           `TAG` prior to running, e.g., `env TAG="manual-tag+"
#           borg-backup.sh`.
# - WAIT    The wait time in seconds to obtain a write lock on the repository from
#           the server. By default, 600 seconds (10 minutes).

## Configuration

# Set to the location of the backup repository.
# Can be a remote directory, using SSH, or a local directory.
# Make sure the SSH agent and/or SSH key is readable by the backup daemon,
# and the remote location is accessible by a key in the ssh-agent or configured
# in .ssh/config.
#
# Example: REPO="backupd@backup-server.tld:backups"
# Example: REPO="~/backups"
REPO="borg-server:backups"

# Set the password or passcommand for encrypted repositories.
export BORG_PASSCOMMAND='pass show backup-server.tld/borg'

## Create auxiliary files to be part of the backup.

# Export the installed package list from the package manager, so it can be backed up.
mkdir -p /tmp/pacman-local/
echo "# Pipe to pakku -S to reinstall" > /tmp/pacman-local/pacman.lst
pacman -Qenq >> /tmp/pacman-local/pacman.lst
pacman -Qemq >> /tmp/pacman-local/pacman.lst

## Create a new backup archive.
# Add additional files to backup as needed.
borg create \
     -C lzma,9 \
     -c 60 \
     --exclude-from ~/borg-exclude \
     --exclude-if-present '.borg-ignore' \
     --lock-wait ${WAIT:-600} \
     $REPO::'{hostname}+'${TAG:-}'{now:%Y-%m-%dT%H:%M:%S}' \
     /tmp/pacman-local/ \
     /etc/sysctl.d \
     /etc/modprobe.d \
     /etc/makepkg.conf \
     /etc/pacman.conf \
     /etc/fstab \
     /etc/X11 \
     ~/

Make it executable with chmod +x ~/bin/borg-backup.sh.

There are two necessary configuration steps:

Change the REPO variable to point to your backup repository. If you’re using a client-only model, this is the path to the backup repository ~/backups. If you’re using a server, you can enter the SSH address and path, or configure the .ssh/config file as discussed later.
Change the export BORG_PASSCOMMAND to export your password manager command, or change the line to export BORG_PASSPHRASE to export the password string as described earlier. You really shouldn’t use BORG_PASSPHRASE since this stores the password in plaintext, but I suppose if your hard drive is encrypted, and the backup script is only stored on the client, it’s probably fine. Ish.

You’ll probably also want to change the list of files that are included in the snapshot. I include my list for reference, which assumes an Arch Linux machine and includes some of my customized root config files.

The script is documented with its major features, but I’ll explain the borg command in more detail.

The option -C lzma,9 enables LZMA compression level 9 (maximum compression). This slows down archive creation but decreases the archive size substantially. In my experience, my snapshots take about a minute to create and upload to the server, so I’m fine with max compression.
The option -c 60 tells borg to create a checkpoint every 60 seconds, saving a partial backup if the backup process is interrupted. This can happen if you’re running on a laptop that goes to sleep in the middle of the backup, for example. I choose 60 seconds since most of my snapshots only take that long, so any longer might indicate a real change to keep track of.
The option --exclude-from ~/borg-exclude excludes any files that match the pattern specification found in the file ~/borg-exclude. I use this file to filter common files, such as compiler generated files. I share this file in Exclude Extraneous Files From Backup.
The option --exclude-if-present '.borg-ignore' excludes the directory from the backup if there is a file named .borg-ignore in that directory. I use this for excluding directories that don’t neatly fit some pattern in borg-exclude, such as large git repos that I contribute to infrequently but don’t manage, or cache or temporary directories.
The option --lock-wait specifies how long to wait for a lock. Only one client can write to the backup repository at a time. I use 10 minutes as a default; my clients usually only take a minute or so to finish running a backup, so waiting 10 minutes should be enough for all clients to finish if there’s contention.
Line 47, $REPO::'{hostname}+' ..., tells borg where the backup repository is located (before the ::), and what the backup archive should be named. I name the archive using the hostname of the client, followed by + as a delimiter, followed optionally by some tag, followed by a timestamp. This naming scheme makes it easy to sort and filter backups when validating backups or searching for a restore point.
The remaining lines are files or directories to include in the backup archive. All files and sub-directories, recursively, are includes, unless excluded by one of the above exclude options.

4.2 Exclude Extraneous Files From Backup

My ~/borg-exclude file is below. Install this file in ~/ on the client; it only needs read permissions for client-user.

borg-exclude

re:/\.ssh
re:/\.bash_history
.zsh_*
re:/no-backup/
re:/\.junk/
re:/\.cron/
re:workspace/aur4/.*/pkg
re:workspace/aur4/.*/src
re:compiled/
*.tar.xz
*.tar.gz
*/.emacs.d
*/.unison/fp*
*/.unison/ar*
*/.vim/bundle
*~
.*.trash
*.aux
*.log
*.out
*.toc
*.fls
*.swp
*.class
*.pyc
*.fdb_latexmk
*.o
*.out
*.xpi
*.zo
*.dep
*.vo
*.glob
*.bbl
*.safe
*.agdai
*.hi
*.tdo
re:\.mutt/cache
re:\.mutt/sent
re:workspace/.*/paper.pdf
re:workspace/.*/techrpt.pdf
re:workspace/.*/final.pdf
*/retex-cache/*
re:\.gnupg/S\..*
re:\.~lock.*\.odp#
y
re:/Pictures/.*/\._
re:/Pictures/.*/\.comments
*.DS_Store

This configuration file accepts exclude patterns, one per line. Each exclude pattern can be either a shell glob or regexp pattern prefixed by re:. I exclude lots of generated files patterns, certain mail folders, and files or folders that are tracked by other systems. Some depend on my workflows and naming conventions, so they might not be relevant to you.

If I want to exclude some folder that doens’t neatly fit a pattern, I run touch path/to/folder/.borg-ignore, and borg will automatically begin ignoring it due to the --exclude-if-present option in borg-backup.sh.

Be sure to run touch ~/backups/.borg-ignore. This will prevent you from DOSing yourself if either you use a client-only configuration, or if your clients are also mirrors.

4.3 Configure Access to the Backup Repository

Finally, we need to make sure the backup script has uninterrupted access to the backup repository.

4.3.1 Client-only Repository Folder

If you’re using a client-only configuration, you’re done!

4.3.2 Backup Server via SSH

If you’re running a separate server, we’ll configure SSH access. Ideally, we don’t even want to be prompted for an SSH key password to ensure backups are running uninterrupted. (Although, I do deal with this on one of my clients, because I haven’t configured the keychain to cache the SSH key while logged in.)

I recommend configuring access through the .ssh/config file, and either a keychain that caches your SSH key that you use everywhere (probably acceptable security), or a fresh passwordless SSH key the provides client-user restricted access to borg as the backupd user on backup-server.tld (better practice security).

I’ll assume you have a fresh passwordless private key called ~/.ssh/id_rsa-borg-client paired with the public key ~/.ssh/id_rsa-borg-client.pub on the client machines. You can generate a fresh passwordless key-pair with:

ssh-keygen -t rsa -b 4096 -C "borg client" -f /home/client-user/.ssh/id_rsa-borg-client -P ""

Make sure to set the permissions correctly, restricting access to the private key.

chmod 600 ~/.ssh/id_rsa-borg-client

Add the following snippet to your .ssh/config, and the borg-backup.sh will automatically use the SSH key ~/.ssh/id_rsa-borg-client on the client machine when connecting as backupd to the backup-server.tld.

Host borg-server

Hostname backup-server.tld

IdentityFile ~/.ssh/id_rsa-borg-client

User backupd

ForwardAgent no

4.3.3 Least Priviledge for Client SSH Key

If you want to follow better practice security, you should restrict access for the id_rsa-borg-client key so it has only the permission it needs: to communicate with the borg server. Add the following line to ~/.ssh/authorized_keys for backupd on the server, replacing <id_rsa-borg-client.pub> by the contents of the public key ~/.ssh/id_rsa-borg-client.pub from the client.

command="/home/backupd/.ssh/ssh-borg-serve.sh",no-pty,no-agent-forwarding,no-port-forwarding <id_rsa-borg-client.pub>

Next, install the following file in ~/.ssh/ on the server and give it execute permissions with chmod +x ~/.ssh/ssh-borg-serve.sh.

ssh-borg-serve.sh

#!/bin/sh

set -f

case "$SSH_ORIGINAL_COMMAND" in
    "borg serve"*)
        exec $SSH_ORIGINAL_COMMAND
        ;;
#   "/usr/lib/ssh/sftp-server")
#       exec /usr/lib/ssh/sftp-server -R
#       ;;
    *)
        echo "Invalid command $SSH_ORIGINAL_COMMAND"
        exit 1
        ;;
esac

This will allow the key id_rsa-borg-client to run only a command starting with borg serve, which launches the borg server. If an attacker gets your id_rsa-borg-client key, they can launch the borg server, but without the backup repository password, they won’t be able to do anything.

The second, commented out, command would enable the client to launch a read-only SFTP server. This is useful for making all clients mirrors. However, allowing the client key to also use the SFTP server violates the principle of least privilege, and you should instead configure a separate mirror key as described in Configure Mirrors. An attacker with SFTP access would be able to download the encrypted repository, and possibly read other files on the server.

5 Configure Mirrors

Having backups stored offsite is good, but what if the server goes down, or is struck by a meteor? It’s best to have not only offsite backups, but redundant offsite backups. Thankfully, this is easy to support. Particularly, if you, like me, have too many computers: a laptop, a desktop, a media server, a VPS, and a work computer... mirrors galore!

On each mirror, we configure rclone with the server as a remote. Add the following to ~/.config/rclone/rclone.conf on the mirror.

[borg-server]

type = sftp

host = backup-server.tld

user = backupd

port =

pass =

key_file = id_rsa-borg-mirror

md5sum_command = md5sum

sha1sum_command = sha1sum

This tells rclone how to connect to the server via SFTP. Following principle of least privilege, we’ll need a new key pair for the mirror.

ssh-keygen -t rsa -b 4096 -C "borg mirror" -f /home/client-user/.ssh/id_rsa-borg-mirror -P ""

chmod 600 ~/.ssh/id_rsa-borg-mirror

And we need to install and restrict the key on the server. Add the following line to the ~/.ssh/authorized-keys file on the server.

command="/home/backupd/.ssh/ssh-borg-mirror.sh",no-pty,no-agent-forwarding,no-port-forwarding <id_rsa-borg-mirror.pub>

Next, install the following file ~/.ssh/ on the server and give it execute permissions with chmod +x ~/.ssh/ssh-borg-mirror.sh.

ssh-borg-mirror.sh

#!/bin/sh

set -f

case "$SSH_ORIGINAL_COMMAND" in
    "/usr/lib/ssh/sftp-server")
        exec /usr/lib/ssh/sftp-server -R
        ;;
    *)
        echo "Invalid command $SSH_ORIGINAL_COMMAND"
        exit 1
        ;;
esac

This restricts the mirror’s key so it can only be used to launch the SFTP server in read-only mode.

Finally, set up a cron job to mirror the repository. Run crontab -e on the mirror and enter:

@hourly rclone sync borg-server:backups ~/backups

rclone will perform a one-way sync from the server to the mirror every hour. rclone uses a delta transfer algorithm with caching. It’s faster than rsync, but with the same low-bandwidth transfer. It also supports more backends than rsync, so you can set up additional mirrors to cloud services like Dropbox, Google Drive, etc, if you want.

Now when a meteor strikes your server just after a burglar stole your laptop, you’ll still have your data. Setup LOTS of mirrors for extra redundancy.

5.1 Least Priviledge for Mirrors

I know it seems like we already did this with the whole read-only SFTP server, but that’s not enough. Right now, an attacker compromising the mirror key can read any file that backupd has access to. That’s no good. Better security practice would be to configure the SSH daemon to chroot the mirror to the ~/backups directory, so they can only read this folder. Recall this folder is encrypted, so an attacker compromising the mirror SSH key still has to break the encryption to get anything.

Unfortunately, this requires root access on the server, reconfiguring the SSH daemon, and creating and managing multiple user and group permissions, which you may be unable or unwilling to do.

To chroot the mirror, we need a second user on the server, which I’ll call mirrord. The ssh-borg-mirror.sh script and addition to authorized_keys we added to backupd above should be thrown out, as we require a different configuration to chroot.

Next, we need a new group, mirrorg, to provide mirrord read access to the directory ~backupd/backups, owned by backupd.

groupadd mirrorg

gpasswd -a mirrord mirrorg

Now we set the group on ~/backups to mirrorg, and provide the group read access. As user backupd, run the following commands.

chgrp ~backupd/backups

chmod g+r -R ~backupd/backups

We need to modify the ssh-borg-serve.sh script (owned by backupd) to maintain the group-read permission. Change the file using the following diff.

- exec $SSH_ORIGINAL_COMMAND

+ exec borg serve --umask=027

This will force the borg server to provide read permissions to mirrorg when writing to the backup repository.

Now, modify the SSH daemon to chroot the mirrord user. As root on the server, add the following to /etc/ssh/sshd_config.

Match User mirrord

ChrootDirectory ~backups/backupd

ForceCommand internal-sftp -R

AllowTcpForwarding no

X11Forwarding no

PasswordAuthentication no

Finally, add the following line to ~/.ssh/authorized_keys for mirrord.

<id_rsa-borg-mirror.pub>

Note that we do not require any restrictions, since the SSH daemon is already restricting mirrord.

Now you have a pretty secure mirror.

6 Monitor and Check Backups

6.1 Check Backups are Happening

Backups are no good if you can’t restore from them. I have a weekly reminder to check on my backups. To check, I run borg list -P machine-name+ on the repository machine (server, or client-only), which lists the backups for the machine with hostname "machine-name". I check to see that hourly backups are being created for each client. If they aren’t, the daemon on that client may not be working for some reason.

6.2 Integrity Check the Repository

Every month of so, I run borg check ~/backups. This runs some integrity checks on the whole repository, and can take a while. I recommend running it in a screen session so you can disconnect and check back on it later. I’ve never had any integrity problems.

6.3 Prune Expired Snapshots

I don’t want to keep hourly snapshots forever. I have a policy for expiring backups, and a script for doing it. I keep hourly snapshots for the last 24 hours, daily snapshots for the last week, weekly snapshots for the last month, and monthly snapshots forever. With deduplication and my workload, this strikes a good balance between data recovery and minimizing the repository size.

Each week after checking my backups, I run the following script to prune any expired snapshots:

borg-prune.sh

#!/bin/sh

# borg-prune.sh

## Usage
# - borg-prune.sh machine-name           Perform a pruning dry-run, seeing what
#                                        would be pruned.
# - borg-prune.sh machine-name --wet     Perform a non-dry run.

REPO=$HOME/backups

DRY_RUN="-n"
if [[ "$2" == "--wet" ]]; then
    echo "Pruning..."
    DRY_RUN=""
fi

borg prune --list $REPO --prefix "$1+" \
     --keep-hourly 24 \
     --keep-daily 7 \
     --keep-weekly 4 \
     --keep-monthly -1 \
     --keep-yearly -1 \
     $DRY_RUN \
     -v

6.4 Finding Large Extraneous Files in the Repository

Sometimes, a large file will get backed up and make the repository unnecessary large. A few times, I’ve accidental backed up the entire repository in itself, DOSing my VPS by filling the drive.

borg makes it sort of easy to find these mistakes.

On the repository machine, run borg info -P machine-name+ to get a print out of the size of each archive for machine-name. When one of the archives prints out as suddenly larger, that’s usually a good target. Copy that archive name; I’ll call it $archive_name.

Next, we mount the archive to see what files are too large. Run the following commands on repository machine.

mkdir -p /tmp/borg

borg mount ~/backups::$archive_name

Now we can explore the mounted archive to find large files. I run the command the following command, which I alias as ducks in my shell.

du -sch * .* | sort -rn | head

This will print out a list of the 10 largest files or folders in the current directory. You might need to exclude the .* pattern if there are no hidden files.

I then follow the large directories until I find a likely looking file; call it /path/to/large-unnecessary-file.

Once we find a file, we want to exclude it from further backups and remove it from existing backups. I add it to the borg-exclude patterns or add a .borg-ignore file as appropriate. Then, I run the following loop to recreate and filter all archives. This loop is in fish syntax; you’ll need to figure out loops in your shell on your own, because I’ve never figured out how to write a shell loop properly.

I’ve never had any problems, but you should backup your repository before running borg recreate. Use rclone to put it anywhere else, at least temporarily.

for archive in (borg list --lock-wait 600 -P machine-name+ ~/backups | cut -f 1 -d ' ')

yes YES | borg recreate --lock-wait 600 -C lzma,9 -s --exclude "/path/to/large-unnecessary-file" backups::$archive

end

This is considered experimental, so it requires that you confirm each recreation by typing "YES". I just pipe yes YES because I like to live on the edge, and have mirrors of this repository if I break something.

borg recreate can take multiple --exclude flags if you find multiple files you want removed. It will also recompress the archive, so you can specify new and different compression options with -C, if you want to change the compression algorithm.

Now the file should be excluded from all existing archives.

7 Restore from Backups

In the likely event that you need to restore from backups, run borg list -P machine-name+ to list the archives available for machine-name. This will give you a list of archive names on the left, with some metadata on the right. Copy and paste the name for the archive you want to restore from; I’ll call this $archive_name.

Next, we mount that archive. Running the following commands, which will create a temporary mount point and mount the archive.

mkdir -p /tmp/borg

borg mount ~/backups::$archive_name

You can now see all your backed-up files in /tmp/borg.

Next, from the client, copy over your files:

rsync -avz --progress backupd@backup-server.tld:/tmp/borg/ /

A Summary of Discussions on Virtual Conferences

2020-06-20T02:44:34Z

After a successful virtual PLDI, some of us expressed support for more virtual conferences in the future, and some expressed dissent and concerns. The result was an interesting discussion on Twitter, which is essentially impossible to follow. I summarize the discussions here, and include some of my own editorializing.

Please forgive the typos; my fingers quickly get out of sync with my brain, especially at 1am while following multiple twitter discussions.

Changelog

9:51PM PDT June 19, 2020, added “on the nature of networking” section
12:10AM PDT June 20, 2020, enabled inline expansion of twitter links
1:10AM PDT June 20, 2020, added time zone issue that I missed
1:02PM PDT June 20, 2020, added summary and link to Neel’s post on barrier to conversion
12:16PM PDT June 21, 2020, added additional section on addition barriers the came up in later discussion
12:16PM PDT June 21, 2020, moved Neel’s post to new section
12:47PM PDT June 21, 2020, added table of contents

Virtual Conferences Lower Barriers
What about Networking
Virtual Conferences Impose New Barriers
On the Nature of ~~Networking~~ Community Building
Additional Barriers in Virtual Conferences

1. Virtual Conferences Lower Barriers

Compared to physical conferences, virtual conferences:

cost less in travel expenses;
cost less in travel time;
cost less in environmental impact;
are more accessible to nations with visa requirements.

This is a huge win, but the first and fourth points are especially important for accessibility reasons. People from first-world nations with easy access to international travel (don’t require visas, e.g.) disproportionately benefit from physical conferences at the expense of people from all other nations.

Threads:

https://twitter.com/hisham_hm/status/1274162301013286912

https://twitter.com/wilbowma/status/1274081046439424000

2. What about Networking

One of the biggest problems with virtual conferences is how it may affect junior academics who will not get the same networking experience the rest of us got. It’s unarguable that networking has affected on our careers, although it’s not clear exactly what those effects are, whether they are good or bad, or whether they could be replicated or improved by other mechanisms.

In essence, a shift to virtual conferences shifts a big risk on to junior academics who would be asked to play by a new set of rules, different from those the rest of us have been playing by.

Threads:

https://twitter.com/alpha_convert/status/1274090654503706629

https://twitter.com/mgrnbrg/status/1274086476934836224

https://twitter.com/wilbowma/status/1274134782364925952

As it would be unfair to ask junior folks to bear that risk, we who want to support virtual conference oughts to donate our time to mitigating that risk:

Thread:

https://twitter.com/neurocy/status/1274149265451909127

It seems that in-person networking is useful, but I ask: why? What is the exact thing, the purpose, the role, the benefit that in-person networking is providing? If we knew that, we could ideally replicate it.

Threads:

https://twitter.com/wilbowma/status/1274139720205733888

https://twitter.com/samth/status/1274161227892170752

https://twitter.com/samth/status/1274164897211965440

On the other hand, both junior and senior folks report more engagement than they’ve experience at physical conferences. This suggests that while the rules of networking in academia might change, it might be for the better.

Threads:

https://twitter.com/RanjitJhala/status/1274166327704301568

https://twitter.com/alpha_convert/status/1274173646622334978

https://twitter.com/TheAviralGoel/status/1274172740413665280

3. Virtual Conferences Impose New Barriers

One barrier is just relating to your fellow humans, which is easier for some when people are willing to open up and get personal, but not everyone does that online:

Threads:

https://twitter.com/johnregehr/status/1274090536148692992

https://twitter.com/aatxe/status/1274163928734535681

Escaping obligations to attend the conference is harder when you’re at home and you can’t just say “Sorry, I’m in London I literally can’t do that”. Carving out time is no longer baked into the structure of the conference but is shifted into an individual responsibility for every attendee.

Threads:

https://twitter.com/eeide/status/1274076671092506624

https://twitter.com/samth/status/1274080233793228802

https://twitter.com/mcarbin/status/1274082318773338116

Virtual attendance also presents barriers to people with disabilities, since it relies so heavily on single-mode information processing. A talk on Zoom is essentially audio-only, with no visual cues from the presenter, no ability to lip-read for many talks, and no captions (since Zoom lacks this support). Slack is visual/text-only.

This was also a problem in many physical conference, but is made worse in virtual.

Thread:

https://twitter.com/TaliaRinger/status/1274173079078367232

Finally, time zones make it particularly difficult for some to attend.

https://twitter.com/stevemblackburn/status/1274093646036430848

4. On the Nature of Networking Community-Building

After much discussion on the nature of “networking”, we approached some specific goals that it achieves and how some virtual mechanisms address it.

First, some argue that “networking” should be rephrased in terms of community and relationship building. The “professional” connotation of “networking” disguises its true function and brings with it ideas of nepotism.

Thread:

https://twitter.com/rg9119/status/1274198703088078849

There several goals of community-building:

help disseminate institutional knowledge;
meet people outside your own community or circle relationships;
help build a map of who works on what, who knows what, to facilitate future contact, questions, and collaboration;
help disseminate your own work and receive feedback;
make friends and build relationship.

Threads:

https://twitter.com/wilbowma/status/1274187581035446272

https://twitter.com/mkolosick/status/1274191817903427584

https://twitter.com/TaliaRinger/status/1274191225340358656

Arguably, point (1) should be a non-issue because our scientific process would preserve and disseminate this in our work. Alas, this isn’t the case.

In-person conference have at least three advantages in addressing the above goals.

First, the default behavior at a conference encourages random encounters. There is less individual action required; the system of the in-person conference makes it the default. This builds point (2) into the system by design, easily facilitating all the others. This low-effort randomness is extremely important for breaking out of your own existing circle and into others, and is not well-supported online.

Threads:

https://twitter.com/wilbowma/status/1274191122806407168

https://twitter.com/mkulkarni/status/1274192113056571393

https://twitter.com/alpha_convert/status/1274194153560313856

https://twitter.com/neurocy/status/1274192121348739072

Second, in person, avoiding confrontations is simple. Like it or not, it is true that not everyone in our field is perfect. Some are actively hostile or hurtful. In-person, it’s easy to see them and avoid them. Online mechanisms like assigned break-out rooms do not support this as easily.

Threads:

https://twitter.com/alpha_convert/status/1274194153560313856

Third, building relationships take unstructured time and communication. This time is easy to achieve in person over dinners, coffee, long talks in the hallway track. Online, this is not the case. Conversations are more structured, and shorter. We’re not eating together, spending time after dinner, etc—we meeting, maybe talk, then close the browser tab or video conferencing window. The online form of communication also makes communicating harder, and requires more effort than we may be willing to spend. Chatting over a coffee is hardly any effort at all and an extremely high bandwidth, multi-mode form of communication.

Threads:

https://twitter.com/alpha_convert/status/1274189661787828225

https://twitter.com/mkulkarni/status/1274192983575990274

https://twitter.com/TaliaRinger/status/1274191558569418752

https://twitter.com/RanjitJhala/status/1274194633673682944

Thankfully, there are some good online alternatives to address some of these goals. Disco rooms for adding random interaction to PLMW seem to have succeeded, and are being looked at to port to whole conferences. Several people find Twitter a great place for community building, extending random conversations, which serve to address some missing aspects of physical conferences.

Threads:

https://twitter.com/mkulkarni/status/1274194554900697089

https://twitter.com/mkulkarni/status/1274192113056571393

https://twitter.com/mkulkarni/status/1274196522830368773

https://twitter.com/TaliaRinger/status/1274196354605019136

Additional Barriers in Virtual Conferences

As the conversation, several barriers not explicitly mentioned in Section 3 above came up.

In short:

It’s harder to have intellectual conversations.
Online, approaching others to talk may be further exclusionary by favoring the bold and confident.
There are particular barrier for caregivers, which disproportionately affect women.
There are barriers for faculty introducing their students into the community.

Neel wrote about his experience in which he argues that the whole point of a conference is talking to people, and this is more difficult in an virtual setting. The talks themselves are there to facilitate intellectual discussion, not merely to be consumed. Virtual conferences impose a huge barrier to this conversation, largely defeating the whole point. So far the only mitigation to this is intentional, individual action, which is not a good substitute for a design that encourages conversation by default as in a physical conference.

Neel’s post:

https://semantic-domain.blogspot.com/2020/06/pldi-2020-conference-report.html

A private discussion pointed out that the focus on randomness in the above about benefits of physical conference can exclude those not bold enough to approach and open up a conversation. This made worse in an online setting when must initiate conversation.

A private discussion focused on the particular hardship caregivers, who are disproportionately women, have in attending virtual conferences. Our field is already predominately male, and without addressing this, virtual conferences could exacerbate the problem. This was already a problem in physical conferences, but may be worse in virtual conferences where individual action is required rather than a systemic solution.

One additional problem related to community-building is that advisors cannot easily introduce their students to the community. In physical conferences, this is pretty easy. An advisor may be looking out for their student, or a student may be shyly hanging around their advisor, and during any of the breaks you’re likely to run into someone and introduce them. Alternatively, an advisor may make a point of introducing their student to someone, and track each of them down during a break. The structure of a virtual conference makes the former impossible, and makes it easy to forget to or disengage too much to do the latter.

Threads:

https://twitter.com/lindsey/status/1274251270035927040

https://twitter.com/mcarbin/status/1274404826579951616

https://twitter.com/jfdm/status/1274253717529903104

Copy/pasting your password into the Runescape Client

2020-04-27T09:25:58Z

In a fit of nostalgia, I wanted to play some Runescape this weekend. I discovered that Runescape forbids copy and pasting your password into the client, for bogus security reasons. This poses a problem for me, since my password is a very long randomly generated string. Normally, I would copy and paste it from my password manager.

Thankfully, a little Powershell scripting solves the problem. The script below will, upon execution, switch to the Runescape client and type your password. You need to configure one variable, $password, which should be set using a command the reads your password from your password manager (or, if you don’t care about security, set to your password as a string literal). The default uses my configuration, fetching the password from pass via WSL.

Be careful not to run the script while you’re already logged in, or it might enter your password in chat. It shouldn’t, and it won’t hit enter, but… use at your own risk.

runescaope-login.ps1

## --------------------------------------------------------------------
## Instructions:
# Launch Runescape then run this script while on the login page.
#
# You may need to switch Runescape between windowed and full screen 
# after, as alt-tabbing or this script sometimes screws up full screen.

## --------------------------------------------------------------------
## Configure:

# Your runescape password
# $password = "my hard coded password"
# $password = get-password-command
$password = (wsl /usr/bin/pass show runescape.com `| head -n 1)

# Delay.
# How long to wait between grabbing Runescape window and starting to type.
$delay = 1

## --------------------------------------------------------------------

function Show-Process($Process) {
  $sig = '
    [DllImport("user32.dll")] public static extern bool ShowWindowAsync(IntPtr hWnd, int nCmdShow);
    [DllImport("user32.dll")] public static extern int SetForegroundWindow(IntPtr hwnd);
  '
  
  $type = Add-Type -MemberDefinition $sig -Name WindowAPI -PassThru
  $hwnd = $process.MainWindowHandle
  $null = $type::ShowWindowAsync($hwnd, 5)
  $null = $type::SetForegroundWindow($hwnd) 
}

Show-Process (Get-Process -Name rs2client)

timeout $delay

Add-Type -AssemblyName System.Windows.Forms 
$password.ToCharArray() | ForEach-Object {[System.Windows.Forms.SendKeys]::SendWait($_)}

Running a public server from WSL 2

2020-04-25T23:32:30Z

This week, for ReAsOnS, I wanted to run a server on WSL 2 that was accessible from the internet. This was surprisingly involved and requires lots of hard-to-find tricks to forward ports through 4 different layers of network abstractions and firewalls.

In WSL, make sure your server is using IPv4. I spent a hell of a long time just trying to figure out why I couldn’t access the server from localhost. I had successfully run a handful of local http servers from WSL that were accessible from the Windows host, so I wasn’t sure what the problem was. It turns out this server, written in Java, wouldn’t work until I added -Djava.net.preferIPv4Stack=true to the java options. It appears that Java was defaulting to IPv6, and WSL doesn’t forward IPv6 properly, or something.
In WSL, make sure you allow the port through your WSL firewall, if you’re using one. Using a WSL firewall might be redundant, but you might be using one. I usually use ufw in my linux machines, so run I’d run ufw allow $PORT in WSL.
In Windows, forward your port from the public IP port to the WSL port using netsh interface portproxy add v4tov4 listenport=$PORT listenaddress=0.0.0.0 connectport=$PORT connectaddress=127.0.0.1 in a Powershell with admin rights. This is one of the hard-to-find but necessary WSL specific bits. It look like Windows creates a virtual adapter that isn’t properly bridged with your internet network adapter. I tried playing various bridging tricks, but in the end, I had to manually create a portproxy rule using Windows’ network shell netsh. This listens on all addresses and forwards the connection to the localhost, which seems to be automatically bridged with WSL. You can also try to manually forward it to the WSL adapter. Use ipconfig to find it. However, the WSL IP changes from time to time, so I recommend using local host instead. It might also be wise to listen explicitly on your internet facing IP instead of 0.0.0.0, but this seemed to work.
In Windows, allow the port through the Windows firewall explicitly by adding a new Inbound Rule using the Windows Defender Firewall with Advanced Security administrative tool. This is accessible as WF.msc in cmd and Powershell. Select Inbound Rule, and click New rule... in the action menu to the right, and work your way through the menu to allow the port explicitly. Normally, Windows asks if you want to allow applications through the firewall. This doesn’t seem to happen with WSL servers, so we have to manually add a rule.
In your router, setup port forwarding for the port.

One inside perspective on the graduate student application process

2020-02-08T19:58:07Z

Around this time of year (graduate student recruiting season), I see lots of:

Stress from students who are unsure about the graduate recruiting process and how their application is viewed.
Reassurance from people who have been through the process, e.g.,:

https://twitter.com/TaliaRinger/status/1225981382708514817

What I don’t see much from professors explaining WTF.

I’ve now been on both sides of this process and wants to give a peak behind the mysterious curtain in an attempt to reduce stress from students currently going through this process, and hopefully help future students with their applications.

This started as a tweet thread; you can view the original here:

https://twitter.com/wilbowma/status/1226023671946330113

In this post, I maintain that thread and elaborate on it.

Disclaimer

These are only my views and observations on the process at one university. I can’t speak for all, nor for all universities.

Original Thread

I’ve been on the grad admissions committee two years now. So far, almost all applications I’ve reviewed fall into one of two piles:

People who probably want to do a course masters and don’t seem to realize this is a research masters program.
Really good applications.

Of the people in pile 2, most “get rejected” because the professor they want to work with, or who has the best research fit with the interests expressed in the SOI, isn’t recruiting.

The rest “get rejected” because there are n slots and m > n candidates. IMHO, there’s very little rhyme or reason at this stage.

Sometimes there’s a “clear” choice: “this student has 5 POPL papers and that one only has straight As”, but not usually, and even that isn’t “clear”. I’ve seen undergrads with multiple publications “get rejected” over students with none.

The problem is there are an absolute ton of variables for what goes into a good grad student, and the application tells us next to nothing about any of them, and many professors appear to be risk averse.

So I put “get rejected” in scare quotes this whole time because I don’t feel like I’ve rejected almost any applications. I’ve merely been unable to accept many. Unfortunately, these are observationally equivalent to the students.

To all the students I’m about to “reject”, just know: I don’t reject you. It’s just that all my pigeon holes are currently full of students.

And I say all this remembering how I felt like it was somehow my fault for the MANY schools I got rejected from. Now, on the other side, I wish I could go back in time and tell myself “There was no reason. It’s no one’s fault.”.

My Process and Advice

Below, I describe my filtering process, and what I’ve observed of others. I sprinkle it with some concrete advice about how an application can best communicate to me the information I need to make a good decision accurately, and changes that we could make to our application process to make it kinder and more efficient.

Filter 0

Surely, Prof. Bowman, you’re just being overly positive here. There must exist terrible students. What about those who failed all their classes, don’t know anything about research, and don’t pass the GRE? Surely they exist!

I guess in principle such an applicant could exist, and I would reject them. But so far, I’ve not seen any application like that.

My guess would be that the barrier to applying is so high—between gathering the required documents, getting reference letters, taking tests, paying application fees—that such students self-select out.

In fact, the complete lack of applications that I might reject on merit suggests the barrier is much too high. I would guess that many great students, who think they might not be good enough, filter themselves out.

This worries me. Students, particularly from underrepresented backgrounds, maybe be more likely to underestimate themselves and never apply.

If you think you’re interested in research, you should talk to faculty members you know about research. If they encourage you to apply, listen to them. This doesn’t mean you’ll get accepted, but it does mean you are good enough.

Based on my samples, if you’ve applied, then you are good enough to get accepted.

Filter 1

In my initial tweet, I divided the most applications into two piles: “course masters” applications, and “really good” applications. This is a subjective call, made hastily while trying to make a first pass through the ~300 applications for the software practices lab (SPL) at UBC. It could be that really good students end up in the first pile.

I make this call by looking first at the statement of purpose. I don’t read it closely; I look for 2 things:

Do you clearly and concisely express interest in some research area?
Do you seem to have any clue what that research area involves?

Honestly, I’m looking for two sentence here. If I find them, then I look at the application more closely.

For many applications, the answer is “no”. These statements almost all says something to the effect of “I want a masters so I can get a good job in the tech industry”. I take that to mean “you want a course masters”.

If you’re a student applying to a research graduate program, put your research interests front and center. I don’t care about when you were 8 and you discovered computers, I don’t care about the classes you’ve taken (yet), I don’t want any flowery language about how beautiful you find your research area. I want to see what research area, something specific about that research area.

It’s very easy to be wrong when making this call. I’m basing this entirely on a statement of purpose, which is sometimes called statement of intent, or “personal statement”. I wager that I’m wrong some of the time because students are unsure about what we’re looking for in the statement of purpose, and we could do a better job instructing students on what we’re looking for in the application process.

For the student applying for a course masters, I really wish the application system had some way of telling them “you really need to have a real research interest for this program or you’re wasting your time and money”.

If an application is not accepted at this stage, it’s not because “you’re not good enough”. It was due some massive failure of communicate about what the applicant and the school are looking for.

Filter 2

For the rest of the applications for which the answer is “yes”, they passed the single biggest filter. Pile 1 is in the hundreds, while pile 2 is in the tens.

Unfortunately, “tens” is still greater than the number of positions available. This year, for example, we’re admitting about 10 for all of the SPL—software engineering and programming languages, combined. 10 out of 300.

The next major filter is: does this application fit with any of the research projects of any professor currently recruiting. A lot don’t. I not-accepted at least 10 great applications this year because their expressed research interest matched one of faculty who has no more slots. Some applied expressing interest with something no faculty is currently working on.

This filter is especially disappointing because it seems so solvable. If we could somehow collect this information from faculty and give it to students before they hit submit on that application, we ought to be able to cut this number down to 0.

If you’re a student and you really want to work with a particular faculty, or on a particular project, you might just send them an email. Make it short. Something like “Are you recruiting graduate students this year for this project; would it be a waste of time applying at all?”. Some faculty might be interested in hearing more, but some will just want to see your application materials in the system. Some faculty might want an email explaining why you’re a good fit. Personally, I get so many of those that are clearly spam that don’t know my research area that I delete almost all of them unread. Keeping it short enough to fit in the subject and/or message preview gives you a good chance of getting read.

If an application is not accepted at this stage, it’s not because “you’re not good enough” It’s because (we think that) there is no research fit. There is no one to advise you to do what (we think based on application materials) you want to do.

Filter 3

Now we’re at a relatively small pile indeed, but still more applicants than there are slots. At this stage, judging applications gets pretty random.

I start looking at the following:

I want to know what have you done related to the research area you’re interested in, and what do you see yourself doing in the future. I don’t expect you to have done a lot, and I don’t expect your predictions about the future to be completely precise or accurate. But I do want to know what your thoughts are, so I can better judge whether you’re a good fit with some faculty member.
Do you have good grades? I don’t put too much stock in this, because it’s so hard to compare grades across schools (and countries). I mostly look for outliers. Did you get a low grade in a math or CS course? That’s not a deal breaker if you can explain why, what you found challenging about that in particular, or what you learned from the experience—preferably in your statement, but I might call you and ask if the application is otherwise promising.
What do the letters say? Many letters are pretty useless and say “they did well in my class”, or “they were a good team player at their internship”, so I skim these, looking for employers who give concrete examples, or faculty with whom the applicant worked closely as an RA or TA.
Do you have software engineering experience? This is pretty valuable for some research areas, so I like to see date ranges for all industry work in your resume.
Do you have research experience? This is kind of rare for undergrads, although it seems to be less and less rare. If you do have it, I want see in the statement a brief explanation of the project, what you contributed, and a little about the bigger picture of the research.

It’s a pretty subjective and random process at this point. I don’t put more or less weight on any of these things. Personally, I’m trying to figure if this is a person I can work with and train on problems that I care about.

The answer to that question isn’t about you, but about us. You might be a great applicant for someone else, but not for me. And I might be completely wrong about my assessment.

It’s hard to give concrete advice at this stage, because it seems that faculty (at least at UBC, and myself included) are now optimizing for things outside of “merit”. All things being equal, they seem to choose applicants that they think are more sure of having a good research fit, more likely will stick around, and more likely to work well with their advising style.

This is pretty reasonable, I think. I don’t want to work with someone for the next 6 years who isn’t actually interested in what I’m (and therefore, they) are working on; they’re likely to switch advisors, or switch programs. At the very least, they’ll be less than happy with the work. That would be a waste of everyone’s time and money and happiness. I don’t want to work with someone I feel ill-equipped to advise; that wouldn’t be good for either of us.

The only thing I can recommend here is try to communicate your interests and needs clearly.

If an application is not accepted at this stage, it’s not because “you’re not good enough”. It’s because (we think, based on the application materials, that) we’re not a good fit for what is a serious commitment.

Filter 4

After that, there remains a final filter: there are still only so many spots. We only have a limited amount of funding, and a limited amount of time. We literally cannot accept everyone.

At least 5 great applicants expressed interest in a faculty member who had 1 slot. That faculty member turned down multiple applications with top-tier conference publications as undergraduate students, straight As, great references, and great research fit, who (based on the application materials, believed) they were willing and able to advise. Not because they weren’t good, but because 4 of 5 people literally could not be accepted.

If an application is not accepted at this stage, it’s not because “you’re not good enough”. There’s just a limited numbers of spots.

Conclusion

Throughout this process, I haven’t rejected anyone. But after this stage, all applications not accepted get exactly the same “rejection” letter. Some admin staff and crappy IT systems are responsible for this.

There’s no way for me to tell you which stage you made it to, to provide some transparency on the process. Given the number of applications I have to look at it, I’m not sure I would or could anyway. That would require a lot of emotional drain and time. I’m also not sure how many students would want that kind of feedback.

So let me just reiterate: If an application is not accepted at any stage, it’s not because “you’re not good enough”

λk.(k blog): λk.(k blog)

A high-level summary and interpretation of ACM finances

A high-level summary of ACM finances, from an objective view point

A subjective interpretation of the high-level ACM finances

Digital library subscriptions and APCs are the main source of revenue

The ACM is sitting on a very large rainy day fund

The ACM is very profitable, particularly for a non-profit

The ACM is not spending $700 per article

The ACM funds no good works at all

The ACM doesn’t need to charge membership fees

The ACM seems very bad at investing

Conclusions: What should I do with this information?

Academia Is a For-Profit Industry

What is a model?

Definitions of “model”

So what is a model?

What is a model of a programming language?

What is syntax?

The First Meaning of Syntax

Historical Interlude

When Semantics is the Syntax

In What Sense is WebAssembly Memory Safe?

What is memory safety?

Memory (un)safety in Wasm

What is logical relations?

What are logical relations, historically?

tait1967 - Intensional interpretations of functionals of finite type I

plotkin1973 - Lambda-definability and logical relations

How is “logical relations” used in PL?

ahmed2006 - Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types

abel2018 - Decidability of conversion for type theory in type theory

timany2022 - A Logical Approach to Type Soundness

What is realizability?

What is realizability, historically?

kleene1945 - On the Interpretation of Intuitionistic Number Theory

amadio1998 - Domains and Lambda-Calculi, Chapter 15

How is “realizability” used in PL?

benton2010 - Realizability and Compositional Compiler Correctness for a Polymorphic Language

nakano2000 - A Modality for Recursion

The A Means A

What does the A mean?

Why does the A matter?

Where does the A come from??

What is peer reviewing?

The Scientific Aspect of Reviewing

Correctness

Originality

Clarity

Relevance

Significance

The Social Aspect of Reviewing

The Syllabus

What is the Point of a Final Exam

What is the point of a final?

Credentialing

Teaching

Is a final exam right for my course?

For Teaching

For Credentialing

An Example

Enabling CORS for nginx WebDAV and CalDAV reverse-proxy

Table of Contents

1 TLDR

2 CORS Requests and Responses

2.1 Preflight

2.1.1 Preflight Request

2.1.2 Preflight Response

2.2 Cross-Origin Requests

3 Configuring nginx

3.1 Configure Valid Cross-Origin Hosts

3.2 Configure CORS Headers

3.3 Process CORS Requests

4 Conclusion and Debugging

A Suitable Cutlery Tray

What is the optimal arrangment of cutlery in a drawer?

Locking down your browser to defend yourself from rickroll

Setting up your backup service

Table of Contents

1 Introduction

2 Install Prerequisite Software