Blog

The METR Task Standard

2024-03-02

At METR, I've helped define a standard for tasks that evaluate language model agents for autonomous capabilities.

Dangerous capabilities evaluations for AI

2023-12-02

A talk I gave at meetups of Toronto AI Safety and the Wisconsin AI Safety Initiative.

SSH multiplexing gotchas

2023-11-30

By default, you can only multiplex so many sessions. And what happens if you change sshd_config?

I'm joining ARC Evals

2023-11-03

Why I think this role at this organization will let me meaningfully reduce AI x-risk.

Pharmacies are restricting the Bridge Access Program to those without insurance

2023-10-08

I hoped to receive a free COVID vaccine. Now, that looks less likely.

Reproducing ARC Evals' recent report on language model agents

2023-09-01

I was able to build an agent roughly as capable as ARC Evals'.

I’m leaving my job. Next, AI x-risk

2023-08-16

Why I’m leaving and why I want to help reduce the risk of humanity going extinct because of AI.

Creating an AI safety chatbot using LangChain and GPT-3

2023-03-27

chat-langchain and ChatGPT made it easy.

100 days of learning Vietnamese

2023-03-17

The hard parts for me: pronunciation, tones, and vocabulary.

Practicing for systems design interviews with ChatGPT

2022-12-09

It turns out that ChatGPT isn't a bad interviewer.

Kotlin's in and out keywords

2022-10-23

Writing down my understanding of these keywords so I don't forget later.

Focusing is debugging for the brain

2022-05-20

Both involve gathering data, generating hypotheses, and seeing how one fits the other.

Kill the Newsletter

2022-04-24

I found a tool that converts email newsletters into RSS feeds.

Mastering Workman

2022-02-12

Reflecting on the past 16 months of using the Workman keyboard layout.

Editing inline code blocks

2022-01-11

Why I find it easier in some productivity apps than others.

Predictions on Twitch

2021-01-25

Recording my initial thoughts on Twitch's new Channel Point Predictions feature.

Subtracting from the blob

2021-01-02

Creative work is subtractive. How does this affect my career development?

Misleading with percent changes of percentages

2020-10-12

If I say unemployment increased by 2%, is that a big deal or not?

Jumping in the deep end with Workman

2020-10-09

Documenting how I started to use the Workman keyboard layout full-time.

Understanding quine-central: the source code

2020-09-27

An explanation of how quine-central generates quine loops.

RC day 5

2020-09-26

Reflecting on my fifth and final day at RC.

RC day 4

2020-09-24

Reflecting on my fourth day at RC.

Understanding quine-central: how do quine loops work?

2020-09-24

An explanation of how the quine loops generated by quine-central work.

RC day 3

2020-09-23

Reflecting on my third day at RC.

RC day 2

2020-09-22

Reflecting on my second day at RC.

RC day 1

2020-09-21

Reflecting on my first day at RC.

My goals for RC

2020-09-20

I'm participating in a Recurse Center mini-batch. Here are my goals.

Brushing my teeth

2020-09-07

Committing to taking control of my oral hygiene and focusing on reality instead of feeling sorry for myself.

PipedInputStream and PipedOutputStream gotchas

2020-07-18

A couple of gotchas I encountered while using Java's PipedInputStream and PipedOutputStream classes.

Grayscale screens

2020-07-11

My experience using grayscale mode on my phone and laptop.

Using Workman

2020-07-10

An update on my progress towards using the Workman keyboard layout full-time.

This blog has an RSS feed

2020-06-27

Detailing the script I wrote to generate an RSS feed for my blog and why I wrote it.

Project retrospective: EDM Scraper

2020-06-13

Looking back on EDM Scraper, a service that sends me a daily email of nearby concerts.

Learning Workman

2020-06-06

My experience after two weeks of learning to type using the Workman keyboard layout.

Unlocking SSH keys using pass

2020-05-29

My setup for automatically unlocking SSH keys with passwords stored in pass, the standard Unix password manager.

Rate-limit right before allocating

2020-05-26

A principle for when in a request’s lifecycle to rate-limit, both ensuring the rate limit is effective and preventing poor user experiences.

A small mindfulness win

2020-05-25

How in one particular situation I used mindfulness to counteract social anxiety about meeting new people.

What would we do if we didn't do code review?

2020-04-24

Grouping code review's benefits into three categories and suggesting that another software development practice could partially replace it.