Blog
I've moved to Berkeley
A quick life update.
Vivaria: METR's platform for evaluating AI agents
METR open-sourced its platform for evaluating AI agents. It's called Vivaria.
The METR Task Standard
At METR, I've helped define a standard for tasks that evaluate language model agents for autonomous capabilities.
Dangerous capabilities evaluations for AI
A talk I gave at meetups of Toronto AI Safety and the Wisconsin AI Safety Initiative.
SSH multiplexing gotchas
By default, you can only multiplex so many sessions. And what happens if you change sshd_config?
I'm joining ARC Evals
Why I think this role at this organization will let me meaningfully reduce AI x-risk.
Pharmacies are restricting the Bridge Access Program to those without insurance
I hoped to receive a free COVID vaccine. Now, that looks less likely.
Reproducing ARC Evals' recent report on language model agents
I was able to build an agent roughly as capable as ARC Evals'.
I’m leaving my job. Next, AI x-risk
Why I’m leaving and why I want to help reduce the risk of humanity going extinct because of AI.
Creating an AI safety chatbot using LangChain and GPT-3
chat-langchain and ChatGPT made it easy.
100 days of learning Vietnamese
The hard parts for me: pronunciation, tones, and vocabulary.
Practicing for systems design interviews with ChatGPT
It turns out that ChatGPT isn't a bad interviewer.
Kotlin's in
and out
keywords
Writing down my understanding of these keywords so I don't forget later.
Focusing is debugging for the brain
Both involve gathering data, generating hypotheses, and seeing how one fits the other.
Kill the Newsletter
I found a tool that converts email newsletters into RSS feeds.
Mastering Workman
Reflecting on the past 16 months of using the Workman keyboard layout.
Editing inline code blocks
Why I find it easier in some productivity apps than others.
Predictions on Twitch
Recording my initial thoughts on Twitch's new Channel Point Predictions feature.
Subtracting from the blob
Creative work is subtractive. How does this affect my career development?
Misleading with percent changes of percentages
If I say unemployment increased by 2%, is that a big deal or not?
Jumping in the deep end with Workman
Documenting how I started to use the Workman keyboard layout full-time.
Understanding quine-central
: the source code
An explanation of how quine-central
generates quine
loops.
RC day 5
Reflecting on my fifth and final day at RC.
RC day 4
Reflecting on my fourth day at RC.
Understanding quine-central
: how do quine loops work?
An explanation of how the quine loops generated by
quine-central
work.
RC day 3
Reflecting on my third day at RC.
RC day 2
Reflecting on my second day at RC.
RC day 1
Reflecting on my first day at RC.
My goals for RC
I'm participating in a Recurse Center mini-batch. Here are my goals.
Brushing my teeth
Committing to taking control of my oral hygiene and focusing on reality instead of feeling sorry for myself.
PipedInputStream and PipedOutputStream gotchas
A couple of gotchas I encountered while using Java's PipedInputStream and PipedOutputStream classes.
Grayscale screens
My experience using grayscale mode on my phone and laptop.
Using Workman
An update on my progress towards using the Workman keyboard layout full-time.
This blog has an RSS feed
Detailing the script I wrote to generate an RSS feed for my blog and why I wrote it.
Project retrospective: EDM Scraper
Looking back on EDM Scraper, a service that sends me a daily email of nearby concerts.
Learning Workman
My experience after two weeks of learning to type using the Workman keyboard layout.
Unlocking SSH keys using pass
My setup for automatically unlocking SSH keys with passwords stored in pass, the standard Unix password manager.
Rate-limit right before allocating
A principle for when in a request’s lifecycle to rate-limit, both ensuring the rate limit is effective and preventing poor user experiences.
A small mindfulness win
How in one particular situation I used mindfulness to counteract social anxiety about meeting new people.
What would we do if we didn't do code review?
Grouping code review's benefits into three categories and suggesting that another software development practice could partially replace it.