What kind of AI papers do I like to read?
Some time ago I was a guest in our company's podcast and before going on, I wanted to refresh my memory on what kind of papers I've been reading recently. And while I haven't abandoned1 my goal of reading the Sutskever 30, I note that none of those papers were on the list. I think this is an interesting measure of what I think I would like to do vs what I actually do.
A few general (obvious) background notions
The field of AI moves very fast, especially if you look at anything even remotely adjacent to LLMs. This has a few obvious implications: - There is a lot of new fertile research ground, meaning a lot of publications. - There is a strong incentive to publish fast, as a paper on GPT-3.5 catches a lot less audience on 2025 than it would have in 2022. - The gap between theoretical foundations and the cutting edge is vast at the moment. All we get are glimpses.
What this implies is that there is no chance for me to read even a reasonable fraction of the papers in the fields that interest me. Nor does all relevant new research information come in the form of a peer-reviewed (slow) article. A lot of fresh insight is first shown in various blogs and corporate whitepapers. So what I often do nowadays is that when I have a set of papers from roughly the same theme that I want to understand, I just shove them all to Google's NotebookLM to hear a comparative summary. Then I pick out the ones most relevant to my work for actual study. And so below is a list of some of the papers which have survived ths list of elimination.
Papers I've read in more or less detail
My main paper that I keep in mind (and re-read at times) when working with LLMs. It doesn’t mean that GPT is shit, but that it is very different from humans. And different does not mean worse or bad. But it does mean that you can’t use them as you would humans.
Large Language Models lack essential metacognition for reliable medical reasoning
Lack of metacognition is one of the (many) defining properties of LLMs. In particular, LLMs have somewhat sycophantic tendencies and “want” to give you an answer even if they know it’s made up. This paper is also very crucial for my work on the healthcare side of AI. Again, not worse, but different.
Tracing the thoughts of a large language model
This one made waves. Similar things have been studied since the first Transformers, but a company as big as Anthropic doing it to the massive LLM models makes for a very interesting read. Less immediately applicable, but can help the personal conceptual space. Their “Transformer circuits” series is also really good for in-depth dives to the internal world of transformers.
By happenstance, when this came out I had ended up reading about Representation Engineering, especially on Linear Artificial Tomography.
Improving Alignment and Robustness with Circuit Breakers
This is the paper I was reading that lead me to reading about Representation engineering. The idea is that using e.g. Representation Engineering we can “Cut off” bad behaviour. This shuts down e.g. most adversial attack trainings. More applicable, though not usually implementable by individuals.
Papers I've skimmed through or summarized with NotebookLM
LLM research also gives rise to better SLMs. Here they introduce ModernBERT. I am very much a fan of the BERT architecture and its variants.
Titans: Learning to Memorize at Test Time
Google research paper on better LLM memory management. Many different groups are trying to solve this at the moment. The core issue, to me, is that with transformer-based LLMs we are using the working memory (or "RAM") for all memmory purposes. There are many approaches on what might be the corresponding long-term memory, arguably RAG is one way to go, but no single thing has become the dominant approach.
A Robust Semantics-based Watermark for Large Language Models against Paraphrasing
A theoretical idea on how to watermark LLM-generated text. I find it fascinating that this might be so possible. I think I came across this in an ACX post?
Language Models Use Trigonometry to Do Addition
LLMs do math very differently than humans.
Connecting themes
It's easy to claim that "I value X", as talking is a very cheap signal. A much more stronger signal is the "proof of work" style of signal about what I've actually spent time on. So looking back at what I actually read vs what I say I would like to read, we can probably conclude some preferences on what I actually want.
-
I seem to really enjoy reading about more details on how LLMs (and SLMs) do things. On what's under the hood. This is probably related to my mathematician's background.
-
The idea of the otherness of LLMs fascinates me. This does not come as a surprise, most of my LLM-related talks spend at least a bit on the "the masked Shoggoth on the right" terminology.