Talk Title: From kernel machines to the linear representation hypothesis for monitoring and steering LLMs
Abstract: A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always "know what they know'' and may even be unintentionally or actively misleading. In this talk I will discuss feature learning introducing Recursive Feature Machines — a powerful generalization of the classical kernel methods designed for extracting relevant features from tabular data. I will demonstrate how this technique enables us to detect and precisely guide LLM behaviors toward almost any desired concept by manipulating a fixed vector in the LLM activation space. I will also discuss how the same method allows for probing for whether LLM exhibits motivated reasoning.
Misha Belkin
Mikhail Belkin is a Professor at the Halicioglu Data Science Institute and the Computer Science and Engineering Department at UC San Diego, and an Amazon Scholar. Previously, he was a Professor in the Department of Computer Science and Engineering and the Department of Statistics at The Ohio State University. He received his Ph.D. in Mathematics from the University of Chicago, where he was advised by Partha Niyogi. His research broadly spans the theory and applications of machine learning, deep learning, and data analysis.