
About me
- I am a postdoctoral researcher at Stanford CS, supervised by Prof. Percy Liang. I will be joining Tel Aviv University CS as a faculty member in Fall 2024
- Until recently, I conducted research around large language models as co-Chief Scientist at AI21 Labs.
- I earned my Ph.D. at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the supervision of Prof. Amnon Shashua.
- I completed my M.Sc. degree in the Condensed Matter Physics Department at The Weizmann Institute of Science, under the supervision of Prof. Yuval Oreg.
- I have a double B.Sc. in Physics and Electrical Engineering from Tel Aviv University.
Research
News
-
Sep’2023: Our paper on PAC learnability for in-context learning was accepted to NeurIPS 2023.
-
Aug’2023: In-Context Retrieval-Augmented Language Models was accepted for journal publication at TACL.
-
July’2023: We released a paper proposing a new automatic method for evaluating an LLM propensity to generate correct facts from a given corpus.
-
May’2023: We released a paper concluding the largest Turing style experiment to date, with over 10 million conversations of humans trying to reveal whether their partner is human or bot.
-
May’2023: Parallel Context Windows was accepted to ACL 2023.
-
Apr’2023: We released a theoretical paper proposing a model of adversarial prompting and misalignment attacks on LLMs.
-
Apr’2023: I am excited to announce that I’ll be joining the Blavatnik School of Computer Science at Tel Aviv University as a faculty member in Fall 2024!
-
Mar’2023: We released a theoretical paper proposing a PAC learning framework for in-context learning in LLMs.
-
Feb’2023: We released In-context Retrieval-Augmented Language Models, a paper studying the benefits of including retrieved documents in the LLM context window. Code available here.
-
Jan’2023: Our paper on the theoretical benefits of sub-task decomposition in seq-to-seq model was accepted to ICLR 2023.
-
Dec’2022: We released Parallel Context Windows, a technique for enlarging an LLM’s context window without training. Code is available here.
-
Jul’2022: Our paper on frozen large language models as readers for open domain question answering was accepted for presentation at the ICML 2022 Workshop on Knowledge Retrieval and Language Models.
-
May’22: We released a paper which proposes a modular, neuro-symbolic architecture that combines large language models, external knowledge sources, and discrete reasoning.
-
Apr’22: We released Standing on the Shoulders of Giant Frozen LMs, a paper which proposes using huge LM as a backbones surrounded by supporting networks that are 1000X smaller (BERT scale) that are able to externally specialize it without sacrificing performance and without actually changing its weights.
-
Apr’22: We released a paper which proves a first positive theoretical result for learning with intermediate supervision. Our paper theoretically motivates trending approaches such as Chain of Thought Prompting, which tackle compounded problems in natural language via introducing intermediate supervision in a seq2seq manner.
-
Jan’22: Our paper on the in-context learning bias in Transformer pretraining was accepted as a spotlight paper at ICLR 2022.
-
Oct’21: We released a paper that theoretically establishes an in-context learning bias in Transformer architectures, and proposes kNN-Pretraining: a new paradigm for pretraining-example design.
-
May’21: Our paper on expressivity bottlenecks in self-attention and their impact on Transformer architecture design across data modalities was accepted to ICML 2021.
-
Apr’21: I am a recipient of the Blavatnik Prize for PhD students.
- Jan’21: Our paper on PMI-Masking was accepted as a spotlight paper at ICLR 2021.
- Sep’20: Our paper on the depth-to-width trade-off in self attention was accepted to NeurIPS 2020.
- June’20: We released a paper shedding light on the interplay between depth and width in self-attention architectures. See blogpost for overview of results.
- Apr’20: Our SenseBERT paper was accepted to ACL 2020.
Jan’20: Deep Autoregressive Models for the Efficient Variational Simulation of Many-Body Quantum Systems was accepted to Physical Review Letters.
- Aug’19: We released our SenseBERT paper which introduces information on word senses withing BERT’s pretraining.
- Feb’19: We released a paper paper developing specialized deep autoregressive models for the efficient simulation of quantum systems.
- Jan’19: Quantum Entanglement in Deep Learning Architectures was accepted to Physical Review Letters.
- Mar’18: We released a paper showing that prominent deep learning architectures can efficiently represent highly entangled quantum wave-functions.
- Mar’18: I am a recipient of the Adams Fellowship for Doctoral Students, the Israel Academy of Sciences and Humanities.
- Jan’18: Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design was accepted to ICLR 2018.
- Jan’18: Benefits of Depth for Long-Term Memory of Recurrent Networks was accepted to workshop of ICLR 2018.
- Oct’17: We released a paper showing that deep recurrent networks have an exponential advantage in long-term memory capacity relative to shallow ones.
- Apr’17: We released a paper connecting quantum wave-functions and convolutional networks, proposing a quantum physics inspired principled approach for deep network design.

About me
- I am an incoming postdoc at Stanford CS, working with Prof. Percy Liang. I will be joining Tel Aviv University CS as a faculty member in Fall 2024.
- Until recently, I conducted research around large language models as co-Chief Scientist at AI21 Labs.
- I earned my Ph.D. at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the supervision of Prof. Amnon Shashua.
- I completed my M.Sc. degree in the Condensed Matter Physics Department at The Weizmann Institute of Science, under the supervision of Prof. Yuval Oreg.
- I have a double B.Sc. in Physics and Electrical Engineering from Tel Aviv University.
Research
News
Aug’2023: In-Context Retrieval-Augmented Language Models was accepted for journal publication at TACL.
July’2023: We released a paper proposing a new automatic method for evaluating an LLM proneness to generate correct facts from a given corpus.
May’2023: We released a paper concluding the largest Turing style experiment to date, with over 10 million conversations of humans trying to reveal whether their partner is human or bot.
May’2023: Parallel Context Windows was accepted to ACL 2023.
APR’2023: I am excited to announce that I’ll be joining the Blavatnik School of Computer Science at Tel Aviv University as a faculty member in Fall 2024!
Mar’2023: We released a theoretical paper proposing a PAC learning framework for in-context learning in LLMs.
Feb’2023: We released In-context Retrieval-Augmented Language Models, a paper studying the benefits of including retrieved documents in the LLM context window. Code available here.
Jan’2023: Our paper on the theoretical benefits of sub-task decomposition in seq-to-seq model was accepted to ICLR 2023.
Dec’2022: We released Parallel Context Windows, a technique for enlarging an LLM’s context window without training. Code is available here.
Jul’2022: Our paper on frozen large language models as readers for open domain question answering was accepted for presentation at the ICML 2022 Workshop on Knowledge Retrieval and Language Models.
May’22: We released a paper which proposes a modular, neuro-symbolic architecture that combines large language models, external knowledge sources, and discrete reasoning.
Apr’22: We released Standing on the Shoulders of Giant Frozen LMs, a paper which proposes using huge LM as a backbones surrounded by supporting networks that are 1000X smaller (BERT scale) that are able to externally specialize it without sacrificing performance and without actually changing its weights.
Apr’22: We released a paper which proves a first positive theoretical result for learning with intermediate supervision. Our paper theoretically motivates trending approaches such as Chain of Thought Prompting, which tackle compounded problems in natural language via introducing intermediate supervision in a seq2seq manner.
Jan’22: Our paper on the in-context learning bias in Transformer pretraining was accepted as a spotlight paper at ICLR 2022.
Oct’21: We released a paper that theoretically establishes an in-context learning bias in Transformer architectures, and proposes kNN-Pretraining: a new paradigm for pretraining-example design.
May’21: Our paper on expressivity bottlenecks in self-attention and their impact on Transformer architecture design across data modalities was accepted to ICML 2021.
Apr’21: I am a recipient of the Blavatnik Prize for PhD students.
- Jan’21: Our paper on PMI-Masking was accepted as a spotlight paper at ICLR 2021.
- Sep’20: Our paper on the depth-to-width trade-off in self attention was accepted to NeurIPS 2020.
- June’20: We released a paper shedding light on the interplay between depth and width in self-attention architectures. See blogpost for overview of results.
- Apr’20: Our SenseBERT paper was accepted to ACL 2020.
Jan’20: Deep Autoregressive Models for the Efficient Variational Simulation of Many-Body Quantum Systems was accepted to Physical Review Letters.
- Aug’19: We released our SenseBERT paper which introduces information on word senses withing BERT’s pretraining.
- Feb’19: We released a paper paper developing specialized deep autoregressive models for the efficient simulation of quantum systems.
- Jan’19: Quantum Entanglement in Deep Learning Architectures was accepted to Physical Review Letters.
- Mar’18: We released a paper showing that prominent deep learning architectures can efficiently represent highly entangled quantum wave-functions.
- Mar’18: I am a recipient of the Adams Fellowship for Doctoral Students, the Israel Academy of Sciences and Humanities.
- Jan’18: Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design was accepted to ICLR 2018.
- Jan’18: Benefits of Depth for Long-Term Memory of Recurrent Networks was accepted to workshop of ICLR 2018.
- Oct’17: We released a paper showing that deep recurrent networks have an exponential advantage in long-term memory capacity relative to shallow ones.
- Apr’17: We released a paper connecting quantum wave-functions and convolutional networks, proposing a quantum physics inspired principled approach for deep network design.