Bookmarks Menu

Mozilla Firefox Get Help Customize Firefox Get Involved About Us
Bookmarks Toolbar arxiv-sanity Modal Mapping / Learn Vimscript the Hard Way foundations Applied Linear Algebra Lecture Series — LessWrong Six (and a half) intuitions for SVD — LessWrong Some ML-Related Math I Now Understand Better — LessWrong Streamlit Linear Algebra for Programmers | Build a foundation for Machine Learning, AI and other areas Maximum likelihood estimation for the beer example model - YouTube Why is a likelihood not a probability distribution? - YouTube QuantFinance/notebooks/Module 2.3.1 - Statistical Significance.ipynb at master · PythonCharmers/QuantFinance · GitHub [1604.06737] Entity Embeddings of Categorical Variables Mode Collapse and the Norm One Principle — LessWrong Basic Facts about Language Model Internals — LessWrong Residual stream norms grow exponentially over the forward pass — LessWrong [1706.02515] Self-Normalizing Neural Networks [2303.01610] Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers Shtetl-Optimized » Blog Archive » The First Law of Complexodynamics Taking the parameters which seem to matter and rotating them until they don't — LessWrong A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features' [2403.20284v1] LayerNorm: A key component in parameter-efficient fine-tuning Superposition, Memorization, and Double Descent Softmax Linear Units Toy Models of Superposition [1805.07091] Tropical Geometry of Deep Neural Networks Singular Learning Theory - LessWrong [2008.02217] Hopfield Networks is All You Need arxiv-sanity Ch 9: Convolutional Networks - YouTube 2003.04887v2.pdf [1603.07285] A guide to convolution arithmetic for deep learning Jensen–Shannon Divergence - Notes on AI Kaiji Opening 01 HD - Mirai wa Bokura no Te no Naka - YouTube GitHub - sshuttle/sshuttle: Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling. I Wrote a Faster Sorting Algorithm | Probably Dance Deconvolution and Checkerboard Artifacts fast.ai – AdamW and Super-convergence is now the fastest way to train neural nets https://www.youtube.com/watch?feature=shared&t=1542&v=0_BBRNYInx8 1902.05522v2.pdf advanced transformer A Walkthrough of A Mathematical Framework for Transformer Circuits - YouTube [1711.03953] Breaking the Softmax Bottleneck: A High-Rank RNN Language Model Softmax Bottleneck Makes Language Models Unable to Represent Multi-mode Word Distributions - ACL Anthology [2404.07647] Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Basic Facts about Language Model Internals — LessWrong LLM.int8() and Emergent Features — Tim Dettmers Because of LayerNorm, Directions in GPT-2 MLP Layers are Monosemantic — LessWrong [2203.11355] Origami in N dimensions: How feed-forward networks manufacture linear separability [1805.06576] Mad Max: Affine Spline Insights into Deep Learning [1609.09106] HyperNetworks [1607.06450] Layer Normalization Estimating efficiency improvements in LLM pre-training — LessWrong [2302.01834] Coinductive guide to inductive transformer heads Re-Examining LayerNorm — LessWrong Interpreting Neural Networks through the Polytope Lens — LessWrong Branch Specialization Activation Atlas What are the tradeoffs of increasing the depth of a neural network? - Quora Intro to Optimization in Deep Learning: Busting the Myth About Batch Normalization Keeping up with AGI GradIEEEnt half decent deep learning - Why do ResNets avoid the vanishing gradient problem? - Artificial Intelligence Stack Exchange 1711.00165v3.pdf A Visual Exploration of Gaussian Processes [1806.07572] Neural Tangent Kernel: Convergence and Generalization in Neural Networks [2012.00152] Every Model Learned by Gradient Descent Is Approximately a Kernel Machine [2212.13881] Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features [2111.00034] Neural Networks as Kernel Learners: The Silent Alignment Effect OyedotunAl IsmaeilAouada_NeurocomputingJournal.pdf Extending Context is Hard...but not Impossible† - Kaio 1906.00904v2.pdf Tropical Geometry of Deep Neural Networks - 1805.07091v1.pdf 2211.12312v1.pdf Keeping NN Simple by Minimizing the Description Legnth of the Weights - colt93.pdf Work dumber not smarter — LessWrong 1603.05027v3.pdf Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice - NIPS-2017-resurrecting-the-sigmoid-in-deep-learning-through-dynamical-isometry-theory-and-practice-Paper.pdf Lecture 4: Signal Propagation and Dynamical Isometry in Deep Neural Networks (English) - YouTube Lecture 4: Signal Propagation and Dynamical Isometry in Deep Neural Networks (English) - YouTube [2210.01245] Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks 2402.03332v1.pdf counterintuitive The Bitter Lesson The Scaling Hypothesis · Gwern.net Deep Double Descent: Where Bigger Models and More Data Hurt Single Headed Attention RNN: Stop Thinking With Your Head Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning - YouTube The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Uncovering Mesa-Optimization Algorithms in Transformers On Emergent Abilities, Scaling Architectures and Large Language Models — Yi Tay Are Emergent Abilities of Large Language Models a Mirage? Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models Language Models Represent Space and Time Invitation to Imitation Energy Based Models GitHub - PythonCharmers/QuantFinance: Top training materials in quantitative finance othello_world/mechanistic_interpretability/tl_initial_exploration.py at master · likenneth/othello_world · GitHub [2404.05729] Finding Visual Task Vectors [2404.05405] Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws [2106.03216] On Memorization in Probabilistic Deep Generative Models [2404.06773] Adapting LLaMA Decoder to Vision Transformer The British-American coup that ended Australian independence | John Pilger | The Guardian megamined/deepseek-math-7b-rl-8.0bpw-exl2 at main [2404.12358] From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function 2207.00729.pdf 2305.12766.pdf 2305.12907.pdf 2404.07544.pdf 2306.04637.pdf Why is in-context learning lower quality than fine-tuning? And…what if it wasn't? · Hazy Research On Average, You’re Using the Wrong Average: Geometric & Harmonic Means in Data Analysis | by Daniel McNichol | Towards Data Science the Stolarsky mean - YouTube Riemannian manifold - Wikipedia Fréchet mean - Wikipedia Root mean square - Wikipedia Mean - Wikipedia Lesson 8 - Practical Deep Learning for Coders 2022 - YouTube [2404.03626v1] Training LLMs over Neurally Compressed Text [2404.14408] SpaceByte: Towards Deleting Tokenization from Large Language Modeling GitHub - neelnanda-io/TransformerLens: A library for mechanistic interpretability of GPT-style language models main_notes.pdf 6.2 Logistic Regression and the Cross Entropy Cost Detexify LaTeX handwritten symbol recognition TRANSFORMERS TRANSFORMERS TRANSFORMERS Welcome To Colab - Colab Let's build GPT: from scratch, in code, spelled out. - YouTube The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube gpt-dev.ipynb - Colab GitHub - fastai/fastbook: The fastai book, published as Jupyter Notebooks 10_nlp.ipynb - Colab 12_nlp_dive.ipynb - Colab 13_convolutions.ipynb - Colab Because of LayerNorm, Directions in GPT-2 MLP Layers are Monosemantic — LessWrong A Mathematical Framework for Transformer Circuits Transformer Circuits Thread What is a Transformer? (Transformer Walkthrough Part 1/2) - YouTube Implementing GPT-2 From Scratch (Transformer Walkthrough Part 2/2) - YouTube Interpreting Neural Networks through the Polytope Lens — LessWrong Some ML-Related Math I Now Understand Better — LessWrong Re-Examining LayerNorm — LessWrong Mechanistic Interpretability Quickstart Guide — Neel Nanda Concrete Steps to Get Started in Transformer Mechanistic Interpretability — Neel Nanda main_notes.pdf 6.2 Logistic Regression and the Cross Entropy Cost Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy - YouTube But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning - YouTube Discord | Your Place to Talk and Hang Out NumPy quickstart — NumPy v1.26 Manual Exploring Low Rank Training of Deep Neural Networks - 2209.13569.pdf Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention - 2020.coling-main.324.pdf GitHub - EGjoni/DRUGS: Stop messing around with finicky sampling parameters and just use DRµGS! TRANSFORMERS2 Let's build GPT: from scratch, in code, spelled out. - YouTube 14_resnet.ipynb - Colab 15_arch_details.ipynb - Colab 16_accel_sgd.ipynb - Colab 18_CAM.ipynb - Colab 10_nlp.ipynb - Colab 12_nlp_dive.ipynb - Colab Because of LayerNorm, Directions in GPT-2 MLP Layers are Monosemantic — LessWrong A Mathematical Framework for Transformer Circuits What is a Transformer? (Transformer Walkthrough Part 1/2) - YouTube Implementing GPT-2 From Scratch (Transformer Walkthrough Part 2/2) - YouTube Einops tutorial, part 1: basics - Einops Concrete Steps to Get Started in Transformer Mechanistic Interpretability — Neel Nanda Streamlit [1.1] Transformer from Scratch (exercises).ipynb - Colab main_notes.pdf 6.2 Logistic Regression and the Cross Entropy Cost pyvene/tutorials/advanced_tutorials/DAS_Main_Introduction.ipynb at main · stanfordnlp/pyvene · GitHub 5613__what_learning_algorithm_is_in.pdf MATS Program ANT-intro - Dropbox 2112.04426v3.pdf The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time. Kullback-Leibler Divergence Explained — Count Bayesie 2012.14913v2.pdf [2404.15758] Let's Think Dot by Dot: Hidden Computation in Transformer Language Models Neural Networks — PyTorch Tutorials 2.3.0+cu121 documentation Transformer Inference Arithmetic | kipply's blog GitHub - konstmish/prodigy: The Prodigy optimizer and its variants for training neural networks. A basic introduction to NumPy's einsum – ajcr – Haphazard investigations [2404.12224v1] Length Generalization of Causal Transformers without Position Encoding You should (usually) log transform your positive data | Statistical Modeling, Causal Inference, and Social Science Simple Linear Regression 2404.17625v1.pdf Xournal++ - Xournal++ Differential Manifolds (Pure and Applied Mathematics, Volume 138): Kosinski, Antoni A.: 9780124218505: Amazon.com: Books kipply's blog 2311.01460v1.pdf 2203.00555v1.pdf NeurIPS-2023-vanillanet-the-power-of-minimalism-in-deep-learning-Paper-Conference.pdf [2302.10322] Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation [2402.16788] Why Transformers Need Adam: A Hessian Perspective [2305.02790] BranchNorm: Robustly Scaling Extremely Deep Transformers 2203.03466v2.pdf A Comprehensive Mechanistic Interpretability Explainer & Glossary — Neel Nanda Transformer Inference Arithmetic | kipply's blog Untitled13.ipynb - Colab Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained) - YouTube Deep Networks Are Kernel Machines (Paper Explained) - YouTube Predictive Coding Approximates Backprop along Arbitrary Computation Graphs (Paper Explained) - YouTube Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained) - YouTube ⚡Lightning AI - Pricing Intrinsic dimension of data representations in deep neural networks - NeurIPS-2019-intrinsic-dimension-of-data-representations-in-deep-neural-networks-Paper.pdf Welcome to Spinning Up in Deep RL! — Spinning Up documentation [Prob&Stats] 3. Expected Value, Variance, and Standard Deviation | by jun94 | jun-devpBlog | Medium Why Momentum Really Works DL-Tutorial-NIPS2015.pptx - DL-Tutorial-NIPS2015.pdf machine learning - Why do we have to divide by 2 in the ML squared error cost function? - Data Science Stack Exchange Writing Custom Datasets, DataLoaders and Transforms — PyTorch Tutorials 2.3.0+cu121 documentation 2304.14802v1.pdf Bayes theorem, the geometry of changing beliefs - YouTube Bayes Theorem & Confusion Matrix. Bayes’ theorem is crucial for… | by Max Kleiner | Medium Ben Lambert - YouTube A Student's Guide to Bayesian Statistics - YouTube llms MiniCPM: Unveiling the Potential of End-side Large Language Models Stop Shuffling [2402.01878] LiPO: Listwise Preference Optimization through Learning-to-Rank [2402.13753] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens OLMO Yi Pretraining Question · Issue #6 · pratyushasharma/laser · GitHub main : add Self-Extend support by ggerganov · Pull Request #4815 · ggerganov/llama.cpp · GitHub Quiet-STaR AstraMindAI/AstraQuasar-4B · Hugging Face Deepseek Pretraining [2312.04455] Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use [2305.10429] DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining [2402.03620] Self-Discover: Large Language Models Self-Compose Reasoning Structures MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases - 2402.14905.pdf NEFTune A Review of Public Japanese Training Sets · AUGMXNT/shisa Wiki · GitHub Apple Researchers Unveil DeepPCR: A Novel Machine Learning Algorithm that Parallelizes Typically Sequential Operations in Order to Speed Up Inference and Training of Neural Networks : LocalLLaMA [2311.04254] Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation Sparsetral Paper Sparsetral Unsloth Layer removal [2403.11901] Larimar: Large Language Models with Episodic Memory Control [2311.10770] Exponentially Faster Language Modelling Tool Generalization [2401.07004] Extending LLMs' Context Window with 100 Samples bestmedia Antimemetics Division Hub - SCP Foundation Starting Over: Chapters 1 to 23 (by Sugaru Miaki) - vgperson's Novel Translations About the Reckless Girl who kept Challenging a Reborn Man like Me | Inlitify OA Animal Farm Episode 1 | The Ember Knight Lelouch Checkmate's Schneizel - Code Geass [HD] - YouTube Code Geass: The Evil Paradox - YouTube How the US Government took control of the Internet | The Hacker Crackdown - YouTube DEF CON 22 - Don't Fuck it Up! - YouTube When Cybercriminals with Good OpSec Attack - YouTube Paul Le Roux - Wikipedia Surprisingly Turing-Complete · Gwern.net Death Note: L, Anonymity & Eluding Entropy · Gwern.net proceedings.pdf qmailsec.dvi - qmailsec-20071101.pdf The Melancholy of Subculture Society · Gwern.net Laws of Tech: Commoditize Your Complement · Gwern.net Timing Technology: Lessons From The Media Lab · Gwern.net The Monkey On Wall Street - Watch BEFORE You Invest - YouTube Don't Talk to the Police - YouTube The Rules for Rulers - YouTube How to be a Pirate Quartermaster. 📈 💎 📈 - YouTube Apple's $3500 Nightmare - YouTube Why Have Modular Smartphones Failed? - YouTube Why we hate engineers - YouTube What if Japan became Christian in 1637? - YouTube postpostmodernism - YouTube How To Criticize Computer Scientists This Blog is Systematic: If you're so smart, how come you're not SBF? The Kelly criterion and choice of expectations and utility function when bet sizing The Microprice: Estimating the fair price, given the state of the order book. - YouTube The Financial Mathematics of Market Liquidity: From Optimal Execution to ... - Olivier Gueant - Google Books Usability for Nerds - Wikibooks, open books for an open world Pluralistic: Tiktok’s enshittification (21 Jan 2023) – Pluralistic: Daily links from Cory Doctorow The Only Unbreakable Law - YouTube George Hotz | Researching | documenting the AMD 7900XTX so we can understand why it crashes | RDNA 3 - YouTube William F. Baker: "On the Harmony of Theory and Practice in the Design of Tall Buildings" - YouTube Growtopia | Official Website Hi Score Girl - Wikipedia novena by cecile richard Shattered Pixel Dungeon - Shattered Pixel Nebulous.io - Apps on Google Play The Blockheads | Exploration, creation and survival sandbox game for iOS and Android Rikei ga Koi ni Ochita no de Shoumei shitemita. - MyAnimeList.net startup Avoid blundering: 80% of a winning strategy comma ai | Three Stories - The Past, Future, and Present | George Hotz Announcing comma 3X COMMA_CON - YouTube Sam Altman - How to Succeed with a Startup - YouTube Mark Zuckerberg at Startup School 2013 - YouTube Lecture 3 - Before the Startup (Paul Graham) - YouTube Lean Game Server 31 nooby C++ habits you need to ditch - YouTube Stanford CS 25 | Transformers United Apart Sprints Representation Engineering: A Top-Down Approach to AI Transparency 2311.05908.pdf Three interpretations of matrix products Smart animate layers between frames – Figma Learn - Help Center GitHub - jacobhilton/deep_learning_curriculum: Language model alignment-focused deep learning curriculum A Comprehensive Mechanistic Interpretability Explainer & Glossary - Dynalist Mechanistic Interpretability Quickstart Guide — Neel Nanda Concrete Steps to Get Started in Transformer Mechanistic Interpretability — Neel Nanda ThursdAI - The top AI news from the past week | Alex Volkov | Substack Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU - 2403.06504.pdf Mathematics for Machine Learning - mml-book.pdf GitHub - center-for-humans-and-machines/transformer-heads: Toolkit for attaching, training, saving and loading of new heads for transformer models Chess-GPT’s Internal World Model | Adam Karvonen GitHub - adamkarvonen/chess_gpt_eval: A repo to evaluate various LLM's chess playing abilities. Manipulating Chess-GPT’s World Model | Adam Karvonen Towards Monosemanticity: Decomposing Language Models With Dictionary Learning 2309.08600.pdf [2309.05858] Uncovering mesa-optimization algorithms in Transformers 2311.03658.pdf Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes - 2402.05406.pdf 2309.16318.pdf bringing down the cost of pretraining llama to $200 - Pastebin.com RLbook2020.pdf There is a Narrative Trick in this Story | Inlitify OA rej WTF Happened In 1971? [2403.03507v1] GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection 2307.05695.pdf ReLoRA - what we've learned · Issue #320 · allenai/OLMo · GitHub Financial Engineering Playground: Signal Processing, Robust Estimation, Kalman, Optimization - YouTube Transformers Learn In-Context by Gradient Descent - 2212.07677.pdf 7195_one_step_of_gradient_descent_i.pdf In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization - 2402.14951.pdf 6735_in_context_convergence_of_tran.pdf [2310.05249] In-Context Convergence of Transformers [2306.09927] Trained Transformers Learn Linear Models In-Context MATHAI_29_paper.pdf Stanford Seminar - Generalized Reversible Computing and the Unconventional Computing Landscape - YouTube Perspectives on diffusion – Sander Dieleman Permission is hereby granted | Suno 2404.03626.pdf 1906.01820.pdf kleene56representation.pdf 2404.02060.pdf DiJiang: Efficient Large Language Models through Compact Kernelization - 2403.19928.pdf AI Mathematical Olympiad - Progress Prize 1 | Kaggle [2311.04897] Future Lens: Anticipating Subsequent Tokens from a Single Hidden State [2403.16952] Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance 2403.08763.pdf Can language models explore in context? In-Context Convergence of Transformers Privileged Bases in the Transformer Residual Stream ml stash https://www.youtube.com/watch?v=NfnWJUyUJYU&feature=youtu.be https://www.youtube.com/playlist?list=PL6EF60E1027E1A10B [2109.12927] Faking Brownian motion with continuous Markov martingales [2302.01834v1] Coinductive guide to inductive transformer heads hywu/Camelidae-8x7B · Hugging Face hywu/Qwen2idae-16x14B-v1.0 · Hugging Face GitHub - mathllm/MathCoder: Family of LLMs for mathematical reasoning. GitHub - hendrycks/math: The MATH Dataset (NeurIPS 2021) nvidia/OpenMath-Mistral-7B-v0.1-hf · Hugging Face Reverse Training to Nurse the Reversal Curse Common 7b models already have strong math CoT without prompting Contrastive Decoding ReFT [2404.02060] Long-context LLMs Struggle with Long In-context Learning Mixture of Depths SSC-CoT Formalized CoT with theorem proving horspools algortihm at DuckDuckGo