Parent directory | - | - |
An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology_ M..> | 1.7 MiB | 2023-Jul-25 17:17 |
Are Emergent Abilities of Large Language Models a Mirage_ arxiv2304.15004.pdf | 1.8 MiB | 2023-May-07 01:08 |
Byte Latent Transformer_ Patches Scale Better Than Tokens_ A Pagnoni_ R Pasunuru_ R Rodriguez_ J Nguyen_ B Muller_ M Li_ C..> | 2.2 MiB | 2024-Dec-14 04:20 |
Climbing towards Natural Language Understanding_ On Meaning Form and Understanding in the Age of Data_ Emily M Bender- Ale..> | 472.2 KiB | 2023-May-08 03:18 |
Deep neural networks are robust to weight binarization and other non-linear distortions_ arxiv1606.01981.pdf | 828.6 KiB | 2024-Mar-02 16:08 |
DeepSeek-R1_ Incentivizing Reasoning Capability in LLMs via Reinforcement Learning_ arxiv2501.12948v1.pdf | 1.3 MiB | 2025-Jan-24 01:16 |
Efficient streaming language models with attention sinks_ arxiv2309.17453.pdf | 11.8 MiB | 2023-Oct-02 17:46 |
Exponentially Faster Language Modeling_ arxiv2311.10770.pdf | 230.5 KiB | 2023-Nov-27 05:35 |
Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf | 733.6 KiB | 2023-Jun-29 02:06 |
Fourier Position Embedding_ Enhancing Attention’s Periodic Extension for Length Generalization_ arxiv2412.17739v1.pdf | 793.3 KiB | 2024-Dec-26 05:15 |
GLU Variants Improve Transformer_arxiv2002.05202.pdf | 106.6 KiB | 2023-May-02 21:23 |
gpt4-maybe-leaked-details-sort-of-again.txt | 7.9 KiB | 2023-Jul-11 03:35 |
GQA_ Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_ arxiv2305.13245.pdf | 248.2 KiB | 2023-Sep-01 22:34 |
How Good Are Low-bit Quantized LLAMA3 Models_ An Empirical Study_ arxiv2404.14047v1.pdf | 260.0 KiB | 2024-Apr-26 20:16 |
Is Cosine-Similarity of Embeddings Really About Similarity_ arxiv2403.05440.pdf | 1.6 MiB | 2024-Mar-12 04:07 |
Landmark Attention_ Random-Access Infinite Context Length for Transformers_ arxiv2305.16300.pdf | 500.2 KiB | 2023-May-28 17:34 |
Llama 2_ Open Foundation and Fine-Tuned Chat Models_ arxiv2307.09288.pdf | 13.0 MiB | 2023-Aug-30 23:29 |
LLaMA_ Open and Efficient Foundation Language Models_ arxiv2302.13971.pdf | 709.5 KiB | 2023-May-13 17:45 |
Mixtral of Experts_ arxiv2401.04088.pdf | 2.4 MiB | 2024-Jan-09 03:21 |
Photonic Matrix Computing_ From Fundamentals to Applications_ Junwei Cheng_ Hailong Zhou_ Jianji Dong_ Nanomaterials 2021...> | 3.5 MiB | 2023-Jul-25 17:13 |
RoFormer_ Enhanced Transformer with Rotary Position Embedding_ arxiv2104.09864v4.pdf | 572.6 KiB | 2023-Apr-21 00:35 |
R ULER_ What’s the Real Context Size of Your_ arxiv2404.06654v2.pdf | 642.6 KiB | 2024-Jul-30 02:47 |
SentencePiece_ A simple and language independent subword tokenizer and detokenizer for Neural Text Processing_ arxiv1808.0..> | 206.7 KiB | 2023-May-13 17:44 |
SmoothQuant_ Accurate and Efficient Post-Training Quantization for Large Language Models_ arxiv2211.10438.pdf | 5.1 MiB | 2023-Dec-11 23:12 |
Stay on topic with Classifier-Free Guidance_ arxiv2306.17806.pdf | 1.9 MiB | 2023-Sep-30 04:35 |
Steering Llama 2 via Contrastive Activation Addition_ arxiv2312.06681.pdf | 27.3 MiB | 2023-Dec-13 04:54 |
The case for 4-bit precision_ k-bit Inference Scaling Laws_ arxiv2212.09720.pdf | 884.7 KiB | 2023-Aug-28 18:57 |
The Curse of Recursion_ Training on Generated Data Makes Models Forget_ arxiv2305.17493.pdf | 2.2 MiB | 2023-Aug-24 19:28 |
The Poison of Alignment_ arxiv2308.13449.pdf | 185.3 KiB | 2023-Aug-30 14:18 |
The Transformer Model in Equations_ John Thickstun_ 2023.pdf | 191.0 KiB | 2023-Jun-24 02:24 |
Ties-Merging_ Resolving Interference When Merging Models_ arxiv2306.01708v2.pdf | 1.1 MiB | 2024-Nov-04 23:54 |
Train Short, Test Long_ Attention with Linear Biases Enables Input Length Extrapolation_arxiv2108.12409.pdf | 741.2 KiB | 2023-Jun-17 23:34 |
Unigram Algorithm_ Subword Regularization_ Improving Neural Network Translation Models with Multiple Subword Candidates_ a..> | 321.8 KiB | 2023-May-13 17:38 |