Type, "/@say/Your message here." after the end of the URL and hit enter to leave a comment. Type, "/find-search terms here.search" to do a full text (document contents) search of any sub-directory. If you want to mirror the entire server contact me first; it'll save you time.

Index of /library/Computing/transformers/

Name ↓	Size ↓	Date ↓
Parent directory	-	-
Unigram Algorithm_ Subword Regularization_ Improving Neural Network Translation Models with Multiple Subword Candidates_ a..>	321.8 KiB	2023-May-13 17:38
Train Short, Test Long_ Attention with Linear Biases Enables Input Length Extrapolation_arxiv2108.12409.pdf	741.2 KiB	2023-Jun-17 23:34
Ties-Merging_ Resolving Interference When Merging Models_ arxiv2306.01708v2.pdf	1.1 MiB	2024-Nov-04 23:54
The Transformer Model in Equations_ John Thickstun_ 2023.pdf	191.0 KiB	2023-Jun-24 02:24
The Poison of Alignment_ arxiv2308.13449.pdf	185.3 KiB	2023-Aug-30 14:18
The Curse of Recursion_ Training on Generated Data Makes Models Forget_ arxiv2305.17493.pdf	2.2 MiB	2023-Aug-24 19:28
The case for 4-bit precision_ k-bit Inference Scaling Laws_ arxiv2212.09720.pdf	884.7 KiB	2023-Aug-28 18:57
Steering Llama 2 via Contrastive Activation Addition_ arxiv2312.06681.pdf	27.3 MiB	2023-Dec-13 04:54
Stay on topic with Classifier-Free Guidance_ arxiv2306.17806.pdf	1.9 MiB	2023-Sep-30 04:35
SmoothQuant_ Accurate and Efficient Post-Training Quantization for Large Language Models_ arxiv2211.10438.pdf	5.1 MiB	2023-Dec-11 23:12
SentencePiece_ A simple and language independent subword tokenizer and detokenizer for Neural Text Processing_ arxiv1808.0..>	206.7 KiB	2023-May-13 17:44
R ULER_ What’s the Real Context Size of Your_ arxiv2404.06654v2.pdf	642.6 KiB	2024-Jul-30 02:47
RoFormer_ Enhanced Transformer with Rotary Position Embedding_ arxiv2104.09864v4.pdf	572.6 KiB	2023-Apr-21 00:35
Photonic Matrix Computing_ From Fundamentals to Applications_ Junwei Cheng_ Hailong Zhou_ Jianji Dong_ Nanomaterials 2021...>	3.5 MiB	2023-Jul-25 17:13
MVDRAM_ Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration_ arxiv2503.23817v1.pdf	3.4 MiB	2025-May-05 14:47
Mixtral of Experts_ arxiv2401.04088.pdf	2.4 MiB	2024-Jan-09 03:21
LLaMA_ Open and Efficient Foundation Language Models_ arxiv2302.13971.pdf	709.5 KiB	2023-May-13 17:45
Llama 2_ Open Foundation and Fine-Tuned Chat Models_ arxiv2307.09288.pdf	13.0 MiB	2023-Aug-30 23:29
Landmark Attention_ Random-Access Infinite Context Length for Transformers_ arxiv2305.16300.pdf	500.2 KiB	2023-May-28 17:34
Is Cosine-Similarity of Embeddings Really About Similarity_ arxiv2403.05440.pdf	1.6 MiB	2024-Mar-12 04:07
How Good Are Low-bit Quantized LLAMA3 Models_ An Empirical Study_ arxiv2404.14047v1.pdf	260.0 KiB	2024-Apr-26 20:16
GQA_ Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_ arxiv2305.13245.pdf	248.2 KiB	2023-Sep-01 22:34
gpt4-maybe-leaked-details-sort-of-again.txt	7.9 KiB	2023-Jul-11 03:35
GLU Variants Improve Transformer_arxiv2002.05202.pdf	106.6 KiB	2023-May-02 21:23
Fourier Position Embedding_ Enhancing Attention’s Periodic Extension for Length Generalization_ arxiv2412.17739v1.pdf	793.3 KiB	2024-Dec-26 05:15
Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf	733.6 KiB	2023-Jun-29 02:06
Exponentially Faster Language Modeling_ arxiv2311.10770.pdf	230.5 KiB	2023-Nov-27 05:35
Efficient streaming language models with attention sinks_ arxiv2309.17453.pdf	11.8 MiB	2023-Oct-02 17:46
DeepSeek-R1_ Incentivizing Reasoning Capability in LLMs via Reinforcement Learning_ arxiv2501.12948v1.pdf	1.3 MiB	2025-Jan-24 01:16
Deep neural networks are robust to weight binarization and other non-linear distortions_ arxiv1606.01981.pdf	828.6 KiB	2024-Mar-02 16:08
Climbing towards Natural Language Understanding_ On Meaning Form and Understanding in the Age of Data_ Emily M Bender- Ale..>	472.2 KiB	2023-May-08 03:18
Byte Latent Transformer_ Patches Scale Better Than Tokens_ A Pagnoni_ R Pasunuru_ R Rodriguez_ J Nguyen_ B Muller_ M Li_ C..>	2.2 MiB	2024-Dec-14 04:20
Are Emergent Abilities of Large Language Models a Mirage_ arxiv2304.15004.pdf	1.8 MiB	2023-May-07 01:08
An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology_ M..>	1.7 MiB	2023-Jul-25 17:17

documents added in the last 7 days

7 days, 31 days
generated at 12:00:36, Wed Aug 13, 2025 UTC

Terms of Use:

You may not access or use the site superkuh.com if you are not over 90 years of age. If you do not agree then you must leave now.

The US Dept. of Justice has determined that violating a website's terms of service is a felony under CFAA 1030(a)2(c). Absurd, isn't it?

Type, "/@say/Your message here." after the end of the URL and hit enter to leave a comment. Type, "/find-search terms here.search" to do a full text (document contents) search of any *sub*-directory. If you want to mirror the entire server contact me first; it'll save you time.

Index of /library/Computing/transformers/

documents added in the last 7 days

Terms of Use:

Type, "/@say/Your message here." after the end of the URL and hit enter to leave a comment. Type, "/find-search terms here.search" to do a full text (document contents) search of any sub-directory. If you want to mirror the entire server contact me first; it'll save you time.