Type, "/@say/Your message here." after the end of the URL and hit enter to leave a comment. Type, "/find-search terms here.search" to do a full text (document contents) search of any *sub*-directory. If you want to mirror the entire server contact me first; it'll save you time.

Index of /library/Computing/transformers/

Name  ↓ Size  ↓ Date  ↓ 
Parent directory--
RoFormer_ Enhanced Transformer with Rotary Position Embedding_ arxiv2104.09864v4.pdf572.6 KiB2023-Apr-21 00:35
GLU Variants Improve Transformer_arxiv2002.05202.pdf106.6 KiB2023-May-02 21:23
Are Emergent Abilities of Large Language Models a Mirage_ arxiv2304.15004.pdf1.8 MiB2023-May-07 01:08
Climbing towards Natural Language Understanding_ On Meaning Form and Understanding in the Age of Data_ Emily M Bender- Ale..>472.2 KiB2023-May-08 03:18
Unigram Algorithm_ Subword Regularization_ Improving Neural Network Translation Models with Multiple Subword Candidates_ a..>321.8 KiB2023-May-13 17:38
SentencePiece_ A simple and language independent subword tokenizer and detokenizer for Neural Text Processing_ arxiv1808.0..>206.7 KiB2023-May-13 17:44
LLaMA_ Open and Efficient Foundation Language Models_ arxiv2302.13971.pdf709.5 KiB2023-May-13 17:45
Landmark Attention_ Random-Access Infinite Context Length for Transformers_ arxiv2305.16300.pdf500.2 KiB2023-May-28 17:34
Train Short, Test Long_ Attention with Linear Biases Enables Input Length Extrapolation_arxiv2108.12409.pdf741.2 KiB2023-Jun-17 23:34
The Transformer Model in Equations_ John Thickstun_ 2023.pdf191.0 KiB2023-Jun-24 02:24
Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf733.6 KiB2023-Jun-29 02:06
gpt4-maybe-leaked-details-sort-of-again.txt7.9 KiB2023-Jul-11 03:35
Photonic Matrix Computing_ From Fundamentals to Applications_ Junwei Cheng_ Hailong Zhou_ Jianji Dong_ Nanomaterials 2021...>3.5 MiB2023-Jul-25 17:13
An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology_ M..>1.7 MiB2023-Jul-25 17:17
The Curse of Recursion_ Training on Generated Data Makes Models Forget_ arxiv2305.17493.pdf2.2 MiB2023-Aug-24 19:28
The case for 4-bit precision_ k-bit Inference Scaling Laws_ arxiv2212.09720.pdf884.7 KiB2023-Aug-28 18:57
The Poison of Alignment_ arxiv2308.13449.pdf185.3 KiB2023-Aug-30 14:18
Llama 2_ Open Foundation and Fine-Tuned Chat Models_ arxiv2307.09288.pdf13.0 MiB2023-Aug-30 23:29
GQA_ Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_ arxiv2305.13245.pdf248.2 KiB2023-Sep-01 22:34
Stay on topic with Classifier-Free Guidance_ arxiv2306.17806.pdf1.9 MiB2023-Sep-30 04:35
Efficient streaming language models with attention sinks_ arxiv2309.17453.pdf11.8 MiB2023-Oct-02 17:46
Exponentially Faster Language Modeling_ arxiv2311.10770.pdf230.5 KiB2023-Nov-27 05:35
SmoothQuant_ Accurate and Efficient Post-Training Quantization for Large Language Models_ arxiv2211.10438.pdf5.1 MiB2023-Dec-11 23:12
Steering Llama 2 via Contrastive Activation Addition_ arxiv2312.06681.pdf27.3 MiB2023-Dec-13 04:54
Mixtral of Experts_ arxiv2401.04088.pdf2.4 MiB2024-Jan-09 03:21
Deep neural networks are robust to weight binarization and other non-linear distortions_ arxiv1606.01981.pdf828.6 KiB2024-Mar-02 16:08
Is Cosine-Similarity of Embeddings Really About Similarity_ arxiv2403.05440.pdf1.6 MiB2024-Mar-12 04:07
How Good Are Low-bit Quantized LLAMA3 Models_ An Empirical Study_ arxiv2404.14047v1.pdf260.0 KiB2024-Apr-26 20:16
R ULER_ What’s the Real Context Size of Your_ arxiv2404.06654v2.pdf642.6 KiB2024-Jul-30 02:47
Ties-Merging_ Resolving Interference When Merging Models_ arxiv2306.01708v2.pdf1.1 MiB2024-Nov-04 23:54
Byte Latent Transformer_ Patches Scale Better Than Tokens_ A Pagnoni_ R Pasunuru_ R Rodriguez_ J Nguyen_ B Muller_ M Li_ C..>2.2 MiB2024-Dec-14 04:20
Fourier Position Embedding_ Enhancing Attention’s Periodic Extension for Length Generalization_ arxiv2412.17739v1.pdf793.3 KiB2024-Dec-26 05:15
DeepSeek-R1_ Incentivizing Reasoning Capability in LLMs via Reinforcement Learning_ arxiv2501.12948v1.pdf1.3 MiB2025-Jan-24 01:16

documents added in the last 7 days

7 days, 31 days
generated at 12:00:30, Thu Apr 17, 2025 UTC

Did you write a response to something in this directory listing? What's the URL?

Terms of Use:

You may not access or use the site superkuh.com if you are not over 90 years of age. If you do not agree then you must leave now.

The US Dept. of Justice has determined that violating a website's terms of service is a felony under CFAA 1030(a)2(c). Absurd, isn't it?