Aoban Paper
Abstract
The AI Aoban 2.7-117M-HeavyL, a transformer-based language model developed by AobanZ. The model focuses on high-speed processing of simple questions, paleotology, and general conversational text while maintaining a compact parameter footprint. Aoban 2.7-117M-HeavyL was trained on a diverse dataset of texts from Microsoft Copilot and Hand-Written conversations.
Aoban 2.7-117M-HeavyL is designed to prioritize adaptability, expressive generation, and real-time interaction rather than strict factual reasoning. By leveraging a moderately deep transformer architecture with optimized attention mechanisms, the model aims to balance performance, efficiency, and creative flexibility.
Model Release
We officially announce the release of Aoban 2.7-117M-HeavyL, a 117-million-parameter Heavy Layer (HeavyL) model, which is now available on Hugging Face Learn More.
Architecture
Aoban 2.7-117M-HeavyL is built upon the Transformer architecture introduced in “Attention Is All You Need”. The model relies entirely on self-attention mechanisms, allowing it to capture long-range dependencies without recurrence or convolution.
The architecture consists of 16 transformer layers for coherency and understanding, each configured with 12 self-attention heads and a 768-dimensional hidden representation. This design enables parallel processing of tokens and efficient utilization of attention bandwidth across different semantic subspaces.
The designation HeavyL reflects the model’s emphasis on denser internal representations per layer rather than extreme depth. This approach favors fast inference and expressive internal states over very deep stacking.
Tokenization
The model uses Byte Pair Encoding (BPE) for tokenization. BPE allows Aoban 2.7-117M-HeavyL to flexibly represent slang, abbreviations, creative spellings, emojis, and mixed-language input without requiring an excessively large vocabulary.
This tokenization strategy is particularly effective for informal and internet-based text, where strict word boundaries are often inconsistent or intentionally violated.
Training Philosophy
Aoban 2.7-117M-HeavyL was trained with a focus on accuracy and coherency in handling basic greetings, arithmetic operations, and general message processing at high speed.
As a result, the model exhibits less creative tendencies but fast response generation, but may overperform on specific tasks.
Capabilities & Limitations
The model excels at handling basic greetings, arithmetic operations, and general message processing at high speed. However, it may struggle with simple conversational grounding tasks such a intent clarification, or strict instruction following.
For these reasons, lighter Aoban models (such as Aoban 1.1) may be better suited for faster interaction pipelines, while 2.7 is intended for better information processing and more complex conversational tasks.
Conclusion
Aoban 2.7-117M-HeavyL represents a significant step in my research.