DeepCoder: Distilling Multiple Large-Scale Models for Efficient and High-Quality Code Generation

Skytells AI Research, Inc.

www.skytells.io

Abstract

DeepCoder is a novel AI model designed specifically for efficient, high-quality code generation. Created at Skytells AI Research, Inc., DeepCoder integrates knowledge distilled from three state-of-the-art large language models (DeepSeek R1, LLaMA 70B, and Qwen 72B). By merging these distinct teacher models into a single, resource-friendly system, DeepCoder achieves fast inference speeds without sacrificing code-generation quality. During finetuning, DeepCoder was trained on over 130k curated code samples from multiple programming languages. Although the model remains proprietary, it powers the SaaS platform DeepCoder.co, where developers can leverage its features to scaffold, develop, and deploy full-stack applications.

Introduction

Large Language Models (LLMs) have shown considerable success in various natural language processing tasks, including code generation, completion, and software design. However, deploying these models in real-world environments is often challenging due to their significant size and computational demands. Distillation and model compression techniques can help alleviate these constraints by transferring knowledge from one or more large-scale teacher models to a smaller, student model.

DeepCoder combines expertise from three specialized models:

DeepSeek R1: Recognized for its enhanced contextual understanding and fine-grained code completions.
LLaMA 70B: A generalized language model with robust abilities to handle diverse coding tasks.
Qwen 72B: Specialized in long-context reasoning for large-scale software development.

Key Contributions

Multi-teacher distillation approach merging multiple large-scale language models into a single, resource-efficient student model
Training pipeline leveraging 130k curated code samples across multiple programming languages
Full-stack development support through the DeepCoder.co SaaS platform

Architecture & Training

DeepCoder's base structure is a transformer-based decoder architecture modified to optimize parameter usage. The training process involves three main stages:

Logit Alignment: Student model learns to match teacher ensemble's probability distributions
Layer Feature Matching: Intermediate hidden states guide nuanced internal representations
Task-Specific Fine-tuning: Post-distillation optimization for improved quality

Training Details

Parameter	Value
Batch size	256 sequences
Sequence length	1024 tokens
Initial learning rate	1 × 10⁻⁴
Distillation epochs	15
Fine-tuning epochs	3-5

DeepCoder.co Platform

The DeepCoder model powers our SaaS platform, offering:

AI Assistant Integration: Chat interface for code generation, debugging, and reviews
Full-Stack Code Generation: Automated project scaffolding for both front-end and back-end development
In-Browser IDE: Real-time previews, integrated terminal, and environment configuration
Git Integration: Seamless version control system integration

Future Work

Additional Programming Languages: Expanding coverage to specialized or low-level languages
Multimodal Integration: Incorporating design artifacts for holistic code generation
Contextual Retrieval: Leveraging public and private code repositories
Reinforcement Learning from Human Feedback (RLHF): Iterative improvements based on developer interactions

Licensing and Contact

While DeepCoder itself is not open-sourced, Skytells AI Research, Inc. offers various tools and utilities via the skytells-research GitHub organization.

For research collaborations, API access, or enterprise inquiries, please contact us.