Zero Budget, Full Stack: Building with Only Free LLM

by ai-intensify
0 comments
Zero Budget, Full Stack: Building with Only Free LLM


Image by author

# Introduction

Remember that building full-stack applications requires expensive cloud credits, expensive API keys, and a team of engineers? Those days are officially over. By 2026, developers can build, deploy, and scale production-ready applications using nothing but free tools, including large language models (LLM) which powers its intelligence.

The landscape has changed dramatically. Open-source models now challenge their commercial counterparts. Free AI coding assistants have evolved from simple autocomplete tools to full coding agents that can architect entire features. And perhaps most importantly, you can run cutting-edge models without spending a dime, either locally or through the generous free tiers.

In this comprehensive article, we will build a real-world application – an AI meeting notes summarizer. Users will upload voice recordings, and our app will transcribe them, extract key points and action items, and display everything in a clean dashboard, all using completely free tools.

Whether you’re a student, a bootcamp graduate, or an experienced developer looking to prototype an idea, this tutorial will show you how to take advantage of the best free AI tools available. Start by understanding why free LLMs work so well today.

# Understanding why free large language models work now

Just two years ago, building an AI-powered app meant budgeting for OpenAI API credits or renting expensive GPU instances. Economics has changed fundamentally.

The distinction between commercial and open-source LLMs has almost disappeared. like models glm-4.7-flash Zipu AI shows that open-source can achieve state-of-the-art performance while remaining completely free to use. Similarly, lfm2-2.6b-transcript Was specifically designed to meet the needs of the users and runs perfectly across devices with cloud-level quality.

What this means for you is that you are no longer limited to a single vendor. If one model doesn’t work for your use case, you can switch to another without changing your infrastructure.

// Joining a self-organized movement

There is a growing preference for local AI models running on their own hardware rather than sending data to the cloud. It’s not just about cost; It’s about privacy, latency and control. with tools like Olama And LM StudioYou can run powerful models on laptops.

// Adopting the “bring your own key” model

A new category of tools has emerged: open-source applications that are free but require you to provide your own API key. This gives you ultimate flexibility. you can use google gemini api (which serves hundreds of free requests per day) or run a completely local model with zero running costs.

# Choosing Your Free Artificial Intelligence Stack

Breaking down the best free alternatives for each component of our application involves selecting tools that balance performance with ease of use.

// Transcription Layers: Speech-to-Text

To convert audio to text, we have excellent free speech-to-text (STT) tools.

tool Type free tier best for
OpenAI Whisper Open-source model Unlimited (self-hosted) accuracy, multiple languages
whisper.cpp Privacy-focused implementation Unlimited (Open Source) Privacy-Sensitive Scenarios
gemini api cloud api 60 requests/minute rapid prototyping

we will use for our project whisperWhich you can run locally or through free hosted options. It supports over 100 languages ​​and produces high quality transcripts.

// Summary and Analysis: Big Language Model

This is where you have the most options. All the options below are completely free:

Sample provider Type expertise
glm-4.7-flash Zipu AI Cloud (Free API) general purpose, coding
lfm2-2.6b-transcript liquid ai local/on device meeting Summary
gemini 1.5 flash Google cloud api Long story, open level
gpt-oss swallow Tokyo Tech local/self-host Japanese/English logic

To summarize our meeting, lfm2-2.6b-transcript The model is particularly interesting; It was trained for virtually this exact use case and runs on less than 3GB of RAM.

// Accelerated Development: Artificial Intelligence Coding Assistant

Before we write a single line of code, consider the tools that help us build more efficiently within an integrated development environment (IDE):

tool free tier Type key feature
humor completely free vs code extension niche-driven, multi-agent
codiaum unlimited free IDE extension 70+ languages, fast estimation
cline Free (BYOK) vs code extension autonomous file editing
continue full open source IDE extension Works with any LLM
bolt.di self hosted browser ide full-stack generation

Our Recommendation: For this project, we will be using Codeium for its unlimited free tier and speed, and we will continue as a backup when we need to switch between different LLM providers.

// Reviewing Traditional FreeStack

  • front end: feedback (free and open source)
  • Backend: fastapi (Python, free)
  • Database: sqlite (file-based, no server required)
  • Deploy: Versailles (generous free level) + render (for backend)

# Reviewing the project plan

Defining application workflow:

  1. User uploads an audio file (meeting recording, voice memo, lecture)
  2. Backend receives the file and sends it to Whisper for transcription
  3. The written text is sent to LLM for summarization
  4. LLM extracts key discussion points, action items and decisions
  5. Results are stored in SQLite
  6. User sees a clean dashboard with transcript, summary and action items

Business Flowchart Diagram with Seven Sequential Steps
Business Flowchart Diagram with Seven Sequential Steps | Image by author

// Prerequisites

  • Python 3.9+ installed
  • Node.js and npm installed
  • Basic familiarity with Python and React
  • A code editor (VS Code recommended)

// Step 1: Setting up the backend with FastAPI

First, create our project directory and set up a virtual environment:

mkdir meeting-summarizer
cd meeting-summarizer
python -m venv venv

Activate the virtual environment:

# On Windows 
venvScriptsactivate

# On Linux/macOS
source venv/bin/activate

Install required packages:

pip install fastapi uvicorn python-multipart openai-whisper transformers torch openai

Now, create main.py file for our FastAPI application and add this code:

from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.middleware.cors import CORSMiddleware
import whisper
import sqlite3
import json
import os
from datetime import datetime

app = FastAPI()

# Enable CORS for React frontend
app.add_middleware(
    CORSMiddleware,
    allow_origins=("http://localhost:3000"),
    allow_methods=("*"),
    allow_headers=("*"),
)

# Initialize Whisper model - using "tiny" for faster CPU processing
print("Loading Whisper model (tiny)...")
model = whisper.load_model("tiny")
print("Whisper model loaded!")

# Database setup
def init_db():
    conn = sqlite3.connect('meetings.db')
    c = conn.cursor()
    c.execute('''CREATE TABLE IF NOT EXISTS meetings
                 (id INTEGER PRIMARY KEY AUTOINCREMENT,
                  filename TEXT,
                  transcript TEXT,
                  summary TEXT,
                  action_items TEXT,
                  created_at TIMESTAMP)''')
    conn.commit()
    conn.close()

init_db()

async def summarize_with_llm(transcript: str) -> dict:
    """Placeholder for LLM summarization logic"""
    # This will be implemented in Step 2
    return {"summary": "Summary pending...", "action_items": ()}

@app.post("/upload")
async def upload_audio(file: UploadFile = File(...)):
    file_path = f"temp_{file.filename}"
    with open(file_path, "wb") as buffer:
        content = await file.read()
        buffer.write(content)
    
    try:
        # Step 1: Transcribe with Whisper
        result = model.transcribe(file_path, fp16=False)
        transcript = result("text")
        
        # Step 2: Summarize (To be filled in Step 2)
        summary_result = await summarize_with_llm(transcript)
        
        # Step 3: Save to database
        conn = sqlite3.connect('meetings.db')
        c = conn.cursor()
        c.execute(
            "INSERT INTO meetings (filename, transcript, summary, action_items, created_at) VALUES (?, ?, ?, ?, ?)",
            (file.filename, transcript, summary_result("summary"),
             json.dumps(summary_result("action_items")), datetime.now())
        )
        conn.commit()
        meeting_id = c.lastrowid
        conn.close()
        
        os.remove(file_path)
        return {
            "id": meeting_id,
            "transcript": transcript,
            "summary": summary_result("summary"),
            "action_items": summary_result("action_items")
        }
    except Exception as e:
        if os.path.exists(file_path):
            os.remove(file_path)
        raise HTTPException(status_code=500, detail=str(e))

// Step 2: Integrating Open Large Language Models

Now, let’s implement it summarize_with_llm() Celebration. We will show two approaches:

Option A: Using glm-4.7-flash API (cloud, free)

from openai import OpenAI

async def summarize_with_llm(transcript: str) -> dict:
    client = OpenAI(api_key="YOUR_FREE_ZHIPU_KEY", base_url="https://open.bigmodel.cn/api/paas/v4/")
    
    response = client.chat.completions.create(
        model="glm-4-flash",
        messages=(
            {"role": "system", "content": "Summarize the following meeting transcript and extract action items in JSON format."},
            {"role": "user", "content": transcript}
        ),
        response_format={"type": "json_object"}
    )
    
    return json.loads(response.choices(0).message.content)

Option B: Using local LFM2-2.6B-transcript (local, completely free)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

async def summarize_with_llm_local(transcript):
    model_name = "LiquidAI/LFM2-2.6B-Transcript"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    
    prompt = f"Analyze this transcript and provide a summary and action items:nn{transcript}"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=500)
    
    return tokenizer.decode(outputs(0), skip_special_tokens=True)

// Step 3: Creating the React Frontend

Build a simple React frontend to interact with our API. In a new terminal, create a React app:

npx create-react-app frontend
cd frontend
npm install axios

change the contents of src/App.js with:

import React, { useState } from 'react';
import axios from 'axios';
import './App.css';

function App() {
  const (file, setFile) = useState(null);
  const (uploading, setUploading) = useState(false);
  const (result, setResult) = useState(null);
  const (error, setError) = useState('');

  const handleUpload = async () => {
    if (!file) { setError('Please select a file'); return; }
    setUploading(true);
    const formData = new FormData();
    formData.append('file', file);

    try {
      const response = await axios.post('http://localhost:8000/upload', formData);
      setResult(response.data);
    } catch (err) {
      setError('Upload failed: ' + (err.response?.data?.detail || err.message));
    } finally { setUploading(false); }
  };

  return (
    
{result && (

Summary

{result.summary}

Action Items

    {result.action_items.map((it, i) =>
  • {it}
  • )}
)}
); } export default App;

// Step 4: Running the Application

  • Start the backend: run in the main directory while keeping your virtual environment active uvicorn main:app --reload
  • Start the frontend: In a new terminal, in the frontend directory, run npm start
  • open http://localhost:3000 in your browser and upload a test audio file

Dashboard interface showing summary results
Dashboard interface showing summary results. Image by author

# Deploying the app for free

Once your app is working locally, it’s time to deploy it worldwide – still for free. render Offers a generous free tier of web services. Push your code to the GitHub repository, create a new web service on render and use these settings:

  • Environment: Python 3
  • Build Command: pip install -r requirements.txt
  • Start order: uvicorn main:app --host 0.0.0.0 --port $PORT

create a requirements.txt file:

fastapi
uvicorn
python-multipart
openai-whisper
transformers
torch
openai

Comment: Whisper and Transformer require significant disk space. If you reach the free tier limit, consider using the cloud API for transcription instead.

// Deploying frontend on Vercel

Versailles The easiest way to deploy React apps is:

  • Install Vercel CLI: npm i -g vercel
  • In your frontend directory, run vercel
  • Update your API URL App.js to point to your render backend

// Exploring local deployment options

If you want to avoid cloud hosting altogether, you can deploy both frontend and backend on a local server using a tool like ngrok To expose your local server temporarily.

# conclusion

We’ve built a production-ready AI application using only free tools. Let us recap what we achieved:

  • Transcription: Used OpenAI’s Whisper (free, open-source)
  • Summary: Leveraged GLM-4.7-Flash or LFM2-2.6B (both completely free)
  • Backend: Built with FastAPI (free)
  • Frontend: Built with React (Free)
  • Database: Used SQLite (free)
  • Deploy: Deploy to Vercel and render (free tier)
  • Development: Accelerated with free AI coding assistants like Codeium

The landscape for free AI development has never been more promising. Open-source models now compete with commercial offerings. Local AI tools give us privacy and control. And the generous free tiers of providers like Google and Zhipu AI let us prototype without financial risk.

Shittu Olumide He is a software engineer and technical writer who is passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and the ability to simplify complex concepts. You can also find Shittu Twitter.

Related Articles

Leave a Comment