Run AI models on your own machine: the stupid-simple guide

TL;DR

·Ollama is the easiest way to run open models locally. One install, one command per model, and a built-in API on port 11434.Ollama هو أسهل طريقة لتشغيل النماذج المفتوحة على جهازك. تثبيت واحد، وأمر واحد لكل نموذج، وواجهة برمجية جاهزة على المنفذ 11434.
·Memory decides everything: Gemma 4 E2B runs on 8 GB of RAM, GPT-OSS 20B wants a 16 GB GPU, and the big flagships need server hardware.الذاكرة تحدد كل شيء: نموذج Gemma 4 E2B يعمل على 8 جيجابايت، وGPT-OSS 20B يحتاج معالج رسومي بذاكرة 16 جيجابايت، والنماذج العملاقة تحتاج خوادم.
·For Arabic OCR, Qari-OCR v0.3 is the newest open Arabic-trained model. It is a 2B model, so it runs on a modest GPU, and you can send it your first document with a short Python script.لقراءة المستندات العربية، Qari-OCR v0.3 هو أحدث نموذج مفتوح مدرب على العربية. حجمه 2B فقط فيعمل على معالج رسومي متواضع، وترسل له أول مستند بسكربت Python قصير.

Running an AI model on your own machine sounds like a weekend project for specialists. It is not. In 2026 the tooling has become genuinely simple: one installer, one command to download a model, and a local API your code can call. This guide goes from an empty machine to a tested model, then to reading your first Arabic document. Nothing here needs the cloud, and nothing leaves your computer.تشغيل نموذج ذكاء اصطناعي على جهازك يبدو مشروعاً صعباً للمتخصصين. لكنه ليس كذلك. في 2026 صارت الأدوات بسيطة فعلاً: مثبّت واحد، وأمر واحد لتحميل النموذج، وواجهة برمجية محلية يستدعيها برنامجك. هذا الدليل يأخذك من جهاز فارغ إلى نموذج يعمل ومجرب، ثم إلى قراءة أول مستند عربي. لا شيء هنا يحتاج السحابة، ولا شيء يخرج من جهازك.

First: how much memory do you have?أولاً: كم ذاكرة في جهازك؟

One number decides which model you can run: memory. On a PC with a graphics card, that means GPU memory (VRAM). On a Mac with Apple Silicon, the regular memory is shared, so the total RAM is what counts. Find your model in the cards below, check it fits, and note its command. These are the current versions as of June 2026: Google's Gemma 4 family (released April 2026), OpenAI's GPT-OSS open-weight models, and Qari-OCR v0.3 for Arabic documents.رقم واحد يحدد أي نموذج تستطيع تشغيله: الذاكرة. في حاسوب فيه بطاقة رسومية، المقصود ذاكرة البطاقة (VRAM). وفي أجهزة Mac الحديثة الذاكرة مشتركة، فالمهم هو الذاكرة الكلية. ابحث عن نموذجك في البطاقات التالية، وتأكد أنه يناسب جهازك، واحفظ أمره. هذه أحدث الإصدارات حتى يونيو 2026: عائلة Gemma 4 من Google (صدرت في أبريل 2026)، ونماذج GPT-OSS المفتوحة من OpenAI، وQari-OCR v0.3 للمستندات العربية.

Gemma 4 E2B

Memory (VRAM/RAM)الذاكرة 4 GB VRAM / 8 GB RAM

Downloadحجم التحميل 7.2 GB

ollama run gemma4:e2b

Runs on almost anything, even a Raspberry Pi 5.يعمل على أي جهاز تقريباً، حتى Raspberry Pi 5.

Gemma 4 E4B

Memory (VRAM/RAM)الذاكرة 6 GB VRAM / 12 GB RAM

Downloadحجم التحميل 9.6 GB

ollama run gemma4:e4b

The sweet spot for ordinary laptops.الخيار الأنسب للحواسيب المحمولة العادية.

Gemma 4 26B (MoE)

Memory (VRAM/RAM)الذاكرة 16 GB VRAM

Downloadحجم التحميل 18 GB

ollama run gemma4:26b

Near-flagship quality; activates only 3.8B parameters per token.جودة قريبة من القمة؛ يستخدم 3.8B معامل فقط لكل رمز.

Gemma 4 31B

Memory (VRAM/RAM)الذاكرة 20 GB VRAM

Downloadحجم التحميل 20 GB

ollama run gemma4:31b

The strongest Gemma; #3 open model on Chatbot Arena.أقوى نسخة من Gemma؛ الثالث بين النماذج المفتوحة على Chatbot Arena.

GPT-OSS 20B

Memory (VRAM/RAM)الذاكرة 16 GB VRAM / RAM

Downloadحجم التحميل 13 GB

ollama run gpt-oss:20b

OpenAI's open model; o3-mini-class reasoning.نموذج OpenAI المفتوح؛ قدرة تفكير قريبة من o3-mini.

GPT-OSS 120B

Memory (VRAM/RAM)الذاكرة 80 GB GPU

Downloadحجم التحميل 60+ GB

ollama run gpt-oss:120b

Server hardware only; o4-mini-class reasoning.لأجهزة الخوادم فقط؛ قدرة تفكير قريبة من o4-mini.

Qari-OCR v0.3 (Arabic OCR)

Memory (VRAM/RAM)الذاكرة ~6 GB VRAM (8-bit)

Downloadحجم التحميل ~5 GB

NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct

Newest open Arabic OCR; setup in the last section.أحدث OCR عربي مفتوح؛ شرح تشغيله في القسم الأخير.

Step 1: install Ollamaالخطوة 1: ثبّت Ollama

Ollama is a free tool that downloads, runs, and serves open models with zero configuration. On macOS and Windows, download the installer from ollama.com and run it. On Linux, it is one line in the terminal. You want version 0.22 or newer for Gemma 4 support.Ollama أداة مجانية تحمّل النماذج المفتوحة وتشغلها وتقدمها لبرامجك دون أي إعداد. على macOS وWindows حمّل المثبّت من ollama.com وشغله. وعلى Linux يكفي سطر واحد في الطرفية. تحتاج الإصدار 0.22 أو أحدث ليعمل Gemma 4.

Linux installالتثبيت على Linux

curl -fsSL https://ollama.com/install.sh | sh

# check it worked
ollama --version

Step 2: download and run your first modelالخطوة 2: حمّل نموذجك الأول وشغله

Pick the model that fits your memory from the cards above. For a typical laptop, start with Gemma 4 E4B. The first run downloads the model (9.6 GB, so give it a few minutes), then drops you into a chat in the terminal.اختر النموذج المناسب لذاكرتك من البطاقات السابقة. للحاسوب المحمول العادي ابدأ بـ Gemma 4 E4B. أول تشغيل يحمّل النموذج (9.6 جيجابايت، فاصبر دقائق)، ثم تفتح لك محادثة في الطرفية مباشرة.

Download + chatالتحميل والمحادثة

ollama run gemma4:e4b

>>> اشرح لي ما هو النموذج اللغوي في ثلاث جمل.
# the model answers in Arabic, locally, with no internet needed

Step 3: test it from your own codeالخطوة 3: جربه من برنامجك

Ollama automatically serves an API on port 11434. Any language that can send an HTTP request can now use your local model. This is the moment it stops being a toy: your scripts, your internal tools, and your automations can call it like they would call a cloud API, except the data never leaves the machine.يفتح Ollama تلقائياً واجهة برمجية على المنفذ 11434. أي لغة برمجة ترسل طلب HTTP تستطيع الآن استخدام نموذجك المحلي. هنا يتحول الأمر من لعبة إلى أداة حقيقية: برامجك وأدواتك الداخلية تستدعيه كما تستدعي أي خدمة سحابية، لكن البيانات لا تغادر الجهاز أبداً.

Call the local APIاستدعاء الواجهة المحلية

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:e4b",
  "prompt": "Summarise this contract clause in one sentence: ...",
  "stream": false
}'

Step 4: set up Arabic OCR and read your first documentالخطوة 4: جهز OCR العربي واقرأ أول مستند

For documents, you want a model trained specifically on Arabic. The newest open one is Qari-OCR v0.3 (released on Hugging Face by NAMAA-Space), built on Qwen2-VL-2B and tuned to keep document structure: headings, tables, and layout survive into the output. Its older version holds the best published open-source Arabic accuracy (6.1% character error rate), and v0.3 adds the structural understanding. At 2B parameters it runs on a single modest GPU. One tip from its model card: use 8-bit precision, not 4-bit, because OCR needs the fine detail.للمستندات تحتاج نموذجاً مدرباً على العربية خصيصاً. أحدث نموذج مفتوح هو Qari-OCR v0.3 (نشرته NAMAA-Space على Hugging Face)، وهو مبني على Qwen2-VL-2B ومدرب ليحفظ بنية المستند: العناوين والجداول والتنسيق تبقى في النتيجة. نسخته السابقة تملك أفضل دقة عربية منشورة بين النماذج المفتوحة (خطأ 6.1% في الحروف)، والنسخة الجديدة أضافت فهم البنية. وبحجم 2B يعمل على معالج رسومي واحد متواضع. نصيحة من صفحة النموذج: استخدم دقة 8-bit لا 4-bit، لأن قراءة الحروف تحتاج التفاصيل الدقيقة.

Install the toolsتثبيت الأدوات

python3 -m venv ocr && source ocr/bin/activate
pip install torch transformers accelerate qwen-vl-utils pillow

Read a document (first run downloads the model)قراءة مستند (أول تشغيل يحمّل النموذج)

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

MODEL = "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(MODEL, device_map="auto")
processor = AutoProcessor.from_pretrained(MODEL)

messages = [{"role": "user", "content": [
    {"type": "image", "image": "invoice-page1.jpg"},
    {"type": "text", "text": "اقرأ النص في هذه الصورة."},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
images, _ = process_vision_info(messages)
inputs = processor(text=[text], images=images, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=2000)
print(processor.batch_decode(out[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0])

Point it at a scanned invoice, a contract page, or a customs form. The output is the Arabic text of the page, structure included. From here, the same pattern scales: swap the test image for a folder of scans, pipe the output into validation, and you have the skeleton of a real document pipeline.جرّبه على فاتورة ممسوحة أو صفحة عقد أو بيان جمركي. النتيجة هي نص الصفحة بالعربية مع بنيتها. ومن هنا يكبر المشروع بالنمط نفسه: استبدل صورة التجربة بمجلد كامل من المستندات، ومرر النتيجة إلى خطوة تحقق، فتحصل على الهيكل الأساسي لنظام مستندات حقيقي.

From laptop to productionمن الحاسوب المحمول إلى بيئة الإنتاج

Everything above runs on one machine. Moving it onto a server that survives reboots is its own short step, covered in deploying agents on a server. A production system then adds the parts that make it trustworthy: validation against your master data, human approval gates, and an audit trail. The models are the easy 20%. The engineering around them is the other 80%.كل ما سبق يعمل على جهاز واحد. ونقله إلى خادم ينجو من إعادة التشغيل خطوة قصيرة مستقلة، نشرحها في نشر الوكلاء على خادم. أما نظام بيئة الإنتاج فيضيف ما يجعله موثوقاً: التحقق من البيانات مقابل سجلاتكم، وموافقة الإنسان قبل أي خطوة مهمة، وسجل تدقيق كامل. النماذج هي 20% السهلة من المشروع. والهندسة المحيطة بها هي 80% الباقية.

Shareشارك

Run AI models on your own machine: the stupid-simple guideشغّل نماذج الذكاء الاصطناعي على جهازك: دليل بسيط جداً

First: how much memory do you have?أولاً: كم ذاكرة في جهازك؟

Step 1: install Ollamaالخطوة 1: ثبّت Ollama

Step 2: download and run your first modelالخطوة 2: حمّل نموذجك الأول وشغله

Step 3: test it from your own codeالخطوة 3: جربه من برنامجك

Step 4: set up Arabic OCR and read your first documentالخطوة 4: جهز OCR العربي واقرأ أول مستند

Put one workflow into production.ضعوا عمليةً واحدة في الإنتاج.

Run AI models on your own machine: the stupid-simple guideشغّل نماذج الذكاء الاصطناعي على جهازك: دليل بسيط جداً

First: how much memory do you have?أولاً: كم ذاكرة في جهازك؟

Step 1: install Ollamaالخطوة 1: ثبّت Ollama

Step 2: download and run your first modelالخطوة 2: حمّل نموذجك الأول وشغله

Step 3: test it from your own codeالخطوة 3: جربه من برنامجك

Step 4: set up Arabic OCR and read your first documentالخطوة 4: جهز OCR العربي واقرأ أول مستند

Put one workflow into production.ضعوا عمليةً واحدة في الإنتاج.

The high tier of open-weight models, June 2026الفئة العليا من النماذج مفتوحة الأوزان، يونيو 2026

The state of Arabic OCR in 2026: what actually worksواقع التعرف الضوئي على النصوص العربية في 2026: ما الذي ينجح فعلاً