Case Studies

AI Grader for Crowns

AI-Powered Educational Tool | Dental Crown Preparation Evaluation

LEANWARE TEAM

1 x Senior Full Stack Developer

AI Grader for Crowns

COMPANY

Custom Software Development

SERVICE

United States

COUNTRY

Fixed Scope

engagement MODEL

CLIENT OVERVIEW

Evan Menke from the University of Colorado (United States), was developing a platform to enable AI-powered evaluation of dental crown preparations. What began as a tool for his personal use quickly grew into a broader idea:
“What if this could support students across multiple dental schools?”

He partnered with Leanware to build a backend service capable of analyzing images of crown preparations with high accuracy. The goal was to help students self-evaluate their work and compare results alongside instructor feedback — improving learning outcomes and saving time for both students and faculty.

Backend: FastAPI with LangGraph for LLM workflow orchestration, PostgreSQL
Deployment: GitHub Actions → Render.com via Docker images
Version Control: GitHub
Project Management: Fixed-scope tracking with regular check-ins

Tech Stack Involved

Backend Implementation

Leanware delivered a backend system that integrates with multiple AI providers — including OpenRouter, OpenAI, Google, and Anthropic. The architecture allows the client to switch models seamlessly and compare accuracy across providers.

Highly Configurable Evaluation System

The backend enables full customization of:

Base prompts
Rubrics
Evaluation criteria
Score weights

This level of flexibility allows the client to adjust the grading system, fine-tune outcomes, and expand the solution as requirements evolve.

The backend service also may return a set percentage-based coordinate highlights for each evaluated image, which can help to visualize the feedback provided.

Integration with Base44

The backend was successfully integrated with Base44, the service hosting the AI Grader’s front-end UI. This ensured secure, reliable communication between the UI and the evaluation engine.

Initial Model Evaluation & Reporting

To determine which models performed best, Leanware conducted initial accuracy tests across different providers. A comparative report was delivered outlining performance, limitations, and recommended configurations.

Documentation & Code Quality

The solution includes clear configuration documentation and clean, maintainable code. This ensures future scalability, easier onboarding, and support for additional features or potential model training.

SERVICES PROVIDED

UX & UI DESIGN

Leanware successfully delivered a backend service capable of evaluating images of crown preparations with high reliability. Evan has been able to run evaluations, store feedback for comparison, and iterate on his rubrics as new data is collected.

The collaboration may continue as the client gathers more evaluation samples, enabling enhancements to scoring criteria, and possibly training a custom model specialized in assessing dental rubrics, increasing accuracy and consistency for students across dental programs.

From Blueprint to Delivery

RESULTS

FAQ

Frequently Asked Questions

How do I benchmark AI model performance for image-based grading tasks?

Use a labeled dataset that mirrors real student submissions, test across varied lighting and resolution conditions, track rubric adherence, break down error categories, and compare cost per evaluation across model providers.

How do I reduce AI API costs for an educational assessment platform?

Apply hybrid routing using cheaper models first, batch requests, cache results, compress images where allowed, and regularly audit model behavior to eliminate unnecessary expensive calls.

What QA processes are required to test AI evaluation consistency?

Expect benchmark datasets, cross-provider comparisons, image-quality variance testing, regression testing after model updates, and documented accuracy thresholds.

What IP ownership clauses are essential when hiring a dev shop for AI projects?

Secure full ownership of prompts, rubric logic, evaluation workflows, architecture design, and all data pipelines to ensure long-term independence from the agency.

What portfolio evidence actually proves a dev shop has built AI evaluation systems before?

Look for real model comparison logs, accuracy audit reports, prompt-testing tools, vision pipeline artifacts, and screenshots of feedback overlays instead of generic marketing slides.

Should I hire an agency or build an in-house team for my AI EdTech platform?

Start with an agency for speed and breadth of expertise. Move in-house once your architecture is stable and you need full-time ownership and iteration.

Should I build custom AI assessment software or use existing platforms like Gradescope?

Use an existing platform if your rubrics are standard and don’t require visual overlays. Build custom if you need configurable scoring, specialized image logic, or multi-model evaluation pipelines.

What hidden costs should I plan for with AI assessment platforms?

Plan for API usage, storage for images and evaluation logs, compute for model testing, and ongoing re-evaluation whenever model providers push updates that may change output behavior.

What should I budget for AI-powered assessment software for medical or professional education?

Expect $150K–$400K due to precision requirements, specialized domain logic, advanced rubric structures, and potential custom model tuning.

Can I start with a paid pilot to test AI model accuracy before a full commitment?

Yes. A 3–4 week pilot costing $10K–$25K lets you validate model accuracy, rubric performance, image variability handling, and the team's ability to deliver consistent evaluation.

How do I assess if a dev shop's team truly has AI/ML expertise vs just API integration experience?

Ask them to walk through their prompt-testing methodology, accuracy measurement process, error classification framework, and approach to switching or comparing models. Real AI engineers can explain these clearly.

What does a realistic timeline look like from contract signing to MVP launch for AI assessment tools?

A strong team delivers an MVP in 10–16 weeks, covering model trials, rubric and prompt design, image-processing pipelines, multi-provider orchestration, and a basic instructor dashboard.

How do I evaluate if a dev shop has real experience with AI image analysis for education?

Ask for architecture diagrams, model comparison data, annotated output samples, and reproducibility testing workflows. Teams relying only on plug-and-play APIs won’t have this depth of evidence.

What's the cost difference between hiring an agency vs building an in-house team for AI grading system projects?

Agencies typically run $20K–$60K per month with immediate senior talent, while an in-house team often exceeds $35K per month for just two AI engineers—before adding recruiting time, benefits, and management overhead.

How much does it cost to hire a development team to build AI-powered educational assessment software?

Most MVPs range from $80,000 to $250,000, driven by the complexity of image analysis, rubric flexibility, and whether your system needs multi-model comparison or visual feedback overlays.

Use a labeled dataset that mirrors real student submissions, test across varied lighting and resolution conditions, track rubric adherence, break down error categories, and compare cost per evaluation across model providers.
MVP development typically requires a few months. Complex migrations take longer. Timeline depends on scope, integration complexity, and data migration requirements.
Yes, we accommodate various engagement lengths for dedicated developers. Project-based work handles shorter timelines for specific deliverables like migrations or performance optimization.
All code undergoes peer review, includes comprehensive tests, follows TypeScript strict mode, and meets ESLint standards. We implement CI/CD pipelines with automated testing before production deployment.
Yes, we regularly join ongoing projects. Initial assessment reviews architecture, identifies technical debt, and establishes development standards before beginning feature work.
We work with current Supabase platform including latest PostgreSQL versions, Edge Functions, Realtime, Storage API, and Auth. We stay current with platform evolution and beta features.
Daily async updates via Slack, weekly video calls for sprint planning, bi-weekly demos showing progress. Full code visibility through GitHub with detailed pull request documentation.
Yes, we execute NDAs before discovery phase. All code and intellectual property belongs to you. We maintain strict confidentiality and security protocols for proprietary systems.

We love to take on new challenges, tell us yours.

We'll get back to you in 1 day business tops