Convert a PDF into question JSON

Friendly mode: we process up to 6 pages per run and show page-by-page progress.

PDF file

Start page

End page (optional, capped at 6)

This tool extracts what’s visible in the PDF (text + math). Images/diagrams are currently ignored.

Output

Run a conversion to see JSON here.

This section explains the JSON fields, what null means, how questions are classified, and how to use the output in your own tools.

What does each JSON field mean?

items: array of extracted content blocks (usually questions).
page: the PDF page number where the item was found.
kind: mcq, frq, or other.
number: question number/label as text (e.g., "7"). Can be null.
prompt_html: question prompt stored as HTML. Math is TeX inside \( \) or \[ \].
options: MCQ options list (label + html). null if not an MCQ.
answer: correct answer (if an answer key is visible on the page). Otherwise null.
solution_html: worked solution/explanation (only if visible on the page). Otherwise null.
page_assets: currently reserved for asset hints; images/diagrams are not extracted yet.

What does null mean?

null means: that value was not present on the page, or it wasn’t clear enough to extract confidently.

Example: if the PDF page contains only questions (no answer key), then answer andsolution_html will stay null.

How does kind get decided (mcq vs frq vs other)?

mcq: options are clearly present (A/B/C/D/E...).
frq: it’s still a question, but no MCQ options are present (free response / multi-part).
other: headers, cover pages, instructions, section titles, etc.

How do I make sure answers and solutions get extracted?

The extractor reads what is visible in the PDF. If you want answer and solution_html, include pages with theanswer key or worked solutions.

Does the tool read diagrams/images?

Not yet. The tool extracts text + math into structured JSON.Images/diagrams are currently ignored.

Need help building digital practice tools?

Copy the JSON (or download it) and paste into your own repository/database, or feed it into an AI tool to transform into your platform’s schema.

For integrations and custom output formats, email support@pdftocode.com.

Workflow preview

A quick walkthrough of the extractor: upload a PDF, get clean, reusable output you can paste directly into your database or AI tooling.

✅ Converts PDFs into structured JSON (questions, options, answers, solutions).
✅ Detects MCQ vs FRQ (when it isn’t MCQ, it becomes FRQ if it’s still a question).
✅ Reads solutions automatically when they are present in the PDF text.
✅ Output is ready for digital practice tools, question banks, analytics, and LMS workflows.

Used by teams at:

Extractor walkthrough (silent autoplay)