NDJSONJSON Linesstreamingbig dataNode.jsPython

NDJSON and JSON Lines: Streaming Large JSON Datasets

·7 min read

The Problem with Large JSON Arrays

Standard JSON requires you to load the entire file into memory before you can parse any of it. A 2 GB array of log entries needs 2 GB of RAM before you see a single record. For large datasets, this is a dealbreaker.

json
[
  { "id": 1, "event": "page_view", "url": "/home" },
  { "id": 2, "event": "click", "url": "/pricing" }
]

The opening [ and closing ] mean a parser cannot know the array is done until it reads the very last byte.

What is NDJSON?

NDJSON (Newline Delimited JSON) — also called JSON Lines — is a simple convention: one complete, valid JSON value per line, separated by newline characters. No wrapping array. No commas between records.

{ "id": 1, "event": "page_view", "url": "/home" }
{ "id": 2, "event": "click", "url": "/pricing" }
{ "id": 3, "event": "purchase", "amount": 49.99 }

Each line is independently parseable. You can process record 1 before reading record 2. You can start reading from the middle of the file. You can append records without rewriting the file.

File Extension and MIME Type

  • Extension: .ndjson or .jsonl
  • MIME type: application/x-ndjson
  • The JSON Lines site (jsonlines.org) uses .jsonl; the NDJSON spec uses .ndjson — both are the same format

Reading NDJSON in Node.js

Line by line without loading the full file:

javascript
import { createReadStream } from "fs";
import { createInterface } from "readline";

const rl = createInterface({
  input: createReadStream("events.ndjson"),
  crlfDelay: Infinity,
});

for await (const line of rl) {
  if (!line.trim()) continue;
  const record = JSON.parse(line);
  console.log(record.event); // process each record as it arrives
}

Memory usage is proportional to one line at a time — not the file size.

Writing NDJSON in Node.js

javascript
import { createWriteStream } from "fs";

const out = createWriteStream("events.ndjson");
const records = [
  { id: 1, event: "page_view" },
  { id: 2, event: "click" },
];

for (const record of records) {
  out.write(JSON.stringify(record) + "
");
}
out.end();

Reading NDJSON in Python

python
import json

with open("events.ndjson", "r") as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        record = json.loads(line)
        print(record["event"])

Appending Records

One of NDJSON's biggest advantages — you can append new records without touching the existing file:

javascript
import { appendFileSync } from "fs";

function logEvent(event) {
  appendFileSync("events.ndjson", JSON.stringify(event) + "
");
}

This is why NDJSON is popular for log files, event streams, and audit trails.

NDJSON in HTTP APIs

You can stream NDJSON over HTTP by setting the Content-Type to application/x-ndjson and flushing each record as it's ready — the client processes records as they arrive instead of waiting for the full response.

javascript
// Express.js streaming endpoint
app.get("/events/stream", (req, res) => {
  res.setHeader("Content-Type", "application/x-ndjson");
  res.setHeader("Transfer-Encoding", "chunked");

  const events = getEventStream(); // async generator
  for await (const event of events) {
    res.write(JSON.stringify(event) + "
");
  }
  res.end();
});

When to Use NDJSON vs Regular JSON

  • Use regular JSON for API responses, config files, and small-to-medium payloads where the entire document is meaningful as a unit.
  • Use NDJSON for log files, event streams, data exports, ETL pipelines, and any dataset where records are independent and the total size is unpredictable.

Real-World Uses

  • Elasticsearch bulk API uses NDJSON for indexing multiple documents in one request
  • Docker logs output each log line as an NDJSON record
  • GitHub Archive stores all public GitHub events as compressed NDJSON files
  • OpenAI's fine-tuning data format is NDJSON
  • BigQuery streaming inserts accept NDJSON

Validate individual NDJSON records with JSONKit's validator — paste one line at a time to check each record's structure.

Try JSON Formatter

Paste an NDJSON record to format and validate its structure.