#!/usr/bin/env python3
"""
swarm_to_pea.py — Convert a Swarm/Foursquare data export into a P.E.A. v2 archive.
SAVE THIS FIRST
═══════════════
If you copied this from the web: paste it into a plain-text file named exactly
`swarm_to_pea.py` (a code editor like BBEdit, VS Code, or TextEdit in *plain text* mode —
not Rich Text, or the quotes/indentation break). Then run it from Terminal as
shown below. You need Python 3, which ships with macOS; check with `python3 --version`.
QUICK START
═══════════
First, request your data. Go to https://app.foursquare.com/settings/privacy
and hit "Initiate Data Download Request". Foursquare (the company behind Swarm)
emails you a download link when it's ready — this can take up to 72 hours.
The email links a `data-export-NNNN.zip`. Download and unzip it. You'll see a folder full of files like `checkins1.json`,
`checkins2.json`, … `photos1.json`, etc. That folder is the **input** to this
script.
From the repo root, point the script at the unzipped folder:
python3 scripts/swarm_to_pea.py "/path/to/data-export-NNNN"
That's the minimum. The script reads the `checkinsN.json` files and writes
**`pea-swarm.json` inside the same folder**, right next to the originals.
That `pea-swarm.json` file is the P.E.A. archive — the thing you then move
into the simulator (or AirDrop to your phone) and import via
**Settings → Add data to P.E.A.** (merge mode).
Worked example, the way most people would run it:
1. Unzip data-export-NNNN.zip → creates a `data-export-NNNN/` folder
2. python3 scripts/swarm_to_pea.py "data-export-NNNN"
→ writes `data-export-NNNN/pea-swarm.json`
(output is ~1 KB per checkin; a multi-thousand-row export is a few MB)
3. Move `pea-swarm.json` to the Files app on your iPhone / iPad / Sim
4. In P.E.A. → Settings → Add data to P.E.A. → pick the file → Merge
TL;DR — WHICH COMMAND SHOULD I RUN?
═══════════════════════════════════
Up to ~2,000 check-ins? Convert everything:
python3 swarm_to_pea.py "data-export-NNNN"
5,000+ check-ins? That many can be hard on the system and slow P.E.A. down.
Start with the ones you wrote a note ("shout") on — those are usually the
meaningful ones:
python3 swarm_to_pea.py "data-export-NNNN" --shouts-only
Only want the moments you shared with someone? Keep the checkins whose shout
@mentions a friend:
python3 swarm_to_pea.py "data-export-NNNN" --mentions-only
Just want to catch up with this year's check-ins? Do this year only:
python3 swarm_to_pea.py "data-export-NNNN" --since 2026-01-01
(Replace `data-export-NNNN` with your unzipped Swarm export folder. The flags
combine — e.g. `--shouts-only --since 2026-01-01` for this year's shouted ones.)
COMMON QUESTIONS
════════════════
Q: Do I need `--since` for my first import?
A: If this is your first import, omit it — you get every checkin Swarm ever recorded for you. `--since
YYYY-MM-DD` is for narrowing later (e.g. "just the last two years") if you want to only import the last two years.
Q: Why do my imported rows show "P.E. #NaN" in the app?
A: That's by design. Imported rows arrive numberless because they haven't
been rated yet — a Swarm checkin isn't automatically a Positive
Experience just because it happened. The moment you rate one in the
editor (Unset → any green step), the row earns the next available
`P.E. #N` and keeps that number for life. Clearing back to Unset later
does NOT remove the number. Rows you neutralize without ever rating
stay #NaN, then disappear after 30 days in the freezer — no archive
slot was ever burned.
Q: I have thousands of checkins and most aren't memorable. Can I narrow it down?
A: Use `--shouts-only`. It keeps just the checkins where you typed a "shout"
(the prose you added at check-in time) and drops the rest. A shout is the
closest thing Swarm carries to "this moment mattered to me" — not a
guarantee, but a strong signal. Run without the flag first: the summary
prints how many of your checkins have a shout so you can see the volume,
then re-run with `--shouts-only` if that smaller set looks right. You can
always import the silent checkins later in a separate pass.
Q: Can I import only the checkins I was with other people for?
A: Sort of. Foursquare's data export does NOT include the list of friends you
checked in with — that "with" data lives only in the live Swarm API and is
stripped from the export you download. The one place companions survive is
the shout text, where you often @mentioned them ("Coffee with @kirbmart").
Use `--mentions-only` to keep just those. The summary reports how many of
your checkins @mention someone so you can gauge the volume first. (Because
the friend list isn't in the export, no people/contacts are created on
import — only the @name text inside the note is preserved.)
Q: Why do some checkins have no location name?
A: Swarm has an "off-grid" path — you can check in without picking a venue
(or at a privacy-flagged / unlisted spot). The row still captures GPS,
so it carries `lat`/`lng`, but there's no venue identity. Typically <1 %
of rows in an export are like this. The script imports them as foreign
rows with coordinates but no `locationName` and no "View external
experience" link — there's no external page to link to. You can attach
a P.E.A. Place to any of them in-app afterward.
WHAT THE CONVERTER MAPS
═══════════════════════
Real Swarm checkin (input) ↔ P.E.A. v2 entry (output):
id → sourceID
id → sourceURL = https://swarmapp.com/checkin/<id>
lat, lng → latitude, longitude
venue.name → locationName (omitted if no venue)
shout (or "") → text
createdAt (UTC) → createdAt / updatedAt (ISO 8601 in the row's local tz)
+ timeZoneOffset
derived → timeOfDay, dayOfWeek, season (from local time)
constant → source: "swarm",
intensity: 1, intensityLabel: "Minimum",
isIntensityUnset: true
raw Swarm checkin → importPayload (compact JSON string)
The Swarm export ships as a folder of `checkinsN.json` files (paged ~1000
items per file), each shaped like:
{
"count": 1000,
"items": [
{
"id": "<24-hex-character checkin id>",
"createdAt": "YYYY-MM-DD HH:MM:SS.ffffff", // UTC, naive
"timeZoneOffset": -420, // minutes from UTC
"lat": 47.6605, "lng": -122.3661, // example only
"venue": { "id": "...", "name": "<venue name>",
"url": "https://app.foursquare.com/v/..." },
"shout": "optional prose, present on a small minority of rows",
...
}
]
}
EDGE CASES THE CONVERTER HANDLES
════════════════════════════════
* Rows with no `venue` (off-grid checkins) — typically <1 % of an
export. Those still have lat/lng and import fine; `locationName` and
`sourceURL` are omitted.
* Rows with no `shout` — usually the vast majority of an export
(Swarm users rarely add prose to checkins). Their `text` becomes "".
* `timeZoneOffset` is always present in observed exports; we default
to UTC if it ever isn't — Swarm's `createdAt` is itself UTC, so the
resulting timestamp stays correct, just shown in UTC.
* Pre-2010 checkins (Swarm/4sq's earliest days) parse identically — the
string format never changed.
CLI REFERENCE
═════════════
python3 scripts/swarm_to_pea.py INPUT_FOLDER [OUTPUT_PATH] [options]
INPUT_FOLDER Folder containing `checkinsN.json` (the unzipped Swarm
export).
OUTPUT_PATH Where to write the P.E.A. archive JSON.
Defaults to `INPUT_FOLDER/pea-swarm.json`.
--since YYYY-MM-DD Only include checkins on/after this local date.
--shouts-only Only include checkins that have a shout (the
prose you typed). The closest signal for a
memorable moment. The summary always reports how
many checkins have a shout, flag or not.
--mentions-only Only include checkins whose shout @mentions
someone. The friend/"with" list isn't in the
export, so an @name in the shout is the only
"with people" signal. Implies a shout.
--seed N Deterministic UUIDs for testing.
MORE EXAMPLES
═════════════
# Convert everything, output lands next to the input
python3 scripts/swarm_to_pea.py "data-export-NNNN"
# Last two years only, custom output path
python3 scripts/swarm_to_pea.py "data-export-NNNN" ~/pea-2024.json \\
--since 2024-01-01
# Only the checkins you wrote a shout on — the memorable ones
python3 scripts/swarm_to_pea.py "data-export-NNNN" --shouts-only
# Only the checkins where you @mentioned someone — moments with people
python3 scripts/swarm_to_pea.py "data-export-NNNN" --mentions-only
# Reproducible UUIDs (testing only)
python3 scripts/swarm_to_pea.py "data-export-NNNN" --seed 42
"""
from __future__ import annotations
import argparse
import json
import os
import re
import sys
import uuid
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Iterable
PEA_SCHEMA_VERSION = 2
CONVERTER_VERSION = "swarm-to-pea/1.0.0"
SWARM_CHECKIN_URL_PREFIX = "https://swarmapp.com/checkin/"
# A Swarm "@mention" inside a shout: an @ at a word boundary followed by a
# username character. NOTE: Foursquare's data export does NOT include the
# structured `with`/companions list (it lives only in the live API), and the
# `entities` array that once marked up mentions is empty in the export. So an
# @mention left in the shout prose is the only signal the export carries for
# "I was here with someone". The lookbehind avoids matching e-mail addresses.
MENTION_RE = re.compile(r"(?<!\w)@\w")
# ── Derived-label helpers (kept in lockstep with PEA/Extensions/Date+Extensions.swift)
#
# These three functions MUST agree with `PEA/Extensions/Date+Extensions.swift`.
# If the app's buckets ever change, update this file too — drift here means
# imported rows render with different metadata than newly captured ones.
_DAY_NAMES = ["Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"]
def time_of_day_label(local: datetime) -> str:
h = local.hour
if 5 <= h < 12:
return "Morning"
if 12 <= h < 17:
return "Afternoon"
if 17 <= h < 21:
return "Evening"
return "Night"
def day_of_week_label(local: datetime) -> str:
# datetime.weekday(): Monday == 0 … Sunday == 6
return _DAY_NAMES[local.weekday()]
def season_label(local: datetime) -> str:
# Northern-hemisphere meteorological seasons — same defaults the app uses.
m = local.month
if 3 <= m <= 5:
return "Spring"
if 6 <= m <= 8:
return "Summer"
if 9 <= m <= 11:
return "Autumn"
return "Winter"
# ── Timestamp parser
#
# Swarm's `createdAt` strings are naive UTC microseconds-precision wall clocks:
# "2026-05-22 03:14:42.000000"
# We parse that as UTC, then shift by `timeZoneOffset` (minutes) to produce a
# tz-aware datetime in the *local* tz of the checkin. The resulting ISO 8601
# string carries the offset (e.g. `2026-05-21T20:14:42-07:00`), which is what
# Swift's default `ISO8601DateFormatter` accepts and what the rest of the PEA
# pipeline assumes for `createdAt` / `updatedAt`.
def parse_swarm_timestamp(created_at: str, tz_offset_minutes: int | None) -> datetime:
# `%f` accepts 1–6 digits, so the trailing `.000000` is fine.
naive_utc = datetime.strptime(created_at, "%Y-%m-%d %H:%M:%S.%f")
utc = naive_utc.replace(tzinfo=timezone.utc)
offset_minutes = tz_offset_minutes if tz_offset_minutes is not None else 0
local_tz = timezone(timedelta(minutes=offset_minutes))
return utc.astimezone(local_tz)
# ── Checkin loading
#
# Walks `<export-dir>/checkins*.json`, unwraps the `{count, items}` envelope,
# and yields each item with its source filename for nicer error messages if
# something downstream chokes.
def load_checkins(export_dir: Path) -> Iterable[dict]:
files = sorted(export_dir.glob("checkins*.json"))
if not files:
sys.exit(
f"error: no checkins*.json files found under {export_dir}\n"
f"hint: point at the folder that contains checkins1.json … checkinsN.json"
)
for f in files:
try:
payload = json.loads(f.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError) as e:
sys.exit(f"error: failed to read {f.name}: {e}")
items = payload.get("items") if isinstance(payload, dict) else None
if not isinstance(items, list):
sys.exit(
f"error: {f.name} is not a Swarm checkin file "
f"(missing top-level 'items' array)"
)
yield from items
# ── Conversion
def convert_checkin(
checkin: dict,
new_uuid: callable,
) -> dict:
swarm_id = checkin["id"]
created_at_raw = checkin["createdAt"]
tz_offset = checkin.get("timeZoneOffset")
local = parse_swarm_timestamp(created_at_raw, tz_offset)
iso_local = local.isoformat() # e.g. "2026-05-21T20:14:42-07:00"
lat = checkin.get("lat")
lng = checkin.get("lng")
venue = checkin.get("venue") if isinstance(checkin.get("venue"), dict) else None
shout = checkin.get("shout") or ""
# Reproduces the test fixture byte-for-byte field order: uuid, archive#,
# text, intensity*, isIntensityUnset, timestamps, derived labels, geo,
# locationName, source/provenance trio, importPayload. Insertion order
# matters because Python's dict preserves it and json.dump honors it,
# which makes diffs against the hand-crafted fixture readable.
#
# archiveNumber is always 0: imported rows are numberless until the
# user rates them in-app. P.E.A. renders archiveNumber == 0 as
# "P.E. #NaN" and only assigns a real number on the Unset → Rated
# transition. Once earned, the number is permanent. See
# `.cursor/rules/intensity-unset-state.mdc` for the full rule.
entry: dict = {
"uuid": new_uuid(),
"archiveNumber": 0,
"text": shout,
"intensity": 1,
"intensityLabel": "Minimum",
"isIntensityUnset": True,
"createdAt": iso_local,
"updatedAt": iso_local,
"timeOfDay": time_of_day_label(local),
"dayOfWeek": day_of_week_label(local),
"season": season_label(local),
}
if lat is not None and lng is not None:
entry["latitude"] = lat
entry["longitude"] = lng
if venue is not None:
name = venue.get("name")
if name:
entry["locationName"] = name
entry["source"] = "swarm"
entry["sourceID"] = swarm_id
# The checkin URL is constructed from the checkin id (Swarm permalink
# convention). It's often privacy-gated (closeFriends / private), but
# that's a user-visible authorization concern, not something the
# converter should second-guess: the URL is the canonical handle, and
# the in-app affordance is "View external experience" — letting the
# user open it surfaces the Swarm app's own gating cleanly.
#
# We *don't* fall back to `venue.url` (which would be the Foursquare
# venue page) — those are different identities. `sourceURL` is the
# canonical handle for the original moment (the checkin), not the place.
if venue is not None:
entry["sourceURL"] = SWARM_CHECKIN_URL_PREFIX + swarm_id
# Stringified compact JSON so PEA's `importPayload` column stays a
# single TEXT value (SwiftData encodes Strings as TEXT, not JSON). The
# importer never reads this; it's preserved for future schema revivals
# — e.g. if PEA grows a `hacc` or `visibility` field, a re-import
# round-trip can backfill from here.
entry["importPayload"] = json.dumps(checkin, separators=(",", ":"),
ensure_ascii=False)
return entry
# ── Main
def main() -> None:
parser = argparse.ArgumentParser(
description=(
"Convert a Swarm/Foursquare data export (folder of checkinsN.json) "
"to a P.E.A. v2 archive (single JSON file)."
),
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
parser.add_argument(
"export_dir", type=Path, metavar="INPUT_FOLDER",
help="Unzipped Swarm export folder. Must contain checkinsN.json files.",
)
parser.add_argument(
"out", type=Path, nargs="?", default=None, metavar="OUTPUT_PATH",
help=(
"Where to write the P.E.A. archive JSON. "
"Defaults to INPUT_FOLDER/pea-swarm.json — i.e. the converted "
"file lands next to the original Swarm export."
),
)
parser.add_argument(
"--since", metavar="YYYY-MM-DD", default=None,
help="Only include checkins on/after this LOCAL date.",
)
parser.add_argument(
"--shouts-only", action="store_true",
help=(
"Only include checkins that have a non-empty 'shout' (the prose "
"you typed at check-in time). These are the closest signal Swarm "
"carries for a moment you found memorable. The summary always "
"reports how many of your checkins have a shout, so you can gauge "
"the volume before deciding."
),
)
parser.add_argument(
"--mentions-only", action="store_true",
help=(
"Only include checkins whose shout @mentions someone (e.g. "
"'Dinner with @kirbmart'). Foursquare's export doesn't include the "
"list of friends you checked in with, so an @mention in the shout "
"text is the only 'I was here with someone' signal available. "
"Implies a shout. The summary always reports how many of your "
"checkins @mention someone, flag or not."
),
)
parser.add_argument(
"--seed", type=int, default=None, metavar="N",
help="Seed UUID4 generation for deterministic output (testing only).",
)
args = parser.parse_args()
if not args.export_dir.is_dir():
sys.exit(f"error: {args.export_dir} is not a directory")
out_path: Path = args.out if args.out is not None else args.export_dir / "pea-swarm.json"
since_date: datetime | None = None
if args.since:
try:
since_date = datetime.strptime(args.since, "%Y-%m-%d").date()
except ValueError:
sys.exit(f"error: --since must be YYYY-MM-DD, got {args.since!r}")
if args.seed is not None:
import random
rng = random.Random(args.seed)
def new_uuid() -> str:
return str(uuid.UUID(int=rng.getrandbits(128), version=4))
else:
def new_uuid() -> str:
return str(uuid.uuid4())
seen_ids: set[str] = set()
checkins: list[dict] = []
skipped_dupes = 0
skipped_since = 0
skipped_missing = 0
skipped_no_shout = 0
skipped_no_mention = 0
shout_total = 0 # checkins with a shout among those that passed id/since
mention_total = 0 # checkins whose shout @mentions someone (subset of shout_total)
for c in load_checkins(args.export_dir):
cid = c.get("id")
created = c.get("createdAt")
if not cid or not created:
skipped_missing += 1
continue
if cid in seen_ids:
skipped_dupes += 1
continue
seen_ids.add(cid)
if since_date is not None:
local = parse_swarm_timestamp(created, c.get("timeZoneOffset"))
if local.date() < since_date:
skipped_since += 1
continue
shout_text = (c.get("shout") or "").strip()
has_shout = bool(shout_text)
if has_shout:
shout_total += 1
has_mention = bool(MENTION_RE.search(shout_text))
if has_mention:
mention_total += 1
if args.shouts_only and not has_shout:
skipped_no_shout += 1
continue
if args.mentions_only and not has_mention:
skipped_no_mention += 1
continue
checkins.append(c)
# Stable chronological order — oldest first. Rows arrive numberless;
# the user earns each `P.E. #N` by rating in the app. We sort so that
# `createdAt` stays monotonic in the resulting JSON, which makes diffs
# and partial re-imports legible — and means the rate-in-order workflow
# ("rate the oldest one first") produces archive numbers that march
# alongside chronology.
checkins.sort(key=lambda c: c["createdAt"])
entries = [convert_checkin(c, new_uuid) for c in checkins]
# `readme` lives near the top so anyone opening the JSON in a text editor
# sees the human-readable explanation before scrolling past thousands of
# entry rows. `exportSchemaVersion` stays first by convention — it's the
# self-identifying sentinel the importer checks against.
archive = {
"exportSchemaVersion": PEA_SCHEMA_VERSION,
"readme": (
"Converted from a Swarm/Foursquare data export by "
f"{CONVERTER_VERSION}. Every row arrives as an Unset intensity "
"(solid light-gray dot) and a numberless P.E. #NaN — checkins "
"aren't automatically Positive Experiences just because they "
"happened. Drag the dot in the editor to assign a strength; "
"the row earns a permanent P.E. # the moment you rate it, and "
"keeps that number for life (even if you clear it later). "
"Source data is preserved in `importPayload`. The 'View "
"external experience' row on each entry deep-links back to "
"the original Swarm checkin page when tapped."
),
"exportDate": datetime.now(timezone.utc).isoformat(timespec="seconds")
.replace("+00:00", "Z"),
"appVersion": CONVERTER_VERSION,
"entryCount": len(entries),
"placeCount": 0,
"mottoCount": 0,
"peopleCount": 0,
"entries": entries,
"places": [],
"mottos": [],
}
out_text = json.dumps(archive, indent=2, ensure_ascii=False)
out_path.write_text(out_text + "\n", encoding="utf-8")
total = len(entries)
if total == 0:
if args.mentions_only:
print("warning: no entries written — no checkins @mention anyone "
"in the selected range (drop --mentions-only to widen)")
elif args.shouts_only:
print("warning: no entries written — no checkins had a shout "
"in the selected range (drop --shouts-only to import all)")
else:
print("warning: no entries written")
else:
first = entries[0]["createdAt"]
last = entries[-1]["createdAt"]
size_mb = out_path.stat().st_size / (1024 * 1024)
print(f"wrote {total} entries to {out_path} ({size_mb:.2f} MB)")
print(f" range: {first} → {last}")
if args.mentions_only:
print(f" mentions-only: kept {total} checkin(s) that @mention someone")
elif args.shouts_only:
print(f" shouts-only: kept {total} checkin(s) with a shout "
f"({mention_total} of them @mention someone)")
else:
print(f" {shout_total} of these have a shout, "
f"{mention_total} @mention someone "
f"(re-run with --shouts-only or --mentions-only to narrow)")
if skipped_no_mention: print(f" skipped: {skipped_no_mention} row(s) with no @mention")
if skipped_no_shout: print(f" skipped: {skipped_no_shout} row(s) with no shout")
if skipped_dupes: print(f" skipped: {skipped_dupes} duplicate id(s)")
if skipped_since: print(f" skipped: {skipped_since} row(s) before {args.since}")
if skipped_missing: print(f" skipped: {skipped_missing} row(s) missing id/createdAt")
if __name__ == "__main__":
main()