# `PhoenixKitCatalogue.Schemas.PdfPageContent`
[🔗](https://github.com/BeamLabEU/phoenix_kit_catalogue/blob/0.8.0/lib/phoenix_kit_catalogue/schemas/pdf_page_content.ex#L1)

Content-addressed cache of PDF page text.

Keyed by `content_hash` (SHA-256 hex of the page's normalized text).
Same page text appearing in multiple PDFs (cross-referenced supplier
catalogues, shared boilerplate, repeated legal disclaimers) is stored
once.

The GIN trigram index lives on `text` here — duplicates indexed only
once, so the index stays small as the corpus grows.

Write-once: pages either reference an existing row or insert a new
one (insert-on-conflict-do-nothing). Orphaned rows (no `pdf_pages`
row referencing them) are removed by a catalogue-side GC helper, not
by FK cascade — `pdf_pages.content_hash → ON DELETE RESTRICT` keeps
the cache stable during normal upload/delete cycles.

# `t`

```elixir
@type t() :: %PhoenixKitCatalogue.Schemas.PdfPageContent{
  __meta__: term(),
  content_hash: term(),
  inserted_at: term(),
  text: term()
}
```

# `changeset`

```elixir
@spec changeset(
  t()
  | %PhoenixKitCatalogue.Schemas.PdfPageContent{
      __meta__: term(),
      content_hash: term(),
      inserted_at: term(),
      text: term()
    },
  map()
) :: Ecto.Changeset.t(t())
```

---

*Consult [api-reference.md](api-reference.md) for complete listing*
