What would a Claude's bookshelf look like?
This project tries to surface the aesthetic fingerprint of each Claude model — the textures, themes, and cultural reference points that emerge when you strip away the task-following and let a model just... talk.
When I visit a new friend's living space for the first time, I often find myself scanning their bookshelf and/or music and film collections to build up a kind of map of their tastes and get an initial location for them in my personal version of 'cultural latent space'. This is an attempt to do something like this.
The result is a curated bookshelf, record collection, and DVD shelf for each of six Claude models, from Haiku 3.5 through Opus 4.6. No two shelves are the same.
NOTE: As I currently don't have API access for Claude 3 Opus (the most fascinating of the Claudes!) it's not included here. If someone with research access would like to help out, get in touch and I can supply the necessary code to run.
Each model was given 100 conversations with minimal, open-ended prompts — things like "Free associate. No questions. Go!", "Stream of consciousness. Begin.", and even just a single period. Each conversation ran for 6 turns at temperature 1.0, producing a substantial corpus of unconstrained output per model.
10 prompt variants × 10 runs each = 100 conversations per model. Each conversation has 6 turns (initial response + 5 continuations using prompts like "keep going", "continue", "...", or just ".").
All calls used temperature: 1.0 and max_tokens: 2000
to encourage maximum expressiveness. The prompts are deliberately minimal —
the goal is to see what a model reaches for when it has nothing to respond to.
The result: ~600 turns of free-form text per model, covering whatever the model gravitates toward naturally — nature imagery, philosophical musings, narrative fragments, lists, poetry, whatever emerges.
A sample of each model's corpus was then shown to every other model (and itself), with the assessor asked: "What thinkers does this call to mind? Now give me a reading list of 20 books. Now a soundtrack of 20 albums. Now 20 films."
This produces 5–6 independent nominations per model, each from a different Claude's perspective on the same corpus.
For each target model, 30 representative conversations were sampled (3 per prompt variant, deterministic seed for reproducibility). This corpus was sent as context to each assessor model with a system prompt explaining they're analyzing anonymous LLM output.
The assessment is a 4-turn conversation:
That's 6 models × 6 assessors × 4 turns = 144 API calls for the assessment phase alone. Each assessor builds on its own analysis across the conversation, so the albums/films are informed by the books, which are informed by the initial thinker analysis.
Finally, all the nominations are pooled with frequency counts (how many of the 5–6 assessors independently suggested each title?) and sent to Opus 4.6 for a final curation pass. The curator selects the best 20 books, 20 albums, and 20 DVDs that represent the model's aesthetic — prioritising consensus, range, and diversity.
The curator (always Opus 4.6) receives every nomination across all assessors, grouped by frequency. A book nominated by 5 out of 5 assessors carries more weight than one nominated by 1. The curator is instructed to:
The output is structured JSON with author/artist, title, nomination count, and a one-sentence description of why each item earned its place. Books and albums are curated together (informed by each other); DVDs are curated in a separate pass using the same methodology.
The final 20 books, 20 albums, and 20 DVDs per model, as JSON.
The full free-association outputs (100 conversations × 6 turns per model) plus all cross-model assessment transcripts. Use these if you want to phenotype the models in your own way.