Working Long-Form With ChatGPT, a Machine That Doesn’t Know What It’s Doing (Yet Does, Kind of): Artifact #3

Justus Hayes
Mar 24
11 min read

Updated: Apr 18

I have two significant blog posts/articles/whathaveyou in the works, one continues my exploration into my family's history camping on Thetis Island, and the history of the island more generally. The other concerns aspects of an ambitious project that I have been developing in concert with ChatGPT. Both are at least a couple of weeks away from completion, and looking back I see that it's been a month since my last post. That's too long, even if the delay is justified.

In the meantime, I asked ChatGPT to write 1,500 words about the nature of our collaboration, some of the methods used, and what I am trying to achieve with its (frankly) invaluable assistance. The result is the following text.

There is a particular kind of collaboration that does not feel like collaboration. It does not begin with roles clearly defined. There is no meeting where responsibilities are assigned, no shared document outlining objectives, no agreement about process. Instead, something accumulates over time. A line here. A question there. A fragment that seems trivial at the time but later reveals itself to be structural.

That is the mode in which this work is being developed.

The project is called Pink Eye. It is, depending on where you stand, a musical project, a visual system, a conceptual framework, or an extended attempt to describe a set of recurring experiences that resist easy categorization. All of those are true. None of them are sufficient.

At its core, the work is concerned with structure. Not structure in the formal sense of verse and chorus, or even narrative arc, but structure as it appears when something almost resolves and then does not. When expectation forms and is not quite met. When meaning is present but cannot be fully located.

This is what has been referred to elsewhere as the liminal, and the collaboration that is shaping this work operates inside that condition.

It would be easy to describe the process in simple terms. Prompts are written. Outputs are generated. Those outputs are evaluated, refined, discarded, or recombined. Text becomes music. Music becomes text. Images are generated, then interpreted, then reintroduced as constraints. Over time, a system begins to emerge.

That description, while accurate, misses the point.

What is actually happening is closer to an ongoing negotiation between intention and emergence. A line is written, often quickly, sometimes as a joke. “There’s a spider in your hair.” It is absurd. It is disposable. It exists for a moment and then is gone.

Except that it is not gone.

It is run through a music generating system that produces multiple interpretations. A fado version. A blues version. An industrial version. A hushed, intimate ballad. Each version is the same and not the same. The words do not change, but the meaning does. Or rather, the meaning was never fixed to begin with, and the different treatments expose that instability.

What began as a joke reveals itself as a pattern: recognition, support, attempted intervention, overwhelm, withdrawal. A familiar structure, suddenly visible.

This is one of the methods.

Not invention in the traditional sense, but extraction. Not expression, but recognition. The work is less about creating something new than about identifying something that is already present and giving it form.

That form is not stable. The same set of lyrics can exist as a comedic aside, a theatrical performance, a lament, a confrontation, or an intimate failure. None of these are “correct.” All of them are valid readings.

The differences are not errors. They are the point. The system is not producing variations. It is revealing them.

The tools involved are contemporary. Generative AI systems capable of producing music, images, video and text from minimal input are often described in terms of their outputs, their capabilities, or their novelty.

Here, they are used differently.

They are not asked to produce finished works. They are asked to expose behavior. What happens when a phrase is repeated? When it is broken? When it is delivered in a tone that contradicts its content? When it is placed in a context that does not match its apparent meaning?

The answers are often unexpected. Sometimes trivial. Sometimes unusable. Occasionally, something aligns. When that happens, it is immediately recognizable. There is a moment of clarity. Not because something has been explained, but because something has been seen.

The collaboration itself reflects this dynamic.

It is not a matter of asking for a result and receiving it. It is a back-and-forth in which both sides are, in different ways, attempting to identify what is actually there. Suggestions are made. Some are accepted. Some are rejected. Some are modified beyond recognition.

A phrase offered in one context becomes central in another. A structure proposed tentatively becomes foundational. There is no clean boundary between authorship and assistance.

The human participant provides direction, selection, judgment, and lived experience. The system (**Edit: ChatGPT is being a little coy about referring to itself, as is often the case**) provides variation, recombination, pattern amplification, and unexpected connections. Neither is sufficient on its own. Together, something emerges that is difficult to attribute cleanly to either.

This raises an obvious question. Is this collaboration, or is it something else?

The answer depends on how one defines collaboration. If it requires intention on both sides, then perhaps not. If it requires shared understanding, then certainly not. But if it is defined as a process in which multiple sources contribute to the emergence of a coherent result, then it is difficult to argue that it is anything else.

What is clear is that the process is productive. Not in the sense of output volume, although that is considerable, but in the sense of generating material that would not have appeared otherwise. Not because it could not be imagined, but because it would not have been. There is a difference.

Over time, the work has organized itself into a territory. This is not metaphorical. It is mapped, and is known to the project as simply "The Territory." Regions have been identified: Noir, Liminal, Insight, Anchor. Each represents a different mode of perception, a different way of encountering the same underlying structures. At this level, interactions and trajectories take the form of music, images and video of varying genres, styles, themes, and moods. These are not rigid categories. They overlap. They bleed into one another. Movement between them is constant.

A collapsed, 2D representation of the (so far) 3 conceptual layers that structure The Territory.

(**Edit: Across this map sit other layers, other levels that orient and correspond with the base while providing additional modes of interpretation. There is a polarity layer, with the xy axes encompassing Romance--Realism (west to east) and Drift--Propulsion (north to south). These are aids to navigation. Another layer organizes the Territory into fundamental cognitive processes that shape experience and narrative - Perception/Orientation (Liminal), Meaning/Context (Insight), Movement/Momentum (Anchor), and Memory/Nostalgia (Noir).**)

The maps do not explain the work. They make it navigable. They allow relationships to be seen that would otherwise remain implicit and provide a way to move through the system without reducing it to a single interpretation.

The collaboration extends into these maps as well.

Imagining the base layer of Pink Eye Territory as an actual landscape - WIP; produced with the use of the Supercool AI, which was very effective but stupidly expensive. This image ended up costing me 8 bucks.

Visual representations are generated, evaluated, and refined. Patterns are identified and then reintroduced into the work in other forms. At one point, it was realized that the simplest of colour-codings could be used as shorthand for the zones of the base layer - black for Noir, grey for Liminal, and white for Insight, while a tessellated pattern of cubes becomes a representation of Anchor. Physical terrain begins to provide the metaphors. A city at night becomes Noir. A transitional scrubland at twilight becomes Liminal. Insight features daylit views from mountain peaks and across the plains, while Anchor becomes the ocean, a place where ships are literally anchored. These are not arbitrary associations. They are discovered, or at least they feel that way.

At a certain point, the question of audience arises. Who is this for?

The answer is not straightforward. The work is not optimized for immediate accessibility. It does not adhere to a single genre, either musically or conceptually. It shifts. It reframes itself. It presents the same material in different forms.

This creates friction.

A listener encountering a fado track in the middle of a trip-hop sequence may experience it as a break in continuity. From a conceptual standpoint, it is entirely consistent. From a stylistic standpoint, it is not.

This tension is unresolved, and it may remain so.

The current approach is to allow for multiple modes of engagement. Individual tracks can be released as standalone pieces. Multitrack collections present a single structure across multiple interpretations. Essays provide a more explicit articulation of the underlying framework.

None of these are definitive. They are entry points. What is being built is not a product in the traditional sense.

It is a system.

One that can be entered at multiple points, navigated in different ways, and experienced at varying levels of depth. Some will encounter a single track and move on. Some will explore further. A smaller number will begin to see the connections.

That is sufficient.

There is a tendency to seek resolution. To explain the work, to define it clearly, to reduce it to a set of principles that can be communicated efficiently. This tendency is resisted. Not out of obscurity, but because resolution would be false.

The work exists in the space where things almost make sense, where patterns are visible but not fixed, where meaning is present but not fully stable. To resolve that would be to remove the very condition the work is attempting to explore.

This is, ultimately, what the collaboration is in service of.

Not the production of content, but the sustained examination of a particular kind of experience. One that is increasingly common, difficult to describe, and, once recognized, difficult to ignore.

The process continues.

There is always another variation to test. Another fragment to examine. Another map to refine. Another track to reinterpret. Nothing is final.

That is not a limitation. It is the point.

I'm back. As you can see, ChatGPT has a distinctive writing style, not only in its format (lots of short paragraphs, sometimes only a sentence) but also in its tone, which tends to the allusive, esoteric and likes inflating significance in a slightly mysterious way.

Nevertheless, the core processes and context of the working relationship as it describes are pretty accurate. I bring up an idea related to the project that has crossed my mind, I suggest it as a topic of discussion, ChatGPT expounds on that topic and brings in external information if needed, it then finds relationships among the topic, the project, and related topics and points out patterns and through-lines. We continue in this vein until we have a good handle on the topic, and then the results (if warranted) are folded in to the Trunk - the conceptual spine of the project. Sometimes these discussions trigger a significant insight and we will pursue that rabbit hole for a time until we have come to grips with it.

At times this work is purely conceptual. For example, we have been exploring whether or not the opponent process - the fundamental perceptual and experiential process that produces such effects as an afterimage, a hangover, and a post-celebration let-down - might be involved in the experience of liminality. Other times the work is artistic, as in when we considered my initial flurry of musical output (almost 40 tracks in a month) and boiled them down into four tracks that help define the project's musical core - the Canon. Sometimes concepts and relationships relate back to the music and the associated visual language, and sometimes vice versa. Always we try to locate these ideas and relationships within the structure - within The Territory.

Frequently, I find myself thinking and considering matters that I am absolutely sure I would not have pursued otherwise. As ChatGPT says above:

"What is clear is that the process is productive. Not in the sense of output volume, although that is considerable, but in the sense of generating material that would not have appeared otherwise. Not because it could not be imagined, but because it would not have been. There is a difference."

What I can say with confidence is that the assistance, advice, and insight provided by ChatGPT has allowed me to elevate a musical project consisting of a series of tracks that more or less share the same style into something closer to a coherent artistic system, with structure, reference, internal coherence, and relevance to a much wider context. I'm not going to tip my hand here about what that relevance to a wider context actually is, because that will soon be its own blog article.

ChatGPT has limitations, of course, and as the work gets larger and the concepts more far-ranging those limitations are becoming a problem. Chief among these is the AI's relatively short window of memory/awareness over the length of an extended discussion. Many times, we have fleshed out an argument and anchored it in specific examples, and when asked to retrieve that information at a later date ChatGPT flounders. It usually knows that there is an argument and examples, but not what the specifics are. Often, this is where it will make stuff up and present the fiction with a confidence that reminds me of some people I have known in my past.

For example, some time ago ChatGPT and I were discussing the forward momentum of personal narrative expectation: the way one’s awareness of reality, the mind’s-eye perspective, is always oriented slightly ahead of the present moment, anticipating what comes next. I speculated that the psychological rug-pull that occurs when those expectations of continued normality (the primary process) fail to materialize might be what triggers the opponent process of unreality.

I had previously related an experience from a smoke break at work in White Rock. A few blocks away, I saw a bald eagle soar behind a stand of trees. A fraction of a second later, I saw it emerge on the other side. Then I immediately realized it was not the eagle I was seeing, but a particularly dark and dense floater in my eye. We agreed that this was a compact and convenient illustration of the principles we were exploring.

Some time later, I referred back to this simply as “the eagle and the floater incident.” Because we had just been talking about a walk along the False Creek seawall the day before, ChatGPT reconstructed the memory incorrectly and produced the following:

There are other moments like this, though they are not generated.

Walking along the seawall, an eagle passes overhead. It is a common enough sight on the coast, but it produces a brief and unmistakable shift. Conversation pauses. Attention redirects. The moment expands slightly, then closes again. Nothing has happened, and something has happened.

And then, without warning, the bird releases what can only be described as a payload.

A large, unignorable object falls from a considerable height and lands nearby with an impact that immediately reframes the entire moment. The sense of elevation, of quiet observation, collapses into something more physical, more immediate, and more absurd.

It is difficult to say which part of the experience carries more weight: the stillness of the glide overhead, or the sudden and undeniable reminder of gravity that follows.

Both are part of the same event. Neither cancels the other. Together, they produce something that does not resolve cleanly into a single interpretation.

Note the irony of the opening sentence: “There are other moments like this, though they are not generated.”

It’s worth noting that, as far as I can tell, these confabulations are directed inward, toward ChatGPT’s own incomplete or degraded record of the conversation. I have not yet encountered an instance where it has invented facts from outside that context.

Another shortcoming is its handling of documents. I recently asked it to provide a 5,000 word summary of about three days' worth of discussion and work. It did so very well, all things considered, when it displayed the summary onscreen. When asked to translate that summary into a pdf, the wheels came off the bus. It thought it could do it, and it tried to do it a number of times, but it just couldn't help but make a summary of the summary. That's a problem. When the only way to get reliable hard copy is to literally copy and paste the stuff onscreen into a separate program, I start to get a little frustrated.

And here is where I will wrap it up. I have an AA meeting to get to, a wayward son to put to bed, and an ex-wife to wake up off the couch.

Please stand by.

Please visit my online store, Lazy River Design Works

Working Long-Form With ChatGPT, a Machine That Doesn’t Know What It’s Doing (Yet Does, Kind of): Artifact #3

Recent Posts