Preparing Your Interview Transcripts

This section guides you through the process of transforming traditional interview transcripts into structured CSV files that can be used with the Oral History as Data platform.

Key Steps

Clean your transcript text (remove formatting artifacts, normalize for web display)
Create a CSV with columns: speaker, words, tags, timestamp
- The only required column is words
Each row = one paragraph on the page
- Break up overly long rows!
Save as CSV, name to match the objectid in your metadata
- A mismatch between filename and objectid is the most common problem users run into! More below

Transcript CSV Structure

Your transcript CSV files can include the following columns (in any order). Note that the only required field is words!

speaker: The person speaking in this segment (optional)
words: The text of what was said
tags: Your thematic codes for this segment (optional, but enables visualization)
timestamp: Timecode reference for audio/video syncing (optional)

The first row of your CSV file should contain these column headers.

Example Transcript Format

Here’s how a typical transcript segment looks:

Speaker	Words	Tags	Timestamp
Interviewer	Could you tell me about your early experiences?	background	00:00
Interviewee	Yes, I grew up in a small town in the 1950s. It was quite different from today.	childhood; rural life	24:15
Interviewer	What was school like for you?	education	48:32
Interviewee	School was challenging but rewarding. I particularly enjoyed mathematics and science classes.	education; academic interests	01:03:45

Timestamp Format

The timestamp field accepts several formats:

00:00 - Minutes:Seconds
0:00:00 - Hours:Minutes:Seconds
00:00.00 - Minutes:Seconds.Milliseconds

The timestamp field enables synchronization between transcript text and audio/video recordings. When properly formatted, users can click on a timestamp to jump to that point in the recording.

CSV Naming Convention

Name your transcript CSV files to match the objectid in your metadata spreadsheet. For example, if your metadata contains an interview with objectid smith_john, name the transcript file smith_john.csv.

This naming convention is crucial because it connects your transcript files to the CollectionBuilder-CSV metadata system, allowing the platform to automatically link the right transcript with the right metadata entry.

Sample Files

For sample transcript files that demonstrate the proper format, see our examples

More Help

Collection Metadata Overview Setup Overview