Tutorial - Transcript Preparation

Complete Tutorial: Converting a Transcript to CSV Format

This beginner-friendly tutorial guides you through converting a standard interview transcript into the structured CSV format used by Oral History as Data. Follow this step-by-step process to prepare your first transcript.

What You’ll Create

By the end of this tutorial, you’ll have:

A properly formatted transcript CSV file
Content ready for upload to your OHD site
A file that can be visualized and searched

Before You Begin

Gather these materials:

A raw interview transcript (Word document, text file, etc.)
Access to Google Sheets (recommended) or another spreadsheet program
30-45 minutes of time (less for subsequent transcripts)

Tutorial Steps

Step 1: Create Your Spreadsheet

Open Google Sheets in your web browser
- Go to sheets.google.com
- Click “Blank” to create a new spreadsheet
Set up your column headers
- In cell A1, type: speaker
- In cell B1, type: words
- In cell C1, type: tags
- In cell D1, type: timestamp
Tip: Alternatively, use our template spreadsheet and click “Make a copy”.

Step 2: Clean Your Transcript Text

Open your transcript document
- Open your interview transcript in Word, Google Docs, or another program
- Select all text (Ctrl+A or Cmd+A) and copy it (Ctrl+C or Cmd+C)
Use the text cleaner tool
- Paste your text in the top box below
- Click the “Clean Text” button
- The cleaned version will appear in the bottom box

Copy the cleaned text
- Click in the bottom box
- Select all text (Ctrl+A or Cmd+A)
- Copy it (Ctrl+C or Cmd+C)

Step 3: Structure Your Transcript

Paste text into your spreadsheet
- If you don’t have any Speaker’s noted in the text, Click in cell B2 (under the “words” column)
- If you do have Speaker’s noted, click in cell A2 (under the “speaker” column)
- Paste your cleaned text (Ctrl+V or Cmd+V)
- The text will appear in a single cell
Separate text by speaker
- If your text appears in a single cell, with the Speaker’s name starting each cell like so “Speaker:” follow these steps:
  - Select the cell with your text
  - Using the menu at top, click Data → Split text to columns
  - In the separator options, choose “Custom” and enter a colon mark –> “:”
  - Click “Split”
- Speaker’s should now be separated from the text that they speak!
  - Note: if you have colons in the rest of your text, this will also be split out into new cells; you can paste that content back into the preceding cells, or just replace the current words column with the cleaned transcript you had from before, then do a find/replace function to remove any mention of speakers
If not Speakers are listed, Identify speakers
- In column A, add the speaker name for each segment of text
- For example:
  - “Interviewer” for questions
  - “John Smith” for the interviewee’s responses
- Be consistent with names throughout

Step 4: Add Optional Information

Add timestamps (if you have audio/video)
- In column D, add timestamps for key segments
- Format as MM:SS (e.g., 01:45) or HH:MM:SS (e.g., 1:12:30)
- Match to the corresponding points in your recording
Add topic tags (for visualization)
- In column C, add relevant topic keywords
- Use semicolons between multiple tags (e.g., “education; family; career”)
- Be consistent with terminology across segments

See our How To for more information on creating Tags!

Step 5: Finalize and Save

Review your spreadsheet
- Ensure each row has a speaker identified
- Check that text is properly separated (one speaker segment per row)
- Verify any timestamps or tags are correctly formatted
Save as CSV file
- Click File → Download → Comma-separated values (.csv)
- Find the downloaded file (don’t open it!!) where it’s been downloaded
- Rename your file to match the objectid in your metadata
- Example: If your interview has objectid “smith_john” in metadata, name the file “smith_john.csv”
Upload to your repository
- Place the CSV file in the _data/transcripts/ folder
- For GitHub users:
  - Navigate to the _data/transcripts/ folder in your repository
  - Click “Add file” → “Upload files”
  - Drag your CSV file or use the file selector
  - Commit the changes

Example of Completed Transcript

Here’s how a properly formatted transcript CSV should look (note: this is overly tagged to serve as an example … )

Speaker	Words	Tags	Timestamp
Interviewer	What was your first teaching job?	career; education	00:15
John Smith	“I started at Lincoln Elementary in 1972. It was a challenging environment but rewarding.”	education; career; 1970s	00:22
Interviewer	What subject did you teach?	education	01:05
John Smith	“I taught sixth grade math and science, though I preferred the science lessons.”	education; science; mathematics	01:12

Other Publishing Options Tutorial - Metadata Preparation