Tutorial - Transcript Preparation
Complete Tutorial: Converting a Transcript to CSV Format
This beginner-friendly tutorial guides you through converting a standard interview transcript into the structured CSV format used by Oral History as Data. Follow this step-by-step process to prepare your first transcript.
What You’ll Create
By the end of this tutorial, you’ll have:
- A properly formatted transcript CSV file
- Content ready for upload to your OHD site
- A file that can be visualized and searched
Before You Begin
Gather these materials:
- A raw interview transcript (Word document, text file, etc.)
- Access to Google Sheets (recommended) or another spreadsheet program
- 30-45 minutes of time (less for subsequent transcripts)
Tutorial Steps
Step 1: Create Your Spreadsheet
- Open Google Sheets in your web browser
- Go to sheets.google.com
- Click “Blank” to create a new spreadsheet
- Set up your column headers
- In cell A1, type: speaker
- In cell B1, type: words
- In cell C1, type: tags
- In cell D1, type: timestamp
Tip: Alternatively, use our template spreadsheet and click “Make a copy”.
Step 2: Clean Your Transcript Text
- Open your transcript document
- Open your interview transcript in Word, Google Docs, or another program
- Select all text (Ctrl+A or Cmd+A) and copy it (Ctrl+C or Cmd+C)
- Use the text cleaner tool
- Paste your text in the top box below
- Click the “Clean Text” button
- The cleaned version will appear in the bottom box
- Copy the cleaned text
- Click in the bottom box
- Select all text (Ctrl+A or Cmd+A)
- Copy it (Ctrl+C or Cmd+C)
Step 3: Structure Your Transcript
- Paste text into your spreadsheet
- If you don’t have any Speaker’s noted in the text, Click in cell B2 (under the “words” column)
- If you do have Speaker’s noted, click in cell A2 (under the “speaker” column)
- Paste your cleaned text (Ctrl+V or Cmd+V)
- The text will appear in a single cell
- Separate text by speaker
- If your text appears in a single cell, with the Speaker’s name starting each cell like so “Speaker:” follow these steps:
- Select the cell with your text
- Using the menu at top, click Data → Split text to columns
- In the separator options, choose “Custom” and enter a colon mark –> “:”
- Click “Split”
- Speaker’s should now be separated from the text that they speak!
- Note: if you have colons in the rest of your text, this will also be split out into new cells; you can paste that content back into the preceding cells, or just replace the current words column with the cleaned transcript you had from before, then do a find/replace function to remove any mention of speakers
- If your text appears in a single cell, with the Speaker’s name starting each cell like so “Speaker:” follow these steps:
- If not Speakers are listed, Identify speakers
- In column A, add the speaker name for each segment of text
- For example:
- “Interviewer” for questions
- “John Smith” for the interviewee’s responses
- Be consistent with names throughout
Step 4: Add Optional Information
- Add timestamps (if you have audio/video)
- In column D, add timestamps for key segments
- Format as MM:SS (e.g., 01:45) or HH:MM:SS (e.g., 1:12:30)
- Match to the corresponding points in your recording
- Add topic tags (for visualization)
- In column C, add relevant topic keywords
- Use semicolons between multiple tags (e.g., “education; family; career”)
- Be consistent with terminology across segments
See our How To for more information on creating Tags!
Step 5: Finalize and Save
- Review your spreadsheet
- Ensure each row has a speaker identified
- Check that text is properly separated (one speaker segment per row)
- Verify any timestamps or tags are correctly formatted
- Save as CSV file
- Click File → Download → Comma-separated values (.csv)
- Find the downloaded file (don’t open it!!) where it’s been downloaded
- Rename your file to match the objectid in your metadata
- Example: If your interview has objectid “smith_john” in metadata, name the file “smith_john.csv”
- Upload to your repository
- Place the CSV file in the _data/transcripts/ folder
- For GitHub users:
- Navigate to the _data/transcripts/ folder in your repository
- Click “Add file” → “Upload files”
- Drag your CSV file or use the file selector
- Commit the changes
Example of Completed Transcript
Here’s how a properly formatted transcript CSV should look (note: this is overly tagged to serve as an example … )
Speaker | Words | Tags | Timestamp |
---|---|---|---|
Interviewer | What was your first teaching job? | career; education | 00:15 |
John Smith | “I started at Lincoln Elementary in 1972. It was a challenging environment but rewarding.” | education; career; 1970s | 00:22 |
Interviewer | What subject did you teach? | education | 01:05 |
John Smith | “I taught sixth grade math and science, though I preferred the science lessons.” | education; science; mathematics | 01:12 |