Size and discourse types - Corpus of Spoken Greek IMGS
Search:
Go

 

Size and discourse types of the Institute's Corpus of Spoken Greek

The Corpus of Spoken Greek is a set of digital files, which is updated and enriched according to the research project’s affordances and needs (Pavlidou 2016). The Corpus consists of four components (Pavlidou 2024):

1. Audiovisual component: It comprises audio-/video-recordings of naturally-occurring talk and their metadata.

2. Transcribed component: This is a subset of the recordings in 1. that has been transcribed according to the transcription conventions of Conversation Analysis (click here). It comprises talk from different discourse types, with varying degrees of formality:

  • everyday conversations among friends and relatives (sample)
  • telephone calls (sample)
  • classroom interaction (sample)
  • television news (sample)
  • television interviews with politicians (sample)
  • interviews/discussions with Greeks of the diaspora (example)
  • other

The transcribed part of the Corpus exceeds 2,3 million words. It should be noted that the transcribed texts vary in detail and quality of transcription.

3. Online component: This is a subset of 2. that can be used freely online for word searches, frequencies, etc. It currently consists of:

  • 40 everyday conversations among family and friends
  • 145 telephone calls
  • 17 television interviews with politicians
  • 26 diasporic interviews/discussions

4. Annotated component: Part of the online material (see 3.), more specifically, the 145 telephone calls, has undergone manual annotation regarding the parts of speech and the independent interrogative clauses. The results are summarized in two Excel files.