Features of the Corpus of Spoken Greek

The need for a corpus of spoken Greek arises primarily from the priority modern linguistics attributes in general to spoken over written discourse. Based, however, on the findings of sociolinguistics, the study of spoken discourse is to be grounded in language material drawn from naturally-occurring circumstances of communication so as to allow its spontaneous and unconstrained production. As a consequence, the compilation of a corpus of spoken discourse poses a number of challenges for researchers (which range from overcoming the so-called ‘observer’s paradox’ to ensuring participants’ consent to the tape-/video-recording, etc.), which do not arise in the case of corpora of written discourse, especially corpora of published texts.

The Corpus of Spoken Greek was originally designed for the qualitative analysis of language and linguistic communication, especially from the perspective of Conversation Analysis. Consequently, particular emphasis is placed on the transcription of tape-/video-recorded material that depicts sound reality as accurately as possible.

For Conversation Analysis, transcription is not a mechanistic procedure (see related software in the market) nor is it limited to the representation of content (see print news interviews). On the contrary, the ‘translation’ of sound into writing presupposes theoretical processing and analysis as well as appropriate training, and requires multiple ‘corrections’ by different individuals.
As a result, the transcribed texts of the Corpus of Spoken Greek depart from the standard orthographic representation of spoken discourse in that additional symbols are used to mark overlaps, pauses, intonational and other features of spoken discourse (see Transcription symbols). The texts also differ from one another with respect to the detail and quality of the transcription.