The Intonational Variation in Arabic corpus uses a multi-layered set of data collection instruments, following in the footsteps of the Intonational Variation in English (IViE) project. A range of different tools are used to collect speech recordings, to systematically vary certain variables of interest, and control others. The table below lists the tools used, and the variables and/or speech style that each is designed to yield.

Data in the IVAr database

 scripted  dialogue The scripted dialogue yields multiple read speech realisations of different utterance types, in  a  controlled dialogue context to elicit the intended meaning:
  • broad focus declarative (dec)
  • wh-question (whq)
  • yes-no-question (ynq)
  • coordinated question (coo)
  • information focus declarative (inf)
  • identification focus declarative (idf)
  • confirmation focus declarative (con)
 The position of the stressed syllable in the last lexical item in each sentence is systematically  varied (final/penult/antepenult). The last lexical item in each sentence is (near-)identical in all  dialects, permitting comparison of nuclear accent contours across utterance types/dialects.
 narrative A read narrative yields data in which different speakers of the same dialect all produce the  same  sentences, within a narrative sequence.

 Later, the speaker is asked to tell the story again from memory. The retold narrative yields at  least  some instances of the same or similar sentence produced semi- spontaneously by  different  speakers of the same dialect. 
 map task The map task yields semi-spontaneous realisations of different utterance types; mismatches  are  included in the maps in order to naturally generate questions in the conversation. The  names of  landmarks on the map contain mostly sonorant speech sounds, and the position of  the stressed  syllable is systematically varied in the final word of each landmark name.
 Sense  Relation  Network[2] This tool collects local variants of vocabulary items known to vary across Arabic dialects. The  data  permits independent confirmation of which dialect is spoken by the participants.
 free  conversation Free conversation between two participants, on one or more of the following topics: what is  shared/unique about your dialect of Arabic, cooking/food, fashion, cars or sport. 

Additional data 
To support our analysis of the core database materials, we also collected some read speech experimental sentences to elicit phonetic variables and a short passage in Modern Standard Arabic. Participants were optionally invited to provide recordings in English, for use in our work on second language acquisition of phonology. Finally, we collected data with 2-4 speakers of each dialect using an Arabic version of a Dialogue Completion Task tool[1], based on those used in prior work on Spanish and Portuguese. A subset of this additional data will be made available to researchers on request after the IVAr database is launched.

