AdapterHub - Explore

Afrikaans

cc100 af/cc100

Language modeling for the Afrikaans language on the CC-100 corpus. Afrikaans

Albanian

cc100 sq/cc100

Language modeling for the Albanian language on the CC-100 corpus. Albanian

Amharic

cc100 am/cc100

Language modeling for the Amharic language on the CC-100 corpus. Amharic

wiki am/wiki

Language modeling for the Amharic language on Wikipedia. Amharic

Arabic

cc100 ar/cc100

Language modeling for the Arabic language on the CC-100 corpus. Arabic

wiki ar/wiki

Language modeling for the Arabic language on Wikipedia. Arabic is a Semitic language that first emerged in the 1st to 4th centuries CE. It is now the lingua franca of the Arab world. Arabic

Armenian

cc100 hy/cc100

Language modeling for the Armenian language on the CC-100 corpus. Armenian

wiki hy/wiki

Language modeling for the Armenian language on Wikipedia. Armenian

Azerbaijani

cc100 az/cc100

Language modeling for the Azerbaijani language on the CC-100 corpus. Azerbaijani

Basque

cc100 eu/cc100

Language modeling for the Basque language on the CC-100 corpus. Basque

wiki eu/wiki

Language modeling for the Basque language on Wikipedia. Basque

Belarusian

cc100 be/cc100

Language modeling for the Belarusian language on the CC-100 corpus. Belarusian

Bengali

cc100 bn/cc100

Language modeling for the Bengali language on the CC-100 corpus. Bengali

wiki bn/wiki

Language modeling for the Bengali language on Wikipedia. Bengali

Bhojpuri

wiki bh/wiki

Language modeling for the Bhojpuri language on Wikipedia. Bhojpuri

Bulgarian

cc100 bg/cc100

Language modeling for the Bulgarian language on the CC-100 corpus. Bulgarian

Burmese

cc100 my/cc100

Language modeling for the Burmese language on the CC-100 corpus. Burmese

wiki my/wiki

Language modeling for the Burmese language on Wikipedia. The Burmese language is a Sino-Tibetan language spoken in Myanmar where it is an official language and the language of the Bamar people,... Burmese

Buryat

wiki bxr/wiki

Language modeling for the Buryat language on Wikipedia. Buryat

Cantonese

wiki zh_yue/wiki

Language modeling for the Cantonese language on Wikipedia. Cantonese

Catalan

cc100 ca/cc100

Language modeling for the Catalan language on the CC-100 corpus. Catalan

Central Khmer

cc100 km/cc100

Language modeling for the Central Khmer language on the CC-100 corpus. Central Khmer

Chinese

cc100 zh/cc100

Language modeling for the Chinese (traditional) language on the CC-100 corpus. Chinese

wiki zh/wiki

Language modeling for the Chinese language on Wikipedia. Chinese is a family of East Asian analytic languages that form the Sinitic branch of the Sino-Tibetan languages. Chinese

Croatian

cc100 hr/cc100

Language modeling for the Croatian language on the CC-100 corpus. Croatian

Czech

cc100 cs/cc100

Language modeling for the Czech language on the CC-100 corpus. Czech

wiki cs/wiki

Language modeling for the Czech language on Wikipedia. Czech

Danish

cc100 da/cc100

Language modeling for the Danish language on the CC-100 corpus. Danish

Dutch

cc100 nl/cc100

Language modeling for the Dutch language on the CC-100 corpus. Dutch

English

cc100 en/cc100

Language modeling for the English language on the CC-100 corpus. English

wiki en/wiki

Language modeling for the English language on Wikipedia. English is a West Germanic language that was first spoken in early medieval England and eventually became a global lingua franca. English

Erzya

wiki myv/wiki

Language modeling for the Erzya language on Wikipedia. Erzya

Esperanto

cc100 eo/cc100

Language modeling for the Esperanto language on the CC-100 corpus. Esperanto

Estonian

cc100 et/cc100

Language modeling for the Estonian language on the CC-100 corpus. Estonian

wiki et/wiki

Language modeling for the Estonian language on Wikipedia. Estonian is a Uralic language of the Finnic branch spoken in Estonia. Estonian

Finnish

cc100 fi/cc100

Language modeling for the Finnish language on the CC-100 corpus. Finnish

wiki fi/wiki

Language modeling for the Finnish language on Wikipedia. Finnish, whose endonym is suomi or suomen kieli is a Uralic language of the Finnic branch spoken by the majority of the population in... Finnish

French

cc100 fr/cc100

Language modeling for the French language on the CC-100 corpus. French

wiki fr/wiki

Language modeling for the French language on Wikipedia. French or la langue française is a Romance language of the Indo-European family. French

Galician

cc100 gl/cc100

Language modeling for the Galician language on the CC-100 corpus. Galician

Georgian

cc100 ka/cc100

Language modeling for the Georgian language on the CC-100 corpus. Georgian

wiki ka/wiki

Language modeling for the Georgian language on Wikipedia. Georgian

German

cc100 de/cc100

Language modeling for the German language on the CC-100 corpus. German

wiki de/wiki

Language modeling for the German language on Wikipedia. German is a West Germanic language that is mainly spoken in Central Europe. German

Greek

cc100 el/cc100

Language modeling for the Greek language on the CC-100 corpus. Greek

wiki el/wiki

Language modeling for the Greek language on Wikipedia. Greek is an independent branch of the Indo-European family of languages, native to Greece, Cyprus, Albania and other parts of the Eastern... Greek

Guarani

wiki gn/wiki

Language modeling for the Guarani language on Wikipedia. Guarani specifically the primary variety known as Paraguayan Guarani, is an indigenous language of South America that belongs to the... Guarani

Gujarati

cc100 gu/cc100

Language modeling for the Gujarati language on the CC-100 corpus. Gujarati

Haitian Creole

wiki ht/wiki

Language modeling for the Haitian Creole language on Wikipedia. Haitian Creole is a French-based creole language spoken by 10–12 million people worldwide, and the only language of most Haitians. Haitian Creole

Hausa

cc100 ha/cc100

Language modeling for the Hausa language on the CC-100 corpus. Hausa

Hebrew

cc100 he/cc100

Language modeling for the Hebrew language on the CC-100 corpus. Hebrew

Hindi

cc100 hi/cc100

Language modeling for the Hindi language on the CC-100 corpus. Hindi

wiki hi/wiki

Language modeling for the Hindi language on Wikipedia. Hindi, or more precisely Modern Standard Hindi, is an Indo-Aryan language spoken in India. Hindi

Hungarian

cc100 hu/cc100

Language modeling for the Hungarian language on the CC-100 corpus. Hungarian

wiki hu/wiki

Language modeling for the Hungarian language on Wikipedia. Hungarian

Icelandic

cc100 is/cc100

Language modeling for the Icelandic language on the CC-100 corpus. Icelandic

wiki is/wiki

Language modeling for the Icelandic language on Wikipedia. Icelandic is a North Germanic language spoken by about 314,000 people, the vast majority of whom live in Iceland where it is the national... Icelandic

Ilocano

wiki ilo/wiki

Language modeling for the Ilocano language on Wikipedia. Ilocano (also Ilokano) is an Austronesian language spoken in the Philippines. Ilocano

Indonesian

cc100 id/cc100

Language modeling for the Indonesian language on the CC-100 corpus. Indonesian

wiki id/wiki

Language modeling for the Indonesian language on Wikipedia. Indonesian is the official language of Indonesia. It is a standardised variety of Malay, an Austronesian language that has been used as... Indonesian

Irish

cc100 ga/cc100

Language modeling for the Irish language on the CC-100 corpus. Irish

Italian

cc100 it/cc100

Language modeling for the Italian language on the CC-100 corpus. Italian

wiki it/wiki

Language modeling for the Italian language on Wikipedia. Italian or lingua italiana, is a Romance language of the Indo-European language family. Italian descended from the Vulgar Latin of the... Italian

Japanese

cc100 ja/cc100

Language modeling for the Japanese language on the CC-100 corpus. Japanese

wiki ja/wiki

Language modeling for the Japanese language on Wikipedia. Japanese is an East Asian language spoken by about 128 million people, primarily in Japan, where it is the national language. Japanese

Javanese

wiki jv/wiki

Language modeling for the Javanese language on Wikipedia. Javanese is the language of the Javanese people from the central and eastern parts of the island of Java, in Indonesia. Javanese

Kannada

cc100 kn/cc100

Language modeling for the Kannada language on the CC-100 corpus. Kannada

Kazakh

cc100 kk/cc100

Language modeling for the Kazakh language on the CC-100 corpus. Kazakh

Kirghiz

cc100 ky/cc100

Language modeling for the Kirghiz language on the CC-100 corpus. Kirghiz

Komi

wiki kv/wiki

Language modeling for the Komi language on Wikipedia. Komi

Korean

cc100 ko/cc100

Language modeling for the Korean language on the CC-100 corpus. Korean

wiki ko/wiki

Language modeling for the Korean language on Wikipedia. The Korean language is an East Asian language spoken by about 77 million people. Korean

Kurdish

cc100 ku/cc100

Language modeling for the Kurdish language on the CC-100 corpus. Kurdish

Lao

cc100 lo/cc100

Language modeling for the Lao language on the CC-100 corpus. Lao

Latin

cc100 la/cc100

Language modeling for the Latin language on the CC-100 corpus. Latin

wiki la/wiki

Language modeling for the Latin language on Wikipedia. Latin

Latvian

cc100 lv/cc100

Language modeling for the Latvian language on the CC-100 corpus. Latvian

wiki lv/wiki

Language modeling for the Latvian language on Wikipedia. Latvian

Lithuanian

cc100 lt/cc100

Language modeling for the Lithuanian language on the CC-100 corpus. Lithuanian

Macedonian

cc100 mk/cc100

Language modeling for the Macedonian language on the CC-100 corpus. Macedonian

Malay

cc100 ms/cc100

Language modeling for the Malay language on the CC-100 corpus. Malay

Malayalam

cc100 ml/cc100

Language modeling for the Malayalam language on the CC-100 corpus. Malayalam

Marathi

cc100 mr/cc100

Language modeling for the Marathi language on the CC-100 corpus. Marathi

Meadow Mari

wiki mhr/wiki

Language modeling for the Meadow Mari language on Wikipedia. Meadow Mari or Meadow-Eastern Mari or Eastern Mari is a standardised dialect of the Mari language used by about half a million people... Meadow Mari

Min Dong

wiki cdo/wiki

Language modeling for the Min Dong language on Wikipedia. Eastern Min or Min Dong, is a branch of the Min group of Sinitic languages of China. Min Dong

Mingrelian

wiki xmf/wiki

Language modeling for the Mingrelian language on Wikipedia. Megrelian is a Kartvelian language spoken in Western Georgia (regions of Samegrelo and Abkhazia), primarily by the Megrelians. Mingrelian

Mongolian

cc100 mn/cc100

Language modeling for the Mongolian language on the CC-100 corpus. Mongolian

Māori

wiki mi/wiki

Language modeling for the Māori language on Wikipedia. Māori, also known as te reo ('the language'), is an Eastern Polynesian language spoken by the Māori people, the indigenous population of New Zealand. Māori

Nepali

cc100 ne/cc100

Language modeling for the Nepali language on the CC-100 corpus. Nepali

Northern Sami

wiki se/wiki

Language modeling for the Northern Sami language on Wikipedia. Northern Sami

Norwegian

cc100 no/cc100

Language modeling for the Norwegian language on the CC-100 corpus. Norwegian

Oriya

cc100 or/cc100

Language modeling for the Oriya language on the CC-100 corpus. Oriya

Pashto

cc100 ps/cc100

Language modeling for the Pashto language on the CC-100 corpus. Pashto

Persian

cc100 fa/cc100

Language modeling for the Persian language on the CC-100 corpus. Persian

wiki fa/wiki

Language modeling for the Persian language on Wikipedia. Persian

Polish

cc100 pl/cc100

Language modeling for the Polish language on the CC-100 corpus. Polish

Portuguese

cc100 pt/cc100

Language modeling for the Portuguese language on the CC-100 corpus. Portuguese

wiki pt/wiki

Language modeling for the Portuguese language on Wikipedia. Portuguese

Punjabi

cc100 pa/cc100

Language modeling for the Punjabi language on the CC-100 corpus. Punjabi

Quechua

wiki qu/wiki

Language modeling for the Quechuan language on Wikipedia. Quechua, usually called Runasimi ("people's language") in Quechuan languages, is an indigenous language family spoken by the Quechua... Quechua

Romanian

cc100 ro/cc100

Language modeling for the Romanian language on the CC-100 corpus. Romanian

Russian

cc100 ru/cc100

Language modeling for the Russian language on the CC-100 corpus. Russian

wiki ru/wiki

Language modeling for the Russian language on Wikipedia. Russian is an East Slavic language, which is an official language in Russia, Belarus, Kazakhstan, Kyrgyzstan, as well as being widely used... Russian

Sanskrit

cc100 sa/cc100

Language modeling for the Sanskrit language on the CC-100 corpus. Sanskrit

Serbian

cc100 sr/cc100

Language modeling for the Serbian language on the CC-100 corpus. Serbian

Sinhala

cc100 si/cc100

Language modeling for the Sinhala language on the CC-100 corpus. Sinhala

Slovak

cc100 sk/cc100

Language modeling for the Slovak language on the CC-100 corpus. Slovak

Slovenian

cc100 sl/cc100

Language modeling for the Slovenian language on the CC-100 corpus. Slovenian

Somali

cc100 so/cc100

Language modeling for the Somali language on the CC-100 corpus. Somali

Spanish

cc100 es/cc100

Language modeling for the Spanish language on the CC-100 corpus. Spanish

wiki es/wiki

Language modeling for the Spanish language on Wikipedia. Spanish, or Castilian, is a Romance language that originated in the Iberian Peninsula and today is a global language with more than 483... Spanish

Swahili

cc100 sw/cc100

Language modeling for the Swahili language on the CC-100 corpus. Swahili

wiki sw/wiki

Language modeling for the Swahili language on Wikipedia. Swahili, also known by its native name Kiswahili, is a Bantu language and the first language of the Swahili people. Swahili

Swedish

cc100 sv/cc100

Language modeling for the Swedish language on the CC-100 corpus. Swedish

Tagalog

cc100 tl/cc100

Language modeling for the Tagalog language on the CC-100 corpus. Tagalog

Tamil

cc100 ta/cc100

Language modeling for the Tamil language on the CC-100 corpus. Tamil

wiki ta/wiki

Language modeling for the Tamil language on Wikipedia. Tamil, About this soundpronunciation is a Dravidian language predominantly spoken by the Tamil people of India and Sri Lanka, and by the... Tamil

Telugu

cc100 te/cc100

Language modeling for the Telugu language on the CC-100 corpus. Telugu

Thai

cc100 th/cc100

Language modeling for the Thai language on the CC-100 corpus. Thai

wiki th/wiki

Language modeling for the Thai language on Wikipedia. Thai, Central Thai, is the official language of Thailand and the first language of the Central Thai people. Thai

Turkish

cc100 tr/cc100

Language modeling for the Turkish language on the CC-100 corpus. Turkish

wiki tr/wiki

Language modeling for the Turkish language on Wikipedia. Turkish, also referred to as Istanbul Turkish or Turkey Turkish, is the most widely spoken of the Turkic languages, with around 70 to 80... Turkish

Turkmen

wiki tk/wiki

Language modeling for the Turkmen language on Wikipedia. Turkmen is an Oghuz Turkic language spoken by the Turkmens of Central Asia and the official language of Turkmenistan. Turkmen

Ukrainian

cc100 uk/cc100

Language modeling for the Ukrainian language on the CC-100 corpus. Ukrainian

Urdu

cc100 ur/cc100

Language modeling for the Urdu language on the CC-100 corpus. Urdu

Uzbek

cc100 uz/cc100

Language modeling for the Uzbek language on the CC-100 corpus. Uzbek

Vietnamese

cc100 vi/cc100

Language modeling for the Vietnamese language on the CC-100 corpus. Vietnamese

wiki vi/wiki

Language modeling for the Vietnamese language on Wikipedia. Vietnamese is an Austroasiatic language that originated in Vietnam, where it is the national and official language. Spoken natively by... Vietnamese