অস্কার

তথ্যসূত্র:

unshuffled_deduplicated_af

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_af')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 130640
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_als

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_als')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4518
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_arz

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_arz')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 79928
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_an

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_an')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2025
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ast

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ast')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5343
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ba

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ba')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 27050
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_am

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_am')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 43102
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_as

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_as')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9212
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_azb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_azb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9985
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_be

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_be')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 307405
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_bo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 15762
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_bxr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bxr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 36
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ceb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ceb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 26145
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_az

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_az')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 626796
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_bcl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bcl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_cy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 98225
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_dsb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_dsb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 37
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_bn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1114481
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_bs

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bs')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 702
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ce

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ce')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2984
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_cv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 10130
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_diq

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_diq')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_eml

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eml')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 80
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_et

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_et')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1172041
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_bg

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bg')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' ৩৩৯৮৬৭৯
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_bpy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bpy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1770
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ca

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ca')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2458067
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ckb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ckb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 68210
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ar

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ar')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9006977
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_av

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_av')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
    • লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 360
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_bar

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bar')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_bh

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bh')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 82
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_br

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_br')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 14724
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_cbk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cbk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_da

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_da')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4771098
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_dv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_dv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 17024
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_eo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 84752
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplatic_fa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 8203495
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_fy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 20661
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_gn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 68
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_cs

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cs')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 12308039
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_hi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1909387
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplatic_hu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 6582908
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_ie

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ie')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 11
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_fr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 59448891
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_gd

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gd')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3883
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_gu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 169834
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicate_hsb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hsb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3084
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_ia

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ia')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 529
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplatic_io

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_io')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 617
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_jbo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_jbo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 617
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_km

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_km')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 108346
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplatic_ku

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ku')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 29054
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_la

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_la')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 18808
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_lmo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lmo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1374
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_lv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 843195
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplated_min

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_min')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 166
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_mr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 212556
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplicated_mwl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mwl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
    • লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
    • আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।

    আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

uncuffled_deduplatic_nah

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nah')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।

    আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 58
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_new

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_new')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2126
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_oc

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_oc')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 6485
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_pam

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pam')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ps

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ps')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 67921
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_it

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_it')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 28522082
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ka

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ka')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 372158
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ro

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ro')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5044757
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_scn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_scn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 17
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ko

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ko')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3675420
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_kw

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kw')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 68
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_lez

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lez')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1381
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_lrc

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lrc')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 72
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mg

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mg')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 13343
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ml

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ml')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 453904
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ms

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ms')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 183443
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_myv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_myv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_nds

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nds')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 8714
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_nn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 109118
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_os

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_os')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2559
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_pms

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pms')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2859
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_qu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_qu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 411
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7121
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2820821
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sh

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sh')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 17610
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_so

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_so')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 42
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 645747
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ta

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ta')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 833101
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_tk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4694
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_tyv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tyv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 24
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_uz

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_uz')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 15074
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_wa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_wa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 677
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_xmf

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_xmf')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2418
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 11014487
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_tg

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tg')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 56259
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_de

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_de')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 62398034
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_tr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 11596446
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_el

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_el')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 6521169
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_uk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_uk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7782375
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_vi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9897709
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_wuu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_wuu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 64
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_yo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 49
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_als

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_als')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7324
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_arz

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_arz')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 158113
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_az

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_az')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 912330
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bcl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bcl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1675515
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bs

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bs')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2143
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ce

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ce')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4042
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_cv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_cv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 20281
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_diq

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_diq')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_eml

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_eml')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 84
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_et

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_et')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2093621
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_zh

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_zh')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 41708901
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_an

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_an')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2449
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ast

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ast')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 6999
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ba

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ba')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 42551
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bg

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bg')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5869686
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bpy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bpy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 6046
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ca

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ca')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4390754
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ckb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ckb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 103639
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_es

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_es')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 56326016
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_da

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_da')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7664010
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_dv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_dv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 21018
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_eo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_eo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 121168
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_fi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5326443
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ga

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ga')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 46493
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_gom

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gom')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 484
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_hr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 321484
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_hy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 396093
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ilo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ilo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1578
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_fa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_fa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 13704702
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_fy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_fy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 33053
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_gn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_gn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 106
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_hi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_hi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3264660
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_hu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_hu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 11197780
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ie

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ie')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 101
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ja

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ja')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 39496439
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_kk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 338073
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_krc

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_krc')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1377
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ky

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ky')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 86561
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_li

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_li')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 118
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_lt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1737411
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mhr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mhr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2515
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 197878
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 16383
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mzn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mzn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 917
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ne

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ne')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 219334
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_no

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_no')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3229940
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_pa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 87235
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_pnb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pnb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3463
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_rm

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_rm')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 34
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sah

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sah')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 8555
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_si

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_si')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 120684
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sq

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sq')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 461598
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sw

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sw')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 24803
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_th

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_th')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3749826
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_tt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 82738
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ur

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ur')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 428674
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_vo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3317
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_xal

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_xal')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 36
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_yue

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yue')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_am

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_am')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 83663
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_as

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_as')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 14985
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_azb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_azb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 15446
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_be

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_be')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 586031
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 26795
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bxr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bxr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 42
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ceb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ceb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 56248
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_cy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_cy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 157698
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_dsb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_dsb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 65
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_fr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_fr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 96742378
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_gd

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_gd')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5799
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_gu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_gu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 240691
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_hsb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_hsb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7959
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ia

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ia')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1040
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_io

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_io')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 694
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_jbo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_jbo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 832
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_km

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_km')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 159363
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ku

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ku')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 46535
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_la

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_la')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 94588
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_lmo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_lmo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1401
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_lv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_lv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1593820
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_min

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_min')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 220
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 326804
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mwl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mwl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 8
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_nah

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_nah')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 61
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_new

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_new')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4696
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_oc

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_oc')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 10709
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_pam

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_pam')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ps

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ps')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 98216
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ro

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ro')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9387265
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_scn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_scn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 21
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5492194
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1013619
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ta

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ta')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1263280
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_tk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_tk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 6456
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_tyv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_tyv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 34
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_uz

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_uz')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 27537
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_wa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_wa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1001
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_xmf

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_xmf')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3783
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_it

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_it')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 46981781
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ka

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ka')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 563916
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ko

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ko')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7345075
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_kw

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_kw')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 203
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_lez

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_lez')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1485
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_lrc

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_lrc')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' ৮৮
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mg

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mg')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 17957
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ml

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ml')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 603937
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ms

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ms')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 534016
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_myv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_myv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 6
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_nds

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_nds')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 18174
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_nn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_nn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 185884
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_os

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_os')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 5213
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_pms

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_pms')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 3225
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_qu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_qu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 452
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 14291
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sh

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sh')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 36700
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_so

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_so')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 156
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 17395625
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_tg

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_tg')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 89002
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_tr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_tr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 18535253
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_uk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_uk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 12973467
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_vi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_vi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 14898250
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_wuu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_wuu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 214
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_yo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_yo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 214
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_zh

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_zh')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 60137667
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_en

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_en')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 304230423
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_eu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 256513
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_frr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_frr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 7
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_gl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 284320
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_he

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_he')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2375030
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ht

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ht')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_id

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_id')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9948521
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_is

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_is')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 389515
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_jv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_jv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1163
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_kn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 251064
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_kv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 924
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_lb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 21735
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_lo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 32652
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mai

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mai')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 25
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 299457
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_mrj

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mrj')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' ৬৬৯
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_my

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_my')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 136639
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_nap

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nap')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 55
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_nl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 20812149
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_or

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_or')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 44230
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_pl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 20682611
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_pt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 26920397
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ru

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ru')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 115954598
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sd

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sd')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' ৩৩৯২৫
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_sl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 886223
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_su

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_su')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 511
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_te

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_te')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 312644
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_tl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 294132
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_ug

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ug')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 15503
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_vec

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vec')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 64
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_war

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_war')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 9161
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_deduplicated_yi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 32919
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_af

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_af')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 201117
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ar

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ar')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 16365602
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_av

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_av')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 456
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bar

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bar')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 4
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_bh

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_bh')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 336
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_br

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_br')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 37085
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_cbk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_cbk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 1
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_cs

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_cs')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 21001388
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_de

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_de')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 104913504
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_el

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_el')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 10425596
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_es

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_es')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 88199221
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_fi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_fi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 8557453
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ga

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ga')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 83223
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_gom

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_gom')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 640
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_hr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_hr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 582219
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_hy

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_hy')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 659430
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ilo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ilo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 2638
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ja

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ja')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 62721527
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_kk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_kk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 524591
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_krc

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_krc')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 1581
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ky

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ky')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 146993
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_li

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_li')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 137
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_lt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_lt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 2977757
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mhr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mhr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 3212
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 395605
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 26598
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mzn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mzn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 1055
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ne

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ne')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 299938
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_no

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_no')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 5546211
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_pa

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_pa')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 127467
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_pnb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_pnb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 4599
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_rm

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_rm')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 41
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sah

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sah')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 22301
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_si

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_si')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 203082
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sq

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sq')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 672077
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sw

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sw')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 41986
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_th

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_th')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 6064129
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_tt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_tt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 135923
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ur

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ur')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 638596
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_vo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_vo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 3366
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_xal

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_xal')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 39
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_yue

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_yue')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 11
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_en

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_en')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 455994980
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_eu

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_eu')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 506883
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_frr

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_frr')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 7
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_gl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_gl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 544388
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_he

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_he')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 3808397
  • বৈশিষ্ট্য :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ht

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ht')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 13
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_id

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_id')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 16236463
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_is

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_is')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 625673
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_jv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_jv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 1445
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_kn

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_kn')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 350363
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_kv

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_kv')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 1549
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_lb

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_lb')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 34807
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_lo

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_lo')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 52910
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mai

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mai')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 123
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mk

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mk')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 437871
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_mrj

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_mrj')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 757
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_my

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_my')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 232329
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_nap

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_nap')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 73
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_nl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_nl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 34682142
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_or

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_or')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 59463
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_pl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_pl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 35440972
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_pt

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_pt')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 42114520
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ru

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ru')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 161836003
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sd

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sd')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 44280
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_sl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_sl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 1746604
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_su

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_su')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

বিভক্ত উদাহরণ
'train' 805
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_te

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_te')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 475703
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_tl

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_tl')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 458206
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_ug

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_ug')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 22255
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_vec

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_vec')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 73
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_war

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_war')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 9760
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

unshuffled_original_yi

TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:

ds = tfds.load('huggingface:oscar/unshuffled_original_yi')
  • বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
  • License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.

    Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:

    • Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
    • Clearly identify the copyrighted work claimed to be infringed.
    • Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

    We will comply to legitimate requests by removing the affected sources from the next release of the corpus.

  • সংস্করণ : 1.0.0

  • বিভাজন :

Split উদাহরণ
'train' 59364
  • Features :
{
    "id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}