TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

irc_disentanglement

Description:

IRC Disentanglement dataset contains over 77,563 messages from Ubuntu IRC channel.

Features include message id, message text and timestamp. Target is list of messages that current message replies to. Each record contains a list of messages from one day of IRC chat.

Additional Documentation: Explore on Papers With Code
Homepage: https://jkk.name/irc-disentanglement
Source code: tfds.datasets.irc_disentanglement.Builder
Versions:
- 2.0.0 (default): No release notes.
Download size: 113.53 MiB
Dataset size: 26.59 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	10
`'train'`	153
`'validation'`	10

Feature structure:

FeaturesDict({
    'day': Sequence({
        'id': Text(shape=(), dtype=string),
        'parents': Sequence(Text(shape=(), dtype=string)),
        'text': Text(shape=(), dtype=string),
        'timestamp': Text(shape=(), dtype=string),
    }),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
day	Sequence
day/id	Text		string
day/parents	Sequence(Text)	(None,)	string
day/text	Text		string
day/timestamp	Text		string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@InProceedings{acl19disentangle,
  author    = {Jonathan K. Kummerfeld and Sai R. Gouravajhala and Joseph Peper and Vignesh Athreya and Chulaka Gunasekara and Jatin Ganhotra and Siva Sankalp Patel and Lazaros Polymenakos and Walter S. Lasecki},
  title     = {A Large-Scale Corpus for Conversation Disentanglement},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  location  = {Florence, Italy},
  month     = {July},
  year      = {2019},
  doi       = {10.18653/v1/P19-1374},
  pages     = {3846--3856},
  url       = {https://aclweb.org/anthology/papers/P/P19/P19-1374/},
  arxiv     = {https://arxiv.org/abs/1810.11118},
  software  = {https://jkk.name/irc-disentanglement},
  data      = {https://jkk.name/irc-disentanglement},
}

irc_disentanglement Stay organized with collections Save and categorize content based on your preferences.

irc_disentanglement