컨셉넷5

참고자료:

컨셉넷5

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:conceptnet5/conceptnet5')
  • 설명 :
\ This dataset is designed to provide training data
for common sense relationships pulls together from various
sources. 

The dataset is multi-lingual. See langauge codes and language info
here: https://github.com/commonsense/conceptnet5/wiki/Languages


This dataset provides an interface for the conceptnet5 csv file, and
some (but not all) of the raw text data used to build conceptnet5:
omcsnet_sentences_free.txt, and omcsnet_sentences_more.txt.

One use of this dataset would be to learn to extract the conceptnet
relationship from the omcsnet sentences.

Conceptnet5 has 34,074,917 relationships. Of those relationships,
there are 2,176,099 surface text sentences related to those 2M
entries.

omcsnet_sentences_free has 898,161 lines. omcsnet_sentences_more has
2,001,736 lines.

Original downloads are available here
https://github.com/commonsense/conceptnet5/wiki/Downloads. For more
information, see: https://github.com/commonsense/conceptnet5/wiki

The omcsnet data comes with the following warning from the authors of
the above site: 

Remember: this data comes from various forms of
crowdsourcing. Sentences in these files are not necessarily true,
useful, or appropriate.
  • 라이선스 : 이 저작물에는 Commonsense Computing Initiative에서 편집한 ConceptNet 5의 데이터가 포함되어 있습니다. ConceptNet 5는 http://conceptnet.io 에서 Creative Commons Attribution-ShareAlike 라이센스(CC BY SA 3.0)에 따라 무료로 사용할 수 있습니다.

포함된 데이터는 Commonsense Computing 프로젝트, Wikimedia 프로젝트, DBPedia, OpenCyc, Games with a Purpose, Princeton University의 WordNet, Francis Bond의 Open Multilingual WordNet 및 Jim Breen의 JMDict 기여자에 의해 생성되었습니다.

그 외 다양한 라이센스가 있습니다. 참조: https://github.com/commonsense/conceptnet5/wiki/Copying-and-sharing-ConceptNet

  • 버전 : 5.7.0
  • 분할 :
나뉘다
'train' 34074917
  • 특징 :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "full_rel": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "rel": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "arg1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "arg2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "lang": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "extra_info": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "weight": {
        "dtype": "float32",
        "id": null,
        "_type": "Value"
    }
}

omcs_sentences_free

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:conceptnet5/omcs_sentences_free')
  • 설명 :
\ This dataset is designed to provide training data
for common sense relationships pulls together from various
sources. 

The dataset is multi-lingual. See langauge codes and language info
here: https://github.com/commonsense/conceptnet5/wiki/Languages


This dataset provides an interface for the conceptnet5 csv file, and
some (but not all) of the raw text data used to build conceptnet5:
omcsnet_sentences_free.txt, and omcsnet_sentences_more.txt.

One use of this dataset would be to learn to extract the conceptnet
relationship from the omcsnet sentences.

Conceptnet5 has 34,074,917 relationships. Of those relationships,
there are 2,176,099 surface text sentences related to those 2M
entries.

omcsnet_sentences_free has 898,161 lines. omcsnet_sentences_more has
2,001,736 lines.

Original downloads are available here
https://github.com/commonsense/conceptnet5/wiki/Downloads. For more
information, see: https://github.com/commonsense/conceptnet5/wiki

The omcsnet data comes with the following warning from the authors of
the above site: 

Remember: this data comes from various forms of
crowdsourcing. Sentences in these files are not necessarily true,
useful, or appropriate.
  • 라이선스 : 이 저작물에는 Commonsense Computing Initiative에서 편집한 ConceptNet 5의 데이터가 포함되어 있습니다. ConceptNet 5는 http://conceptnet.io 에서 Creative Commons Attribution-ShareAlike 라이센스(CC BY SA 3.0)에 따라 무료로 사용할 수 있습니다.

포함된 데이터는 Commonsense Computing 프로젝트, Wikimedia 프로젝트, DBPedia, OpenCyc, Games with a Purpose, Princeton University의 WordNet, Francis Bond의 Open Multilingual WordNet 및 Jim Breen의 JMDict 기여자에 의해 생성되었습니다.

그 외 다양한 라이센스가 있습니다. 참조: https://github.com/commonsense/conceptnet5/wiki/Copying-and-sharing-ConceptNet

  • 버전 : 5.7.0
  • 분할 :
나뉘다
'train' 898160
  • 특징 :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "raw_data": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "lang": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

omcs_sentences_more

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:conceptnet5/omcs_sentences_more')
  • 설명 :
\ This dataset is designed to provide training data
for common sense relationships pulls together from various
sources. 

The dataset is multi-lingual. See langauge codes and language info
here: https://github.com/commonsense/conceptnet5/wiki/Languages


This dataset provides an interface for the conceptnet5 csv file, and
some (but not all) of the raw text data used to build conceptnet5:
omcsnet_sentences_free.txt, and omcsnet_sentences_more.txt.

One use of this dataset would be to learn to extract the conceptnet
relationship from the omcsnet sentences.

Conceptnet5 has 34,074,917 relationships. Of those relationships,
there are 2,176,099 surface text sentences related to those 2M
entries.

omcsnet_sentences_free has 898,161 lines. omcsnet_sentences_more has
2,001,736 lines.

Original downloads are available here
https://github.com/commonsense/conceptnet5/wiki/Downloads. For more
information, see: https://github.com/commonsense/conceptnet5/wiki

The omcsnet data comes with the following warning from the authors of
the above site: 

Remember: this data comes from various forms of
crowdsourcing. Sentences in these files are not necessarily true,
useful, or appropriate.
  • 라이선스 : 이 저작물에는 Commonsense Computing Initiative에서 편집한 ConceptNet 5의 데이터가 포함되어 있습니다. ConceptNet 5는 http://conceptnet.io 에서 Creative Commons Attribution-ShareAlike 라이센스(CC BY SA 3.0)에 따라 무료로 사용할 수 있습니다.

포함된 데이터는 Commonsense Computing 프로젝트, Wikimedia 프로젝트, DBPedia, OpenCyc, Games with a Purpose, Princeton University의 WordNet, Francis Bond의 Open Multilingual WordNet 및 Jim Breen의 JMDict 기여자에 의해 생성되었습니다.

그 외 다양한 라이센스가 있습니다. 참조: https://github.com/commonsense/conceptnet5/wiki/Copying-and-sharing-ConceptNet

  • 버전 : 5.7.0
  • 분할 :
나뉘다
'train' 2001735
  • 특징 :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "raw_data": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "lang": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}