How to guide#

To get started you will need to have concepCy installed along with a pre-trained spaCy model.

To do so, run the following:

pip3 install concepcy
python3 -m spacy download en_core_web_sm

Now that we are all set, let’s get down to the nitty-gritty!

Basic usage#

In this first example we will use SpaCy’s en_core_web_sm model and the default configuration of the concepcy extension.

[1]:

import spacy
import concepcy

[2]:

nlp = spacy.load("en_core_web_sm")

# let us add the concepCy pipe to the current pipeline
nlp.add_pipe("concepcy");

Let us check that the ConcepCyComponent has successfully been added to our pipeline

[3]:

print(nlp.pipe_names)

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'concepcy']

Success! Now we are ready to enrich our documents with semantical information!

[4]:

doc = nlp("Joe Manchin announces surprise deal on climate, health care and tax package")

Let us explore the relatively general relation RelatedTo contained in the whole document.

[5]:

for word, relations in doc._.relatedto.items():
    print("\n----------------------------")
    print(f"Word: '{word}'")
    for rel in relations:
        print(rel["text"])


----------------------------
Word: 'surprise'
[[surprise]] is related to [[shock]]
[[surprise]] is related to [[party]]
[[surprise]] is related to [[unexpected]]
[[shock]] is related to [[surprise]]
[[surprise]] is related to [[birthday]]
[[surprise]] is related to [[birthday party]]
[[party]] is related to [[surprise]]
[[surprise]] is related to [[emotion]]

----------------------------
Word: 'deal'
[[deal]] is related to [[cards]]
[[deal]] is related to [[agreement]]
[[deal]] is related to [[bargain]]
[[offer]] is related to [[deal]]
[[deal]] is related to [[transaction]]

----------------------------
Word: 'climate'
[[weather]] is related to [[climate]]

----------------------------
Word: 'health'
[[health]] is related to [[being]]
[[health]] is related to [[well]]
None

----------------------------
Word: 'care'
[[care]] is related to [[love]]
[[care]] is related to [[loving]]
[[care]] is related to [[concern]]
[[care]] is related to [[after]]
[[care]] is related to [[tend]]
[[care]] is related to [[tender]]
None

----------------------------
Word: 'tax'
[[tax]] is related to [[government]]
[[tax]] is related to [[money]]
[[tax]] is related to [[payment]]
[[tax]] is related to [[income]]
[[tax]] is related to [[fee]]
[[tax]] is related to [[irs]]
[[tax]] is related to [[april]]
[[tax]] is related to [[revenue]]
[[tax]] is related to [[levy]]
[[tax]] is related to [[charge]]
[[tax]] is related to [[government money]]
[[tax]] is related to [[pay]]
[[tax]] is related to [[government payment]]
[[tax]] is related to [[government fee]]

----------------------------
Word: 'package'
[[parcel]] is related to [[package]]

We are able to retrieve the most relevant list of relations to the words present in our text.

One can notice that some words are missing. Indeed, some words might not be related to any other node from the ConceptNet base. Also, to diminish noise we have filtered out stop words, punctuation and named entities from being enriched with semantical information.

Custom configuration#

The ConcepcyComponent allows you to only select the relations (see available relations here) you are interested in as well as filter out edges that are not trustworthy.

Let’s assume we interested in the retrieving causal relations and that we consider trustworthy edges with a weight greater than 0.0

[6]:

import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "concepcy",
    config={
        "relations_of_interest": ["Causes"],
        "filter_missing_text": True,
        "filter_edge_weight": 1,
    }
);

Let us reuse the same document and access the semantical information at a word level.

[7]:

doc = nlp("Joe Manchin announces surprise deal on climate, health care and tax package")

[8]:

for word in doc:
    print("\n----------------------------")
    print(f"Word: '{word}'")
    for rel in word._.causes:
        print(rel["text"])


----------------------------
Word: 'Joe'

----------------------------
Word: 'Manchin'

----------------------------
Word: 'announces'

----------------------------
Word: 'surprise'
The effect of [[opening a gift]] is [[surprise]].
Sometimes [[seeing something new]] causes [[is surprise]]

----------------------------
Word: 'deal'

----------------------------
Word: 'on'

----------------------------
Word: 'climate'

----------------------------
Word: ','

----------------------------
Word: 'health'
Something that might happen as a consequence of [[eating vegetables]] is [[health]]
The effect of [[cleaning]] is [[health]]

----------------------------
Word: 'care'

----------------------------
Word: 'and'

----------------------------
Word: 'tax'

----------------------------
Word: 'package'

And that’s a wrap!

If you have any ideas on how to improve the user experience or any other features that would be nice to have, feel free to open issue.