wukong documentation

wukong offers an ORM query engine for Solr and Solr Cloud.

Get Started

Define your document class

from wukong.models import SolrDoc

Class YourDocClass(SolrDoc):
    solr_hosts = "localhost:8080"
    collection_name = "my_solr_collection"

Fetch documents

docs = YourDocClass.documents.filter(firstname__eq="james").all()

Update documents

docs[0].firstname = "Jim"
docs[0].index() # single update

docs[1].firstname = "Smith"
docs.index() # batch update

Delete documents

docs[0].delete() # single delete
docs.delete() # batch delete

Create documents

YourDocClass.documents.create(id=1, firstname__eq="james", lastname__eq="bond")

Documentations

Solr Document Class

In order to connect to your Solr collection, we can just extend a base class called SolrDoc. You can specify four attributes in your class.

  • solr_hosts: the host name(s) for your Solr servers.
  • zookeeper_hosts: the host name(s) for your Zookeeper hosts which monitor Solr. (optional)
  • collection_name: the collection name for your collection in Solr.
  • request_timeout: in how many seconds to drop the request to Solr. (optional, defaults to 15)

For example, if you have a collection named User, you can do the following.

from wukong.models import SolrDoc

Class User(SolrDoc):
    solr_hosts = "localhost:8080"
    zookeeper_hosts = "localhost:2181"
    collection_name = "users"

def validate_schema_fields(self, fields):
    pass

def get_data_for_solr(self):
    pass

You can overide existing methods to fit your business logic, like validate_schema_fields, get_data_for_solr.

  • validate_schema_fields: return boolean to validate if the current document is consistent with the Solr Schema
  • get_data_for_solr: return a json format to send to Solr for indexing

If you have multiple collections, you can define a base class to define solr_hosts and zookeeper_hosts, and the subclasses to only specify the collection_name.

from wukong.models import SolrDoc

Class BaseDoc(SolrDoc):
    solr_hosts = "localhost:8080"
    zookeeper_hosts = "localhost:2181"

Class User(BaseDoc)
    collection_name = "users"

Class Car(BaseDoc)
    collection_name = "cars"

Documents Retrieval

Once you define your document class, you can use it to fetch documents in Solr.

Filtering

# fetch all documents whose name is james
User.documents.filter(name__eq="james").all()

# fetch all documents whose name is not james
User.documents.filter(name__ne="james").all()

# fetch all documents whose name has james as substring
User.documents.filter(name__wc="james").all()

# fetch all documents whose name doesn't have james as substring
User.documents.filter(name__nwc="james").all()

# fetch all documents whose age is greater to 30
User.documents.filter(age__g=30).all()

# fetch all documents whose age is less to 30
User.documents.filter(age__l=30).all()

# fetch all documents whose age is greater or equal to 30
User.documents.filter(age__ge=30).all()

# fetch all documents whose age is less or equal to 30
User.documents.filter(age__le=30).all()

# fetch all documents who lives in either in Ottawa or New York
User.documents.filter(city__in=['Ottawa', 'New York']).all()

# fetch all documents who lives in neither in Ottawa nor New York
User.documents.filter(city__nin=['Ottawa', 'New York']).all()

# fetch all documents whose has zip field
User.documents.filter(zip__ex=True).all()

# fetch all documents whose doesn't have zip field
User.documents.filter(zip__nex=True).all()

# fetch all documents whose age is less to 30 and live in Ottawa
User.documents.filter(age__l=30, city__eq="Ottawa").all()

# fetch all documents whose age is less to 30 or live in Ottawa
User.documents.filter(OR(age__l=30, city__eq="Ottawa")).all()

# fetch all documents whose age is less to 30 or live in Ottawa and also has zip field
User.documents.filter(AND(OR(age__l=30, city__eq="Ottawa"), zip__ex=True)).all()

# fetch all documents whose age is less to 30 and live in Ottawa
User.documents.filter(age__l=30).filter(city__eq="Ottawa").all()

Sorting

# fetch all documents sorted by age ascendingly
User.documents.sort_by('age').all()

# fetch all documents whose name is james sorted by age descendingly
User.documents.filter(name__eq="james").sort_by('-age').all()

Grouping

# group all documents by `gender` and fetch the groups
User.documents.group_by('gender').groups()

# group all documents by `gender` and `city` and fetch the groups
User.documents.group_by(['gender', 'city']).groups()

# group all documents by `gender` and `city` and get 3 documents in each group
User.documents.group_by(['gender', 'city'], group_limit=3).groups()

Faceting

# facet all documents by `gender` and fetch the facets
User.documents.facet_by('gender').facets()

# facet all documents by `gender` and `city` and fetch the facets
User.documents.facet_by(['gender', 'city']).facets()

# facet all documents by `gender` and `city` and fetch the facets at least having 10 docs
User.documents.facet_by(['gender', 'city'], mincount=10).groups()

Return Fields

# only fetch the fields (id and name) for each document
User.documents.only('id, 'name').all()

Raw Documents

# fetch all documents matched `james bond` and fetch a list of raw json rather than SolrDoc list
User.documents.search('james bond').raw()

Chained Query

# fetch the documents matching `james bond` and with age greater than 30, and get 100 documents starting from 200
User.documents.search('james bond').filter(age__g=30).offset(200).limit(100).all()

Document Creation

# Create a document in Solr
User.documents.create(id=12345, name="James Bond", city="London")

# Batch create within one request to Solr
docs = [
    User(id=12345, name="James Bond", city="London"),
    Entity(id=12346, name="Kate", city="New York")
    ...
]
docs = SolrDocs(docs)
docs.index()

Document Update

doc = User.documents.create(id=12345, name="James Bond", city="London")

# Update a document in Solr
doc.name = "Jim Bond"
doc.city = "Ottawa"
doc.index()

Document Delete

doc = User.documents.create(id=12345, name="James Bond", city="London")

# Update a document in Solr
doc.delete()

# Batch delete within one request to Solr
docs = [
    User(id=12345, name="James Bond", city="London"),
    Entity(id=12346, name="Kate", city="New York")
    ...
]
docs = SolrDocs(docs)
docs.delete()

Complex Query

# You can always use `User.solr.select` to build your custom query
User.solr.select({
    q: "it is complex",
    ...,
    ...,
    ...
})