Cohort Search API

Constructing a request payload for the Cohort Search API


Contents of the API Request Payload

In general, the API request payload is defined as a JSON containing an array of individual filter definitions. A filter definition can contain one or more filters, each specifying one concept to filter on. Filters can be combined using both AND and OR logic, allowing for complex queries to be developed that include multiple criteria for a patient.

Each individual filter configuration has three components:

  • column - the element to be filtered on
  • operator - the logical operator for the statement
  • value - the value to search for in that element

Acceptable values for column and operator are listed below.

Within a single request, multiple filters can be applied to entities at the same time. Filters within the same config block will be applied at the row level. The operation to be applied to these filters is specified in the operator field that exists outside of the filter array.

For example, the following config segment would look for patients with a diagnosis of "COPD" in their record that also has a clinical status of "Active"

{
  "config": [
    {
      "column": "concept_name",
      "operator": "=",
      "value": "COPD"
    },
    {
      "column": "clinical_status",
      "operator": "=",
      "value": "active"
    }
  ],
  "operator": "and",
  "type": "include"
}

Multiple filters can also be applied across rows by providing multiple config blocks within a single filters array. By default, these are applied as AND operators. The JSON below shows a filter definition for a patient who has both "COPD" and "Asthma" identified as concepts, where both have a clinical_status of "active".

{
  "filters": [
    {
      "config": [
        {
          "column": "concept_name",
          "operator": "=",
          "value": "COPD"
        },
        {
          "column": "clinical_status",
          "operator": "=",
          "value": "active"
        }
      ],
      "operator": "and",
      "type": "include"
    },
    {
      "config": [
        {
          "column": "concept_name",
          "operator": "=",
          "value": "Asthma"
        },
        {
          "column": "clinical_status",
          "operator": "=",
          "value": "active"
        }
      ],
      "operator": "and",
      "type": "include"
    }
  ]
}

If an OR operation is required across rows, users can make use of "parent filter" capability within the API. To do this, first define a parent filter by assigning it a UUID and an empty config array (Python allows you to use uuid4() to generate one). Then, define two or more additional filters within the filters array, each with parent_filter_uuid set to the uuid you chose for the first filter. Finally, set the operator field in the first filter definition to "or". You will then have a nested set of filters that will be applied with the OR operator. Below shows an example of a query that looks for patients with either "COPD" or "Asthma" listed as an active condition.

{
  "filters": [
    {
      "uuid": "1dcf1e11-dd5d-4753-b3bd-a1c4a0b73585",
      "config": [],
      "operator": "or",
      "type": "include"
    },
    {
      "uuid": "3bbe5b9b-496d-44de-a38e-a77cec60b937",
      "parent_filter_uuid": "1dcf1e11-dd5d-4753-b3bd-a1c4a0b73585",
      "config": [
        {
          "column": "concept_name",
          "operator": "=",
          "value": "COPD"
        },
        {
          "column": "clinical_status",
          "operator": "=",
          "value": "active"
        }
      ],
      "operator": "and",
      "type": "include"
    },
    {
      "uuid": "fe04ff8f-e018-41dc-997b-d711d7c595c7",
      "parent_filter_uuid": "1dcf1e11-dd5d-4753-b3bd-a1c4a0b73585",
      "config": [
        {
          "column": "concept_name",
          "operator": "=",
          "value": "Asthma"
        },
        {
          "column": "clinical_status",
          "operator": "=",
          "value": "active"
        }
      ],
      "operator": "and",
      "type": "include"
    }
  ]
}

Defining Filters

Within the API, filters may be specified using any of the following:

  • concept_name - the UMLS display name of a concept
  • timestamp [timestamp] - the time at which the data was entered into the source system
  • cui - the UMLS CUI for a concept
  • semantic_group - the semantic group of a concept defined in the UMLS
  • semantic_type - the semantic type of a concept defined in the UMLS
  • clinical_status - the status of a concept from a clinical perspective
  • verification_status - status indicating whether the concept was verified
  • parent_concept_cui - the CUI of a concept's parent in the UMLS
  • parent_concept_name - the name of a concept's parent in the UMLS
  • parent_semantic_group - the semantic group of a concept's parent in the UMLS
  • parent_semantic_types [array] - the semantic type(s) of a concept's parent in the UMLS
  • second_parent_concept_cui - the CUI of the parent two levels above a concept in the UMLS
  • second_parent_concept_name - the name of the parent two levels above a concept in the UMLS
  • second_parent_semantic_group - the semantic group of the parent two levels above a concept in the UMLS
  • second_parent_semantic_types [array] - the semantic type(s) of the parent two levels above a concept in the UMLS

The list of valid operators for any single filter definition is as follows:

  • "=" - equal to
  • "is_a" - equal to or child of (applies specifically to to CUI column search)
  • "<" - less than
  • ">" - greater than
  • "<=" - less than or equal to
  • ">=" - greater than or equal to
  • "!=" - not equal to
  • "in" - is included in (list)
  • "not in" - not included in (list)
  • "ilike" - is like pattern (pattern can be an exact strin, or can include % and _ wildcards to specify character matches)

Note that all filter values should be provided as strings unless otherwise specified in the list above. Timestamps should follow the format "YYYY-MM-DD" and should be wrapped in double quotes in the payload.

Sample Payload

The below shows a sample payload which can be used to request a cohort of patients with both COPD and Asthma listed as a condition.

{
  "page": 0,
  "limit": 10,
  "concepts_page": 0,
  "concepts_limit": 10,
  "filters": [
    {
      "config": [
        {
          "column": "concept_name",
          "operator": "=",
          "value": "COPD"
        },
        {
          "column": "clinical_status",
          "operator": "=",
          "value": "active"
        }
      ],
      "operator": "and",
      "type": "include"
    },
    {
      "config": [
        {
          "column": "concept_name",
          "operator": "=",
          "value": "Asthma"
        },
        {
          "column": "clinical_status",
          "operator": "=",
          "value": "active"
        }
      ],
      "operator": "and",
      "type": "include"
    }
  ]
}

It is understood that in clinical text, there is often variation in the way terms are documented, and the specificity with which a physician will choose to write a specific term. For example, a physician may simply refer to Type 2 Diabetes Mellitus as "T2DM" in a clinical note. Alternatively, if they are referencing a specific manifestation of uncontrolled diabetes, they may get as specific as writing "Neuropathic ulcer of midfoot due to type 2 diabetes mellitus", which in the UMLS is a child term of "Diabetes Mellitus Type 2". Now, when searching for all patients with Type 2 Diabetes Mellitus, the API provides a way for all levels of specificity to be surfaced in the response, ensuring a more complete result. This capability is referred to as Hierarchical Search.

To include Hierarchical Search in a query, users must leverage the "is_a" operator in their API query when specifying the CUI to search for. A sample query depicting this is shown below. Note that C0011860 is the UMLS CUI for Type 2 Diabetes Mellitus.

{
  "page": 0,
  "limit": 10,
  "concepts_page": 0,
  "concepts_limit": 10,
  "filters": [
    {
       "type": "include",
      "operator": "and",
      "config": [
        {
          "column": "cui",
          "operator": "is_a",
          "value": "C0011860" 
        },
        {
          "column": "clinical_status",
          "operator": "=",
          "value": "active"
        }
      ]
    }
  ]
}

This type of search can be combined with any of the other capabilities referenced previously in this documentation, allowing for users full flexibility into how they search.

Note: Verto has worked hard to constrain the UMLS hierarchy to terms that are useful in a clinical context. However, there are cases where the breadth of child concepts for a given CUI are broader than a use case demands. In this case, it is recommended that the user request all child terms from the API, and then filter for the specific terms of interest once the result is returned based on concept name, CUI, or a combination of the two.

Edit on GitHub