Here are some ideas on how we task our ontology / conceptual model in our methodology.
An ontology-based approach to qualitative data analysis, such as we advocate, calls for expressing the implied conceptualisation of a field (in our case, archaeological digital practice, including aspects of cognition and motivation of actors that shape action in the practice, as well as the role of diverse, both conceptual and physical/digital, mediating tools) in terms of a formal specification which will capture: (a) the main entities in the field at the level of granularity that captures our intuitions on how the field is structured, and (b) their properties and relationships (and their properties) that are relevant to our conception of the field. Relationships include ontological (mainly specialisation/generalisation, thorugh a class hierarchy, as well as mereological through a part-of hierarchy if relevant) and semantic ones related to the specifics of our field (e.g. Activity has_actor Actor, Document has_format Format). This formal representation should allow simple stuff (evidence presented in an interview, an archaeological activity described or observed) to be expressed simply, more complex stuff to be represented in an analogously more complex manner, and should allow us to pose useful questions and make useful inferences about the kinds of questions we are asking.
While an ontology can be perfectly defined (in tools such as Protege) as a class hierarchy, and then semantic relationships are buried within properties in the property sheet of each class), most ontology builders will provide the important conceptual structures captured by the ontology in one or more "top level" diagrams. Examples are the "activities as meetings between actors, objects, place and timespans" diagram of CIDOC CRM, or the main structure and agency, procedure etc. perspective diagrams of Scholarly Ontology.
Using the model
The use of the model within E-CURATORS is twofold:
(a)It is a starting point for defining the QDA code system, i.e. simple regular hierarchy (taxonomy) of terms (codes, in QDA parlance). Typically, the main entity class names (the ones that define will become top-level codes in the hierarchy, and then subclass names *as well as* types become subcodes. The challenge here is to capture at top level the significant facets from entities in the domain, and to capture also relationships (e.g. with goals and motives) that may be hidden way down in the specialisation hierarchy.
(b)It is the reference model to construct a schema for the (graph) database that we want to create using Neo4J, to export and further process the coded-annotated data from MaxQDA. The main consideration for this is twofold:
Firstly, what kind of expressiveness can we expect from the coded qualitative data, i.e. what kinds of relationships between, say, Actors, Activities etc. will be retrievable from the (multiple) codings of segments of data so that we can export them in an Excel spreadsheet and import them to generate automatically graph data within Neo4J? We can use complex code searches for this, e.g. find the (overlapping, identical, contiguous) segments that are coded both with a specific code (and its subcodes) and another code (and its subcodes) and surmise that the two are linked through a relationship in the conceptual model. Some experimentation with a coded paragraph of text, examining the resulting xls output, may be helpful to determine the possibilities. One idea is to define a "container" code, e.g. "Context", which we can use to highlight/code a continuous longer segment of text that refers to the same activity, or topic, in a participant's interview, and then seek to just create an output xls with codes from different code facets/hierarchies, e.g. actors (people and collectivities), activities, archaeological entities, tools etc. that belong to each such context, and then recreate separate graphs for each individual context by imputing relationships between entities postulated by our conceptual model.
Secondly, what kinds of questions/queries we envisage interrogating the database with? These should be derived from our interest in substantive questions within E-CURATORS, on the structure of digitally-mediated / archaeology-related activities people engage with, on the identities motivations, and affiliations of people, on the goals stated for specific things they do, on the use of methods, routines, procedural knowledge etc. in accomplishing specific tasks, on the use and role of particular digital technologies, devices, tools, online infrastructures etc., on the problems and issues encountered by people, and the ways, ideas, approaches, solutions they are considering or employing in their activity, etc. But each dataset (from different E-CURATORS case study) may present us with further yet unanticipated dimensions. What we know already is more or less captured in texts already circulated within our workspace so far, and remains at a general, hopefully shared across cases, level, and a useful approach would be to scan ADS interviews and reverse-engineer a set of queries that are implicit in the material (i.e. questions that this evidence attends to), and, then, subset entities, relationships and properties from the conceptual model into Neo4J property lists/schemas for individual kinds of objects, which will then be used to convert MaxQDA outputs in .xls into graph database structures.
Specific comments and questions from the April 15, 2019 workshop
- Perhaps we don't need SI Observable Entity - in general, much of that expansion of CIDOC CRM is relevant really to scientific instrumentation data processing which is rather irrelevant in our case (we may enocunter cases where people talk about capturing digital data, but will not process such data in our study).
- Do we need Creation and Assignment as subclasses of Activity? Activities can implicate objects in different relationships, and from our point of view it is important to differentiate between the relationship to a (direct) object and that to a means / tool that contributes to the activity, and for the "input" and "output" of an activity - so the Creation vs. Assignment subclasses differentiate between activities that point to an entire object (and one could think also of Destruction as a relevant complement to Creation) and those that modify an object through changing one or more of its properties (where apart from knowledge-related changes we should also consider physical modification of the Object as long as it retains its identity). The question is: do de need subclasses here, or would it be adequate to just note Activity types that would include Creation, Modification, Destruction etc.?
- We need surely a specialisation of Activity for Assignments (Assertions) - to use the CRMinf notion. What people assert in general can be traced through kinds of Conceptual objects (e.g. Propositional Objects, Statements), the question of subclassing Activities (rather than just using Activity types) boils down to whether we need the additional properties to reconstitute what is assigned, the property it concerns etc.
- The subclasses of Assignment - Argumentation --Belief adoption --Inference making ---Evaluation --- Simulation/prediction ---Hypothesis building could be interesting and useful to be converted into QDA codes... , the same things in SO are represented as Statements which are kinds of Conceptual objects; do we need to account for the *activity* of, say, evaluating, so that we recover in the Neo4J database the relationship of what is the thing that is evaluated, who evaluated it etc.? Or is it enough to just note evaluating as an activity type?
Note: assuming that we seek to "code" kinds of activities such as evaluating, stating a hypothesis etc., we should consider how we could extract automatically the relationships of such an activity from QDA codings. One option would be to replicate in the code system (and codings) the kinds of conceptual objects that are the output of each the Argumentation subclasses, e.g. Evaluative statement, Hypothesis. Then we could create a query to find collocations between Argumentation (and subcodes) and Statement (and subcodes), and reconstitute the relationship in graphs within Neo4J.
Note: Goals are kinds of statements, methods as kinds of conceptual objects - what they are *for the activity* is expressed as properties of the relationships
More specific decisions/considerations will emerge as we talk about the provisional code system we are building for Qualitative Data Analysis, have some experience using it, and consolidate decisions on it.