# ODMDB Natural Language Query PoC This is a **Proof of Concept (PoC)** that demonstrates the conversion of natural language queries into ODMDB search queries using OpenAI's structured output API. ## Current Status ✅ **Complete Multi-Schema Implementation**: Supports **all ODMDB object types** including seekers, jobads, recruiters, persons, and sirets. The system intelligently detects the target object from natural language queries and generates appropriate ODMDB DSL queries. ## Features - **Multi-Object Natural Language Processing**: Intelligently detects target object (seekers, jobads, recruiters, persons, sirets) from natural language queries - **Real ODMDB Schema Integration**: Dynamically loads actual schema files for all object types with verified accuracy - **Comprehensive Field Mapping**: Uses real schema definitions with proper access rights for recruiter-readable fields - **Index-Aware Query Generation**: Leverages actual ODMDB indexes for optimal query performance - **Schema Mapping Manager**: Centralized system reading real schema files and generating comprehensive field synonyms - **Multi-Object Query Support**: Handles queries across all ODMDB object types with object-specific optimizations - **OpenAI Structured Output**: Dynamic JSON schema generation for any target object type - **Real Data Validation**: Verified against actual ODMDB schema properties and index registers - **Prepared Query Demos**: Ready-to-use example queries for all supported object types ## Prerequisites - Node.js (v16 or higher) - OpenAI API key ## Installation 1. Make sure you have the complete ODMDB data structure available: ``` ../smatchitObjectOdmdb/ ├── schema/ │ ├── seekers.json # Seeker schema (62 properties, 27 readable fields) │ ├── jobads.json # Job advertisement schema │ ├── recruiters.json # Recruiter schema │ ├── persons.json # Person schema │ ├── sirets.json # Company/Siret schema │ └── *.json # Additional schema files └── objects/ ├── seekers/ │ ├── idx/ # Index files (lst_alias, seekstatus_alias, etc.) │ └── itm/ # Individual seeker JSON files ├── jobads/ │ ├── idx/ # Job ad indexes │ └── itm/ # Job ad data files ├── recruiters/ │ ├── idx/ # Recruiter indexes │ └── itm/ # Recruiter data files ├── persons/ │ └── itm/ # Person data files └── sirets/ └── itm/ # Company data files ``` 2. Install dependencies: ```bash npm install ``` 3. Set your OpenAI API key: ```bash export OPENAI_API_KEY=sk-your-api-key-here ``` ## Usage ### Running the PoC **Interactive Demo (Recommended):** ```bash node demo.js ``` This runs the comprehensive demo with prepared queries for all object types and shows real-time query generation. **Main PoC (Query Generation Only):** ```bash npm start ``` **Main PoC with Query Execution:** ```bash EXECUTE_QUERY=true npm start ``` This will process the hardcoded natural language query and output the generated ODMDB query in JSON format. When `EXECUTE_QUERY=true`, it will also execute the query against the ODMDB server. ### Changing the Query To test different natural language queries, edit the `NL_QUERY` constant in `poc.js`: ```javascript // Line 16 in poc.js const NL_QUERY = "your natural language query here"; ``` The system will automatically detect which object type you're asking about and generate the appropriate query. ### Example Queries by Object Type #### Seekers (Job Seekers) **Status-based queries:** - `"show me seekers with status startasap and their email and experience"` - `"find seekers looking for jobs urgently with their skills and salary expectations"` - `"get seekers who are not looking with their employment status"` **Skills & experience:** - `"find seekers with technical skills and years of experience"` - `"show me seekers with language abilities and personality profiles"` - `"get seekers with specific know-how and job radar interests"` **Location & preferences:** - `"show me seekers in Paris with remote work preferences"` - `"find seekers available to work in multiple countries"` - `"get seekers with specific location and salary requirements"` #### Job Ads **Job search queries:** - `"show me recent job postings in technology"` - `"find job ads with high salary ranges"` - `"get job advertisements posted this week"` **Company & location:** - `"show me jobs at specific companies"` - `"find remote job opportunities"` - `"get job ads in Paris or Lyon"` #### Recruiters **Recruiter information:** - `"show me active recruiters and their specializations"` - `"find recruiters from specific companies"` - `"get recruiter contact information and experience"` #### Persons **General person queries:** - `"show me person profiles with their roles"` - `"find persons by their experience or background"` #### Companies (Sirets) **Company information:** - `"show me companies in the technology sector"` - `"find companies by size or location"` - `"get company details and contact information"` ### Supported Query Types **Multi-Object Intelligence:** The system automatically detects which object you're asking about: - Mentions of "seekers", "candidates", "job seekers" → seekers object - Mentions of "jobs", "positions", "job ads" → jobads object - Mentions of "recruiters", "hiring managers" → recruiters object - Mentions of "persons", "people", "profiles" → persons object - Mentions of "companies", "employers", "organizations" → sirets object **Filter Types:** - **Status filtering**: Object-specific status fields - **Date filtering**: Creation dates, update dates with date ranges - **Index optimization**: Uses real ODMDB indexes for efficient queries - **Field-specific**: Searches within specific properties ## Schema Mapping System The PoC uses a sophisticated schema mapping system located in `schema-mappings/`: ### Architecture - **ODMDBMappingManager**: Central manager that loads and caches schema mappings - **Base Mapping**: Core field synonym generation and mapping logic - **Object-Specific Mappings**: Individual mapping files for each object type - **Real Schema Integration**: Direct reading from actual ODMDB schema files ### Verified Schema Coverage **Seekers Object:** - 62 total schema properties mapped - 27 recruiter-readable fields identified - 3 indexes available (lst_alias, seekstatus_alias, alias) - 206+ field synonyms generated from real schema definitions **All Objects:** - Dynamic schema loading for any ODMDB object type - Access rights properly extracted from apxaccessrights structure - Index definitions read from actual idx directories - Field synonyms generated from real property definitions ### Field Mapping Examples The system provides comprehensive natural language to field mappings: **Contact & Identity:** - `email`, `contact`, `mail` → `email` - `id`, `username`, `alias` → `alias` - `bio`, `description`, `summary` → `shortdescription` **Work Experience & Status:** - `experience`, `years of experience`, `career length` → `seekworkingyear` - `job titles`, `positions`, `roles`, `work history` → `seekjobtitleexperience` - `status`, `availability`, `urgency` → `seekstatus` **Location & Geography:** - `location`, `where`, `work location` → `seeklocation` - `countries`, `work countries` → `countryavailabletowork` **Skills & Competencies:** - `skills`, `competencies`, `abilities` → `skills` - `languages`, `language skills` → `languageskills` - `knowledge`, `expertise`, `know-how` → `knowhow` _(Plus hundreds more mappings for all object types)_ ## Output Format The PoC generates ODMDB queries in this format: ```json { "object": "seekers", "condition": ["prop.dt_create(>=:2025-10-06)"], "fields": ["alias", "email", "seekworkingyear"] } ``` ## ODMDB DSL Support The PoC understands and generates these ODMDB DSL patterns: - **Property queries**: `prop.(operator:value)` - **Index queries**: `idx.(value)` - **Join queries**: `join(remoteObject:localKey:remoteProp:operator:value)` ## Demo & Testing Tools **Interactive Demo:** ```bash node demo.js ``` **Live PoC demonstration** featuring: - Real query generation from natural language using OpenAI - Multi-object detection and schema loading - Prepared queries for all supported object types - Real-time field mapping and validation - Current ODMDB data status display **Demo Features:** - **Prepared Queries**: 4 example queries per object type (20 total) - **Schema Validation**: Shows actual field counts and mappings - **Real-time Generation**: Demonstrates actual OpenAI API integration - **Multi-Object Support**: Covers seekers, jobads, recruiters, persons, sirets ## Environment Variables - `OPENAI_API_KEY` - Your OpenAI API key (required) - `EXECUTE_QUERY` - Set to "true" to execute queries against ODMDB (default: false) - `EXECUTE_DEMO` - Set to "true" to execute demo queries with real generation - `ODMDB_BASE_URL` - ODMDB server URL (default: http://localhost:3000) - `ODMDB_TRIBE` - ODMDB tribe name (default: smatchit) - `OPENAI_MODEL` - OpenAI model to use (default: gpt-4o) ## System Validation The mappings have been thoroughly validated to ensure they: ✅ **Read actual ODMDB schema files** - Not hardcoded mappings ✅ **Access real index registers** - Uses actual idx directory files ✅ **Extract proper access rights** - Reads apxaccessrights.recruiters.R structure ✅ **Generate comprehensive synonyms** - 200+ field mappings per object ✅ **Support all object types** - Dynamic loading for any ODMDB schema ## Technical Architecture ### Core Components 1. **poc.js**: Main PoC engine with multi-object support 2. **demo.js**: Comprehensive demonstration with prepared queries 3. **schema-mappings/**: Real schema integration system 4. **package.json**: Dependencies and execution scripts ### Schema Integration Flow 1. **Schema Loading**: ODMDBMappingManager reads actual schema files 2. **Field Extraction**: Extracts properties and access rights from real schemas 3. **Index Integration**: Reads index definitions from idx directories 4. **Synonym Generation**: Creates comprehensive field mappings 5. **Query Generation**: Uses OpenAI with dynamic schema for target object 6. **Validation**: Ensures generated queries match schema constraints ### Data Flow ``` Natural Language Query ↓ Object Detection (seekers/jobads/recruiters/persons/sirets) ↓ Schema Loading (real ODMDB schema files) ↓ Field Mapping (comprehensive synonym matching) ↓ OpenAI Structured Output (dynamic JSON schema) ↓ ODMDB DSL Query (validated against real schema) ``` ## Limitations - **Local schema files required**: Needs access to actual ODMDB schema structure - **OpenAI API dependency**: Requires valid API key and credits - **Performance considerations**: Schema loading and mapping generation takes time - **Single query per run**: No interactive conversation mode (yet) ## Next Steps - [ ] Interactive CLI for multiple queries in conversation - [ ] Enhanced query execution with real ODMDB server integration - [ ] Query result processing and formatting improvements - [ ] Advanced multi-object join queries - [ ] Performance optimizations for schema loading - [ ] User interface for non-technical users ## Files **Core Implementation:** - `poc.js` - Main PoC engine supporting all ODMDB object types - `demo.js` - Comprehensive demo with real query generation - `package.json` - Dependencies and scripts **Schema System:** - `schema-mappings/` - Complete schema mapping system - `odmdb-mapping-manager.js` - Central mapping coordinator - `base-mapping.js` - Core mapping logic and synonym generation - `seekers-mapping.js`, `jobads-mapping.js`, etc. - Object-specific mappings **Data Integration:** - `../smatchitObjectOdmdb/schema/*.json` - Real ODMDB schema files - `../smatchitObjectOdmdb/objects/*/idx/` - Index definition files - `../smatchitObjectOdmdb/objects/*/itm/` - Data files for all object types ## Verification The system has been validated against real ODMDB data: - **Schema Properties**: All properties correctly read from actual schema files - **Index Access**: Confirmed access to real index files (lst_alias, seekstatus_alias, etc.) - **Access Rights**: Proper extraction of recruiter-readable fields - **Field Mappings**: Comprehensive synonym generation from actual definitions - **Multi-Object Support**: Verified functionality across all object types This ensures the PoC works with **actual ODMDB schema properties** and **accesses real index registers** as required for production readiness.