# ODMDB Natural Language Query PoC This is a **Proof of Concept (PoC)** that demonstrates the conversion of natural language queries into ODMDB search queries using OpenAI's structured output API. ## Current Status ⚠️ **Partial Implementation**: Currently only the **seekers** object mapping is implemented. This PoC focuses on demonstrating the natural language to DSL query conversion for seeker-related searches. ## Features - **Natural Language Processing**: Converts human questions into structured ODMDB queries - **Real ODMDB Integration**: Works with actual ODMDB data from `../smatchitObjectOdmdb/` - **Schema-Based Mapping**: Uses actual seekers.json schema for accurate field mapping (62 properties) - **Local Data Execution**: Processes queries against local seeker files in `objects/seekers/itm/` - **OpenAI Structured Output**: Ensures reliable JSON query generation - **Query Validation**: Validates generated queries against real ODMDB schema rules - **jq Integration**: Powerful result processing, filtering, and CSV export capabilities ## Prerequisites - Node.js (v16 or higher) - OpenAI API key ## Installation 1. Make sure you have the ODMDB data structure available: ``` ../smatchitObjectOdmdb/ ├── schema/ │ └── seekers.json # Seeker schema (62 properties) └── objects/ └── seekers/ └── itm/ # Individual seeker JSON files ``` 2. Install dependencies: ```bash npm install ``` 3. Set your OpenAI API key: ```bash export OPENAI_API_KEY=sk-your-api-key-here ``` ## Usage ### Running the PoC **Query Generation Only (Default):** ```bash npm start ``` **Query Generation + Execution:** ```bash EXECUTE_QUERY=true npm start ``` ```` This will process the hardcoded natural language query and output the generated ODMDB query in JSON format. When `EXECUTE_QUERY=true`, it will also execute the query against the ODMDB server. ### Changing the Query To test different natural language queries, edit the `NL_QUERY` constant in `poc.js`: ```javascript // Line 16 in poc.js const NL_QUERY = "your natural language query here"; ```` ### Example Queries **Status-based queries:** - `"show me seekers with status startasap and their email and experience"` - `"find seekers looking for jobs urgently with their skills and salary expectations"` - `"get seekers who are not looking with their employment status"` **Date-based queries:** - `"give me new seekers since last week with email and experience"` - `"show me seekers from yesterday with their location and availability"` - `"find recently updated seekers with their job preferences"` **Comprehensive field queries:** - `"show me seeker contact info and work experience"` - `"find seekers with personality types and language skills"` - `"get seeker salary expectations and preferred working hours"` - `"show me seeker education and training preferences"` - `"find seekers with their job applications and saved jobs"` **Location & preferences:** - `"show me seekers in Paris with remote work preferences"` - `"find seekers available to work in multiple countries"` - `"get seekers with specific location and salary requirements"` **Skills & competencies:** - `"find seekers with technical skills and years of experience"` - `"show me seekers with language abilities and personality profiles"` - `"get seekers with specific know-how and job radar interests"` **Job search activity:** - `"show me seekers who applied to jobs recently"` - `"find seekers with saved jobs and their preferences"` - `"get seekers who were invited to apply with their status"` **Notifications & communication:** - `"show me seekers with email preferences and notification settings"` - `"find seekers who receive weekly reports and interview tips"` **Supported filter types:** - **Status filtering**: `seekstatus` (startasap, norush, notlooking) - **Date filtering**: `dt_create`, `dt_update`, `matchinglastdate` with date ranges - **Index optimization**: Uses ODMDB indexes (`lst_alias`, `seekstatus_alias`) for efficient queries ### Demo & Testing Tools **Interactive Demo:** ```bash node demo.js ``` **Live PoC demonstration** that actually uses the query generation functionality to show: - Real query generation from natural language using OpenAI - ODMDB schema loading and field mapping - Current ODMDB data status and sample data **Demo with Query Execution:** ```bash EXECUTE_DEMO=true node demo.js ``` Runs the demo with actual query execution against real seeker data files. **jq Playground:** ```bash node experiment-jq-playground.js ``` A playground to experiment with jq commands - not vital to the PoC but useful for learning jq syntax. Demonstrates various jq operations including: - Basic data formatting and field selection - CSV conversion from JSON - Advanced filtering and transformations - Statistical summaries and aggregations ## Environment Variables - `OPENAI_API_KEY` - Your OpenAI API key (required) - `EXECUTE_QUERY` - Set to "true" to execute queries against ODMDB (default: false) - `ODMDB_BASE_URL` - ODMDB server URL (default: http://localhost:3000) - `ODMDB_TRIBE` - ODMDB tribe name (default: smatchit) - `OPENAI_MODEL` - OpenAI model to use (default: gpt-5) ## Output Format **Query Generation:** The PoC generates ODMDB queries in this format: ```json { "object": "seekers", "condition": ["prop.dt_create(>=:2025-10-06)"], "fields": ["alias", "email", "seekworkingyear"] } ``` ## ODMDB DSL Support The PoC understands and generates these ODMDB DSL patterns: - **Property queries**: `prop.(operator:value)` - **Index queries**: `idx.(value)` - **Join queries**: `join(remoteObject:localKey:remoteProp:operator:value)` ## Comprehensive Field Mappings Supports extensive natural language mapping for **all 62 seeker properties**: **Contact & Identity:** - `email`, `contact`, `mail` → `email` - `id`, `username`, `alias` → `alias` - `bio`, `description`, `summary` → `shortdescription` **Work Experience & Status:** - `experience`, `years of experience`, `career length` → `seekworkingyear` - `job titles`, `positions`, `roles`, `work history` → `seekjobtitleexperience` - `status`, `availability`, `urgency` → `seekstatus` - `employment`, `work status`, `job status` → `employmentstatus` **Location & Geography:** - `location`, `where`, `work location` → `seeklocation` - `countries`, `work countries` → `countryavailabletowork` - `current location`, `last location` → `lastlocation` **Salary & Compensation:** - `salary`, `pay`, `compensation`, `wage` → `salaryexpectation` - `currency`, `salary currency` → `salarydevise` - `salary unit`, `pay period` → `salaryunit` **Skills & Competencies:** - `skills`, `competencies`, `abilities` → `skills` - `languages`, `language skills` → `languageskills` - `knowledge`, `expertise`, `know-how` → `knowhow` **Personality & Preferences:** - `personality`, `MBTI`, `type` → `mbti` - `likes`, `interests`, `preferences` → `thingsilike` - `dislikes`, `avoid`, `not interested` → `thingsidislike` **Job Search Activity:** - `applied jobs`, `applications` → `jobadapply` - `saved jobs`, `bookmarked jobs` → `jobadsaved` - `viewed jobs`, `job views` → `jobadview` - `invitations`, `invited to apply` → `jobadinvitedtoapply` **Availability & Schedule:** - `working hours`, `preferred hours`, `schedule` → `preferedworkinghours` - `unavailable`, `blocked times` → `notavailabletowork` **Dates & Activity:** - `created`, `new`, `recent`, `since` → `dt_create` - `updated`, `modified`, `last update` → `dt_update` - `last matching`, `matching date` → `matchinglastdate` _Plus comprehensive mappings for education, notifications, training, and system fields._ ## Schema Context The PoC can optionally load schema files for context: - `main.json` - Combined schema definitions - `lg.json` - Localization/language mappings ## Limitations - **Seekers only**: Other ODMDB objects (jobads, recruiters, etc.) are not yet implemented - **Local execution only**: Works with file-based data, not live ODMDB server API - **Hardcoded query**: Single query per run (no interactive mode) - **Performance limit**: Processes first 50 seeker files for PoC performance - **Simplified DSL**: Basic condition parsing (date ranges, status filtering) ## Next Steps - [ ] Add support for other ODMDB objects (jobads, recruiters, etc.) - [ ] Interactive CLI for multiple queries - [ ] Integration with actual ODMDB backend - [ ] Enhanced field mapping and validation - [ ] Multi-turn conversation support ## Files **Core Implementation:** - `poc.js` - Main PoC implementation with full ODMDB integration - `package.json` - Dependencies and scripts **Demo & Testing:** - `demo.js` - **Live PoC demo** that actually generates and executes queries using real ODMDB data - `experiment-jq-playground.js` - jq learning playground (optional, not vital to PoC) **Data & Schema:** - `main.json` - Optional consolidated schema context (if available) - `../smatchitObjectOdmdb/schema/seekers.json` - Real seekers schema (62 properties) - `../smatchitObjectOdmdb/objects/seekers/itm/` - Individual seeker data files