Files
Poc-dashboard/README.md
Eliyan 663cf45704 feat: Enhance ODMDB query handling with multi-schema support and intelligent routing
- Updated `poc.js` to support queries for multiple object types (seekers, jobads, recruiters, etc.) with intelligent routing based on natural language input.
- Implemented a query validation mechanism to prevent excessive or sensitive requests.
- Introduced a mapping manager for dynamic schema handling and object detection.
- Enhanced the response schema generation to accommodate various object types and their respective fields.
- Added a new script `verify-mapping.js` to verify and display the mapping details for the seekers schema, including available properties, indexes, access rights, and synonyms.
2025-10-15 13:54:24 +02:00

380 lines
13 KiB
Markdown

# ODMDB Natural Language Query PoC
This is a **Proof of Concept (PoC)** that demonstrates the conversion of natural language queries into ODMDB search queries using OpenAI's structured output API.
## Current Status
**Complete Multi-Schema Implementation**: Supports **all ODMDB object types** including seekers, jobads, recruiters, persons, and sirets. The system intelligently detects the target object from natural language queries and generates appropriate ODMDB DSL queries.
## Features
- **Multi-Object Natural Language Processing**: Intelligently detects target object (seekers, jobads, recruiters, persons, sirets) from natural language queries
- **Real ODMDB Schema Integration**: Dynamically loads actual schema files for all object types with verified accuracy
- **Comprehensive Field Mapping**: Uses real schema definitions with proper access rights for recruiter-readable fields
- **Index-Aware Query Generation**: Leverages actual ODMDB indexes for optimal query performance
- **Schema Mapping Manager**: Centralized system reading real schema files and generating comprehensive field synonyms
- **Multi-Object Query Support**: Handles queries across all ODMDB object types with object-specific optimizations
- **OpenAI Structured Output**: Dynamic JSON schema generation for any target object type
- **Real Data Validation**: Verified against actual ODMDB schema properties and index registers
- **Prepared Query Demos**: Ready-to-use example queries for all supported object types
## Prerequisites
- Node.js (v16 or higher)
- OpenAI API key
## Installation
1. Make sure you have the complete ODMDB data structure available:
```
../smatchitObjectOdmdb/
├── schema/
│ ├── seekers.json # Seeker schema (62 properties, 27 readable fields)
│ ├── jobads.json # Job advertisement schema
│ ├── recruiters.json # Recruiter schema
│ ├── persons.json # Person schema
│ ├── sirets.json # Company/Siret schema
│ └── *.json # Additional schema files
└── objects/
├── seekers/
│ ├── idx/ # Index files (lst_alias, seekstatus_alias, etc.)
│ └── itm/ # Individual seeker JSON files
├── jobads/
│ ├── idx/ # Job ad indexes
│ └── itm/ # Job ad data files
├── recruiters/
│ ├── idx/ # Recruiter indexes
│ └── itm/ # Recruiter data files
├── persons/
│ └── itm/ # Person data files
└── sirets/
└── itm/ # Company data files
```
2. Install dependencies:
```bash
npm install
```
3. Set your OpenAI API key:
```bash
export OPENAI_API_KEY=sk-your-api-key-here
```
## Usage
### Running the PoC
**Interactive Demo (Recommended):**
```bash
node demo.js
```
This runs the comprehensive demo with prepared queries for all object types and shows real-time query generation.
**Main PoC (Query Generation Only):**
```bash
npm start
```
**Main PoC with Query Execution:**
```bash
EXECUTE_QUERY=true npm start
```
This will process the hardcoded natural language query and output the generated ODMDB query in JSON format. When `EXECUTE_QUERY=true`, it will also execute the query against the ODMDB server.
### Changing the Query
To test different natural language queries, edit the `NL_QUERY` constant in `poc.js`:
```javascript
// Line 16 in poc.js
const NL_QUERY = "your natural language query here";
```
The system will automatically detect which object type you're asking about and generate the appropriate query.
### Example Queries by Object Type
#### Seekers (Job Seekers)
**Status-based queries:**
- `"show me seekers with status startasap and their email and experience"`
- `"find seekers looking for jobs urgently with their skills and salary expectations"`
- `"get seekers who are not looking with their employment status"`
**Skills & experience:**
- `"find seekers with technical skills and years of experience"`
- `"show me seekers with language abilities and personality profiles"`
- `"get seekers with specific know-how and job radar interests"`
**Location & preferences:**
- `"show me seekers in Paris with remote work preferences"`
- `"find seekers available to work in multiple countries"`
- `"get seekers with specific location and salary requirements"`
#### Job Ads
**Job search queries:**
- `"show me recent job postings in technology"`
- `"find job ads with high salary ranges"`
- `"get job advertisements posted this week"`
**Company & location:**
- `"show me jobs at specific companies"`
- `"find remote job opportunities"`
- `"get job ads in Paris or Lyon"`
#### Recruiters
**Recruiter information:**
- `"show me active recruiters and their specializations"`
- `"find recruiters from specific companies"`
- `"get recruiter contact information and experience"`
#### Persons
**General person queries:**
- `"show me person profiles with their roles"`
- `"find persons by their experience or background"`
#### Companies (Sirets)
**Company information:**
- `"show me companies in the technology sector"`
- `"find companies by size or location"`
- `"get company details and contact information"`
### Supported Query Types
**Multi-Object Intelligence:**
The system automatically detects which object you're asking about:
- Mentions of "seekers", "candidates", "job seekers" → seekers object
- Mentions of "jobs", "positions", "job ads" → jobads object
- Mentions of "recruiters", "hiring managers" → recruiters object
- Mentions of "persons", "people", "profiles" → persons object
- Mentions of "companies", "employers", "organizations" → sirets object
**Filter Types:**
- **Status filtering**: Object-specific status fields
- **Date filtering**: Creation dates, update dates with date ranges
- **Index optimization**: Uses real ODMDB indexes for efficient queries
- **Field-specific**: Searches within specific properties
## Schema Mapping System
The PoC uses a sophisticated schema mapping system located in `schema-mappings/`:
### Architecture
- **ODMDBMappingManager**: Central manager that loads and caches schema mappings
- **Base Mapping**: Core field synonym generation and mapping logic
- **Object-Specific Mappings**: Individual mapping files for each object type
- **Real Schema Integration**: Direct reading from actual ODMDB schema files
### Verified Schema Coverage
**Seekers Object:**
- 62 total schema properties mapped
- 27 recruiter-readable fields identified
- 3 indexes available (lst_alias, seekstatus_alias, alias)
- 206+ field synonyms generated from real schema definitions
**All Objects:**
- Dynamic schema loading for any ODMDB object type
- Access rights properly extracted from apxaccessrights structure
- Index definitions read from actual idx directories
- Field synonyms generated from real property definitions
### Field Mapping Examples
The system provides comprehensive natural language to field mappings:
**Contact & Identity:**
- `email`, `contact`, `mail` → `email`
- `id`, `username`, `alias` → `alias`
- `bio`, `description`, `summary` → `shortdescription`
**Work Experience & Status:**
- `experience`, `years of experience`, `career length` → `seekworkingyear`
- `job titles`, `positions`, `roles`, `work history` → `seekjobtitleexperience`
- `status`, `availability`, `urgency` → `seekstatus`
**Location & Geography:**
- `location`, `where`, `work location` → `seeklocation`
- `countries`, `work countries` → `countryavailabletowork`
**Skills & Competencies:**
- `skills`, `competencies`, `abilities` → `skills`
- `languages`, `language skills` → `languageskills`
- `knowledge`, `expertise`, `know-how` → `knowhow`
_(Plus hundreds more mappings for all object types)_
## Output Format
The PoC generates ODMDB queries in this format:
```json
{
"object": "seekers",
"condition": ["prop.dt_create(>=:2025-10-06)"],
"fields": ["alias", "email", "seekworkingyear"]
}
```
## ODMDB DSL Support
The PoC understands and generates these ODMDB DSL patterns:
- **Property queries**: `prop.<field>(operator:value)`
- **Index queries**: `idx.<indexName>(value)`
- **Join queries**: `join(remoteObject:localKey:remoteProp:operator:value)`
## Demo & Testing Tools
**Interactive Demo:**
```bash
node demo.js
```
**Live PoC demonstration** featuring:
- Real query generation from natural language using OpenAI
- Multi-object detection and schema loading
- Prepared queries for all supported object types
- Real-time field mapping and validation
- Current ODMDB data status display
**Demo Features:**
- **Prepared Queries**: 4 example queries per object type (20 total)
- **Schema Validation**: Shows actual field counts and mappings
- **Real-time Generation**: Demonstrates actual OpenAI API integration
- **Multi-Object Support**: Covers seekers, jobads, recruiters, persons, sirets
## Environment Variables
- `OPENAI_API_KEY` - Your OpenAI API key (required)
- `EXECUTE_QUERY` - Set to "true" to execute queries against ODMDB (default: false)
- `EXECUTE_DEMO` - Set to "true" to execute demo queries with real generation
- `ODMDB_BASE_URL` - ODMDB server URL (default: http://localhost:3000)
- `ODMDB_TRIBE` - ODMDB tribe name (default: smatchit)
- `OPENAI_MODEL` - OpenAI model to use (default: gpt-4o)
## System Validation
The mappings have been thoroughly validated to ensure they:
✅ **Read actual ODMDB schema files** - Not hardcoded mappings
✅ **Access real index registers** - Uses actual idx directory files
✅ **Extract proper access rights** - Reads apxaccessrights.recruiters.R structure
✅ **Generate comprehensive synonyms** - 200+ field mappings per object
✅ **Support all object types** - Dynamic loading for any ODMDB schema
## Technical Architecture
### Core Components
1. **poc.js**: Main PoC engine with multi-object support
2. **demo.js**: Comprehensive demonstration with prepared queries
3. **schema-mappings/**: Real schema integration system
4. **package.json**: Dependencies and execution scripts
### Schema Integration Flow
1. **Schema Loading**: ODMDBMappingManager reads actual schema files
2. **Field Extraction**: Extracts properties and access rights from real schemas
3. **Index Integration**: Reads index definitions from idx directories
4. **Synonym Generation**: Creates comprehensive field mappings
5. **Query Generation**: Uses OpenAI with dynamic schema for target object
6. **Validation**: Ensures generated queries match schema constraints
### Data Flow
```
Natural Language Query
Object Detection (seekers/jobads/recruiters/persons/sirets)
Schema Loading (real ODMDB schema files)
Field Mapping (comprehensive synonym matching)
OpenAI Structured Output (dynamic JSON schema)
ODMDB DSL Query (validated against real schema)
```
## Limitations
- **Local schema files required**: Needs access to actual ODMDB schema structure
- **OpenAI API dependency**: Requires valid API key and credits
- **Performance considerations**: Schema loading and mapping generation takes time
- **Single query per run**: No interactive conversation mode (yet)
## Next Steps
- [ ] Interactive CLI for multiple queries in conversation
- [ ] Enhanced query execution with real ODMDB server integration
- [ ] Query result processing and formatting improvements
- [ ] Advanced multi-object join queries
- [ ] Performance optimizations for schema loading
- [ ] User interface for non-technical users
## Files
**Core Implementation:**
- `poc.js` - Main PoC engine supporting all ODMDB object types
- `demo.js` - Comprehensive demo with real query generation
- `package.json` - Dependencies and scripts
**Schema System:**
- `schema-mappings/` - Complete schema mapping system
- `odmdb-mapping-manager.js` - Central mapping coordinator
- `base-mapping.js` - Core mapping logic and synonym generation
- `seekers-mapping.js`, `jobads-mapping.js`, etc. - Object-specific mappings
**Data Integration:**
- `../smatchitObjectOdmdb/schema/*.json` - Real ODMDB schema files
- `../smatchitObjectOdmdb/objects/*/idx/` - Index definition files
- `../smatchitObjectOdmdb/objects/*/itm/` - Data files for all object types
## Verification
The system has been validated against real ODMDB data:
- **Schema Properties**: All properties correctly read from actual schema files
- **Index Access**: Confirmed access to real index files (lst_alias, seekstatus_alias, etc.)
- **Access Rights**: Proper extraction of recruiter-readable fields
- **Field Mappings**: Comprehensive synonym generation from actual definitions
- **Multi-Object Support**: Verified functionality across all object types
This ensures the PoC works with **actual ODMDB schema properties** and **accesses real index registers** as required for production readiness.