Add Knowledge
Learn how to create and configure knowledge bases in Vectense Platform to provide contextual information for your AI workflows.
Overview
Adding knowledge to your workspace involves:
- Creating a Knowledge Base: Set up the knowledge container
- Configuring Sources: Connect to your data sources
- Processing Content: Index and prepare content for AI use
- Testing Retrieval: Validate that knowledge works correctly
Prerequisites
Before creating a knowledge base:
- Active Workspace: Access to workspace with knowledge creation permissions
- AI Model: At least one configured model for embedding generation
- Content Sources: Documents, files, or data sources to index
- Understanding: Clear idea of what knowledge you want to capture
Creating Your First Knowledge Base
Step 1: Navigate to Knowledge Creation
- Go to Knowledge: Click "Knowledge" in the main navigation
- Create New: Click "Create a new knowledge" button
- Start Configuration: Begin the knowledge setup wizard
Step 2: Configure Basic Information
Knowledge Name
- Choose a descriptive name that reflects the content
- Examples: "Product Documentation", "Company Policies", "Customer Support KB"
- Can be changed later if needed
Description
- Optional but recommended description
- Helps team members understand the knowledge purpose
- Include information about content scope and intended use
AI Model Selection
- Choose which model will generate embeddings for this knowledge
- The model affects search quality and language understanding
- Use the same model you plan to use in workflows for best results
Step 3: Choose Knowledge Source Type
Select the type of data source for your knowledge base:
File Bucket (Recommended for Beginners)
What it is: Direct file upload through the web interface
Best for:
- Company documents and policies
- Product manuals and documentation
- Training materials and guides
- Small to medium document collections
Supported Formats:
- PDF: Portable Document Format files
- Word: Microsoft Word documents (.docx, .doc)
- Excel: Spreadsheets and data files (.xlsx, .xls, .csv)
- Text: Plain text files (.txt)
- RTF: Rich Text Format documents
- Markdown: Markdown formatted files (.md)
Configuration:
- No additional configuration required
- Files are uploaded after knowledge creation
- Automatic format detection and processing
- Simple drag-and-drop interface
How to Use:
- Select "File Upload" as source type
- Complete knowledge creation
- Upload files using drag-and-drop interface
- Wait for automatic processing and indexing
Local Filesystem
What it is: Connect to local directories and network drives
Best for:
- Large document repositories
- Shared network drives
- Version-controlled documentation
- Automatically updated content
Configuration Options:
Source Path
- Local Directory:
/path/to/documents - Network Share:
//server/share/documents - Mounted Drive:
/mnt/shared/knowledge
File Pattern (Glob)
- All Files:
**/*(everything recursively) - PDF Only:
**/*.pdf - Documentation:
**/*.{md,txt,pdf} - Exclude Folders:
**/*.pdf,!**/archive/**
Pattern Examples:
**/*.pdf # All PDF files recursively
docs/**/*.md # Markdown files in docs folder
*.{txt,md} # Text and markdown in root only
**/*,!**/temp/** # Everything except temp folders
How to Configure:
- Select "Local Filesystem" as source type
- Enter the source path to monitor
- Set file pattern to filter files
- Configure update frequency (if supported)
- Test connection and file access
Web Content
What it is: Crawl and index web pages and documentation
Best for:
- Product documentation websites
- Internal wikis and knowledge bases
- News and blog content
- Public information sources
Configuration Options:
Start URL
- Documentation Site:
https://docs.yourcompany.com - Wiki:
https://wiki.internal.com/products - Blog Section:
https://blog.company.com/category/product
Crawl Depth
- 1: Only the starting page
- 2: Starting page + directly linked pages
- 3+: Multiple levels of links (be careful with large sites)
Max Pages
- Limit total pages crawled
- Prevent excessive resource usage
- Typical values: 50-500 pages
Advanced Options:
- URL Patterns: Restrict crawling to specific URL patterns
- Exclude Patterns: Skip certain pages or sections
- Update Schedule: How often to refresh content
How to Configure:
- Select "Web Content" as source type
- Enter the starting URL
- Set crawl depth (usually 2-3 levels)
- Set maximum pages to crawl
- Configure any URL restrictions
- Test the crawl with a small depth first
Step 4: Create and Process
Create Knowledge Base
- Review all configuration settings
- Click "Create" to create the knowledge base
- Wait for initial setup to complete
- Monitor processing status
Content Processing
- File Upload: Upload files through the interface
- Filesystem: Automatic scanning and indexing begins
- Web Crawl: Crawling starts immediately
- Progress Monitoring: Track processing in the Jobs section
Content Management
File Upload Management
Upload Files
- Navigate to your knowledge base
- Go to the "Edit" tab
- Use drag-and-drop or "Select Files" button
- Wait for upload and processing completion
Supported Upload Methods:
- Drag and Drop: Drag files directly to the upload area
- File Browser: Click "Select Files" to browse
- Bulk Upload: Select multiple files at once
File Management:
- View Uploaded Files: See all files in the knowledge base
- Delete Files: Remove individual files
- Replace Files: Upload newer versions
- Monitor Processing: Track indexing progress
Filesystem Management
Monitoring
- Files are automatically detected and processed
- New files are indexed when added
- Modified files are re-processed
- Deleted files are removed from the index
File Filters
- Use glob patterns to control which files are included
- Exclude temporary or system files
- Focus on specific file types or directories
Update Frequency
- Changes are detected in real-time or near real-time
- Large directories may have some processing delay
- Monitor the Jobs section for processing status
Web Content Management
Content Updates
- Web content can be refreshed manually or automatically
- Set up refresh schedules for regularly updated sites
- Monitor for broken links or access issues
Content Quality
- Review extracted content for quality
- Some web pages may not extract cleanly
- Adjust crawl settings if needed
Content Processing Details
Text Extraction Process
Document Processing
- Format Detection: Automatic detection of file type
- Text Extraction: Pull text content from documents
- Structure Preservation: Maintain headings and organization
- Metadata Extraction: Capture file properties and information
Content Chunking
- Size Optimization: Break large documents into manageable pieces
- Context Preservation: Maintain document structure and relationships
- Overlap Strategy: Ensure continuity between chunks
- Quality Assurance: Validate chunk quality and content
Vector Generation
- Embedding Creation: Convert text to numerical representations
- Semantic Indexing: Enable meaning-based search
- Optimization: Optimize for search performance
- Storage: Store embeddings in vector database
Quality Assurance
Content Validation
- Verify that all uploaded files are processed successfully
- Check for any processing errors or warnings
- Review extracted text quality
- Validate that content is searchable
Common Issues and Solutions:
Files Not Processing
- Check file format is supported
- Verify file is not corrupted
- Ensure file permissions allow reading
- Check system resources and processing queue
Poor Text Extraction
- Try different file formats (e.g., export PDF to Word)
- Check for password-protected files
- Verify file encoding for text files
- Review extracted content in processing logs
Missing Content
- Verify file patterns include desired files
- Check directory permissions for filesystem sources
- Validate web URLs are accessible
- Review crawl logs for errors
Testing Your Knowledge Base
After content processing is complete:
Manual Testing
- Navigate to Knowledge: Go to your knowledge base
- Test Tab: Click on "Test" or "Manual Run"
- Enter Query: Type a question related to your content
- Review Results: Check that relevant content is returned
- Refine if Needed: Adjust configuration based on results
Test Queries Examples
- Product Info: "What are the system requirements?"
- Process Questions: "How do I reset my password?"
- Policy Queries: "What is the return policy?"
- Technical Questions: "How do I configure SSL?"
Quality Evaluation
- Relevance: Results should be relevant to the query
- Completeness: Important information should be findable
- Accuracy: Retrieved content should be correct
- Coverage: Test various types of queries
Integration with Workflows
Using Knowledge in Workflows
Context Injection
- Knowledge is automatically available to AI steps
- AI models can query knowledge during processing
- Relevant context is provided based on workflow needs
Manual Knowledge Queries
- Use Knowledge Retrieval action in workflows
- Explicitly search for specific information
- Control what context is provided to AI models
Best Practices
- Design knowledge structure to match workflow needs
- Test knowledge retrieval with actual workflow scenarios
- Monitor knowledge usage and performance
- Keep content updated and relevant
Monitoring and Maintenance
Usage Monitoring
- Query Volume: Track how often knowledge is accessed
- Popular Content: Identify most-accessed information
- Performance: Monitor search response times
- Quality Metrics: Track search result relevance
Content Maintenance
- Regular Updates: Keep content current and accurate
- Quality Reviews: Periodically review content quality
- Cleanup: Remove outdated or irrelevant content
- Optimization: Optimize search performance
Performance Optimization
- Index Maintenance: Regular optimization of search indexes
- Content Curation: Focus on high-quality, relevant content
- Resource Monitoring: Track processing and storage usage
- Cost Management: Monitor embedding generation costs
Troubleshooting
Common Issues
Knowledge Creation Fails
- Check user permissions for knowledge creation
- Verify workspace license supports knowledge bases
- Ensure AI model is configured and accessible
- Review error messages for specific issues
Content Not Processing
- Verify source accessibility (filesystem permissions, web URLs)
- Check file formats are supported
- Monitor processing jobs for errors
- Review system resources and capacity
Poor Search Results
- Review content quality and organization
- Test with different query phrasings
- Check if content was processed correctly
- Consider adjusting content chunking settings
Performance Issues
- Monitor system resources during processing
- Consider processing content in smaller batches
- Review concurrent processing limits
- Optimize file organization and structure
Getting Help
- Documentation: Reference specific source type guides
- Job Logs: Review processing logs for detailed error information
- Community: Ask questions in user forums
- Support: Contact technical support for complex issues
Best Practices
Content Organization
- Clear Structure: Organize content logically and consistently
- Quality Focus: Prioritize high-quality, relevant content
- Regular Updates: Keep content current and accurate
- Documentation: Document content sources and organization
Performance Optimization
- Batch Processing: Process large amounts of content in batches
- Incremental Updates: Only process changed content when possible
- Resource Management: Monitor and optimize resource usage
- Index Maintenance: Regularly optimize search indexes
Security and Privacy
- Access Control: Limit access to sensitive knowledge bases
- Content Review: Review content for sensitive information
- Audit Logging: Track knowledge access and usage
- Compliance: Ensure knowledge handling meets regulatory requirements
Your knowledge base is now ready to provide intelligent context to your AI workflows. Continue to Test Knowledge to validate retrieval quality, then Refresh Knowledge to learn about maintenance and updates.