Skip to main content

Add Knowledge

Learn how to create and configure knowledge bases in Vectense Platform to provide contextual information for your AI workflows.

Overview

Adding knowledge to your workspace involves:

  1. Creating a Knowledge Base: Set up the knowledge container
  2. Configuring Sources: Connect to your data sources
  3. Processing Content: Index and prepare content for AI use
  4. Testing Retrieval: Validate that knowledge works correctly

Prerequisites

Before creating a knowledge base:

  • Active Workspace: Access to workspace with knowledge creation permissions
  • AI Model: At least one configured model for embedding generation
  • Content Sources: Documents, files, or data sources to index
  • Understanding: Clear idea of what knowledge you want to capture

Creating Your First Knowledge Base

Step 1: Navigate to Knowledge Creation

  1. Go to Knowledge: Click "Knowledge" in the main navigation
  2. Create New: Click "Create a new knowledge" button
  3. Start Configuration: Begin the knowledge setup wizard

Step 2: Configure Basic Information

Knowledge Name

  • Choose a descriptive name that reflects the content
  • Examples: "Product Documentation", "Company Policies", "Customer Support KB"
  • Can be changed later if needed

Description

  • Optional but recommended description
  • Helps team members understand the knowledge purpose
  • Include information about content scope and intended use

AI Model Selection

  • Choose which model will generate embeddings for this knowledge
  • The model affects search quality and language understanding
  • Use the same model you plan to use in workflows for best results

Step 3: Choose Knowledge Source Type

Select the type of data source for your knowledge base:

What it is: Direct file upload through the web interface

Best for:

  • Company documents and policies
  • Product manuals and documentation
  • Training materials and guides
  • Small to medium document collections

Supported Formats:

  • PDF: Portable Document Format files
  • Word: Microsoft Word documents (.docx, .doc)
  • Excel: Spreadsheets and data files (.xlsx, .xls, .csv)
  • Text: Plain text files (.txt)
  • RTF: Rich Text Format documents
  • Markdown: Markdown formatted files (.md)

Configuration:

  • No additional configuration required
  • Files are uploaded after knowledge creation
  • Automatic format detection and processing
  • Simple drag-and-drop interface

How to Use:

  1. Select "File Upload" as source type
  2. Complete knowledge creation
  3. Upload files using drag-and-drop interface
  4. Wait for automatic processing and indexing

Local Filesystem

What it is: Connect to local directories and network drives

Best for:

  • Large document repositories
  • Shared network drives
  • Version-controlled documentation
  • Automatically updated content

Configuration Options:

Source Path

  • Local Directory: /path/to/documents
  • Network Share: //server/share/documents
  • Mounted Drive: /mnt/shared/knowledge

File Pattern (Glob)

  • All Files: **/* (everything recursively)
  • PDF Only: **/*.pdf
  • Documentation: **/*.{md,txt,pdf}
  • Exclude Folders: **/*.pdf,!**/archive/**

Pattern Examples:

**/*.pdf              # All PDF files recursively
docs/**/*.md # Markdown files in docs folder
*.{txt,md} # Text and markdown in root only
**/*,!**/temp/** # Everything except temp folders

How to Configure:

  1. Select "Local Filesystem" as source type
  2. Enter the source path to monitor
  3. Set file pattern to filter files
  4. Configure update frequency (if supported)
  5. Test connection and file access

Web Content

What it is: Crawl and index web pages and documentation

Best for:

  • Product documentation websites
  • Internal wikis and knowledge bases
  • News and blog content
  • Public information sources

Configuration Options:

Start URL

  • Documentation Site: https://docs.yourcompany.com
  • Wiki: https://wiki.internal.com/products
  • Blog Section: https://blog.company.com/category/product

Crawl Depth

  • 1: Only the starting page
  • 2: Starting page + directly linked pages
  • 3+: Multiple levels of links (be careful with large sites)

Max Pages

  • Limit total pages crawled
  • Prevent excessive resource usage
  • Typical values: 50-500 pages

Advanced Options:

  • URL Patterns: Restrict crawling to specific URL patterns
  • Exclude Patterns: Skip certain pages or sections
  • Update Schedule: How often to refresh content

How to Configure:

  1. Select "Web Content" as source type
  2. Enter the starting URL
  3. Set crawl depth (usually 2-3 levels)
  4. Set maximum pages to crawl
  5. Configure any URL restrictions
  6. Test the crawl with a small depth first

Step 4: Create and Process

Create Knowledge Base

  1. Review all configuration settings
  2. Click "Create" to create the knowledge base
  3. Wait for initial setup to complete
  4. Monitor processing status

Content Processing

  • File Upload: Upload files through the interface
  • Filesystem: Automatic scanning and indexing begins
  • Web Crawl: Crawling starts immediately
  • Progress Monitoring: Track processing in the Jobs section

Content Management

File Upload Management

Upload Files

  1. Navigate to your knowledge base
  2. Go to the "Edit" tab
  3. Use drag-and-drop or "Select Files" button
  4. Wait for upload and processing completion

Supported Upload Methods:

  • Drag and Drop: Drag files directly to the upload area
  • File Browser: Click "Select Files" to browse
  • Bulk Upload: Select multiple files at once

File Management:

  • View Uploaded Files: See all files in the knowledge base
  • Delete Files: Remove individual files
  • Replace Files: Upload newer versions
  • Monitor Processing: Track indexing progress

Filesystem Management

Monitoring

  • Files are automatically detected and processed
  • New files are indexed when added
  • Modified files are re-processed
  • Deleted files are removed from the index

File Filters

  • Use glob patterns to control which files are included
  • Exclude temporary or system files
  • Focus on specific file types or directories

Update Frequency

  • Changes are detected in real-time or near real-time
  • Large directories may have some processing delay
  • Monitor the Jobs section for processing status

Web Content Management

Content Updates

  • Web content can be refreshed manually or automatically
  • Set up refresh schedules for regularly updated sites
  • Monitor for broken links or access issues

Content Quality

  • Review extracted content for quality
  • Some web pages may not extract cleanly
  • Adjust crawl settings if needed

Content Processing Details

Text Extraction Process

Document Processing

  1. Format Detection: Automatic detection of file type
  2. Text Extraction: Pull text content from documents
  3. Structure Preservation: Maintain headings and organization
  4. Metadata Extraction: Capture file properties and information

Content Chunking

  1. Size Optimization: Break large documents into manageable pieces
  2. Context Preservation: Maintain document structure and relationships
  3. Overlap Strategy: Ensure continuity between chunks
  4. Quality Assurance: Validate chunk quality and content

Vector Generation

  1. Embedding Creation: Convert text to numerical representations
  2. Semantic Indexing: Enable meaning-based search
  3. Optimization: Optimize for search performance
  4. Storage: Store embeddings in vector database

Quality Assurance

Content Validation

  • Verify that all uploaded files are processed successfully
  • Check for any processing errors or warnings
  • Review extracted text quality
  • Validate that content is searchable

Common Issues and Solutions:

Files Not Processing

  • Check file format is supported
  • Verify file is not corrupted
  • Ensure file permissions allow reading
  • Check system resources and processing queue

Poor Text Extraction

  • Try different file formats (e.g., export PDF to Word)
  • Check for password-protected files
  • Verify file encoding for text files
  • Review extracted content in processing logs

Missing Content

  • Verify file patterns include desired files
  • Check directory permissions for filesystem sources
  • Validate web URLs are accessible
  • Review crawl logs for errors

Testing Your Knowledge Base

After content processing is complete:

Manual Testing

  1. Navigate to Knowledge: Go to your knowledge base
  2. Test Tab: Click on "Test" or "Manual Run"
  3. Enter Query: Type a question related to your content
  4. Review Results: Check that relevant content is returned
  5. Refine if Needed: Adjust configuration based on results

Test Queries Examples

  • Product Info: "What are the system requirements?"
  • Process Questions: "How do I reset my password?"
  • Policy Queries: "What is the return policy?"
  • Technical Questions: "How do I configure SSL?"

Quality Evaluation

  • Relevance: Results should be relevant to the query
  • Completeness: Important information should be findable
  • Accuracy: Retrieved content should be correct
  • Coverage: Test various types of queries

Integration with Workflows

Using Knowledge in Workflows

Context Injection

  • Knowledge is automatically available to AI steps
  • AI models can query knowledge during processing
  • Relevant context is provided based on workflow needs

Manual Knowledge Queries

  • Use Knowledge Retrieval action in workflows
  • Explicitly search for specific information
  • Control what context is provided to AI models

Best Practices

  • Design knowledge structure to match workflow needs
  • Test knowledge retrieval with actual workflow scenarios
  • Monitor knowledge usage and performance
  • Keep content updated and relevant

Monitoring and Maintenance

Usage Monitoring

  • Query Volume: Track how often knowledge is accessed
  • Popular Content: Identify most-accessed information
  • Performance: Monitor search response times
  • Quality Metrics: Track search result relevance

Content Maintenance

  • Regular Updates: Keep content current and accurate
  • Quality Reviews: Periodically review content quality
  • Cleanup: Remove outdated or irrelevant content
  • Optimization: Optimize search performance

Performance Optimization

  • Index Maintenance: Regular optimization of search indexes
  • Content Curation: Focus on high-quality, relevant content
  • Resource Monitoring: Track processing and storage usage
  • Cost Management: Monitor embedding generation costs

Troubleshooting

Common Issues

Knowledge Creation Fails

  • Check user permissions for knowledge creation
  • Verify workspace license supports knowledge bases
  • Ensure AI model is configured and accessible
  • Review error messages for specific issues

Content Not Processing

  • Verify source accessibility (filesystem permissions, web URLs)
  • Check file formats are supported
  • Monitor processing jobs for errors
  • Review system resources and capacity

Poor Search Results

  • Review content quality and organization
  • Test with different query phrasings
  • Check if content was processed correctly
  • Consider adjusting content chunking settings

Performance Issues

  • Monitor system resources during processing
  • Consider processing content in smaller batches
  • Review concurrent processing limits
  • Optimize file organization and structure

Getting Help

  • Documentation: Reference specific source type guides
  • Job Logs: Review processing logs for detailed error information
  • Community: Ask questions in user forums
  • Support: Contact technical support for complex issues

Best Practices

Content Organization

  • Clear Structure: Organize content logically and consistently
  • Quality Focus: Prioritize high-quality, relevant content
  • Regular Updates: Keep content current and accurate
  • Documentation: Document content sources and organization

Performance Optimization

  • Batch Processing: Process large amounts of content in batches
  • Incremental Updates: Only process changed content when possible
  • Resource Management: Monitor and optimize resource usage
  • Index Maintenance: Regularly optimize search indexes

Security and Privacy

  • Access Control: Limit access to sensitive knowledge bases
  • Content Review: Review content for sensitive information
  • Audit Logging: Track knowledge access and usage
  • Compliance: Ensure knowledge handling meets regulatory requirements

Your knowledge base is now ready to provide intelligent context to your AI workflows. Continue to Test Knowledge to validate retrieval quality, then Refresh Knowledge to learn about maintenance and updates.