SurfSense Documentation

Manual Installation

Setting up SurfSense manually for customized deployments (Preferred)

Manual Installation (Preferred)

This guide provides step-by-step instructions for setting up SurfSense without Docker. This approach gives you more control over the installation process and allows for customization of the environment.

Prerequisites

Before beginning the manual installation, ensure you have the following installed and configured:

Required Software

  • Python 3.12+ - Backend runtime environment
  • Node.js 20+ - Frontend runtime environment
  • PostgreSQL 14+ - Database server
  • PGVector - PostgreSQL extension for vector similarity search
  • Redis - Message broker for Celery task queue
  • Git - Version control (to clone the repository)

Required Services & API Keys

Complete all the setup steps, including:

  • Authentication Setup (choose one):
    • Google OAuth credentials (for AUTH_TYPE=GOOGLE)
    • Local authentication setup (for AUTH_TYPE=LOCAL)
  • File Processing ETL Service (choose one):
    • Unstructured.io API key (Supports 34+ formats)
    • LlamaCloud API key (enhanced parsing, supports 50+ formats)
    • Docling (local processing, no API key required, supports PDF, Office docs, images, HTML, CSV)
  • Other API keys as needed for your use case

Backend Setup

The backend is the core of SurfSense. Follow these steps to set it up:

1. Environment Configuration

First, create and configure your environment variables by copying the example file:

Linux/macOS:

cd surfsense_backend
cp .env.example .env

Windows (Command Prompt):

cd surfsense_backend
copy .env.example .env

Windows (PowerShell):

cd surfsense_backend
Copy-Item -Path .env.example -Destination .env

Edit the .env file and set the following variables:

ENV VARIABLEDESCRIPTION
DATABASE_URLPostgreSQL connection string (e.g., postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense)
SECRET_KEYJWT Secret key for authentication (should be a secure random string)
NEXT_FRONTEND_URLURL where your frontend application is hosted (e.g., http://localhost:3000)
BACKEND_URL(Optional) Public URL of the backend for OAuth callbacks (e.g., https://api.yourdomain.com). Required when running behind a reverse proxy with HTTPS. Used to set correct OAuth redirect URLs and secure cookies.
AUTH_TYPEAuthentication method: GOOGLE for OAuth with Google, LOCAL for email/password authentication
GOOGLE_OAUTH_CLIENT_ID(Optional) Client ID from Google Cloud Console (required if AUTH_TYPE=GOOGLE)
GOOGLE_OAUTH_CLIENT_SECRET(Optional) Client secret from Google Cloud Console (required if AUTH_TYPE=GOOGLE)
ELECTRIC_DB_USER(Optional) PostgreSQL username for Electric-SQL connection (default: electric)
ELECTRIC_DB_PASSWORD(Optional) PostgreSQL password for Electric-SQL connection (default: electric_password)
EMBEDDING_MODELName of the embedding model (e.g., sentence-transformers/all-MiniLM-L6-v2, openai://text-embedding-ada-002)
RERANKERS_ENABLED(Optional) Enable or disable document reranking for improved search results (e.g., TRUE or FALSE, default: FALSE)
RERANKERS_MODEL_NAMEName of the reranker model (e.g., ms-marco-MiniLM-L-12-v2) (required if RERANKERS_ENABLED=TRUE)
RERANKERS_MODEL_TYPEType of reranker model (e.g., flashrank) (required if RERANKERS_ENABLED=TRUE)
TTS_SERVICEText-to-Speech API provider for Podcasts (e.g., local/kokoro, openai/tts-1). See supported providers
TTS_SERVICE_API_KEY(Optional if local) API key for the Text-to-Speech service
TTS_SERVICE_API_BASE(Optional) Custom API base URL for the Text-to-Speech service
STT_SERVICESpeech-to-Text API provider for Audio Files (e.g., local/base, openai/whisper-1). See supported providers
STT_SERVICE_API_KEY(Optional if local) API key for the Speech-to-Text service
STT_SERVICE_API_BASE(Optional) Custom API base URL for the Speech-to-Text service
FIRECRAWL_API_KEY(Optional) API key for Firecrawl service for web crawling
ETL_SERVICEDocument parsing service: UNSTRUCTURED (supports 34+ formats), LLAMACLOUD (supports 50+ formats including legacy document types), or DOCLING (local processing, supports PDF, Office docs, images, HTML, CSV)
UNSTRUCTURED_API_KEYAPI key for Unstructured.io service for document parsing (required if ETL_SERVICE=UNSTRUCTURED)
LLAMA_CLOUD_API_KEYAPI key for LlamaCloud service for document parsing (required if ETL_SERVICE=LLAMACLOUD)
CELERY_BROKER_URLRedis connection URL for Celery broker (e.g., redis://localhost:6379/0)
CELERY_RESULT_BACKENDRedis connection URL for Celery result backend (e.g., redis://localhost:6379/0)
SCHEDULE_CHECKER_INTERVAL(Optional) How often to check for scheduled connector tasks. Format: <number><unit> where unit is m (minutes) or h (hours). Examples: 1m, 5m, 1h, 2h (default: 1m)
REGISTRATION_ENABLED(Optional) Enable or disable new user registration (e.g., TRUE or FALSE, default: TRUE)
PAGES_LIMIT(Optional) Maximum pages limit per user for ETL services (default: 999999999 for unlimited in OSS version)

Google Connector OAuth Configuration:

ENV VARIABLEDESCRIPTION
GOOGLE_CALENDAR_REDIRECT_URI(Optional) Redirect URI for Google Calendar connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/google/calendar/connector/callback)
GOOGLE_GMAIL_REDIRECT_URI(Optional) Redirect URI for Gmail connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/google/gmail/connector/callback)
GOOGLE_DRIVE_REDIRECT_URI(Optional) Redirect URI for Google Drive connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/google/drive/connector/callback)

Connector OAuth Configurations (Optional):

ENV VARIABLEDESCRIPTION
AIRTABLE_CLIENT_ID(Optional) Airtable OAuth client ID from Airtable Developer Hub
AIRTABLE_CLIENT_SECRET(Optional) Airtable OAuth client secret
AIRTABLE_REDIRECT_URI(Optional) Redirect URI for Airtable connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/airtable/connector/callback)
CLICKUP_CLIENT_ID(Optional) ClickUp OAuth client ID
CLICKUP_CLIENT_SECRET(Optional) ClickUp OAuth client secret
CLICKUP_REDIRECT_URI(Optional) Redirect URI for ClickUp connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/clickup/connector/callback)
DISCORD_CLIENT_ID(Optional) Discord OAuth client ID
DISCORD_CLIENT_SECRET(Optional) Discord OAuth client secret
DISCORD_REDIRECT_URI(Optional) Redirect URI for Discord connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/discord/connector/callback)
DISCORD_BOT_TOKEN(Optional) Discord bot token from Developer Portal
ATLASSIAN_CLIENT_ID(Optional) Atlassian OAuth client ID (for Jira and Confluence)
ATLASSIAN_CLIENT_SECRET(Optional) Atlassian OAuth client secret
JIRA_REDIRECT_URI(Optional) Redirect URI for Jira connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/jira/connector/callback)
CONFLUENCE_REDIRECT_URI(Optional) Redirect URI for Confluence connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/confluence/connector/callback)
LINEAR_CLIENT_ID(Optional) Linear OAuth client ID
LINEAR_CLIENT_SECRET(Optional) Linear OAuth client secret
LINEAR_REDIRECT_URI(Optional) Redirect URI for Linear connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/linear/connector/callback)
NOTION_CLIENT_ID(Optional) Notion OAuth client ID
NOTION_CLIENT_SECRET(Optional) Notion OAuth client secret
NOTION_REDIRECT_URI(Optional) Redirect URI for Notion connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/notion/connector/callback)
SLACK_CLIENT_ID(Optional) Slack OAuth client ID
SLACK_CLIENT_SECRET(Optional) Slack OAuth client secret
SLACK_REDIRECT_URI(Optional) Redirect URI for Slack connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/slack/connector/callback)
TEAMS_CLIENT_ID(Optional) Microsoft Teams OAuth client ID
TEAMS_CLIENT_SECRET(Optional) Microsoft Teams OAuth client secret
TEAMS_REDIRECT_URI(Optional) Redirect URI for Teams connector OAuth callback (e.g., http://localhost:8000/api/v1/auth/teams/connector/callback)

(Optional) Backend LangSmith Observability:

ENV VARIABLEDESCRIPTION
LANGSMITH_TRACINGEnable LangSmith tracing (e.g., true)
LANGSMITH_ENDPOINTLangSmith API endpoint (e.g., https://api.smith.langchain.com)
LANGSMITH_API_KEYYour LangSmith API key
LANGSMITH_PROJECTLangSmith project name (e.g., surfsense)

(Optional) Uvicorn Server Configuration

ENV VARIABLEDESCRIPTIONDEFAULT VALUE
UVICORN_HOSTHost address to bind the server0.0.0.0
UVICORN_PORTPort to run the backend API8000
UVICORN_LOG_LEVELLogging level (e.g., info, debug, warning)info
UVICORN_PROXY_HEADERSEnable/disable proxy headersfalse
UVICORN_FORWARDED_ALLOW_IPSComma-separated list of allowed IPs127.0.0.1
UVICORN_WORKERSNumber of worker processes1
UVICORN_ACCESS_LOGEnable/disable access log (true/false)true
UVICORN_LOOPEvent loop implementationauto
UVICORN_HTTPHTTP protocol implementationauto
UVICORN_WSWebSocket protocol implementationauto
UVICORN_LIFESPANLifespan implementationauto
UVICORN_LOG_CONFIGPath to logging config file or empty string
UVICORN_SERVER_HEADEREnable/disable Server headertrue
UVICORN_DATE_HEADEREnable/disable Date headertrue
UVICORN_LIMIT_CONCURRENCYMax concurrent connections
UVICORN_LIMIT_MAX_REQUESTSMax requests before worker restart
UVICORN_TIMEOUT_KEEP_ALIVEKeep-alive timeout (seconds)5
UVICORN_TIMEOUT_NOTIFYWorker shutdown notification timeout (sec)30
UVICORN_SSL_KEYFILEPath to SSL key file
UVICORN_SSL_CERTFILEPath to SSL certificate file
UVICORN_SSL_KEYFILE_PASSWORDPassword for SSL key file
UVICORN_SSL_VERSIONSSL version
UVICORN_SSL_CERT_REQSSSL certificate requirements
UVICORN_SSL_CA_CERTSPath to CA certificates file
UVICORN_SSL_CIPHERSSSL ciphers
UVICORN_HEADERSComma-separated list of headers
UVICORN_USE_COLORSEnable/disable colored logstrue
UVICORN_UDSUnix domain socket path
UVICORN_FDFile descriptor to bind to
UVICORN_ROOT_PATHRoot path for the application

Refer to the .env.example file for all available Uvicorn options and their usage. Uncomment and set in your .env file as needed.

For more details, see the Uvicorn documentation.

2. Install Dependencies

Install the backend dependencies using uv:

Linux/macOS:

# Install uv if you don't have it
curl -fsSL https://astral.sh/uv/install.sh | bash

# Install dependencies
uv sync

Windows (PowerShell):

# Install uv if you don't have it
iwr -useb https://astral.sh/uv/install.ps1 | iex

# Install dependencies
uv sync

Windows (Command Prompt):

# Install dependencies with uv (after installing uv)
uv sync

3. Start Redis Server

Redis is required for Celery task queue. Start the Redis server:

Linux:

# Start Redis server
sudo systemctl start redis

# Or if using Redis installed via package manager
redis-server

macOS:

# If installed via Homebrew
brew services start redis

# Or run directly
redis-server

Windows:

# Option 1: If using Redis on Windows (via WSL or Windows port)
redis-server

# Option 2: If installed as a Windows service
net start Redis

Alternative for Windows - Run Redis in Docker:

If you have Docker Desktop installed, you can run Redis in a container:

# Pull and run Redis container
docker run -d --name redis -p 6379:6379 redis:latest

# To stop Redis
docker stop redis

# To start Redis again
docker start redis

# To remove Redis container
docker rm -f redis

Verify Redis is running by connecting to it:

redis-cli ping
# Should return: PONG

4. Start Celery Worker

In a new terminal window, start the Celery worker to handle background tasks:

If using uv:

# Make sure you're in the surfsense_backend directory
cd surfsense_backend

# Start Celery worker
uv run celery -A celery_worker.celery_app worker --loglevel=info --concurrency=1 --pool=solo

If using pip/venv:

# Make sure you're in the surfsense_backend directory
cd surfsense_backend

# Activate virtual environment
source .venv/bin/activate  # Linux/macOS
# OR
.venv\Scripts\activate     # Windows

# Start Celery worker
celery -A celery_worker.celery_app worker --loglevel=info --concurrency=1 --pool=solo

Optional: Start Flower for monitoring Celery tasks:

In another terminal window:

# If using uv
uv run celery -A celery_worker.celery_app flower --port=5555

# If using pip/venv (activate venv first)
celery -A celery_worker.celery_app flower --port=5555

Access Flower at http://localhost:5555 to monitor your Celery tasks.

5. Start Celery Beat (Scheduler)

In another new terminal window, start Celery Beat to enable periodic tasks (like scheduled connector indexing):

If using uv:

# Make sure you're in the surfsense_backend directory
cd surfsense_backend

# Start Celery Beat
uv run celery -A celery_worker.celery_app beat --loglevel=info

If using pip/venv:

# Make sure you're in the surfsense_backend directory
cd surfsense_backend

# Activate virtual environment
source .venv/bin/activate  # Linux/macOS
# OR
.venv\Scripts\activate     # Windows

# Start Celery Beat
celery -A celery_worker.celery_app beat --loglevel=info

Important: Celery Beat is required for the periodic indexing functionality to work. Without it, scheduled connector tasks won't run automatically. The schedule interval can be configured using the SCHEDULE_CHECKER_INTERVAL environment variable.

6. Run the Backend

Start the backend server:

If using uv:

# Run without hot reloading
uv run main.py

# Or with hot reloading for development
uv run main.py --reload

If using pip/venv:

# Activate virtual environment if not already activated
source .venv/bin/activate  # Linux/macOS
# OR
.venv\Scripts\activate     # Windows

# Run without hot reloading
python main.py

# Or with hot reloading for development
python main.py --reload

If everything is set up correctly, you should see output indicating the server is running on http://localhost:8000.

Frontend Setup

1. Environment Configuration

Set up the frontend environment:

Linux/macOS:

cd surfsense_web
cp .env.example .env

Windows (Command Prompt):

cd surfsense_web
copy .env.example .env

Windows (PowerShell):

cd surfsense_web
Copy-Item -Path .env.example -Destination .env

Edit the .env file and set:

ENV VARIABLEDESCRIPTION
NEXT_PUBLIC_FASTAPI_BACKEND_URLBackend URL (e.g., http://localhost:8000)
NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPESame value as set in backend AUTH_TYPE i.e GOOGLE for OAuth with Google, LOCAL for email/password authentication
NEXT_PUBLIC_ETL_SERVICEDocument parsing service (should match backend ETL_SERVICE): UNSTRUCTURED, LLAMACLOUD, or DOCLING - affects supported file formats in upload interface
NEXT_PUBLIC_ELECTRIC_URLURL for Electric-SQL service (e.g., http://localhost:5133)
NEXT_PUBLIC_ELECTRIC_AUTH_MODEElectric-SQL authentication mode (default: insecure)

2. Install Dependencies

Install the frontend dependencies:

Linux/macOS:

# Install pnpm if you don't have it
npm install -g pnpm

# Install dependencies
pnpm install

Windows:

# Install pnpm if you don't have it
npm install -g pnpm

# Install dependencies
pnpm install

3. Run the Frontend

Start the Next.js development server:

Linux/macOS/Windows:

pnpm run dev

The frontend should now be running at http://localhost:3000.

Browser Extension Setup (Optional)

The SurfSense browser extension allows you to save any webpage, including those protected behind authentication.

1. Environment Configuration

Linux/macOS:

cd surfsense_browser_extension
cp .env.example .env

Windows (Command Prompt):

cd surfsense_browser_extension
copy .env.example .env

Windows (PowerShell):

cd surfsense_browser_extension
Copy-Item -Path .env.example -Destination .env

Edit the .env file:

ENV VARIABLEDESCRIPTION
PLASMO_PUBLIC_BACKEND_URLSurfSense Backend URL (e.g., http://127.0.0.1:8000)

2. Build the Extension

Build the extension for your browser using the Plasmo framework.

Linux/macOS/Windows:

# Install dependencies
pnpm install

# Build for Chrome (default)
pnpm build

# Or for other browsers
pnpm build --target=firefox
pnpm build --target=edge

3. Load the Extension

Load the extension in your browser's developer mode and configure it with your SurfSense API key.

Verification

To verify your installation:

  1. Open your browser and navigate to http://localhost:3000
  2. Sign in with your Google account
  3. Create a search space and try uploading a document
  4. Test the chat functionality with your uploaded content

Troubleshooting

  • Database Connection Issues: Verify your PostgreSQL server is running and pgvector is properly installed
  • Redis Connection Issues: Ensure Redis server is running (redis-cli ping should return PONG). Check that CELERY_BROKER_URL and CELERY_RESULT_BACKEND are correctly set in your .env file
  • Celery Worker Issues: Make sure the Celery worker is running in a separate terminal. Check worker logs for any errors
  • Authentication Problems: Check your Google OAuth configuration and ensure redirect URIs are set correctly
  • LLM Errors: Confirm your LLM API keys are valid and the selected models are accessible
  • File Upload Failures: Validate your ETL service API key (Unstructured.io or LlamaCloud) or ensure Docling is properly configured
  • Windows-specific: If you encounter path issues, ensure you're using the correct path separator (\ instead of /)
  • macOS-specific: If you encounter permission issues, you may need to use sudo for some installation commands

Next Steps

Now that you have SurfSense running locally, you can explore its features:

  • Create search spaces for organizing your content
  • Upload documents or use the browser extension to save webpages
  • Ask questions about your saved content
  • Explore the advanced RAG capabilities

For production deployments, consider setting up:

  • A reverse proxy like Nginx
  • SSL certificates for secure connections
  • Proper database backups
  • User access controls

On this page