Skip to main content
Every video notewise processes goes through the same sequence. Understanding the pipeline helps you predict output behavior, tune concurrency, and debug problems.

Pipeline flowchart

URL / Video ID


1. Parse & Validate URL


2. Fetch Video Metadata ──── cache hit? ──► VIDEO_SKIPPED
      │                                         │
      ▼                                   output exists &
3. Fetch Transcript                       no --force? → skip


4. Chunk & Generate

      ├── chapters present & duration > 1 hr?
      │         ▼ YES
      │   4a. Chapter-level generation (concurrent)
      │         └── per chapter: chunk → generate → combine

      └── NO: 4b. Flat generation (chunk → generate → combine)


5. Write output artifacts
      ├── Standard video    → <title>.md
      ├── Chapter-aware     → directory with 01_<chapter>.md …
      ├── --quiz            → <title>_quiz.md
      └── --export-transcript → <title>_transcript.txt/.json


6. Persist to SQLite cache


7. Emit VIDEO_SUCCESS / VIDEO_FAILED

Step details

1 · Parse & Validate URL

Module: youtube/parser.pyparse_youtube_url()Accepts any of these as input:
  • Full YouTube watch URL (https://youtube.com/watch?v=...)
  • Short URL (https://youtu.be/...)
  • Embed URL or Shorts URL
  • Bare 11-character video ID
  • Bare playlist ID (PL, UU, LL, FL, RD, UL, WL, OLAK5uy_)
  • Playlist URL (?list=...)
  • Path to a .txt batch file (one URL per line)
Returns a ParsedURL domain object with url_type ("video" or "playlist"), video_id, and/or playlist_id. For playlists, extract_playlist_videos() expands into individual video IDs before the pipeline starts.
Module: youtube/metadata.pyget_video_metadata()Retrieves title, duration (seconds), and chapter list. Two early-exit checks:
  • Cache hit — if the video ID is in SQLite cache and --force is not set → emit VIDEO_SKIPPED.
  • Output path check — if notes already exist in the output directory and --force is not set → skip.
Module: youtube/transcript.pyfetch_transcript()Tries languages in the preference list in order (default: ["en"]), falling back to the next if unavailable. If a cookie_file is configured, it is used for all requests — enabling private video access.
  • Retries: up to 3 attempts with backoff on network errors.
  • Returns a VideoTranscript with a list of TranscriptSegment (text, start time, duration).
Module: pipeline/generation.pyStudyMaterialGeneratorToken counting uses LiteLLM’s token_counter with the configured model. If the transcript exceeds 4,000 tokens (DEFAULT_CHUNK_SIZE), it is split into overlapping 200-token chunks, split on sentence boundaries where possible.4a — Chapter-level generationActivated when the video has chapters and duration exceeds 3,600 seconds (1 hour).
  • Chapters processed concurrently, up to 3 at a time.
  • Each chapter is independently chunked and reduced to a Markdown document.
  • Events: CHAPTER_GENERATINGCHAPTER_CHUNK_GENERATINGCHAPTER_COMBININGCHAPTER_COMPLETE
4b — Flat generationUsed for short videos or videos without chapter data.
  • Single-pass for small transcripts, multi-chunk with a combine step for large ones.
  • Events: GENERATION_STARTCHUNK_GENERATINGGENERATION_COMBININGGENERATION_COMPLETE
Module: pipeline/_artifacts.py
ConditionOutput
Standard video<OUTPUT_DIR>/<sanitized title>.md
Chapter-aware video<OUTPUT_DIR>/<title>/01_<chapter>.md, 02_<chapter>.md, …
--quiz<title>_quiz.md alongside notes
--export-transcript txt<title>_transcript.txt
--export-transcript json<title>_transcript.json
Module: storage/repository.pyDatabaseRepository.aupsert_video_cache()After a successful run, the following are written to SQLite:
  • Video metadata (id, title, duration)
  • Raw transcript text and language
  • Token usage (prompt, completion, total)
  • Cost estimate (USD)
  • Timing (transcript fetch seconds, generation seconds)
  • Model name
This powers skip-if-already-processed logic and the stats / history commands.
PipelineEvent objects are emitted via the on_event callback throughout. The CLI subscribes via PipelineDashboard to update the live terminal UI.At the end, CorePipeline.run() returns a PipelineResult with success_count, failure_count, total_count, errors, and metrics.See Pipeline Events for the full event type reference.

Concurrency model

notewise uses asyncio throughout.
SemaphoreDefaultConfig key
Video-level concurrency5MAX_CONCURRENT_VIDEOS
Chapter-level concurrency3Code default only
YouTube request rate10 / minYOUTUBE_REQUESTS_PER_MINUTE
In batch runs, videos are processed concurrently up to MAX_CONCURRENT_VIDEOS. Within a single long video with chapters, chapters are processed concurrently up to DEFAULT_MAX_CONCURRENT_CHAPTERS. A PipelineSharedState object is passed into CorePipeline during batch runs so all video instances share the same semaphores.