Canadian Immigration Dashboard [ CID ]

1. Dataset Scope

This dashboard analyses YouTube comments related to Canadian immigration collected between 2013 and 2026. The corpus consists of:

113 channels — major Canadian news outlets (CBC, CTV, Global News), international news (CNN, TRT World, Firstpost, WION), and immigration-focused creators.
362 videos — selected by keyword search and channel expansion from seeds including: Canadian immigration, Canadian visas, deportation, immigrant settlement, anti-immigration, Indians in Canada, Punjabis in Canada.
508,577 top-level comments and 87,965 replies from unique commenter accounts.
Temporal distribution is heavily skewed toward 2023–2025, reflecting YouTube's engagement patterns and the collection window.

Important limitation: This corpus is a purposive sample, not a representative sample of all YouTube discourse on Canadian immigration. Channel expansion may have introduced non-immigration content. Findings should be interpreted as patterns within this collected corpus, not as representative of all Canadian public opinion.

2. Data Collection Pipeline

Stage A — Keyword seed: YouTube search API queries using immigration-related keywords. Matching video URLs saved as seed list.

Stage B — Channel expansion: For each video in the seed, the publishing channel was identified and additional videos from that channel collected. This captures channel-level discourse but may include non-immigration content.

Stage C — Comment & reply collection: YouTube Data API v3 was used to collect all top-level comment threads, including replies, for each video. Comments are paginated and stored in full.

Stage D — Incremental refresh: Subsequent ETL runs use per-video checkpoints (etl_comment_checkpoints) to fetch only comments published after the last collection run, avoiding duplication.

Stage E — LLM classification: Comment text was batch-classified (500 comments per batch) using an LLM classifier (Gemini / DeepSeek / OpenAI-compatible endpoint) into the 25 sentiment categories described in Section 4. Note: the comment_llm_labels table (which stores richer per-comment labels including identity_frame, migration_stage, stance, and toxicity) is currently empty — advanced LLM labelling has not yet been run on the full corpus.

3. Sentiment Classification Model

Sentiment was assigned using a multi-label LLM classifier. Each comment may receive one or more category labels from the 25-category taxonomy (Section 4). Category assignments are stored in the comment_sentiment_categories join table — the current corpus has 1,033,643 category assignments across 469,744 classified comments.

Polarity (Positive / Negative / Ambiguous) is a property of the category, not independently assigned per comment. The overall polarity distribution is:

Negative 62.8% Positive 26.0% Ambiguous 11.2%

Rationale: Each sentiment record in comment_sentiment includes an LLM-generated rationale explaining why the category was assigned. These are accessible via the Close Reading page.

Reliability note: LLM classifiers at this scale (batch 500) are not manually validated. Category assignments should be treated as exploratory indicators, not ground truth. Researchers conducting publishable analysis should validate a sample of assignments against human coders.

4. Sentiment Category Codebook

The 25 categories were designed to capture the emotional and discursive register of immigration-related public commentary. Categories are grouped by polarity.

Ambiguous polarity

Identity Struggle

Comments reflecting confusion or conflict around cultural identity.

Mixed/Conflicted

Ambivalence or coexisting positive and negative tones.

Relief

Sense of escape or safety after hardship in home country.

Sarcasm/Irony

Masked criticism or humor with emotional complexity.

Negative polarity

Alienation

Feeling of being excluded, marginalized, or unwanted.

Disillusionment

Loss of faith or idealism, especially after arrival.

Economic Anxiety

Worries about cost of living, housing, or job market.

Fear

Expressions of anxiety or concern.

Frustration with System

Disappointment or anger at immigration processes, bureaucracy, or delays.

Homesickness

Expressions of longing for home country, family, or familiar culture.

Indignation

Moral outrage or ethical disapproval.

Policy Critique

Critical reflection on immigration laws or political decisions.

Racism Experienced

Describing or denouncing racial discrimination.

Resentment

Frustration, bitterness, or feelings of injustice.

Xenophobia or Prejudice

Negative attitudes toward immigrants or minority groups.

Positive polarity

Admiration for Canada

Expressions of praise or admiration for Canadian values, people, or policies.

Belonging

Feeling of community, acceptance, or inclusion.

Civic Optimism

Positive outlook on participating in democracy or civil life.

Empathy

Supportive, understanding, or compassionate sentiment.

Gratitude

Expressions of thankfulness, relief, or appreciation.

Hope

Aspirational or forward-looking optimism.

Hope for Children

Optimism centered on better futures for one's children.

Job Satisfaction

Positive reflections on career opportunities or working conditions.

Solidarity

Support expressed for others facing similar immigration struggles.

Thankfulness for Safety

Gratitude for peace, asylum, or protection.

5. Citing This Dataset

When citing findings derived from this dashboard in academic work, include the following information:

Canadian Immigration YouTube Analytics Dashboard. Data collected 2013–2026 via YouTube Data API v3. Sentiment classified using LLM-assisted thematic coding (25-category scheme). Live dashboard: https://youtubeanalytics.ca/ [Accessed: ].

Note: as an evolving live database, row counts and sentiment distributions will change as new data is ingested. Record the access date and relevant counts (visible on the Overview page) when citing specific statistics.

6. Known Limitations

Purposive sample: Channels and videos were chosen via keyword search and channel expansion — this is not a random sample of YouTube content.
Geo NLP noise: Place-name extraction uses NLP entity recognition which produces false positives (e.g., "Quran", "West", "Danielle" appear as city entities). Geographic data should be used indicatively, not as precise counts.
LLM classification reliability: Category assignments were generated at scale without human validation. Treat as exploratory.
comment_llm_labels is empty: The richer per-comment LLM label table (with identity_frame, migration_stage, stance, toxicity) has not yet been populated. These fields are reserved for a future classification run.
Platform bias: YouTube comment sections attract particular demographics and discourse styles. Comments are not representative of the general Canadian public.
Missing subscriber/view counts: The channels table does not currently store subscriber counts or channel-level view counts — only comment-derived metrics are available.
ETL errors: Recent ingestion runs (March 2026) have returned errors. Check the Django Admin ETL log for current pipeline status.

Methodology & Codebook