Judul: “Kakek, Cucu, dan Kenangan 01.3gp”


3.1 Choose a sampling strategy

| Strategy | When to use | |----------|-------------| | Every N seconds (e.g., 1 fps) | Quick overview, low storage | | Every N‑th frame (e.g., 30) | Fixed‑rate extraction, good for motion analysis | | Scene‑change based (via PySceneDetect) | Only keep distinct shots |

Below is a simple 1‑fps extractor using ffmpeg (fast, no Python loop):

ffmpeg -i "ABG kakek ML ama cucu sendiri. kakek 01.3gp" \
       -vf "fps=1" -qscale:v 2 frames/frame_%04d.jpg
  • -vf "fps=1" → 1 frame per second
  • -qscale:v 2 → high‑quality JPEG (lower = better)

All frames will be saved under frames/.

5.3 Speech‑to‑Text (optional)

OpenAI Whisper (or any other ASR) works well even on low‑quality audio.

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
processor = WhisperProcessor.from_pretrained("openai/whisper-base")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base").to(device)
def transcribe(wav_path):
    audio = processor(wav_path, sampling_rate=16000, return_tensors="pt").input_features.to(device)
    predicted_ids = model.generate(audio, max_new_tokens=200)
    return processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
transcript = transcribe("audio.wav")
print("=== Transcript ===")
print(transcript[:1000])   # first 1000 chars

You now have a plain‑text representation that can be:

  • Indexed for keyword search
  • Run through sentiment / topic models
  • Aligned with visual frames (e.g., using forced alignment tools like aeneas)

Bab 3 – Mengapa Kakek Menyimpan Rahasia?

Setelah menonton, Bima menatap kakeknya dengan kebingungan. “Kakek, kenapa dulu Kakek tidak pernah bercerita tentang Manda? Siapa dia sebenarnya?”

Raden menghela napas berat. Ia menatap mata cucunya yang polos dan berkata:

“Aku menutup mata itu karena pada waktu itu, hati kami masih terluka. Ibu Manda meninggal karena penyakit yang tak terobati, dan ayahnya—saudaraku—menghilang setelah pertengkaran yang tak pernah selesai. Aku takut menanggung beban masa lalu yang terlalu berat, jadi aku menyimpan semua itu dalam video, agar tidak lagi mengganggu hidupku yang sekarang.”

Namun, ia menambahkan, “Kau, Bima, adalah generasi yang berbeda. Kita tidak lagi hidup dalam bayang‑bayang dendam. Kita bisa memetik pelajaran dari masa lalu, bukan menutupnya.”


1️⃣ Install the required software

# Core video/audio utilities (Linux/macOS – for Windows use the pre‑built binaries)
sudo apt-get update && sudo apt-get install -y ffmpeg mediainfo
# Python environment (recommended: conda)
conda create -n video_feat python=3.11 -y
conda activate video_feat
# Core Python libraries
pip install opencv-python-headless tqdm \
            numpy pandas h5py \
            ffmpeg-python \
            librosa soundfile pydub \
            torch torchvision torchaudio \
            facenet-pytorch \
            pySceneDetect \
            moviepy \
            transformers[torch]  # for Whisper/other ASR models

Tip: If you only need a tiny subset (e.g., just metadata) you can skip the heavy deep‑learning packages.