XBMC Library Cleanup Part I - Missing and Duplicate Movies

Detect missing or duplicate movies in the XBMC database with SQL queries and some Python

Table of Contents

SQL Walkthrough
Step 1: Connect to the Video Library
Step 2: Find the idPath – Reveals Missing Movies
Step 3: Find the idFiles – Reveals Duplicate Entries
Automate the Procedure with Python
Code
Example Output

XBMC is an outstanding media player with an attractive and fast user interface. The software manages your media collection almost perfectly, nevertheless, from time to time movies are not properly added one-to-one. The two types of misrelation are:

XBMC does not add the movie to the library at all.
XBMC adds multiple instances of the same movie.

The second error is quite apparent when your browse through the movie collection (see for instance the featured image, where the movie The World’s End appears twice). The first type is harder to notice, especially if you add movies in bulk. There is an excellent add-on, however, called “Missing Movie Scanner” that lists missing media within XBMC.

But even though both types of errors can be detected from within XBMC, you might want to directly query the XBMC video database for the following reasons:

you get a report of errors almost instantly, rather than having to browse the library and record errors separately
you can detect errors without launching XBMC
you can easily automate and extend the procedure, e.g., to automatically delete movies with duplicate entries

The tutorial focuses on how to query the XBMC video database to find out how many times a particular movie will appear in XBMC. Given those SQL queries it’s easy to build a script that generates a report for an entire movie collection; a small proof of concept Python code to do so is shown at the end.

SQL Walkthrough

Step 1: Connect to the Video Library

The database is named MyVideos**.db, where ** denotes the version number. On Windows you usually find the file under C:\Users\USERNAME\AppData\Roaming\XBMC\userdata\Database</code> (see the entry on the XBMC wiki for more information).

To query the database, you’ll need a SQLite interface. Some popular choices on Windows are SQLite3Explorer{.highlight}, the SQLite Manager Extension for Firefox{.highlight} and the plain command-line shell{.highlight} – which I’m going to use in the following.
For the command-line program simply provide the database as the first argument to connect:
>sqlite3.exe <path_to_db>\MyVideos75.db 
SQLite version 3.8.2 2013-12-06 14:53:30 
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>
Step 2: Find the idPath – Reveals Missing Movies
Let’s assume your movies are in separate folders that match the movie title, e.g.,
X:\Movies\Con Air (1997)\
    Con Air 1997.mk
    extrathumbs\
        thumb1.jpg
        thumb2.jpg
    extrafanart\
        fanart1.jpg
The starting point is the folder that contains the video (X:\Movies\Con Air (1997)). The table path maps the path to a unique idPath:
SELECT idPath FROM path WHERE strPath="X:\Movies\Con Air (1997)\";
/* example result: 483 */
If there isn’t an entry for your movie path, then the movie won’t show up in XBMC. To fix this, make sure the folder name corresponds to the movie title. If that doesn’t help, add a text file movie.nfo to the folder with the following content:
<movie>
    <title>Con Air (1997)</title>
    <sorttitle>Con Air (1997)</sorttitle>
</movie>

http://akas.imdb.com/title/tt0118880/
Remark: If you have DVDs that you store as is, and if keep the VIDEO_TS folder, then add the following to your WHERE-clause to account for both cases:
... OR strPath="X:\Movies\Con Air (1997)\VIDEO_TS\";
Step 3: Find the idFiles – Reveals Duplicate Entries
Given the idPath (for example 483) one can now query the table files to get a list of all movie files:
SELECT strFilename FROM  files WHERE idPath=483;
/* example result: Con Air 1997 (dvdrip).avi */
Even if the movie consists of multiple parts (file stacking{.highlight}) or if your storing raw DVD, you should only get one entry for each movie. Let’s say your movie is split into two parts Gomorra (2008) disc 1.avi and Gomorra (2008) disc 2.avi respectively. XBMC won’t add two entries to the files database like might assume, but rather combines both files into a comma separated list prefixed by ‘stack://’. Here’s the example as it should be stored in the database:
SELECT strFilename FROM files 
WHERE idPath=(
    SELECT idPath FROM path WHERE strPath="X:\Movies\Gomorra (2008)\"
);
/* example result:
   stack://X:\Movies\Gomorra (2008)\Gomorra (2008) disc 1.avi , X:\Movies\Gomorra (2008)\Gomorra (2008) disc 2.avi */
If you get more than one entry for a given path then all those entries will show up as individual movies in XBMC. For instance:
SELECT strFilename FROM files 
WHERE idPath=(
    SELECT idPath FROM path WHERE strPath="X:\Movies\The World's End (2013)"
);
X:\Movies\The World's End (2013)\first part.avi
X:\Movies\The World's End (2013)\second part.avi
indicates a failed file stacking resulting in two database entries and twice the same movie in XBMC:
Automate the Procedure with Python
The following code iterates over all movies in a movie folder and executes the above SQL queries for each one of them. It then reports missing movies or movies added more than once.
Code
import argparse
import sqlite3 as lite
import os
 
def _open_db(db):
    try:
        con = lite.connect(db)
        cur = con.cursor()
        return (cur,con)
    except lite.Error as e:
        print("Could not open database %s: %s" % (db,e))
        quit(1)
 
def find_flaws(db, movie_folder):
    """ find missing or duplicate movies in video library
    
        lookup all movies from a folder in video db
        and find the following flaws:
       
        (1) the movie is not in the libary (won't show up in XBMC)
        (2) the movie has multiple entries (will show up multiple times)
        
        Args:
            db: path to 'MyVideos**.db' (XBMC video library)
            movie_folder: path to directory with movies in separate folders
            
        Returns:
            flaws: dictionary of flaws.
                   key: movie name
                   value: flaw name
    """
 
    (cur,con) = _open_db(db)
    flaws = dict()
    movies = os.listdir(movie_folder)
 
    # iterate over all subdirectories of movies, i.e., over all movies on disk
    for i, movie in enumerate(movies):
        # show progress
        print "\r{0:>3}/{1:<3} {2:<100}:".format(i, len(movies), movie),
        path = os.path.join(movie_folder, movie)
                
        # 1) check table 'path' for movies without 'idPath' (movie is missing)
        #    also checks 'VIDEO_TS' subfolder for DVDs
        cur.execute('SELECT idPath FROM path WHERE strPath=? or ' +
                'strPath like ?', (path + os.sep, 
                 os.path.join(path,"VIDEO_TS") + os.sep))                
        res = cur.fetchone()
        
        if not res:
            flaws[movie] = "missing"
            continue
        else:
            idPath = res[0]
            
        # 2) check table 'files' for multiple entry (movie duplicates)        
        cur.execute('SELECT idFile FROM files WHERE idPath=?', (idPath,))
        idsFile = [tmp[0] for tmp in cur.fetchall()]        
        if len(idsFile) > 1:
            flaws[movie] = "{0} duplicate(s)".format(len(idsFile)-1)            
        elif len(idsFile) == 0:
            flaws[movie] = "missing"
        
    # clean progress line
    print "\r" + 100*" " + "\r",
    con.close()
    return flaws
    
if __name__=="__main__":
    parser = argparse.ArgumentParser(description='find flaws in XBMC video database')
    parser.add_argument('db', help='path to myvideos**.db library')
    parser.add_argument('movie_folder', help='path to movie folder')   
    args = parser.parse_args()
    
    flaws = find_flaws(args.db, args.movie_folder)
    for movie, flaw in sorted(flaws.items()):
        print("{0:<15}: {1} ".format(flaw, movie))
Link to Gist
Example Output
>missing_or_duplicate_movies.py ~/.xbmc/userdata/Database/MyVideos75.db ~/Movies
1 duplicate(s) : A Very Harold & Kumar 3D Christmas (2011)                                                    :
1 duplicate(s) : Devil Seed (2012)
1 duplicate(s) : Hera Pheri (2000)
3 duplicate(s) : Iron Man 3 (2013)
1 duplicate(s) : The Devil's Double (2011)
1 duplicate(s) : The French Connection (1971)
missing        : The Vanishing (1988)
1 duplicate(s) : The World's End (2013)
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.
James Oct 29, 2014 21:06:17 UTCYou pick and choose nice subject in your blog.Your work technique of writing the web log is fantastic
Josef Dec 27, 2014 03:03:31 UTCFantastic article!
I am using mysql as DB backend in kodi....are you considering a part II of your article? maybe dealing with mysql?
Guest Jan 22, 2015 16:13:18 UTCTo list all duplicates in one line:
SELECT c00, c22, count(c00) FROM movie GROUP BY c00 HAVING count(c00) > 1;
Marco Fluffy Hffh Jan 22, 2015 16:23:19 UTCTo list all duplicates in one line:
SELECT c00, c22, count(c00) FROM movie GROUP BY c00, c07 HAVING count(c00) > 1;
fledgeless Jan 26, 2015 20:39:24 UTCWell done. You just have to be careful of non-ascii characters in your directory/file naming as that will cause the script to fail.
tyler durden May 05, 2015 23:53:17 UTCthe python script is not working
getting this error
line 40
print "\r{0:>3}/{1:<3} {2:<100}:".format(i, len(movies), movie),
Johannes May 06, 2015 04:17:01 UTCThat's probably because you use Python 3. Change the line to print("...", end="")
tyler durden May 06, 2015 05:41:59 UTCI don't know how to fix this properly..
So you are saying replacing entire line
-print "\r{0:>3}/{1:<3} {2:<100}:".format(i, len(movies), movie),
with
-print("...", end=""),
Can you post an updated file please.
Also how to use the command in windows 7?
Gleb Zverinskiy Nov 13, 2015 03:28:33 UTCprint ("\r{0:>3}/{1:<3} {2:<100}:").format(i, len(movies), movie),
Benjamin Stürmer Sep 04, 2015 19:30:26 UTCwrote a script to delete duplicates without parsing a local directory.
https://gist.github.com/the...