MusicBrainz andmebaasi importimine PostgreSQL-i28. May '14

Käesolevas artiklis püüame importida MusicBrainz andmebaasi oma isiklikus käsutuses olevasse Ubunu/Debian masinasse.

Teeme väljavõtte MusicBrainz serveri Git repositooriumist:

git clone https://github.com/metabrainz/musicbrainz-server/

Loome mbrainz nimelise kasutaja Postgre andmebaasi, selleks kõigepealt logida administratiivkasutajana andmebaasi:

psql postgres postgres

Loome MusicBrainz andmebaasi ning lisame selle kasutamise jaoks ka rolli:

-- Loo roll nimega mbrainz ning sea selle parooliks salakala
CREATE ROLE mbrainz LOGIN PASSWORD 'salakala';

-- Loo andmebaas musicbrainz
CREATE DATABASE musicbrainz;

-- Anna mbrainz rollile kõik õigused (INSERT, SELECT, jms) selles andmebaasis
GRANT ALL ON DATABASE musicbrainz TO mbrainz;
GRANT ALL ON SCHEMA musicbrainz TO mbrainz;

-- Lülitu andmebaasilt postgres üle andmebaasile musicbrainz
\connect musicbrainz;

-- Luba musicbrainz andmebaasis cube laiendus
CREATE EXTENSION cube;

-- Loome nimeruumi musicbrainz kuhu istutatakse hiljem mõned
-- funktsioonid
CREATE SCHEMA musicbrainz;

-- Seame musicbrainz andmebaasil vaikimisi nimeruumi
-- otsingusse musicbrainz schema
ALTER DATABASE musicbrainz SET search_path TO musicbrainz,public;

-- Loome tabelid musicbrainz-server repositooriumist
\i musicbrainz-server/admin/sql/CreateTables.sql
\q

Väljume musicbrainz-server kataloogist:

cd ..

Laadime alla MusicBrainz andmebaasi dump-i. Tegu on sisuliselt hunniku kokku pakitud CSV failidega:

wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20130713-003116/mbdump.tar.bz2

Pakime lahti:

tar xvjf mbdump.tar.bz2
cd mbdump

Impordime:

mkdir -p ../done
for t in * ; do
    echo `date` $t ; echo "\\copy $t from ./$t" | \
        psql musicbrainz postgres && mv $t ../done/ ;
done ;
echo `date` Done

Võtame lahti uuesti Postgres käsurea:

cd musicbrainz-server
psql musicbrainz postgres

Lisame primaarvõtmed ning indeksid, muidu on üleüldse lootusetu nendest andmekogustest midagi välja filtreerida:

-- Lisa primaarvõtmed
\i admin/sql/CreatePrimaryKeys.sql

-- Lisa funktsioonid
\i admin/sql/CreateFunctions.sql

-- Kommenteeri välja järgnevas failis indeksid mis kasutavad
-- paigaldamata laiendusi musicbrainz_collate, page_index jms
\i admin/sql/CreateIndexes.sql

-- Lisa välised võtmed
\i admin/sql/CreateFKConstraints.sql

Nüüd on kõik andmebaasi kitsendused paigas, kuid kettal valitseb paras segadus, selle vastu aitab VACUUM käsk:

VACUUM ANALYZE;

Kui osaliselt on opereeritud postgres kasutajana, võib juhtuda, et mõni tabel pole kättesaadav kasutajale mbrainz:

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA musicbrainz TO mbrainz;

Nüüdseks on kõige hullem möödas ja võib ennast lõbustada päringute tegemisega sellel andmebaasil

Leiame artistide nimede tabelist vaste bändile Queen:

SELECT
    id, comment
FROM artist
WHERE name = (
    SELECT id
    FROM artist_name
    WHERE NAME LIKE 'Queen');

Käitatud päring annab artistide nimede tabelist vasteks rea identifikaatoriga 192:

   id   |                 gid                  |    comment
--------+--------------------------------------+---------------
 701492 | 420ca290-76c5-41af-999e-564d7c71f1a7 | 板野友美
    192 | 0383dadf-2a4e-4d10-a46a-e9e041da8eb3 | UK rock group
 992994 | 5eecaf18-02ec-47af-a4f2-7831db373419 | US rapper
(3 rows)

Siinkohal kommentaariks, et saadud nimekiri on sisuliselt sama kui kasutaja kirjutab MusicBrainz-i lehel otsingusse 'Queen' ning peale seda valib otsingutulemustest meelepärase kirje:

Järgnevalt otsime artistide tabelist vasteid kus oleks viide samale nimele:

SELECT
    release_name.name AS album_name
FROM release
JOIN release_name
    ON release_name.id = release.name
WHERE release.artist_credit = 192
LIMIT 20;

Albumite nimekiri:

      album_name
----------------------
 The Works
 Jazz
 The Game
 A Kind of Magic
 A Night at the Opera
 A Day at the Races
 News of the World
 Made in Heaven
 The Game
 Flash Gordon
 Jazz
 Sheer Heart Attack
 Live Killers
 The Works
 The Game
 Live at Wembley ’86
 Flash Gordon
 Greatest Hits II
 Made in Heaven
 Headlong
(20 rows)

Keerukama ühendpäringuga saab klapitada kokku lugude nimed ja albumite nimed:

SELECT DISTINCT
    release.id AS "Release mbid",
    release_name.name AS "Album name",
    medium.position AS "Disc number",
    track.position AS "Track number",
    track_name.name AS "Track name"
FROM release
JOIN release_name
    ON release_name.id = release.name
JOIN medium
    ON medium.release = release.id
JOIN track
    ON track.medium = medium.id
JOIN track_name
    ON track.name = track_name.id
WHERE release.artist_credit = 192
ORDER BY
    release_name.name, release.id, medium.position, track.position
LIMIT 18;

Annab vasteks:

 Release mbid |            Album name            | Disc number | Track number |             Track name
--------------+----------------------------------+-------------+--------------+-------------------------------------
       727641 | 18 Greatest Hits Live From Queen |           1 |            1 | Killer Queen
       727641 | 18 Greatest Hits Live From Queen |           1 |            2 | Now I'm Here
       727641 | 18 Greatest Hits Live From Queen |           1 |            3 | In the Lap of the Gods... Revisited
       727641 | 18 Greatest Hits Live From Queen |           1 |            4 | Bohemian Rhapsody
       727641 | 18 Greatest Hits Live From Queen |           1 |            5 | Love of My Life
       727641 | 18 Greatest Hits Live From Queen |           1 |            6 | Somebody to Love
       727641 | 18 Greatest Hits Live From Queen |           1 |            7 | We Will Rock You
       727641 | 18 Greatest Hits Live From Queen |           1 |            8 | We Are the Champions
       727641 | 18 Greatest Hits Live From Queen |           1 |            9 | Crazy Little Thing Called Love
       727641 | 18 Greatest Hits Live From Queen |           1 |           10 | Another One Bites the Dust
       727641 | 18 Greatest Hits Live From Queen |           1 |           11 | Play the Game
       727641 | 18 Greatest Hits Live From Queen |           1 |           12 | Save Me
       727641 | 18 Greatest Hits Live From Queen |           1 |           13 | Under Pressure
       727641 | 18 Greatest Hits Live From Queen |           1 |           14 | Radio Ga Ga
       727641 | 18 Greatest Hits Live From Queen |           1 |           15 | I Want to Break Free
       727641 | 18 Greatest Hits Live From Queen |           1 |           16 | One Vision
       727641 | 18 Greatest Hits Live From Queen |           1 |           17 | A Kind of Magic
       727641 | 18 Greatest Hits Live From Queen |           1 |           18 | Who Wants to Live Forever
(18 rows)

Kasutades gruppeerimist ning aggregaatfunktsioone:

SELECT
    max(track.position) as "Lugude arv",
    track.medium as "Plaadi id",
    max(release_name.name) as "Album"
FROM track
JOIN medium
    ON medium.id = track.medium
JOIN release
    ON release.id = medium.release
JOIN release_name
    ON release_name.id = release.name
WHERE track.artist_credit=192
GROUP BY medium
ORDER BY max(track.position) desc
LIMIT 20;

Saame tabeli kus on albumid millel on kõige enim lugusid ning mille esitajate hulgas on Queen:

 Lugude arv | Plaadi id |                        Album
------------+-----------+-----------------------------------------------------
         38 |    848290 | School Disco
         37 |    789068 | After the Goldrush
         32 |    496633 | Extraordinary
         29 |   1008179 | James Hyman: DJ Mission Vol 007 (License To Thrill)
         29 |    810285 | Highlander
         28 |    810284 | Highlander
         27 |   1252727 | Hungarian Rhapsody: Queen Live in Budapest ’86
         26 |   1139952 | England Expects
         26 |   1225309 | Rock Montreal
         25 |    460813 | We Will Rock You (Deutsche Originalaufnahme)
         25 |    858280 | Wedding in a Box
         25 |   1243577 | We Will Rock You (Deutsche Originalaufnahme)
         24 |    923754 | 70 Greatest Songs (disc 1)
         24 |   1010885 | Queen II / Sheer Heart Attack
         24 |    922278 | Hangmania
         24 |   1330320 | Hungarian Rhapsody: Live in Budapest
         24 |   1257319 | 1979-01-24: Deutschlandhalle, Berlin, Germany
         23 |   1238682 | Jock Rock, Volume 2
         23 |   1272429 | Hungarian Rhapsody: Queen Live in Budapest ’86
         23 |   1257306 | 1977-03-13: Seattle Center Arena, Seattle, WA, USA
(20 rows)

Python-iga liidestamine on üsna lihtne kasutades openpyxl teeki Excel 2007 vormingus failide lugemiseks/kirjutamiseks ning psycopg2 teeki PostgreSQL andmebaasiga liidestamiseks.

#!/usr/bin/python
# coding: utf-8
# Sõltuvuste paigaldamiseks:
# sudo apt-get install python-openpyxl python-psycopg2

QUERY_QUEEN_ALBUMS = """
SELECT DISTINCT
    release.gid AS "Release mbid",
    release_name.name AS "Album name",
    medium.position AS "Disc number",
    track.position AS "Track number",
    track_name.name AS "Track name"
FROM release
JOIN release_name
    ON release_name.id = release.name
JOIN medium
    ON medium.release = release.id
JOIN track
    ON track.medium = medium.id
JOIN track_name
    ON track.name = track_name.id
WHERE release.artist_credit = 192
ORDER BY
    release_name.name, release.gid, medium.position, track.position
"""

# Ühendume PostgreSQL andmebaasi
import psycopg2
conn = psycopg2.connect(dbname="musicbrainz", user="mbrainz")
cur = conn.cursor()
cur.execute(QUERY_QUEEN_ALBUMS)

# Loome uue Exceli workbook-i
from openpyxl import Workbook
workbook = Workbook()
worksheet = wb.get_active_sheet()

# Kirjutame päise
for index, header in enumerate(cur.description):
    worksheet.cell(column=index, row=0).value = header.name

# Kirjutame andmed
row_index = 1
while True:
    row = cur.fetchone()
    if not row:
        break
    print "Kirjutan rea:", row
    for column_index, cell in enumerate(row):
        ws.cell(column=column_index, row=row_index).value = cell
    row_index += 1

# Sulgeme ühenduse PostgreSQL andmebaasi
cur.close()
conn.close()

# Salvestame Exceli workbook-i kettale
workbook.save("albumid.xlsx")

Käesolev juhend on kokku lapitud mitmest allikast tuginedes peaasjalikult MusicBrainzi vikile 1.

1

https://wiki.musicbrainz.org/History:Database_Installation

Postgres MusicBrainz SQL