25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Up   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Payments   Humor   Music

Microsoft Access database for a classical music LP record collection, with automated text import routines (RECORDS.MDB, RECORDS.DOC)

The download links, including the zipped data-only table neatly formatted into an HTML file you can view with your browser, are at the bottom of this page.

This project has a lot of macro code in Access Basic and Visual Basic. Links to the pages showing those code listings are farther down this page.

If this database looks like a lot more than you need for your purpose, there is a smaller and simpler database called MyRecords.mdb.

Overview

RECORDS.MDB can be used by itself as a database for records or CDs, as described below, but it is also the basis of a surprisingly successful artificial intelligence or learning systems project that uses Microsoft Access, Microsoft Word, and SQL via dynamic data exchange (DDE) to automatically extract data items from free-form text describing the contents of an LP, intelligently determine which database field each data item belongs in, assemble each set of fields into a database record, and transfer the new record into the RECORDS.MDB database.

In other words, it analyzes the text scanned from an LP album cover and assists with transferring the information to the database.

It is far from being completely automated: the text requires some user-preprocessing, and the program can make mistakes which the user must correct, and it will often leave behind data which the user must assign manually to fields (using procedures that are automated by toolbar buttons).

But it is highly successful at reducing the number of keystrokes required to get the data into the database and reducing the amount of repetitive work.

The Database

As a database, RECORDS.MDB was created for organizing a collection of mostly classical music, which is reflected in its fields and in the functions it performs. Its many queries and one comprehensive user-interactive form show their usefulness best when managing a large collection.

Each record in the database corresponds to one track on a record (or more accurately, one composition on a record). This makes it possible, for example, to create a query showing every performance you have of a particular Beethoven composition or to create a list of everything you have that is by Beethoven. But you don't need to create these queries because the master form will do it for you with a button click.

The database download includes all the table data, not just the structures. It might be fun for browsing, and it is definitely useful for seeing how the form and queries work to manage a large amount of data.

In both versions, if you plan to use the automated routines, RECORDS.MDB must keep its name because the MSWord macros in RECORDS.DOC refer to it by name.

Version differences in the databases

Access 2.0

The database was developed in Access 2.0, so that was where I got the bugs worked out and was satisfied with its performance. Development on this version ended sometime before 2004.

Access 2003

The 2.0 version was imported into Access 2003, where I had to rebuild the ComposersAll form from scratch. I've been using it ever since, and its functionality seems fine.

The Access 2003 version of the Tracks table has a new field, and queries were revised to use it, and there may be other minor differences, but as a stand-alone database, both the Access 2.0 and Access 2003 versions are entirely functional, are very similar to each other, and it makes little difference which you choose.

The Tracks Table

Its fields include:

ID Unique track ID, assigned by Access
MaxID Record number assigned when it was purchased, its order in the collection
RCID Record Company ID#, the product number assigned to this disc in its music catalog
Title Title of the entire record or album
Composer Composer, author, or writer of this composition or track
Composition Title of composition (symphony, concerto, etc.) or title of the track
Orchestra Orchestra
Conductor Conductor
Soloists All soloists and their instruments, concatenated into one field.
RPM RPM (33, 45, 78)
Kp Rating (keep code): 0=not heard, 1=best to 3=average to 5=worst
Opinion My opinion about this track, comments on the composition, performance, etc.
Memo Any other misc information that didn't fit elsewhere.

 

The ComposersAll Form

This form, based on a query which in turn is based on a crosstab query, is a sort of master control panel that shows statistics about the composers represented in the database and provides buttons to easily access the most commonly used functions.

For each composer in the database, the form displays the number of tracks by that composer, the number I've heard, the number I haven't heard, my average rating of the tracks, and the number of tracks that received each rating 1-5. This makes it easy, for example, to find tracks I haven't heard by composers I like, or to avoid composers I don't like. With nearly 600 composers in the database, a database is a lot more reliable than my memory.

Screenshot of the ComposersAll form

The form's buttons:

Composer Detail Place the cursor on any composer's name and then press this button to open a query a showing all the tracks in the database by that composer, sorted by the name of the composition so multiple performances will be grouped together.
By partial name Press this button to enter the partial name of a composer and open a query showing all database records that match the string, sorted by the composer name and then by composition name. E.g. Bach finds all tracks by all the Bachs, plus any other references to Bach.
Find in .DOC Place the cursor on any composer's name and then click this button to open RECORDS.DOC (the Word document that contains the text of records that have been scanned but not yet transferred to the database) and find the first occurrence of the selected composer within the document. The MSWord Repeat Find command will find subsequent occurrences. Before you use "Find in .DOC", you will probably have to edit that button's event code to supply it with the actual location of RECORDS.DOC on your computer.
See HeldOuts I pulled some highly rated records out of the collection and put them on a convenient shelf for easy access. This opens the table containing the ID numbers of those records. The button can be revised to open any other table or query more meaningful to you.
See 1 Record Press this button to enter the MaxID of a particular album and open a query showing all its tracks.
Compare Records Same as "See 1 Record", but for two or more records. This opens a query in design view in which you enter the MaxID's of the albums you want to view. This was useful when I wanted to hear a composition of which I had multiple performances; I could choose the one most likely to be good, based on the other contents of the albums.
Open RECORDS.DOC This hyperlink only exists on the Access 2003 form, and does what it says, opens the document.

The form also has some filters:

If you enable the lines of the the AutoExec macro in the database, the ComposersAll form will be opened automatically when you open the database.

The automated data import and transfer methods project

The easiest way to describe this project in greater detail, how it works, and thus how to use it, is to explain how it evolved:

When I became the temporary custodian of a collection of over 3000 classical music LPs, it was a good opportunity to undertake a systematic study of classical music. The collection was thoroughly indexed and cross-referenced on paper in a set of ring binders. But I also wanted to record details about the compositions, notes about them, provide my own ratings and opinions, and be able to search the collection using search keys for which paper indexes hadn't been created: the data needed to be on the computer! But the indexes had been typed on a manual typewriter using a cloth mesh ribbon, and in numerous experiments they proved completely unreadable by OCR. So I turned to the only automated method I could think of for getting the data, scanning the album covers. I eventually wound up with an enormous text file containing the record data, but it was all free-form text, with no regular pattern at all, and the task of cutting and pasting individual data items into the database manually promised to be even more difficult than typing it all in by hand to start with. Thus the automation began.

The document RECORDS.DOC contains the free-form text scanned from the records. At the top of the document is a table containing all the field names from the RECORDS.MDB database, in the same order that they appear in the database. If you append a row to this table and fill it with the appropriate data (by dragging text into its cells), you can then cut the row, task-switch to the .MDB, and Paste Append the data to the end of the Tracks table.

RECORDS.DOC ready to process a record:

Screenshot of Records.doc

Creating a toolbar button for each of the cells (fields) allowed me to select text and merely click the appropriate button (e.g. Composer), to have that text moved into the corresponding field in the table.

One piece of data (MaxID, the record's number assigned to it when it was bought) always appeared in the same location in the file (on the first line), so I created a macro ProcessOneRecord, invoked by pressing the Start button. And the first thing it did was grab the MaxID from the first line and move it to the table without my telling it to. Other items proved equally easy to parse ("78rpm", etc.), and they became automated. Other items were almost universal defaults (33 rpm), and could be auto-entered into the table even if that specific text wasn't present at all, as long as contradictory data (78rpm) hadn't been found.

Other data (RCID, the record company album ID numbers) had a predictable format (e.g. "M-1392" = M-nnnn) that MSWord could easily identify with a pattern-matching search, so I added a section to the macro to identify and move the most commonly occurring of these patterns into the RCID field.

There are two common formats for listing the contents of a classical LP, and I gave each a name:

Tab1C (a 1-column table) =

Composer
Composition1
Composition2

Tab2C (a 2-column table) =

Composer1 : Composition
Composer2 : Composition

It takes many keystrokes to transfer this data into the table at the top of the document. So I created a macro for each format. All you have to do is select the text to process (see instructions below), press the appropriate toolbar button, and the macro will create a small table, transfer the selected text into it, auto-fill any repeating entries (the composer name in Tab1C), and transfer this table into the correct columns of the table at the top of the document!

Many LP records have multiple compositions on them, which means they generate multiple database records. In this system, the database records are built as multiple rows in the table at the top of the document. Some of the data extracted from the text (RCID, RPM, etc.) applies to all of the rows. Once it has been entered into the top row, it can be auto-filled into all the rows below it. Other data (Composer, Conductor, etc.) often but not always applies to multiple rows, and can likewise be auto-filled into all the rows below it, but only until it hits a non-empty cell, at which point it must auto-fill that new data into all the rows below that point. All this is the function of the FILLDOWN button, which is the last button you press before your final review. Once you are certain that the data in the table is correct, you press the EXPORT button, which transfers the table's rows one at a time (this proved to be easiest and most reliable) over to the RECORDS.MDB database using Dynamic Data Exchange commands submitted to MSAccess.

Backing up a bit...

At one point I realized that I had used MSWord's pattern matching abilities a lot to identify the RCID's, as described above, and every new one I added required changes to the macro code. It made more sense to create a database table of these patterns, to which I could more easily add new entries as I encountered them. All I had to do was revise the macro to retrieve these entries one at a time (by DDE) to determine if each appears in the text, and if it does, to transfer it to the RCID field.

But once that was done, I realized there were LOTS more patterns that could be matched, and they were not all RCID's. For example, given the text "The ... Orchestra", it is near certain that this constitutes the name of an orchestra, which belongs in the Orchestra field. So I added a column to the database table PatternMatchStrings which indicated what database field that text should be transferred into when it matches the corresponding pattern. And the macro was revised to make use of this information. But there's more! The phrase "Conducted by" is extremely common and is near-certain followed by the name of the conductor (usually two names, first and last), which belongs in the Conductor field. But in this case the problem is that the phrase "Conducted by" is garbage that you don't want in the database. So I added to PatternMatchStrings yet another column, which indicates what portion of the matched phrase is garbage and should be discarded before the rest is transferred to the field table at the top of the document. And the macro was revised to make use of this information. Examining the PatternMatchStrings table, you'll find many more entries that serve as reliable tags about the informational content of various phrases.

Screenshot of the PatternMatch table in Records.mdb

This was all very exciting and useful, but at this point the procedures demonstrate a lot of automation but not much intelligence, and there was a huge source of information that I had not tapped at all, the database itself as it was being built!

Each time the macros processed a record, they left behind in the text information that it had not been able to recognize, and that I had to post to the table myself. Example: move "Bruno Walter" to the Conductor field. Once I've done that, and once the record has been transferred to RECORDS.MDB, Bruno Walter has been categorized, as a conductor. The database has learned something! If only the macros could make use of this! But they can, and it was easy! Just write a macro that retrieves the name of each conductor in the database, and search for it in the text. If found, move it to the Conductor field. So I only have to manually process Bruno Walter once. After that, he will always be recognized automatically.

But if it works for conductors, why not orchestras? Why not composers? Why not soloists? Why not compositions? Yes, yes, yes, to all of them. Every piece of information in the database is categorized simply because it's in the field that it's in. And as a general rule, if you ever encounter that same information again, it most often belongs in that same field.

But implementing this method for conductors, orchestras, composers, soloists, and compositions was where it got complicated.

The basic problem is this: "Bruno Walter" is a conductor, reliably. But in the database, which by this time had grown quite large, he was in some places listed as "Walter". If the search for conductors was done first, this would sometimes work out fine, as long as no one else named Walter appeared anywhere in the text. But "Walter" would also match Walter Gieseking, who is a pianist, and it would move Walter to the conductor field, leaving Gieseking behind. And the same sorts of abbreviations as "Walter" had been used for many other people and orchestras in the database, in all fields, all over the place. If the Soloist search was instead done first, it might pull out "Gieseking" as the pianist, leaving behind Walter to be miscategorized as the conductor.

One approach might be to devise some automated procedures to regularize and standardize the existing database entries so that they would be reliable, and enforce that policy from that time on. But the purpose of this project was to create a flexible input method that could operate with reasonable accuracy even with varying or incomplete data. That is a substantial part of the problem of extracting data from free-form text, the problem I was trying to solve, so that solution would definitely be a step in the wrong direction.

So I started with this assumption: the longest matching phrase found is probably the most reliable. If Walter Gieseking is in the text, it will match Walter and Gieseking and Walter Gieseking, but the one you should extract is always Walter Gieseking because it won't leave anything behind to be misconstrued. If only Gieseking is in the text, you can extract that. But if only Walter is in the text, it certainly doesn't refer to Gieseking, anyway.

The implementation of this strategy resulted in a monster union query (often thousands of characters long) built by a Word macro, cut into 255-character chunks, and submitted to MSAccess by DDE. The structure for this query can be found as UnionAllByLoc in RECORDS.MDB. (It's not a monster there because it uses a variable Txt. When built by the Word macro, Txt consists of the entire document text being searched for matches, and it's used several times in different parts of the query.) This query identifies ALL the phrases in the database that appear in the text being searched. It then orders them by Location (how early they appear in the text), and by Length of the matching phrase, returning the longest phrase first.

The Word macro thus receives by DDE a list of all the matching phrases, and for each, the name of the database field that the text should be posted to, based on the field in which it was found in the database. The first phrase returned is the one that begins earliest in the text. The macro cuts out and moves that text to the appropriate field in the table. It then retrieves the next phrase, but it will ignore it if its starting point was within the text that just got cut and moved. If two matching phrases start at the same location, the first one that the macro receives will be the longest; that's the one that will be cut, and the subsequent match(es) that started at that same point will be ignored because they occur within the text that was just cut. So as an example, if "Bruno Walter" is in the text, "Bruno Walter" is the match that will be returned first, and "Bruno Walter" will be cut out of the text, and when it is retrieved, the next match "Walter" will be ignored. In this fashion, the macro proceeds through the text until all matches have been either processed or ignored. At that point, the only thing remaining is text that you, the user, must process manually. And anything that you handle manually, such as moving Toscanini to Conductor, will subsequently be recognized and you won't have to do it again.

I call it a learning system because each time you manually assign a phrase to a field, you are teaching that categorization to the database. It becomes part of the database's knowledge, and it subsequently uses its own growing body of knowledge to improve its performance.

This project also demonstrates a principle that I like a lot and have used elsewhere: whenever possible, link together the most capable applications you have available, each assigned to do the things it does best. In this case, MSWord and MSAccess work together to accomplish a task that would be virtually impossible for either to do alone.

The above description is only an overview. Implementing each feature usually required the creation of one or more new macros, sections of a macro, or functions, and figuring out how to deal with multiple troublesome details, many of which you will find documented as you browse the source code which in places has more documentation than code.

Version differences in the data import macros

The automated data import procedures were designed with Access 2.0, Word 6.0 using WordBasic, under Windows For Workgroups 3.11, and it is only with those that the import procedures are tested and are in their best form. However, the Access 2003 project does contain everything that is necessary for the automated data import procedures, and I have been using them with some success, fixing problems as I encounter them. It is this process of encountering problems and fixing them that is not finished. I'm not sure I've found all the places where the old methods don't translate correctly to Office 2003.

One important procedural change had to be made: MSWord 2003 does not seem able to automatically move its assembled data (at the top of the document) directly to MSAccess 2003 by DDE (maybe due to security restrictions), so instead it moves it to a separate Word document (RECORDSTRANSFERTABLE.DOC). As you process a (phonograph) record, the (database) records for it are assembled at the top of the document (as before), but when it comes time to move the data to the database, Word transfers it to the second Word document, appending it to a large table there. At the end of your session, you copy or cut that table from the document, and manually PasteAppend the whole thing into the RECORDS.MDB Tracks table at one time.

The problems of doing a good quality conversion to Office 2003:

  1. Microsoft Word 2003 converts WordBasic statements to Visual Basic by prefixing each statement with "WordBasic.", invoking the outdated but thankfully still present WordBasic engine. This works for most of the converted statements, but apparently not all. But even if it results in working code for the time being, the project cannot be realistically extended using an outdated language. The entire project needs to be converted to Visual Basic.
  2. The security enhancements of Windows XP and Microsoft Access 2003 and Word 2003 appear to make it significantly more difficult for one application to control another using Dynamic Data Exchange. Furthermore, DDE itself, as much as I liked it, appears to be considered inherently insecure and its use discouraged; somewhere I got the idea (maybe erroneous) that it would be phased out, replaced by I don't know what. Either way, there is probably a better technology for extending the project.

I do expect to eventually convert this project to Microsoft Office 2003 under Windows XP, or at least to something more modern. Any suggestions about what newer technologies would be most appropriate for the conversion and extension of the project would be appreciated. Although the methods used in this LP records project are specific to this one use, I believe that once they have been more fully developed into a less experimental form, it should be possible to generalize them to other applications.

View the macro code for the automated routines:

Macro code in AI.DOT for Word 6.0 (and Access 2.0)

Macro code in AI.DOT for Word 2003 (and Access 2003)

How to use the automated data import and transfer methods - procedure

These macros help you analyze scanned (or otherwise non-organized) text, extract the data from it into an organized data table, and export the table entries into a MSAccess database.

  1. Scan text from records into RECORDS.DOC. The first line of each entry must be the MaxID (ID number of the records in the collection). Entries (phonograph records) are separated in the document by Section Breaks. See the existing RECORDS.DOC for the required format. Included in this project are some Microsoft Word files (SCANRECS.DOT, SCANONE.DOC, and SCANNED.DOC) that may help automate the scanning procedure. Instructions for using them are in the SCANRECS.DOT file.)
  2. In Word, RECORDS.DOC, use View | Toolbars to display both the Phonograph toolbars.
  3. Visually review the first record below the data table (Section 2 of the document) and fix any obvious spelling or organization problems.
  4. For each item you know may be mishandled by the automated procedures, select its text and assign it manually to a category by pressing that category's toolbar button. Soloists are a common trouble spot. It takes much less time to anticipate and prevent a mistake than to fix one that the automated procedures have already made. It is important that the full text of each data item be together on one physical line. So if the text says:

Conducted by
Leonard
Bernstein,

you must manually pull it all onto one line. At some times, you will see the automated procedures temporarily alter the zoom setting to allow the maximum amount of text on each line.

  1. Press the START button to begin automated processing of the record. The program will analyze the text and attempt to move the data items it recognizes to the proper columns in the data table.
  2. Manually process any text items left behind. This usually includes these:

Tab1C and Tab2C are the "officially-supported" formats for Composers and Compositions, for which automated procedures exist. If the program fully recognizes a Tab2C line, it processes it automatically. Otherwise, it leaves it behind, selected. It often recognizes only a portion of a Tab1C block, so it always leaves selected the portion it recognized, for you to process.

Tab2C means "Table - 2 Columns", and its format is

Composer : Composition (colon required; if not present, insert it manually)

To process it manually, place the cursor on the given line, or select the multiple sequential line(s) that have the format, and press the Tab2C button (or Ctrl+Shift+2).

Tab1C means "Table - 1 Column", and its format is:

Composer
1st Composition
2nd Composition
etc.

To process it manually, place the cursor on the Composition line (if there is only 1), or select all the Compositions (but not the Composer), and press the Tab1C button (or Ctrl+Shift+1).

  1. Fix in the data table any mistakes made by the automated procedures.
  2. Press FILLDOWNS button to auto-fill repeating entries in the data table and to do some data integrity checks.
  3. Press EXPORT button to transfer the data table lines to the database. (As noted above, in the 2003 version it transfers the data to a table in a 2nd Word document.)
  4. Ensure that the selected (highlighted) text remaining in the record you just processed contains only leftover garbage, and delete it. The next record now automatically becomes the current record. The breaks between records are section breaks, and the "current" record is always Section 2 of the document.

Tips

Procedure for pre-processing the text after scanning:

  1. Scan as many records as possible into records.doc in each session so the following steps only have to be done once:
  2. Delete all garbage and indecipherable or unneeded info.
  3. Change to "title" capitalization.
  4. Spell check until all words in the file are legitimate.

To do

Ensure that the main union query orders its results by frequency of occurrence within a particular category: Leonard Bernstein appears as conductor, composer, and soloist (piano); whenever he is found, the default assignment should be conductor because that's the category where he has the most existing entries. It may already do so, or I may have tried it and found it impossible. Also, revise the method so that the text to search is posted to a control on a form (but what's the maximum length? If 255, use a Memo field in a Transfer table); then you can modify the Union query so it refers to that control (or field) instead of repeated identical instances of the text; then you can run the Union query itself (as it exists in MDB, instead of by poking pieces through DDE), and most important, you can then base a new query on the Union query, e.g. to reorder the records it returns. Also consider how to modify the Union query (or more likely duplicate it in a module) so that it searches all fields, getting their names from the table definition. This would be not for actual use (unnecessarily slow in this application), but for theory, to prove that you can do this same sort of search, in any database, extracting and categorizing any data items without knowing in advance which fields to search: complete generalizability.


Project Downloads

Both packages contain the complete projects: Access database, Word templates, associated documents, and usage notes.

RECORDS2003MDB.ZIP for MSAccess 2003 and Word 2003
About 1032 KB. The most recent version. After download, unzip to a directory of your choosing.
 
RECORDS20MDB.ZIP for MSAccess 2.0 and Word 6.0
About 520 KB. After downloading using a computer that is more recent than Windows 3.11, unzip it to a directory of your choosing and copy the files to your Windows 3.11 computer. If you need a version that is directly downloadable to Windows 3.11, please ask and I can construct one.
 

Related web pages (on this site)

recordsmdbtxt.htm
A series of web pages containing data (about 4,000 database records) from RECORDS.MDB, as plain text. If you're interested in the project, you shouldn't have any need for this page. If you download the database, it has all the data, and it's a lot easier to navigate.

Neatly formatted data-only table

tracksforbrowsingandsearching.zip
Only about 298 KB, but it unzips into a 1.8 MB .htm web page file. It contains the same data as the recordsmdbtxt.htm series of web pages, but formatted into a giant table that you can view with your browser. The data it displays is the output from the database's TracksForBrowsingAndSearching query, which sorts by Composer, Composition, so for example all Beethoven entries are together, and within those all his Symphony 1 entries are together. After downloading this file, unzip it to wherever you like, and Open it with your web browser.

To import the data-only into an application other than Access

tracksforbrowsingandsearchingtab.zip
About 240 KB. If you want to import the data-only into your own database application or Microsoft Excel, this file should allow it. It contains the same 4,000 database entries, with fields separated by TABs. There is no text delimiter (no quotes) because some of the database entries themselves contain quotation marks and others contain apostrophes. After downloading this file, unzip it to wherever you like, and Open it with Excel or Import it into your database application.

Music-related web sites

The Classical Music Archives is a terrific site with a huge collection of classical music files you can listen to.



 

Valid HTML 4.01 Transitional Valid CSS
Yahoo! Search
Search the web Search this site
View content labeling at ICRA.