Given a somewhat peculiar logfile, represented by the following snippet:
FILE (insert): file=Templates\xyz_EN_0615.pdf key=KEY_EN_AP_PAID
FILE (insert): file=Templates\xyz_DE_0615.pdf key=KEY_DE_STD_PAID
FILE (insert): file=Templates\xyz_DE_0615_free.pdf key=KEY_DE_STD_FREE
FILE (insert): file=Templates\xyz_IT_0615.pdf key=KEY_IT_STD_PAID
FILE (insert): file=Templates\xyz_IT_0615_free.pdf key=KEY_IT_STD_FREE
DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
Jul 31, 2015 5:07:54 PM java.util.prefs.WindowsPreferences <init>
WARNUNG: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
Jul 31, 2015 5:07:55 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNUNG: Using fallback font ArialMT for base font ZapfDingbats
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]
DEBUG: Opening Migration\abc_2_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Reset_1
Field not available: Print
DEBUG: Writing Migration\abc_2_DE-migrated.pdf
PERFORMANCE: [OVERALL completed in 756ms]
DEBUG: Opening Migration\abc_3_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
DEBUG: Writing Migration\abc_3-migrated.pdf
PERFORMANCE: [OVERALL completed in 660ms]
DEBUG: Opening Migration\abc_4.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_5.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_6_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Text6
Field not available: Text7
Field not available: Text8
Field not available: Text9
Field not available: Text10
Field not available: Text11
DEBUG: Writing Migration\abc_6-migrated.pdf
PERFORMANCE: [OVERALL completed in 686ms]
null
%EOF
For an analysis of how accurate an automated PDF form field transformation service runs, I need to filter out and count all occurrences of the following 4-tuple:
DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]
There can be any number of lines in between that final 4-tuple which can either be skipped or added to the list of invalid log entries. The simple selection criteria is hard-coded into the code below.
Next, the logfile should subsequently be split into entries that are valid and entries that are invalid, including line numbering. The current program's output run against the above example would output:
Statistics: Valid[tuples]=4 Valid[lines]=16 Invalid[lines]=8 Skipped[lines]=17 Total[lines]=41
----------------------[VALID]----------------------
key=6 value=DEBUG: Opening Migration\abc_1.pdf
key=7 value=DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
key=12 value=DEBUG: Writing Migration\abc_1-migrated.pdf
key=13 value=PERFORMANCE: [OVERALL completed in 2303ms]
key=14 value=DEBUG: Opening Migration\abc_2_DE.pdf
key=15 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=18 value=DEBUG: Writing Migration\abc_2_DE-migrated.pdf
key=19 value=PERFORMANCE: [OVERALL completed in 756ms]
key=20 value=DEBUG: Opening Migration\abc_3_DE.pdf
key=21 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=22 value=DEBUG: Writing Migration\abc_3-migrated.pdf
key=23 value=PERFORMANCE: [OVERALL completed in 660ms]
key=30 value=DEBUG: Opening Migration\abc_6_DE.pdf
key=31 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=38 value=DEBUG: Writing Migration\abc_6-migrated.pdf
key=39 value=PERFORMANCE: [OVERALL completed in 686ms]
----------------------[VALID]----------------------
----------------------[INVALID]----------------------
key=24 value=DEBUG: Opening Migration\abc_4.pdf
key=25 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=26 value=null
key=27 value=DEBUG: Opening Migration\abc_5.pdf
key=28 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=29 value=null
key=40 value=null
key=41 value=%EOF
----------------------[INVALID]----------------------
Here is my approach:
import org.testng.annotations.Test;
import java.io.*;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
public class AnalyseMigrationLog {
public class RingMap<K, V> extends LinkedHashMap<K, V> {
private int cacheSize;
public RingMap(int cacheSize) {
super(cacheSize);
this.cacheSize = cacheSize;
}
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > cacheSize;
}
}
@Test
public void doAnalysis() throws IOException {
final String logfile = "./run-simple.log";
final int ringSize = 4;
int lc = 0;
int skipped = 0;
Long count;
String line;
Map<Integer, String> circularFifo = new RingMap<>(ringSize);
Map<Integer, String> validTuples = new LinkedHashMap<>();
Map<Integer, String> invalidTuples = new LinkedHashMap<>();
FileReader fre = new FileReader(logfile);
BufferedReader bre = new BufferedReader(fre);
while ((line = bre.readLine ()) != null) {
lc++;
if (line.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") || line.endsWith("<init>")) {
skipped++;
continue;
}
circularFifo.put(lc, line);
if (circularFifo.size() < ringSize)
continue;
count = circularFifo.values().stream().
filter(p -> p.matches("^(DEBUG: Opening|DEBUG: Writing|PERFORMANCE:).*")).count();
// Get the LRU entry in the circular fifo
List<Map.Entry<Integer, String>> entryList = new ArrayList<>(circularFifo.entrySet());
Map.Entry<Integer, String> lastEntry = entryList.get(entryList.size() - 1);
if (count == ringSize && lastEntry.getValue().startsWith("PERFORMANCE:")) {
validTuples.putAll(circularFifo);
// Remove already pushed entries from invalidTuples list to avoid duplicate entries
circularFifo.forEach((key, value) -> invalidTuples.remove(key));
circularFifo.clear();
} else {
invalidTuples.putAll(circularFifo);
}
}
// Put in the last entries that didn't fill up the circular fifo anymore.
invalidTuples.putAll(circularFifo);
bre.close();
fre.close();
System.out.printf("Statistics: Valid[tuples]=%s Valid[lines]=%s Invalid[lines]=%s Skipped[lines]=%s Total[lines]=%s%n",
validTuples.size()/ringSize, validTuples.size(), invalidTuples.size(), skipped, lc);
System.out.printf("----------------------[VALID]----------------------%n");
validTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
System.out.printf("----------------------[VALID]----------------------%n");
System.out.printf("----------------------[INVALID]----------------------%n");
invalidTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
System.out.printf("----------------------[INVALID]----------------------%n");
}
}
The basic trick was to introduce a circular fifo for this task. While short, fast and working perfectly well, I was wondering if this could be translated more adequately into Java-8 features, like using NIO2 and appropriate streaming techniques. I don't want to use Guava or any other over-engineered libraries for such a simple task.
Now, I specifically do not fancy the solution the get the LRU entry as done above. How would I be able to extend and use the inner class with something along the lines of:
public class RingMap<K, V> extends LinkedHashMap<K, V> {
private int cacheSize;
public RingMap(int cacheSize) {
super(cacheSize);
this.cacheSize = cacheSize;
}
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > cacheSize;
}
//TODO: how exactly would this work?
public <K, V> Map.Entry<K,V> getLast(LinkedHashMap<K, V> map) {
Map.Entry<K, V> result = null;
for (Map.Entry<K, V> kvEntry : map.entrySet()) {
result = kvEntry;
}
return result;
}
}
Next, I really would like to make use of NIO2 features, however I do not understand how I could best integrate them into my solution. Something along the lines of:
@Test
public void doAnalysisNIO2() throws IOException {
final String logfile = "./run-simple.log";
Path path = Paths.get(logfile);
try (Stream<String> filteredLines = Files.lines(path, StandardCharsets.UTF_8)
.onClose(() -> System.out.println("Stream has been closed!"))
.filter(s -> !(s.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") ||
s.endsWith("<init>")))) {
// Do the same thing as in the other code
filteredLines.forEach((l) -> System.out.printf("line = %s%n", l));
}
}
User contributions licensed under CC BY-SA 3.0